The Decomposition Tree is a Power BI visual that allows you to break down data hierarchically to analyze it in detail. It’s useful for root cause analysis or drilling down into specific dimensions of data. Let’s explore how to use the decomposition tree with the Titanic dataset as an example.
Get the Dataset here : https://github.com/mwaskom/seaborn-data/blob/master/titanic.csv
Step-by-Step Explanation: Decomposition Tree on Titanic Dataset
1. Prepare the Titanic Dataset
The Titanic dataset typically includes the following columns:
PassengerId
: Unique identifier for each passenger.Survived
: Whether the passenger survived (1 = Yes, 0 = No).Pclass
: Passenger class (1st, 2nd, or 3rd class).Sex
: Gender of the passenger.Age
: Age of the passenger.SibSp
: Number of siblings/spouses aboard.Parch
: Number of parents/children aboard.Fare
: Ticket fare.Embarked
: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).
2. Add the Titanic Dataset to Power BI
- Open Power BI Desktop.
- Load the Titanic dataset (CSV or Excel file) using the Get Data option.
- Ensure the data is clean and has no missing or invalid values (use Power Query if needed).
3. Create the Decomposition Tree
- Add the Decomposition Tree visual to the Power BI canvas (available in the Visualization pane).
- Define the Analyzed Field (the metric you want to break down):
- Drag
Survived
into the Analyzed Field. This will be the primary measure. - Set it to Count (to count the number of passengers who survived/didn’t survive).
- Drag
- Define Explain By fields (dimensions to break down the data):
- Add fields like
Pclass
,Sex
,Age
,Embarked
,Fare
, andSibSp
.
- Add fields like
4. Analyze Survival Rates
Now you can use the decomposition tree to analyze the survival rates:
- Root Node:
- The root node will display the total count of passengers, split into
Survived = 1
(survived) andSurvived = 0
(did not survive).
- The root node will display the total count of passengers, split into
- Drill Down:
- Expand by
Pclass
to see survival rates across passenger classes (1st, 2nd, and 3rd). - Drill down further by
Sex
to analyze survival by gender within each class. - Drill down by
Embarked
to explore survival rates by port of embarkation. - Use
Age
to break down survival rates by age groups (e.g., children vs. adults).
- Expand by
5. Insights from the Decomposition Tree
The decomposition tree allows you to visually uncover patterns in the Titanic dataset. For example:
- By Class:
- Higher survival rates in 1st class, lower in 3rd class.
- By Gender:
- Females have a significantly higher survival rate compared to males.
- By Age:
- Children (age < 12) often have higher survival rates due to evacuation priorities.
- By Embarkation Point:
- Survival rates may vary based on the port of embarkation (e.g., higher for passengers boarding at Cherbourg).
6. Dynamic Analysis with AI Splits
The decomposition tree has built-in AI capabilities:
- Click on + Add AI Split to let Power BI automatically determine the most significant factor contributing to survival rates.
- This feature helps identify relationships that may not be obvious (e.g., a combination of
Pclass
andFare
could be critical).
Example Walkthrough
Suppose the tree starts with 891 passengers:
- Split by
Survived
:- 342 survived, 549 did not.
- Drill down by
Pclass
:- In 1st class: 136 survived, 80 did not.
- In 3rd class: 119 survived, 372 did not.
- Further drill by
Sex
within 1st class:- Females: 91 survived, 3 did not.
- Males: 45 survived, 77 did not.
Benefits of Using Decomposition Tree
- Dynamic Drilldowns: Easily explore multiple dimensions of the data without predefined hierarchies.
- Root Cause Analysis: Quickly identify which factors contribute most to survival or other metrics.
- AI-Powered Insights: Automatically discover key influencers in the dataset.
In Power BI, whether you use Count or Sum depends on how the Survived
column is structured in your dataset:
Scenario 1: Survived is Binary (0 or 1)
If the Survived
column contains 0 (did not survive) and 1 (survived):
- Sum: This is the correct choice.
- The sum of the
Survived
column will give the total number of passengers who survived (since 1 represents survival). - For example, if there are 342 passengers who survived, the sum will be
342
.
- The sum of the
- Count: Would return the total number of rows, not distinguishing between survival statuses.
Scenario 2: Using Count for Survival Rates
If you’re splitting passengers based on Survived = 0
and Survived = 1
:
- You can count rows grouped by the
Survived
value to determine how many passengers fall into each category:- Count of rows where
Survived = 1
gives the number of survivors. - Count of rows where
Survived = 0
gives the number of non-survivors.
- Count of rows where
Best Practice for Titanic Dataset
- If you want the total survivors:
- Use Sum on the
Survived
column.
- Use Sum on the
- If you want to show a breakdown of survival counts (survived vs. not survived):
- Use Count of rows, grouped by
Survived
.
- Use Count of rows, grouped by
Decomposition Tree Setup Recap:
- Analyzed Field: Use Sum(Survived) to show total survivors.
- Explain By: Add fields like
Pclass
,Sex
,Age
, etc., to break down survival trends.