Categories: Uncategorized
Tags:

The Decomposition Tree is a Power BI visual that allows you to break down data hierarchically to analyze it in detail. It’s useful for root cause analysis or drilling down into specific dimensions of data. Let’s explore how to use the decomposition tree with the Titanic dataset as an example.

Get the Dataset here : https://github.com/mwaskom/seaborn-data/blob/master/titanic.csv


Step-by-Step Explanation: Decomposition Tree on Titanic Dataset

1. Prepare the Titanic Dataset

The Titanic dataset typically includes the following columns:

  • PassengerId: Unique identifier for each passenger.
  • Survived: Whether the passenger survived (1 = Yes, 0 = No).
  • Pclass: Passenger class (1st, 2nd, or 3rd class).
  • Sex: Gender of the passenger.
  • Age: Age of the passenger.
  • SibSp: Number of siblings/spouses aboard.
  • Parch: Number of parents/children aboard.
  • Fare: Ticket fare.
  • Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).

2. Add the Titanic Dataset to Power BI

  • Open Power BI Desktop.
  • Load the Titanic dataset (CSV or Excel file) using the Get Data option.
  • Ensure the data is clean and has no missing or invalid values (use Power Query if needed).

3. Create the Decomposition Tree

  • Add the Decomposition Tree visual to the Power BI canvas (available in the Visualization pane).
  • Define the Analyzed Field (the metric you want to break down):
    • Drag Survived into the Analyzed Field. This will be the primary measure.
    • Set it to Count (to count the number of passengers who survived/didn’t survive).
  • Define Explain By fields (dimensions to break down the data):
    • Add fields like Pclass, Sex, Age, Embarked, Fare, and SibSp.

4. Analyze Survival Rates

Now you can use the decomposition tree to analyze the survival rates:

  • Root Node:
    • The root node will display the total count of passengers, split into Survived = 1 (survived) and Survived = 0 (did not survive).
  • Drill Down:
    • Expand by Pclass to see survival rates across passenger classes (1st, 2nd, and 3rd).
    • Drill down further by Sex to analyze survival by gender within each class.
    • Drill down by Embarked to explore survival rates by port of embarkation.
    • Use Age to break down survival rates by age groups (e.g., children vs. adults).

5. Insights from the Decomposition Tree

The decomposition tree allows you to visually uncover patterns in the Titanic dataset. For example:

  • By Class:
    • Higher survival rates in 1st class, lower in 3rd class.
  • By Gender:
    • Females have a significantly higher survival rate compared to males.
  • By Age:
    • Children (age < 12) often have higher survival rates due to evacuation priorities.
  • By Embarkation Point:
    • Survival rates may vary based on the port of embarkation (e.g., higher for passengers boarding at Cherbourg).

6. Dynamic Analysis with AI Splits

The decomposition tree has built-in AI capabilities:

  • Click on + Add AI Split to let Power BI automatically determine the most significant factor contributing to survival rates.
  • This feature helps identify relationships that may not be obvious (e.g., a combination of Pclass and Fare could be critical).

Example Walkthrough

Suppose the tree starts with 891 passengers:

  1. Split by Survived:
    • 342 survived, 549 did not.
  2. Drill down by Pclass:
    • In 1st class: 136 survived, 80 did not.
    • In 3rd class: 119 survived, 372 did not.
  3. Further drill by Sex within 1st class:
    • Females: 91 survived, 3 did not.
    • Males: 45 survived, 77 did not.

Benefits of Using Decomposition Tree

  1. Dynamic Drilldowns: Easily explore multiple dimensions of the data without predefined hierarchies.
  2. Root Cause Analysis: Quickly identify which factors contribute most to survival or other metrics.
  3. AI-Powered Insights: Automatically discover key influencers in the dataset.

In Power BI, whether you use Count or Sum depends on how the Survived column is structured in your dataset:


Scenario 1: Survived is Binary (0 or 1)

If the Survived column contains 0 (did not survive) and 1 (survived):

  • Sum: This is the correct choice.
    • The sum of the Survived column will give the total number of passengers who survived (since 1 represents survival).
    • For example, if there are 342 passengers who survived, the sum will be 342.
  • Count: Would return the total number of rows, not distinguishing between survival statuses.

Scenario 2: Using Count for Survival Rates

If you’re splitting passengers based on Survived = 0 and Survived = 1:

  • You can count rows grouped by the Survived value to determine how many passengers fall into each category:
    • Count of rows where Survived = 1 gives the number of survivors.
    • Count of rows where Survived = 0 gives the number of non-survivors.

Best Practice for Titanic Dataset

  1. If you want the total survivors:
    • Use Sum on the Survived column.
  2. If you want to show a breakdown of survival counts (survived vs. not survived):
    • Use Count of rows, grouped by Survived.

Decomposition Tree Setup Recap:

  • Analyzed Field: Use Sum(Survived) to show total survivors.
  • Explain By: Add fields like Pclass, Sex, Age, etc., to break down survival trends.