The box plot of Survived
vs Fare
in the Titanic dataset visualizes the distribution of fares for passengers who survived (1) versus those who did not survive (0). Here’s an explanation of what this plot typically reveals:
Key Observations:
- Fare Range for Survivors vs. Non-Survivors:
- Survivors tend to have higher fares on average compared to non-survivors. This is because passengers who paid higher fares were more likely to be in first-class cabins, which had better access to lifeboats.
- Non-survivors generally have lower fares, indicating they were more likely to be in second or third class.
- Median Fare:
- The median fare (line inside the box) for survivors is significantly higher than that for non-survivors.
- This shows a clear disparity in survival chances based on socioeconomic status.
- Spread of Fares:
- Survivors show a larger spread of fares, including both high and low values, which indicates survival wasn’t exclusive to first-class passengers. However, most outliers (extremely high fares) belong to survivors.
- Non-survivors have a narrower distribution with fewer high fares, suggesting most were from lower-paying ticket classes.
- Outliers:
- There are outliers in the
Survived = 1
group, indicating some passengers with very high fares (likely luxury travelers) survived. - Fewer outliers are present for non-survivors, as most passengers in lower classes didn’t pay extremely high fares.
- There are outliers in the
Insights:
- Socioeconomic Advantage: Fare is a proxy for social class, and first-class passengers had a clear survival advantage due to their proximity to lifeboats and social norms at the time.
- Survival Opportunities: Although paying a high fare generally increased survival chances, some lower-fare passengers also survived, likely due to factors like age, sex, and location during the sinking.
- Importance in Modeling: The relationship between
Fare
andSurvived
is significant and should be considered in predictive models. However, it’s not the sole determinant of survival.
Example Code to Generate the Box Plot:
import seaborn as sns
import matplotlib.pyplot as plt
# Box plot of Survived vs Fare
sns.boxplot(x='Survived', y='Fare', data=df, palette='pastel')
plt.title('Fare vs Survival')
plt.xlabel('Survived')
plt.ylabel('Fare')
plt.show()
This plot provides visual evidence that fare (and indirectly class) played a crucial role in survival on the Titanic.