This dataset focuses on factors influencing the likelihood of heart attacks in young adults (ages 18–35) in India. It encompasses demographic, lifestyle, medical, and clinical data, offering a comprehensive resource for analyzing connections between these factors and heart attack risk in this age group.
Dataset is available here on Kaggle: https://www.kaggle.com/datasets/ankushpanday1/heart-attack-in-youth-of-india
Under License : MIT
We are exploring this dataset for Educational Research Purpose
Key Features
Demographics
- Age
- Gender
- Region (state or locality)
- Urban/Rural residence
- Socioeconomic Status (SES)
Lifestyle Factors
- Smoking and alcohol consumption
- Dietary preferences (vegetarian/non-vegetarian)
- Physical activity levels
- Screen time
- Sleep duration
Medical History
- Family history of heart disease
- Diabetes and hypertension history
- Cholesterol levels
- Body Mass Index (BMI)
- Stress levels
Clinical and Test Results
- Blood pressure (systolic and diastolic)
- Resting heart rate
- Electrocardiogram (ECG) results
- Chest pain type
- Maximum heart rate during exercise
- Exercise-induced angina
- Blood oxygen levels (SpO₂)
- Triglyceride levels
Target Variable
- Heart Attack Likelihood: Yes/No
Potential Use Cases
1. Machine Learning for Risk Prediction
- Train classification models (e.g., logistic regression, random forests, neural networks) to predict heart attack likelihood based on features.
2. Public Health Insights
- Identify high-risk groups, such as sedentary individuals with a family history of heart disease, to guide targeted interventions.
3. Lifestyle Recommendations
- Analyze the effects of modifiable factors like diet, exercise, and sleep on heart attack risk.
4. Regional and Socioeconomic Analysis
- Study disparities in heart attack risk based on geographic or socioeconomic differences within India.
Insights to Extract
1. Feature Interactions
- Examine non-linear relationships, such as how physical activity mitigates the effect of high BMI on heart attack risk.
- Example: High BMI individuals with active lifestyles may have a lower risk compared to their sedentary counterparts.
2. Clustering
- Use clustering techniques (e.g., K-Means, DBSCAN) to group individuals with similar risk profiles for tailored interventions.
3. Latent Factors
- Apply dimensionality reduction techniques like PCA to uncover hidden health patterns influencing heart attack risk.
4. Causal Relationships
- Use causal inference methods to determine if specific factors (e.g., stress) directly influence others (e.g., cholesterol levels).
5. Regional Variation
- Conduct geospatial analysis to study how environmental and cultural factors across regions impact lifestyle, health indicators, and heart attack risks.
Example Insights to Explore
- Stress and Sleep Interaction: Do high-stress individuals with inadequate sleep show significantly higher heart attack risk?
- Exercise-Induced Angina: How does exercise-induced angina correlate with abnormal ECG results and high cholesterol?
- Gender-Specific Risk Factors: Are there differences in risk factor importance, such as smoking or BMI, between men and women?
- Dietary Impact: What is the differential impact of vegetarian vs. non-vegetarian diets on triglycerides and heart attack risk?
- Rare Patterns: Identify rare combinations, such as individuals with low SpO₂, normal BMI, but high heart attack likelihood.
Starting Points for Analysis
- Correlation Analysis:
- Assess relationships between individual features and heart attack likelihood.
- Predictive Modeling:
- Build models like Random Forest or Gradient Boosting to predict outcomes and analyze feature importance.
- Trend Analysis:
- If the dataset includes time-series data, investigate trends in key health metrics (e.g., cholesterol, stress levels) over time.
This dataset provides a robust foundation for deriving actionable insights and advancing research into young adult heart health.