Exploratory Data Analysis (EDA) is the process of examining datasets to summarize their main characteristics using statistics and visualizations.
Exploratory Data Analysis (EDA) is a critical first step in any data-driven project, enabling analysts to understand data structure, identify patterns, detect anomalies, and generate meaningful insights before applying advanced analytics or machine learning models. In this project, we performed a comprehensive EDA on a Student Performance dataset using Python in Google Colab. The objective was to analyze how various academic and lifestyle factors—such as study hours, attendance percentage, assignment completion, practice scores, sleep duration, and screen time—impact overall student performance levels.
Using powerful data analysis libraries like Pandas and NumPy, we explored the dataset’s structure, calculated descriptive statistics, and extracted key performance indicators (KPIs). To make the insights more intuitive and interactive, we leveraged Plotly for visualization, creating dynamic scatter plots, box plots, histograms, heatmaps, and distribution charts. These visualizations allowed us to clearly observe relationships between variables and identify trends influencing academic success.
This project not only demonstrates technical implementation skills in data analytics but also highlights the importance of structured analysis in educational performance evaluation. The findings can assist institutions in understanding student behavior patterns and designing data-backed strategies to enhance academic outcomes.
Get Dataset here: https://colorstech.net/power-bi/student-performance-analysis-in-excel-step-by-step-guide-to-build-an-interactive-eda-dashboard/
🧱 Final Tech Stack Summary
Platform: Google Colab
Language: Python
Libraries: Pandas, NumPy, Plotly
Data Source: Excel Dataset
Output: Interactive Visual Analytics + KPI Insights
🐍 Python EDA Code
# Import Required Libraries
import pandas as pd # Used for data manipulation and analysis
import numpy as np # Used for numerical calculations
import plotly.express as px # Used for interactive visualizations
import plotly.graph_objects as go # Used for advanced Plotly charts
# Load Dataset
df = pd.read_excel('/mnt/data/student_performance_iris_style_300.xlsx')
# Reads the Excel dataset into a pandas DataFrame
# Display First 5 Rows
df.head()
# Shows first 5 records to understand dataset structure
📌 Dataset Column Description
- student_id → Unique ID for each student
- study_hours → Average daily study hours
- attendance_pct → Attendance percentage
- assignments_completed → Number of assignments completed
- practice_score → Practice test score
- sleep_hours → Average daily sleep hours
- screen_time → Daily screen time (hours)
- performance_level → Student performance category (Low/Medium/High)
# Dataset Info
df.info()
# Displays column types and non-null values
df.describe()
# Provides statistical summary of numeric columns
📈 5 Key Performance Indicators (KPIs)
# KPI 1: Total Students
total_students = df.shape[0]
# Counts total number of students
# KPI 2: Average Study Hours
avg_study_hours = df['study_hours'].mean()
# Calculates average study hours
# KPI 3: Average Attendance Percentage
avg_attendance = df['attendance_pct'].mean()
# Calculates average attendance
# KPI 4: Average Practice Score
avg_practice_score = df['practice_score'].mean()
# Calculates average practice test score
# KPI 5: Most Common Performance Level
top_performance = df['performance_level'].value_counts().idxmax()
# Finds most frequent performance category
print("Total Students:", total_students)
# Prints total students
print("Average Study Hours:", avg_study_hours)
# Prints average study hours
print("Average Attendance %:", avg_attendance)
# Prints average attendance
print("Average Practice Score:", avg_practice_score)
# Prints average practice score
print("Most Common Performance Level:", top_performance)
# Prints most common performance category
📊 6 Interactive Visualizations (Plotly)
1️⃣ Study Hours vs Practice Score
fig1 = px.scatter(df, x='study_hours', y='practice_score',
color='performance_level',
title='Study Hours vs Practice Score')
# Creates scatter plot showing relation between study hours and score
fig1.show()
# Displays scatter plot
2️⃣ Attendance vs Practice Score
fig2 = px.scatter(df, x='attendance_pct', y='practice_score',
color='performance_level',
title='Attendance vs Practice Score')
# Shows impact of attendance on performance
fig2.show()
# Displays scatter plot
3️⃣ Box Plot – Study Hours by Performance Level
fig3 = px.box(df, x='performance_level', y='study_hours',
title='Study Hours Distribution by Performance Level')
# Compares study hours across performance groups
fig3.show()
# Displays boxplot
4️⃣ Histogram – Practice Score Distribution
fig4 = px.histogram(df, x='practice_score',
color='performance_level',
barmode='overlay',
title='Practice Score Distribution')
# Shows distribution of practice scores
fig4.show()
# Displays histogram
5️⃣ Correlation Heatmap
corr = df.corr(numeric_only=True)
# Computes correlation matrix of numeric variables
fig5 = px.imshow(corr, text_auto=True,
title='Correlation Heatmap')
# Creates heatmap to visualize feature correlations
fig5.show()
# Displays heatmap
6️⃣ Performance Level Distribution
performance_count = df['performance_level'].value_counts().reset_index()
# Counts students in each performance category
performance_count.columns = ['performance_level', 'count']
# Renames columns for clarity
fig6 = px.pie(performance_count,
names='performance_level',
values='count',
title='Performance Level Distribution')
# Creates pie chart showing performance distribution
fig6.show()
# Displays pie chart
✅ Final Conclusion
The Student Performance dataset provides valuable insights into the academic behavior and outcomes of students.
Key Findings:
• Study hours show a positive relationship with practice scores.
• Higher attendance percentage generally correlates with better performance levels.
• Practice score is strongly linked with overall performance category.
• Balanced sleep hours appear to contribute positively to performance.
• Excessive screen time may show inverse trends with academic performance.
The correlation heatmap confirms that:
✔ Study Hours
✔ Attendance
✔ Practice Score
are the most influential academic factors.
This EDA clearly shows that academic success is multi-dimensional — combining effort (study hours), consistency (attendance), practice, and lifestyle factors (sleep & screen time).
The dataset is well-structured and suitable for:
- Predictive Modeling
- Student Risk Identification
- Academic Performance Forecasting
- Institutional Analytics Dashboards
This Student Performance EDA project demonstrates how structured data analysis can transform raw educational data into actionable insights. By systematically exploring the dataset using Python in Google Colab, we were able to uncover meaningful relationships between academic habits, lifestyle patterns, and overall performance levels. The analysis reinforces the importance of combining statistical evaluation with interactive visualization to generate comprehensive understanding.
One of the most significant findings from the project is the positive relationship between study hours and practice scores. Students who consistently dedicate more time to focused study tend to achieve higher practice test results. This indicates that disciplined study behavior remains a fundamental driver of academic success. However, the analysis also suggests that study hours alone are not the only determining factor. Attendance percentage showed a noticeable correlation with practice performance, emphasizing the importance of classroom engagement and consistent participation.
Another valuable insight derived from the correlation heatmap and scatter visualizations is the strong linkage between practice scores and overall performance levels. Practice assessments appear to act as a reliable predictor of final performance classification (Low, Medium, High). This suggests that educational institutions could use practice test metrics as an early warning indicator to identify students who may need additional support.
Lifestyle factors also play an important role. Sleep hours showed patterns indicating that balanced rest contributes positively to performance. Students with extremely low sleep durations often demonstrated inconsistent scores, which aligns with established research on cognitive performance and rest cycles. Similarly, screen time analysis hinted that excessive non-academic screen exposure may negatively influence academic results. While moderate digital usage may not be harmful, excessive screen dependency can potentially reduce study effectiveness and focus levels.
The KPI table provided a clear executive summary of the dataset, offering quick insights such as average study hours, average attendance percentage, average practice score, and the most common performance category. Presenting KPIs in a structured horizontal format enhances readability and supports decision-making, especially for academic administrators who require high-level summaries rather than raw data tables.
The use of Plotly significantly improved the quality of analysis. Unlike static charts, interactive visualizations allow users to hover, filter, zoom, and analyze specific clusters dynamically. For example, scatter plots clearly revealed performance clusters based on study behavior, while box plots highlighted distribution differences across performance levels. The correlation heatmap provided immediate clarity on which variables are strongly associated, enabling quicker identification of influential factors.
From a technical standpoint, this project highlights strong competency in data preprocessing, statistical summarization, KPI extraction, and interactive visualization. It demonstrates practical implementation of Pandas for data manipulation, NumPy for numerical calculations, and Plotly for professional dashboard-style visuals. The structured approach followed in this analysis reflects real-world data analytics workflows used in business intelligence and educational data science.
Beyond technical insights, this project has strategic implications. Educational institutions can leverage such analyses to build early intervention systems. By monitoring attendance, study hours, and practice scores in real time, institutions can identify at-risk students before final assessments. Data-driven academic planning enables targeted mentoring programs, optimized study schedules, and performance improvement strategies.
Furthermore, this dataset can serve as a foundation for predictive modeling. Since performance levels are clearly influenced by measurable variables, the next logical step would be to build classification models using machine learning techniques. Logistic regression, decision trees, or ensemble models could predict performance levels with significant accuracy. Such predictive systems could transform academic support frameworks from reactive to proactive.
In conclusion, this Student Performance EDA project successfully bridges technical analytics skills with practical educational insights. It proves that structured data exploration is not just about numbers and charts—it is about understanding behavior patterns, identifying improvement opportunities, and enabling informed decision-making. The combination of statistical summaries, KPI dashboards, and interactive Plotly visualizations creates a strong analytical foundation that can be expanded into advanced predictive systems.
Ultimately, this project demonstrates how data analytics can empower educational institutions to improve student outcomes, optimize resource allocation, and create more personalized learning environments. By turning raw data into meaningful intelligence, we move one step closer to building smarter, data-driven education systems.
