The Employee Performance & Training Dataset is a synthetic dataset containing 200 employee records, designed to help learners and professionals practice HR analytics and workforce insights. It includes both numerical variables such as age, years of experience, monthly salary, and training hours, as well as categorical variables like department, job level, education level, and performance rating. This mix makes it ideal for exploring how demographic and professional attributes impact employee performance. The dataset can be used to study salary trends, training effectiveness, performance evaluation, and workforce segmentation, making it a valuable resource for data analysis, visualization, and predictive modeling.
Title: Employee Performance & Training Dataset
- Rows: 200
- Columns: 9
- id (Unique Identifier)
- Numerical:
age,years_experience,monthly_salary,training_hours - Categorical:
department,job_level,education_level,performance_rating
🎯 Purpose of the Dataset
This dataset simulates employee profiles and performance records. It is designed for practicing:
- HR Analytics
- Performance evaluation modeling
- Workforce segmentation
- Training impact analysis
Get the dataset here: https://github.com/slidescope/Employee-Performance-Training-Data-Analysis-Power-BI-dashboard
❓ Example Questions to Solve
- Does training hours correlate with performance rating?
- Which department has the highest average salary?
- Do employees with higher education levels earn more?
- What is the average years of experience per job level?
- Are managers more likely to receive “Excellent” ratings?
- Which department invests most in employee training?
- Is there a relationship between age and performance rating?
Notes for Students
📊 Difference Between Average (Mean) and Median
1. Average (Mean)
The average is the sum of all values divided by the total number of values:
\[ \text{Average (Mean)} = \frac{\text{Sum of all values}}{\text{Number of values}} \]
Example: Numbers = [10, 20, 30, 40, 100]
\[ \text{Average} = \frac{10 + 20 + 30 + 40 + 100}{5} = \frac{200}{5} = 40 \]
👉 The mean is sensitive to outliers (like 100 in this case).
2. Median
The median is the middle value when numbers are arranged in order.
Example: Numbers = [10, 20, 30, 40, 100]
Ordered → [10, 20, 30, 40, 100], middle value = 30 → Median = 30
👉 The median is not affected by outliers.
⚖️ Key Difference
- Average (Mean): Balance point of data, skewed by extreme values.
- Median: Central value, robust against outliers.
💡 Visual Example
💡 Real-Life Example
If most employees earn around $40,000 but one earns $1,000,000:
- The average salary will look very high (skewed).
- The median salary gives a better picture of a “typical” employee.
