Dataset Overview
- Total Rows: 300 employees
- Total Columns: 18 variables
- Sheet Name:
Sheet1
Get the Dataset here: https://colorstech.net/power-bi/human-resources-hr-employee-performance-dataset/
Columns in the Dataset
- Employee ID – Unique identifier for each employee
- Employee Name – Name of the employee
- Department – Department where the employee works
- Job Role – Specific role or designation
- Joining Date – Date the employee joined the company
- Experience (Years) – Total years of professional experience
- Age – Employee age
- Gender – Gender of the employee
- Work Location – Office / Remote / Hybrid location
- Working Hours (Per Week) – Average weekly working hours
- Performance Rating – Employee performance score
- Salary ($) – Employee salary
- Bonus ($) – Bonus received by the employee
- Training Hours (Per Year) – Training hours completed annually
- Job Satisfaction Score – Employee satisfaction rating
- Leave Days Taken – Number of leave days used
- Remote Work (%) – Percentage of remote work
- Promotion Status – Whether the employee was promoted (Yes/No)
This Dataset is Good For EDA Topics Like
This dataset can support analyses such as:
- Employee performance analysis
- Promotion prediction insights
- Salary vs performance correlation
- Department-wise productivity
- Training impact on performance
- Remote work vs productivity
- Job satisfaction vs retention indicators
Possible KPIs You Can Create
Examples:
- Average Performance Rating
- Promotion Rate
- Average Salary by Department
- Training Hours vs Performance
- Leave Days vs Job Satisfaction
- Remote Work % vs Performance
Python EDA Code – HR Employee Performance Dataset
# Import necessary libraries
import pandas as pd # pandas is used for data manipulation and analysis
import numpy as np # numpy is used for numerical operations
import matplotlib.pyplot as plt # matplotlib is used for plotting graphs
import seaborn as sns # seaborn is used for advanced statistical visualizations# Load the dataset
df = pd.read_excel("HR_Employee_Performance_Dataset.xlsx") # reads the HR dataset from Excel file into a dataframe# Display first 5 rows of the dataset
df.head() # shows the first five rows to understand the dataset structure
Dataset Column Description
| Column | Description |
|---|---|
| Employee ID | Unique identifier for each employee |
| Employee Name | Name of employee |
| Department | Department of the employee |
| Job Role | Employee’s role in organization |
| Joining Date | Date employee joined company |
| Experience (Years) | Total years of experience |
| Age | Age of employee |
| Gender | Gender of employee |
| Work Location | Work type (Office/Remote/Hybrid) |
| Working Hours (Per Week) | Average weekly working hours |
| Performance Rating | Employee performance evaluation score |
| Salary ($) | Annual salary |
| Bonus ($) | Annual bonus |
| Training Hours (Per Year) | Hours of training completed |
| Job Satisfaction Score | Satisfaction rating |
| Leave Days Taken | Number of leave days |
| Remote Work (%) | Percentage of remote work |
| Promotion Status | Promotion indicator (Yes/No) |
Basic Dataset Exploration
df.shape # returns number of rows and columns in the dataset
df.info() # shows data types and non-null values of each column
df.describe() # provides statistical summary of numeric columns
df.isnull().sum() # checks for missing values in each column
Data Cleaning
df.drop_duplicates(inplace=True) # removes duplicate rows from the dataset
df['Joining Date'] = pd.to_datetime(df['Joining Date']) # converts joining date column to datetime format
KPI Calculations
KPI 1 – Average Performance Rating
avg_performance = df['Performance Rating'].mean() # calculates the average performance rating of employees
print("Average Performance Rating:", avg_performance) # prints the average performance rating KPI
KPI 2 – Average Salary
avg_salary = df['Salary ($)'].mean() # calculates average employee salary
print("Average Salary:", avg_salary) # displays the average salary KPI
KPI 3 – Promotion Rate
promotion_rate = (df['Promotion Status'].value_counts(normalize=True)['Yes'])*100 # calculates percentage of promoted employees
print("Promotion Rate (%):", promotion_rate) # prints the promotion rate KPI
KPI 4 – Average Job Satisfaction Score
avg_satisfaction = df['Job Satisfaction Score'].mean() # calculates average employee satisfaction score
print("Average Job Satisfaction:", avg_satisfaction) # displays satisfaction KPI
KPI 5 – Average Training Hours
avg_training = df['Training Hours (Per Year)'].mean() # calculates average training hours per employee
print("Average Training Hours:", avg_training) # prints training hours KPI
Data Visualization (Seaborn)
Visualization 1 – Performance Rating Distribution
plt.figure(figsize=(8,5)) # sets the figure size for the plot
sns.histplot(df['Performance Rating'], kde=True) # plots distribution of performance ratings
plt.title("Employee Performance Rating Distribution") # sets title of the plot
plt.show() # displays the plot
Visualization 2 – Department vs Average Salary
plt.figure(figsize=(10,6)) # defines size of visualization
sns.barplot(x='Department', y='Salary ($)', data=df) # creates bar chart of department vs salary
plt.xticks(rotation=45) # rotates department labels for readability
plt.title("Average Salary by Department") # adds chart title
plt.show() # renders the visualization
Visualization 3 – Experience vs Performance Rating
plt.figure(figsize=(8,5)) # sets size of scatter plot
sns.scatterplot(x='Experience (Years)', y='Performance Rating', data=df) # plots relationship between experience and performance
plt.title("Experience vs Performance Rating") # adds plot title
plt.show() # displays the scatter plot
Visualization 4 – Job Satisfaction by Department
plt.figure(figsize=(10,6)) # defines visualization size
sns.boxplot(x='Department', y='Job Satisfaction Score', data=df) # shows satisfaction distribution by department
plt.xticks(rotation=45) # rotates labels for clarity
plt.title("Job Satisfaction Across Departments") # adds visualization title
plt.show() # renders the chart
The box plot shows the distribution of job satisfaction scores across different departments including Sales, Operations, HR, Marketing, Finance, and IT. Finance and IT departments appear to have relatively higher median satisfaction levels compared to other departments, indicating generally better employee satisfaction. Sales and HR show slightly lower median scores, suggesting moderate satisfaction levels. Most departments have a wide spread of scores, meaning employee satisfaction varies significantly within each department. Some departments also show lower minimum values, indicating a few employees with very low satisfaction. Overall, while satisfaction exists across all departments, Finance and IT seem to maintain slightly stronger employee morale and satisfaction compared to others.
Visualization 5 – Remote Work vs Performance Rating
plt.figure(figsize=(8,5)) # sets plot size
sns.scatterplot(x='Remote Work (%)', y='Performance Rating', data=df) # plots relationship between remote work and performance
plt.title("Remote Work Percentage vs Performance Rating") # adds chart title
plt.show() # displays visualization
Correlation Analysis
plt.figure(figsize=(10,8)) # sets figure size for correlation heatmap
sns.heatmap(df.corr(numeric_only=True), annot=True) # visualizes correlations between numerical variables
plt.title("Correlation Matrix of HR Dataset") # adds heatmap title
plt.show() # displays heatmap
Final EDA Conclusion
Based on the exploratory analysis of the HR Employee Performance Dataset, several important insights can be observed. The average performance rating provides an overall indicator of employee productivity across the organization. Salary distribution across departments highlights how compensation varies depending on roles and responsibilities. Promotion rates indicate the proportion of employees who progress in their careers, reflecting internal growth opportunities.
The analysis also shows that employee experience tends to positively influence performance ratings, suggesting that experienced employees contribute significantly to productivity. Job satisfaction levels vary across departments, which could indicate differences in management practices, workload distribution, or workplace culture. Training hours and professional development programs may also play an important role in improving employee performance and career advancement.
Additionally, the relationship between remote work percentage and performance provides insights into modern workplace flexibility and productivity. Some employees maintain strong performance even with higher remote work percentages, indicating the effectiveness of hybrid or remote work environments.
Overall, this EDA helps HR managers and business leaders understand workforce performance trends, identify factors affecting employee satisfaction and productivity, and make data-driven decisions regarding promotions, training programs, and organizational policies.
