Categories: Python Pandas Tutorial

Tags:

HR Employee Performance Dataset – EDA With Python Libraries

Dataset Overview

Total Rows: 300 employees
Total Columns: 18 variables
Sheet Name: Sheet1

Get the Dataset here: https://colorstech.net/power-bi/human-resources-hr-employee-performance-dataset/

Columns in the Dataset

Employee ID – Unique identifier for each employee
Employee Name – Name of the employee
Department – Department where the employee works
Job Role – Specific role or designation
Joining Date – Date the employee joined the company
Experience (Years) – Total years of professional experience
Age – Employee age
Gender – Gender of the employee
Work Location – Office / Remote / Hybrid location
Working Hours (Per Week) – Average weekly working hours
Performance Rating – Employee performance score
Salary ($) – Employee salary
Bonus ($) – Bonus received by the employee
Training Hours (Per Year) – Training hours completed annually
Job Satisfaction Score – Employee satisfaction rating
Leave Days Taken – Number of leave days used
Remote Work (%) – Percentage of remote work
Promotion Status – Whether the employee was promoted (Yes/No)

This Dataset is Good For EDA Topics Like

This dataset can support analyses such as:

Employee performance analysis
Promotion prediction insights
Salary vs performance correlation
Department-wise productivity
Training impact on performance
Remote work vs productivity
Job satisfaction vs retention indicators

Possible KPIs You Can Create

Examples:

Average Performance Rating
Promotion Rate
Average Salary by Department
Training Hours vs Performance
Leave Days vs Job Satisfaction
Remote Work % vs Performance

Python EDA Code – HR Employee Performance Dataset

# Import necessary libraries
import pandas as pd  # pandas is used for data manipulation and analysis
import numpy as np  # numpy is used for numerical operations
import matplotlib.pyplot as plt  # matplotlib is used for plotting graphs
import seaborn as sns  # seaborn is used for advanced statistical visualizations# Load the dataset
df = pd.read_excel("HR_Employee_Performance_Dataset.xlsx")  # reads the HR dataset from Excel file into a dataframe# Display first 5 rows of the dataset
df.head()  # shows the first five rows to understand the dataset structure

Dataset Column Description

Column	Description
Employee ID	Unique identifier for each employee
Employee Name	Name of employee
Department	Department of the employee
Job Role	Employee’s role in organization
Joining Date	Date employee joined company
Experience (Years)	Total years of experience
Age	Age of employee
Gender	Gender of employee
Work Location	Work type (Office/Remote/Hybrid)
Working Hours (Per Week)	Average weekly working hours
Performance Rating	Employee performance evaluation score
Salary ($)	Annual salary
Bonus ($)	Annual bonus
Training Hours (Per Year)	Hours of training completed
Job Satisfaction Score	Satisfaction rating
Leave Days Taken	Number of leave days
Remote Work (%)	Percentage of remote work
Promotion Status	Promotion indicator (Yes/No)

Basic Dataset Exploration

df.shape  # returns number of rows and columns in the dataset

df.info()  # shows data types and non-null values of each column

df.describe()  # provides statistical summary of numeric columns

df.isnull().sum()  # checks for missing values in each column

Data Cleaning

df.drop_duplicates(inplace=True)  # removes duplicate rows from the dataset

df['Joining Date'] = pd.to_datetime(df['Joining Date'])  # converts joining date column to datetime format

KPI Calculations

KPI 1 – Average Performance Rating

avg_performance = df['Performance Rating'].mean()  # calculates the average performance rating of employees
print("Average Performance Rating:", avg_performance)  # prints the average performance rating KPI

KPI 2 – Average Salary

avg_salary = df['Salary ($)'].mean()  # calculates average employee salary
print("Average Salary:", avg_salary)  # displays the average salary KPI

KPI 3 – Promotion Rate

promotion_rate = (df['Promotion Status'].value_counts(normalize=True)['Yes'])*100  # calculates percentage of promoted employees
print("Promotion Rate (%):", promotion_rate)  # prints the promotion rate KPI

KPI 4 – Average Job Satisfaction Score

avg_satisfaction = df['Job Satisfaction Score'].mean()  # calculates average employee satisfaction score
print("Average Job Satisfaction:", avg_satisfaction)  # displays satisfaction KPI

KPI 5 – Average Training Hours

avg_training = df['Training Hours (Per Year)'].mean()  # calculates average training hours per employee
print("Average Training Hours:", avg_training)  # prints training hours KPI

Data Visualization (Seaborn)

Visualization 1 – Performance Rating Distribution

plt.figure(figsize=(8,5))  # sets the figure size for the plot
sns.histplot(df['Performance Rating'], kde=True)  # plots distribution of performance ratings
plt.title("Employee Performance Rating Distribution")  # sets title of the plot
plt.show()  # displays the plot

Visualization 2 – Department vs Average Salary

plt.figure(figsize=(10,6))  # defines size of visualization
sns.barplot(x='Department', y='Salary ($)', data=df)  # creates bar chart of department vs salary
plt.xticks(rotation=45)  # rotates department labels for readability
plt.title("Average Salary by Department")  # adds chart title
plt.show()  # renders the visualization

Visualization 3 – Experience vs Performance Rating

plt.figure(figsize=(8,5))  # sets size of scatter plot
sns.scatterplot(x='Experience (Years)', y='Performance Rating', data=df)  # plots relationship between experience and performance
plt.title("Experience vs Performance Rating")  # adds plot title
plt.show()  # displays the scatter plot

Visualization 4 – Job Satisfaction by Department

plt.figure(figsize=(10,6))  # defines visualization size
sns.boxplot(x='Department', y='Job Satisfaction Score', data=df)  # shows satisfaction distribution by department
plt.xticks(rotation=45)  # rotates labels for clarity
plt.title("Job Satisfaction Across Departments")  # adds visualization title
plt.show()  # renders the chart

The box plot shows the distribution of job satisfaction scores across different departments including Sales, Operations, HR, Marketing, Finance, and IT. Finance and IT departments appear to have relatively higher median satisfaction levels compared to other departments, indicating generally better employee satisfaction. Sales and HR show slightly lower median scores, suggesting moderate satisfaction levels. Most departments have a wide spread of scores, meaning employee satisfaction varies significantly within each department. Some departments also show lower minimum values, indicating a few employees with very low satisfaction. Overall, while satisfaction exists across all departments, Finance and IT seem to maintain slightly stronger employee morale and satisfaction compared to others.

Visualization 5 – Remote Work vs Performance Rating

plt.figure(figsize=(8,5))  # sets plot size
sns.scatterplot(x='Remote Work (%)', y='Performance Rating', data=df)  # plots relationship between remote work and performance
plt.title("Remote Work Percentage vs Performance Rating")  # adds chart title
plt.show()  # displays visualization

Correlation Analysis

plt.figure(figsize=(10,8))  # sets figure size for correlation heatmap
sns.heatmap(df.corr(numeric_only=True), annot=True)  # visualizes correlations between numerical variables
plt.title("Correlation Matrix of HR Dataset")  # adds heatmap title
plt.show()  # displays heatmap

Final EDA Conclusion

Based on the exploratory analysis of the HR Employee Performance Dataset, several important insights can be observed. The average performance rating provides an overall indicator of employee productivity across the organization. Salary distribution across departments highlights how compensation varies depending on roles and responsibilities. Promotion rates indicate the proportion of employees who progress in their careers, reflecting internal growth opportunities.

The analysis also shows that employee experience tends to positively influence performance ratings, suggesting that experienced employees contribute significantly to productivity. Job satisfaction levels vary across departments, which could indicate differences in management practices, workload distribution, or workplace culture. Training hours and professional development programs may also play an important role in improving employee performance and career advancement.

Additionally, the relationship between remote work percentage and performance provides insights into modern workplace flexibility and productivity. Some employees maintain strong performance even with higher remote work percentages, indicating the effectiveness of hybrid or remote work environments.

Overall, this EDA helps HR managers and business leaders understand workforce performance trends, identify factors affecting employee satisfaction and productivity, and make data-driven decisions regarding promotions, training programs, and organizational policies.