Tags:

Dataset Overview

  • Total Rows: 300 employees
  • Total Columns: 18 variables
  • Sheet Name: Sheet1

Get the Dataset here: https://colorstech.net/power-bi/human-resources-hr-employee-performance-dataset/

Columns in the Dataset

  1. Employee ID – Unique identifier for each employee
  2. Employee Name – Name of the employee
  3. Department – Department where the employee works
  4. Job Role – Specific role or designation
  5. Joining Date – Date the employee joined the company
  6. Experience (Years) – Total years of professional experience
  7. Age – Employee age
  8. Gender – Gender of the employee
  9. Work Location – Office / Remote / Hybrid location
  10. Working Hours (Per Week) – Average weekly working hours
  11. Performance Rating – Employee performance score
  12. Salary ($) – Employee salary
  13. Bonus ($) – Bonus received by the employee
  14. Training Hours (Per Year) – Training hours completed annually
  15. Job Satisfaction Score – Employee satisfaction rating
  16. Leave Days Taken – Number of leave days used
  17. Remote Work (%) – Percentage of remote work
  18. Promotion Status – Whether the employee was promoted (Yes/No)

This Dataset is Good For EDA Topics Like

This dataset can support analyses such as:

  • Employee performance analysis
  • Promotion prediction insights
  • Salary vs performance correlation
  • Department-wise productivity
  • Training impact on performance
  • Remote work vs productivity
  • Job satisfaction vs retention indicators

Possible KPIs You Can Create

Examples:

  • Average Performance Rating
  • Promotion Rate
  • Average Salary by Department
  • Training Hours vs Performance
  • Leave Days vs Job Satisfaction
  • Remote Work % vs Performance

Python EDA Code – HR Employee Performance Dataset

# Import necessary libraries
import pandas as pd # pandas is used for data manipulation and analysis
import numpy as np # numpy is used for numerical operations
import matplotlib.pyplot as plt # matplotlib is used for plotting graphs
import seaborn as sns # seaborn is used for advanced statistical visualizations# Load the dataset
df = pd.read_excel("HR_Employee_Performance_Dataset.xlsx") # reads the HR dataset from Excel file into a dataframe# Display first 5 rows of the dataset
df.head() # shows the first five rows to understand the dataset structure

Dataset Column Description

ColumnDescription
Employee IDUnique identifier for each employee
Employee NameName of employee
DepartmentDepartment of the employee
Job RoleEmployee’s role in organization
Joining DateDate employee joined company
Experience (Years)Total years of experience
AgeAge of employee
GenderGender of employee
Work LocationWork type (Office/Remote/Hybrid)
Working Hours (Per Week)Average weekly working hours
Performance RatingEmployee performance evaluation score
Salary ($)Annual salary
Bonus ($)Annual bonus
Training Hours (Per Year)Hours of training completed
Job Satisfaction ScoreSatisfaction rating
Leave Days TakenNumber of leave days
Remote Work (%)Percentage of remote work
Promotion StatusPromotion indicator (Yes/No)

Basic Dataset Exploration

df.shape  # returns number of rows and columns in the dataset
df.info()  # shows data types and non-null values of each column
df.describe()  # provides statistical summary of numeric columns
df.isnull().sum()  # checks for missing values in each column

Data Cleaning

df.drop_duplicates(inplace=True)  # removes duplicate rows from the dataset
df['Joining Date'] = pd.to_datetime(df['Joining Date'])  # converts joining date column to datetime format

KPI Calculations

KPI 1 – Average Performance Rating

avg_performance = df['Performance Rating'].mean()  # calculates the average performance rating of employees
print("Average Performance Rating:", avg_performance) # prints the average performance rating KPI

KPI 2 – Average Salary

avg_salary = df['Salary ($)'].mean()  # calculates average employee salary
print("Average Salary:", avg_salary) # displays the average salary KPI

KPI 3 – Promotion Rate

promotion_rate = (df['Promotion Status'].value_counts(normalize=True)['Yes'])*100  # calculates percentage of promoted employees
print("Promotion Rate (%):", promotion_rate) # prints the promotion rate KPI

KPI 4 – Average Job Satisfaction Score

avg_satisfaction = df['Job Satisfaction Score'].mean()  # calculates average employee satisfaction score
print("Average Job Satisfaction:", avg_satisfaction) # displays satisfaction KPI

KPI 5 – Average Training Hours

avg_training = df['Training Hours (Per Year)'].mean()  # calculates average training hours per employee
print("Average Training Hours:", avg_training) # prints training hours KPI

Data Visualization (Seaborn)

Visualization 1 – Performance Rating Distribution

plt.figure(figsize=(8,5))  # sets the figure size for the plot
sns.histplot(df['Performance Rating'], kde=True) # plots distribution of performance ratings
plt.title("Employee Performance Rating Distribution") # sets title of the plot
plt.show() # displays the plot

Visualization 2 – Department vs Average Salary

plt.figure(figsize=(10,6))  # defines size of visualization
sns.barplot(x='Department', y='Salary ($)', data=df) # creates bar chart of department vs salary
plt.xticks(rotation=45) # rotates department labels for readability
plt.title("Average Salary by Department") # adds chart title
plt.show() # renders the visualization

Visualization 3 – Experience vs Performance Rating

plt.figure(figsize=(8,5))  # sets size of scatter plot
sns.scatterplot(x='Experience (Years)', y='Performance Rating', data=df) # plots relationship between experience and performance
plt.title("Experience vs Performance Rating") # adds plot title
plt.show() # displays the scatter plot

Visualization 4 – Job Satisfaction by Department

plt.figure(figsize=(10,6))  # defines visualization size
sns.boxplot(x='Department', y='Job Satisfaction Score', data=df) # shows satisfaction distribution by department
plt.xticks(rotation=45) # rotates labels for clarity
plt.title("Job Satisfaction Across Departments") # adds visualization title
plt.show() # renders the chart

The box plot shows the distribution of job satisfaction scores across different departments including Sales, Operations, HR, Marketing, Finance, and IT. Finance and IT departments appear to have relatively higher median satisfaction levels compared to other departments, indicating generally better employee satisfaction. Sales and HR show slightly lower median scores, suggesting moderate satisfaction levels. Most departments have a wide spread of scores, meaning employee satisfaction varies significantly within each department. Some departments also show lower minimum values, indicating a few employees with very low satisfaction. Overall, while satisfaction exists across all departments, Finance and IT seem to maintain slightly stronger employee morale and satisfaction compared to others.

Visualization 5 – Remote Work vs Performance Rating

plt.figure(figsize=(8,5))  # sets plot size
sns.scatterplot(x='Remote Work (%)', y='Performance Rating', data=df) # plots relationship between remote work and performance
plt.title("Remote Work Percentage vs Performance Rating") # adds chart title
plt.show() # displays visualization

Correlation Analysis

plt.figure(figsize=(10,8))  # sets figure size for correlation heatmap
sns.heatmap(df.corr(numeric_only=True), annot=True) # visualizes correlations between numerical variables
plt.title("Correlation Matrix of HR Dataset") # adds heatmap title
plt.show() # displays heatmap

Final EDA Conclusion

Based on the exploratory analysis of the HR Employee Performance Dataset, several important insights can be observed. The average performance rating provides an overall indicator of employee productivity across the organization. Salary distribution across departments highlights how compensation varies depending on roles and responsibilities. Promotion rates indicate the proportion of employees who progress in their careers, reflecting internal growth opportunities.

The analysis also shows that employee experience tends to positively influence performance ratings, suggesting that experienced employees contribute significantly to productivity. Job satisfaction levels vary across departments, which could indicate differences in management practices, workload distribution, or workplace culture. Training hours and professional development programs may also play an important role in improving employee performance and career advancement.

Additionally, the relationship between remote work percentage and performance provides insights into modern workplace flexibility and productivity. Some employees maintain strong performance even with higher remote work percentages, indicating the effectiveness of hybrid or remote work environments.

Overall, this EDA helps HR managers and business leaders understand workforce performance trends, identify factors affecting employee satisfaction and productivity, and make data-driven decisions regarding promotions, training programs, and organizational policies.