Case Study (Human Resources Retention):

Importing & Merging DataFrames

Importing Human Resources data

In [2]:
import pandas as pd
import numpy as np

df_hr = pd.read_csv('human_resource.csv')

df_hr.head()
Out[2]:
employee_id number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years department salary
0 1003 2 157 3 0 1 0 sales low
1 1005 5 262 6 0 1 0 sales medium
2 1486 7 272 4 0 1 0 sales medium
3 1038 5 223 5 0 1 0 sales low
4 1057 2 159 3 0 1 0 sales low

Importing Employees Satisfaction Data

In [3]:
df_es = pd.read_csv('satisfaction.csv')

df_es.head()
Out[3]:
EMPLOYEE # satisfaction_level last_evaluation
0 1003 0.38 0.53
1 1005 0.80 0.86
2 1486 0.11 0.88
3 1038 0.72 0.87
4 1057 0.37 0.52

Number of Rows & Columns (Human Resource)

In [4]:
df_hr.shape
Out[4]:
(14999, 9)

Number of Rows & Columns (Employees Satisfaction)

In [5]:
df_es.shape
Out[5]:
(14999, 3)

Merge DataFrames

- Set employee_id column as an index column
In [6]:
df = df_hr.set_index('employee_id').join(df_es.set_index('EMPLOYEE #'))

df.head()
Out[6]:
number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years department salary satisfaction_level last_evaluation
employee_id
1003 2 157 3 0 1 0 sales low 0.38 0.53
1005 5 262 6 0 1 0 sales medium 0.80 0.86
1486 7 272 4 0 1 0 sales medium 0.11 0.88
1038 5 223 5 0 1 0 sales low 0.72 0.87
1057 2 159 3 0 1 0 sales low 0.37 0.52

Merge DataFrames

- Reset Index
In [7]:
df = df_hr.set_index('employee_id').join(df_es.set_index('EMPLOYEE #'))

df = df.reset_index()

df.head()
Out[7]:
employee_id number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years department salary satisfaction_level last_evaluation
0 1003 2 157 3 0 1 0 sales low 0.38 0.53
1 1005 5 262 6 0 1 0 sales medium 0.80 0.86
2 1486 7 272 4 0 1 0 sales medium 0.11 0.88
3 1038 5 223 5 0 1 0 sales low 0.72 0.87
4 1057 2 159 3 0 1 0 sales low 0.37 0.52

Export merged Dataframes to a new CSV file

In [9]:
df.to_csv('hr_satisfaction', index=False)

df.head()
Out[9]:
employee_id number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years department salary satisfaction_level last_evaluation
0 1003 2 157 3 0 1 0 sales low 0.38 0.53
1 1005 5 262 6 0 1 0 sales medium 0.80 0.86
2 1486 7 272 4 0 1 0 sales medium 0.11 0.88
3 1038 5 223 5 0 1 0 sales low 0.72 0.87
4 1057 2 159 3 0 1 0 sales low 0.37 0.52