Case Study (Human Resources Retention):

Pivot Table & GroupBy

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv('hr_satisfaction.csv')

df.head()
Out[1]:
employee_id number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years department salary satisfaction_level last_evaluation
0 1003 2 157 3 0 1 0 sales low 0.38 0.53
1 1005 5 262 6 0 1 0 sales medium 0.80 0.86
2 1486 7 272 4 0 1 0 sales medium 0.11 0.88
3 1038 5 223 5 0 1 0 sales low 0.72 0.87
4 1057 2 159 3 0 1 0 sales low 0.37 0.52

Removing employee ID

In [2]:
df.drop(columns='employee_id', inplace=True)

df.head()
Out[2]:
number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years department salary satisfaction_level last_evaluation
0 2 157 3 0 1 0 sales low 0.38 0.53
1 5 262 6 0 1 0 sales medium 0.80 0.86
2 7 272 4 0 1 0 sales medium 0.11 0.88
3 5 223 5 0 1 0 sales low 0.72 0.87
4 2 159 3 0 1 0 sales low 0.37 0.52

Total number of employees who left the company

    - Pivot Table : to see Total number (All) in the table, add -> margins=True
In [46]:
df.pivot_table(values='left', index='department', aggfunc=np.sum, margins=True)
Out[46]:
left
department
IT 273
RandD 121
accounting 204
hr 215
management 91
marketing 203
product_mng 198
sales 1014
support 555
technical 697
All 3571

In each Department

- number of employees who left the company

    Two ways : 

    1. Pivot Table
    2. GroupBy

1. Pivot Table

In [46]:
df.pivot_table(values='left', index='department', aggfunc=np.sum, margins=True)
Out[46]:
left
department
IT 273
RandD 121
accounting 204
hr 215
management 91
marketing 203
product_mng 198
sales 1014
support 555
technical 697
All 3571

2. GroupBy

In [4]:
df_grouped = df.groupby('department')

df_grouped['left'].sum()
Out[4]:
department
IT              273
RandD           121
accounting      204
hr              215
management       91
marketing       203
product_mng     198
sales          1014
support         555
technical       697
Name: left, dtype: int64

In each Department

- number of

- Work_accident
- average_montly_hours
- last_evaluation   
- left
- number_project    
- promotion_last_5years 
- satisfaction_level    
- time_spend_company

Two ways : 

    1. Pivot Table
    2. GroupBy

1. Pivot Table

In [47]:
df.pivot_table(values = df.loc[:], index='department', aggfunc=np.sum, margins=True)
Out[47]:
Work_accident average_montly_hours last_evaluation left number_project promotion_last_5years satisfaction_level time_spend_company
department
IT 164 248119 879.452250 273 4683 3 758.172830 4256
RandD 134 158030 560.446125 121 3033 27 487.800000 2650
accounting 96 154292 550.706125 204 2934 14 446.682830 2702
hr 89 146828 524.006125 215 2701 15 442.535660 2480
management 103 126787 456.234499 91 2432 69 391.765660 2711
marketing 138 171073 613.946125 203 3164 43 530.622830 3063
product_mng 132 180369 644.662250 198 3434 0 559.195660 3135
sales 587 831773 2938.236749 1014 15634 100 2543.779810 14631
support 345 447490 1611.534499 555 8479 20 1377.908490 7563
technical 381 550793 1961.930624 697 10548 28 1653.372640 9279
All 2169 3015554 10741.155369 3571 57042 319 9191.836409 52470

2. GroupBy

In [56]:
df_grouped = df.groupby('department')

df_grouped.sum()
Out[56]:
number_project average_montly_hours time_spend_company Work_accident left promotion_last_5years satisfaction_level last_evaluation
department
IT 4683 248119 4256 164 273 3 758.17283 879.452250
RandD 3033 158030 2650 134 121 27 487.80000 560.446125
accounting 2934 154292 2702 96 204 14 446.68283 550.706125
hr 2701 146828 2480 89 215 15 442.53566 524.006125
management 2432 126787 2711 103 91 69 391.76566 456.234499
marketing 3164 171073 3063 138 203 43 530.62283 613.946125
product_mng 3434 180369 3135 132 198 0 559.19566 644.662250
sales 15634 831773 14631 587 1014 100 2543.77981 2938.236749
support 8479 447490 7563 345 555 20 1377.90849 1611.534499
technical 10548 550793 9279 381 697 28 1653.37264 1961.930624