Case Study (Human Resources Retention):¶

Pivot Table & GroupBy¶

In [1]:

import pandas as pd
import numpy as np

df = pd.read_csv('hr_satisfaction.csv')

df.head()

Out[1]:

	employee_id	number_project	average_montly_hours	time_spend_company	left	department	salary	satisfaction_level	last_evaluation
0	1003	2	157	3	1	sales	low	0.38	0.53
1	1005	5	262	6	1	sales	medium	0.80	0.86
2	1486	7	272	4	1	sales	medium	0.11	0.88
3	1038	5	223	5	1	sales	low	0.72	0.87
4	1057	2	159	3	1	sales	low	0.37	0.52

Removing employee ID¶

In [2]:

df.drop(columns='employee_id', inplace=True)

df.head()

Out[2]:

	number_project	average_montly_hours	time_spend_company	left	department	salary	satisfaction_level	last_evaluation
0	2	157	3	1	sales	low	0.38	0.53
1	5	262	6	1	sales	medium	0.80	0.86
2	7	272	4	1	sales	medium	0.11	0.88
3	5	223	5	1	sales	low	0.72	0.87
4	2	159	3	1	sales	low	0.37	0.52

Total number of employees who left the company¶

    - Pivot Table : to see Total number (All) in the table, add -> margins=True

In [46]:

df.pivot_table(values='left', index='department', aggfunc=np.sum, margins=True)

Out[46]:

	left
department
IT	273
RandD	121
accounting	204
hr	215
management	91
marketing	203
product_mng	198
sales	1014
support	555
technical	697
All	3571

In each Department¶

- number of employees who left the company¶

    Two ways : 

    1. Pivot Table
    2. GroupBy

1. Pivot Table¶

In [46]:

df.pivot_table(values='left', index='department', aggfunc=np.sum, margins=True)

Out[46]:

	left
department
IT	273
RandD	121
accounting	204
hr	215
management	91
marketing	203
product_mng	198
sales	1014
support	555
technical	697
All	3571

2. GroupBy¶

In [4]:

df_grouped = df.groupby('department')

df_grouped['left'].sum()

Out[4]:

department
IT              273
RandD           121
accounting      204
hr              215
management       91
marketing       203
product_mng     198
sales          1014
support         555
technical       697
Name: left, dtype: int64

In each Department¶

- number of¶

- Work_accident
- average_montly_hours
- last_evaluation   
- left
- number_project    
- promotion_last_5years 
- satisfaction_level    
- time_spend_company

Two ways : 

    1. Pivot Table
    2. GroupBy

1. Pivot Table¶

In [47]:

df.pivot_table(values = df.loc[:], index='department', aggfunc=np.sum, margins=True)

Out[47]:

	Work_accident	average_montly_hours	last_evaluation	left	number_project	promotion_last_5years	satisfaction_level	time_spend_company
department
IT	164	248119	879.452250	273	4683	3	758.172830	4256
RandD	134	158030	560.446125	121	3033	27	487.800000	2650
accounting	96	154292	550.706125	204	2934	14	446.682830	2702
hr	89	146828	524.006125	215	2701	15	442.535660	2480
management	103	126787	456.234499	91	2432	69	391.765660	2711
marketing	138	171073	613.946125	203	3164	43	530.622830	3063
product_mng	132	180369	644.662250	198	3434	0	559.195660	3135
sales	587	831773	2938.236749	1014	15634	100	2543.779810	14631
support	345	447490	1611.534499	555	8479	20	1377.908490	7563
technical	381	550793	1961.930624	697	10548	28	1653.372640	9279
All	2169	3015554	10741.155369	3571	57042	319	9191.836409	52470

2. GroupBy¶

In [56]:

df_grouped = df.groupby('department')

df_grouped.sum()

Out[56]:

	number_project	average_montly_hours	time_spend_company	Work_accident	left	promotion_last_5years	satisfaction_level	last_evaluation
department
IT	4683	248119	4256	164	273	3	758.17283	879.452250
RandD	3033	158030	2650	134	121	27	487.80000	560.446125
accounting	2934	154292	2702	96	204	14	446.68283	550.706125
hr	2701	146828	2480	89	215	15	442.53566	524.006125
management	2432	126787	2711	103	91	69	391.76566	456.234499
marketing	3164	171073	3063	138	203	43	530.62283	613.946125
product_mng	3434	180369	3135	132	198	0	559.19566	644.662250
sales	15634	831773	14631	587	1014	100	2543.77981	2938.236749
support	8479	447490	7563	345	555	20	1377.90849	1611.534499
technical	10548	550793	9279	381	697	28	1653.37264	1961.930624