Pandas - Part4 (Merging/Joining DataFrames)

Join (Left Join, Right Join, Outer Join)

In this lesson we use join by Category for two distinc situations:

  1. when values for the Category is the same for both Dataframe
  1. when values for the Category is not completely the same for Dataframes

then we learn how to join when Column names are different in Dataframes.

Import NUMPY, PANDAS, Series & DataFrame

In [77]:
import numpy as np

import pandas as pd

from pandas import Series, DataFrame

Creating DataFrame 1

In [78]:
# Creating DataFrame1

nan=np.nan

df1 = DataFrame ({ 'Category': (['Books', 'Computers', 'Home']), 'sales_Number': np.array([1,3,5])})

df1
Out[78]:
Category sales_Number
0 Books 1
1 Computers 3
2 Home 5

Creating DataFrame 2

In [79]:
# Creating DataFrame2

nan=np.nan

df2 = DataFrame ({ 'Category': (['Books', 'Computers', 'Home']), 'sales_Number': np.array([4,7,9])})

df2
Out[79]:
Category sales_Number
0 Books 4
1 Computers 7
2 Home 9

Creating DataFrame3

In [80]:
# Creating DataFrame3

nan=np.nan

df3 = DataFrame ({ 'Category': (['Clothes', 'Computers', 'Home']), 'sales_Number': np.array([10,30,50])})

df3
Out[80]:
Category sales_Number
0 Clothes 10
1 Computers 30
2 Home 50

Creating DataFrame4

In [90]:
# Creating DataFrame4

nan=np.nan

df4 = DataFrame ({ 'Categories': (['Books', 'Computers', 'Home']), 'sales_Number': np.array([10,30,50])})

df4
Out[90]:
Categories sales_Number
0 Books 10
1 Computers 30
2 Home 50

Join by Category: (Left Join, Right Join, Outer Join)

df1 Category: Books, Computers, Home

df2 Category: Books, Computers, Home

df3 Category: Clothes, Computers, Home

df4 Categories: Books, Computers, Home

Join by Category: (Left Join, Right Join, Outer Join) (same values)

In [82]:
# Left Join by Category (df1 , df2)(same data in category)

pd.merge(df1, df2, on='Category', how='left')
Out[82]:
Category sales_Number_x sales_Number_y
0 Books 1 4
1 Computers 3 7
2 Home 5 9
In [83]:
# Right Join by Category (df1 , df2)(same data in category)

pd.merge(df1, df2, on='Category', how='right')
Out[83]:
Category sales_Number_x sales_Number_y
0 Books 1 4
1 Computers 3 7
2 Home 5 9
In [84]:
# Outer Join by Category (df1 , df2)(same data in category)

pd.merge(df1, df2, on='Category', how='outer')
Out[84]:
Category sales_Number_x sales_Number_y
0 Books 1 4
1 Computers 3 7
2 Home 5 9

Result: Shows values of Category for both DataFrame

When values of Category for both DataFrame is the same, left join and right join and outer join will have the same result

sales_number_x are values for first dataframe(df1) & sales_number_y are values for second dataframe(df2)

Join by Category: (Left Join, Right Join, Outer Join) (not same values)

In [85]:
# Left Join by Category (df1 , df3)(not same data in category)

pd.merge(df1, df3, on='Category', how='left')
Out[85]:
Category sales_Number_x sales_Number_y
0 Books 1 NaN
1 Computers 3 30.0
2 Home 5 50.0

Result:

Since second DataFrame does not have Books in its category, it is empty(has missing value)

In [86]:
# Right Join by Category (df1 , df3)(not same data in category)

pd.merge(df1, df3, on='Category', how='right')
Out[86]:
Category sales_Number_x sales_Number_y
0 Computers 3.0 30
1 Home 5.0 50
2 Clothes NaN 10

Result:

Since First DataFrame does not have Clothes in its category, it is empty(has missing value)

In [87]:
# Outer Join by Category (df1 , df3)(not same data in category)

pd.merge(df1, df3, on='Category', how='outer')
Out[87]:
Category sales_Number_x sales_Number_y
0 Books 1.0 NaN
1 Computers 3.0 30.0
2 Home 5.0 50.0
3 Clothes NaN 10.0

Result:

Shows all values in Category for both DataFrame df1 & df3

Different Column Name:

In [91]:
pd.merge(df1, df4, left_on='Category', right_on='Categories')
Out[91]:
Category sales_Number_x Categories sales_Number_y
0 Books 1 Books 10
1 Computers 3 Computers 30
2 Home 5 Home 50