In this lesson we use join by Category for two distinc situations:
then we learn how to join when Column names are different in Dataframes.
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
# Creating DataFrame1
nan=np.nan
df1 = DataFrame ({ 'Category': (['Books', 'Computers', 'Home']), 'sales_Number': np.array([1,3,5])})
df1
# Creating DataFrame2
nan=np.nan
df2 = DataFrame ({ 'Category': (['Books', 'Computers', 'Home']), 'sales_Number': np.array([4,7,9])})
df2
# Creating DataFrame3
nan=np.nan
df3 = DataFrame ({ 'Category': (['Clothes', 'Computers', 'Home']), 'sales_Number': np.array([10,30,50])})
df3
# Creating DataFrame4
nan=np.nan
df4 = DataFrame ({ 'Categories': (['Books', 'Computers', 'Home']), 'sales_Number': np.array([10,30,50])})
df4
# Left Join by Category (df1 , df2)(same data in category)
pd.merge(df1, df2, on='Category', how='left')
# Right Join by Category (df1 , df2)(same data in category)
pd.merge(df1, df2, on='Category', how='right')
# Outer Join by Category (df1 , df2)(same data in category)
pd.merge(df1, df2, on='Category', how='outer')
When values of Category for both DataFrame is the same, left join and right join and outer join will have the same result
sales_number_x are values for first dataframe(df1) & sales_number_y are values for second dataframe(df2)
# Left Join by Category (df1 , df3)(not same data in category)
pd.merge(df1, df3, on='Category', how='left')
Since second DataFrame does not have Books in its category, it is empty(has missing value)
# Right Join by Category (df1 , df3)(not same data in category)
pd.merge(df1, df3, on='Category', how='right')
Since First DataFrame does not have Clothes in its category, it is empty(has missing value)
# Outer Join by Category (df1 , df3)(not same data in category)
pd.merge(df1, df3, on='Category', how='outer')
Shows all values in Category for both DataFrame df1 & df3
pd.merge(df1, df4, left_on='Category', right_on='Categories')