Case Study (Wine Quality) : Checking Duplicate Values

In [110]:
import pandas as pd
import numpy as np

df_red = pd.read_csv('red.csv')

df_white = pd.read_csv('white.csv')

df = pd.read_csv('wine.csv')

Number of duplicate rows in combined datasets

In [113]:
len(df)-len(df.drop_duplicates())
Out[113]:
0

Number of duplicate rows in the red wine dataset

In [111]:
len(df_red)-len(df_red.drop_duplicates())
Out[111]:
0

Number of duplicate rows in the white wine dataset

In [112]:
len(df_white)-len(df_white.drop_duplicates())
Out[112]:
0

Remove Duplicates in dataset

-Following example is for learning how to remove Duplicates in dataset
-We do not have Duplicates in our datasets

Example: Removing Duplicates in dataset

In [115]:
df = df.drop_duplicates()


# After you removed Duplicates, you can export it to existing or new file. Here I exported to existing file.

df.to_csv('wine.csv')