Dealing with Missing Data¶
1. Checking Missing Values -> df.isnull().sum()
2. Decide what to do with Missing Values :
- A. Remove them using -> .dropna(axis=0)
- axis {0 or ‘index’, 1 or ‘columns’}, default 0
0, or ‘index’ : Drop rows which contain missing values
1, or ‘columns’ : Drop columns which contain missing value
- B. Replace them with average of column
DataFrame.fillna(df.mean(), inplace=True)
- C. Replace them with zeros, or Forward Fill (ffill) or Back Fill (backfill)
1.Replace with zeros
f['number_of_fires'].fillna(0)
2. Replace with Forward Fill (ffill)
Forward Fill df.ffill(axis = 0)
-> any missing value is filled with value in the previous row
Forward Fill df.ffill(axis = 1)
-> any missing value is filled with value in the previous column
3. Replace with Back Fill (backfill)
- Back Fill df.bfill(axis = 0)
any missing value is filled with value in the next row
- Back Fill df.bfill(axis = 1)
any missing value is filled with value in the next column
- Here, we dropped Missing Values:
df.dropna(axis=0)