Search
drop

Data Science

Deleting rows and columns with missing values:

In [57]:
import numpy as np

import pandas as pd

from pandas import Series, DataFrame

# Creating DataFrame1

# We put np.nan in nan as it is easier to write nan than np.nan 
nan=np.nan

#nan is a missing value in np.array([nan,2,5]) and np.array([nan,nan,nan])  
df1 = DataFrame ({ 'Category': (['Books', 'Computers', 'Home']), 
                   'sales_Number': np.array([nan,3,5]),
                   'Purchase_Date': pd.Timestamp('20200212'),
                   'Customers_number': np.array([nan,nan,nan]) 
            
                 })

df1
Out[57]:
Category sales_Number Purchase_Date Customers_number
0 Books NaN 2020-02-12 NaN
1 Computers 3.0 2020-02-12 NaN
2 Home 5.0 2020-02-12 NaN

Delete Columns with missing values:

Columns with missing values: sales_Number & Customers_number)

df1.dropna(axis=1)

or

df1.dropna(axis=1, how='any')

It drops even the columns with just one missing value like sales_number

In [49]:
df1.dropna(axis=1)
Out[49]:
Category Purchase_Date
0 Books 2020-02-12
1 Computers 2020-02-12
2 Home 2020-02-12
In [58]:
# when you add how=any, the result is same as df1.dropna(axis=1)

df1.dropna(axis=1, how='any')
Out[58]:
Category Purchase_Date
0 Books 2020-02-12
1 Computers 2020-02-12
2 Home 2020-02-12

Delete Columns with missing values in all rows:

Columns with all the rows empty: Customers_number

df1.dropna(axis=1, how='all')

In [51]:
df1.dropna(axis=1, how='all')
Out[51]:
Category sales_Number Purchase_Date
0 Books NaN 2020-02-12
1 Computers 3.0 2020-02-12
2 Home 5.0 2020-02-12

Delete all rows with missing values:

df1.dropna(how='any')

It drops even the rows with just one missing value like row 0 and row 1

In [52]:
#shows the original table

df1
Out[52]:
Category sales_Number Purchase_Date Customers_number
0 Books NaN 2020-02-12 NaN
1 Computers 3.0 2020-02-12 NaN
2 Home 5.0 2020-02-12 NaN
In [53]:
df1.dropna(how='any')
Out[53]:
Category sales_Number Purchase_Date Customers_number

Delete rows with all columns have missing values:

In [54]:
df1.dropna(how='all')
Out[54]:
Category sales_Number Purchase_Date Customers_number
0 Books NaN 2020-02-12 NaN
1 Computers 3.0 2020-02-12 NaN
2 Home 5.0 2020-02-12 NaN

Result: Nothing happens as all the rows have at least one value

Delete rows with a rule : thresh= X

Example:

thresh=4 -> delete all rows with less than 4 value : all rows will be deleted as all rows have a least one missing value

thresh=3 -> delete all rows with less than 3 value like Books (it has two values)

In [55]:
df1.dropna(thresh=4)
Out[55]:
Category sales_Number Purchase_Date Customers_number
In [56]:
df1.dropna(thresh=3)
Out[56]:
Category sales_Number Purchase_Date Customers_number
1 Computers 3.0 2020-02-12 NaN
2 Home 5.0 2020-02-12 NaN