Data Science
Deleting rows and columns with missing values:¶
In [57]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
# Creating DataFrame1
# We put np.nan in nan as it is easier to write nan than np.nan
nan=np.nan
#nan is a missing value in np.array([nan,2,5]) and np.array([nan,nan,nan])
df1 = DataFrame ({ 'Category': (['Books', 'Computers', 'Home']),
'sales_Number': np.array([nan,3,5]),
'Purchase_Date': pd.Timestamp('20200212'),
'Customers_number': np.array([nan,nan,nan])
})
df1
Out[57]:
In [49]:
df1.dropna(axis=1)
Out[49]:
In [58]:
# when you add how=any, the result is same as df1.dropna(axis=1)
df1.dropna(axis=1, how='any')
Out[58]:
In [51]:
df1.dropna(axis=1, how='all')
Out[51]:
In [52]:
#shows the original table
df1
Out[52]:
In [53]:
df1.dropna(how='any')
Out[53]:
Delete rows with all columns have missing values:¶
In [54]:
df1.dropna(how='all')
Out[54]:
Result: Nothing happens as all the rows have at least one value¶
Delete rows with a rule : thresh= X¶
Example:
thresh=4 -> delete all rows with less than 4 value : all rows will be deleted as all rows have a least one missing value
thresh=3 -> delete all rows with less than 3 value like Books (it has two values)
In [55]:
df1.dropna(thresh=4)
Out[55]:
In [56]:
df1.dropna(thresh=3)
Out[56]: