Pandas dataframe.replace() function:

  • replace a string, regex, list, dictionary, series, number etc.

df.replace(to_replace ="x",value ="y") -> replacing one value x with y

df.replace(to_replace =["x", "Z"],value ="y") -> replacing more than one value x & z with y

df.replace(to_replace =np.nan,value ="x") -> replacing missing value with x

  • No need to write to_'replace =' an 'value ='

df.replace("x","y") -> replacing one value x with y

df.replace(["x", "Z"],"y") -> replacing more than one value x & z with y

df.replace(np.nan,"x") -> replacing missing value with x

In [36]:
import numpy as np
import pandas as pd 

df = pd.read_csv("bank.csv", sep=';') 

# Showing first 50 rows
df[:5]
Out[36]:
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
0 30 unemployed married primary no 1787 no no cellular 19 oct 79 1 -1 0 unknown no
1 33 services NaN secondary no 4789 yes yes cellular 11 may 220 1 339 4 failure no
2 35 management single tertiary no 1350 yes no cellular 16 apr 185 1 330 1 failure no
3 30 management married tertiary no 1476 yes yes unknown 3 jun 199 4 -1 0 unknown no
4 59 blue-collar married secondary no 0 yes no unknown 5 may 226 1 -1 0 unknown no
In [35]:
  # replacing "unemployed" with "jobless" 
    
df.replace(to_replace ="unemployed",value ="jobless ")[:5]
Out[35]:
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
0 30 jobless married primary no 1787 no no cellular 19 oct 79 1 -1 0 unknown no
1 33 services NaN secondary no 4789 yes yes cellular 11 may 220 1 339 4 failure no
2 35 management single tertiary no 1350 yes no cellular 16 apr 185 1 330 1 failure no
3 30 management married tertiary no 1476 yes yes unknown 3 jun 199 4 -1 0 unknown no
4 59 blue-collar married secondary no 0 yes no unknown 5 may 226 1 -1 0 unknown no
In [37]:
  # replacing "married", "single" with "private"

df.replace(to_replace =["married", "single"],value ="private")[:5]
Out[37]:
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
0 30 unemployed private primary no 1787 no no cellular 19 oct 79 1 -1 0 unknown no
1 33 services NaN secondary no 4789 yes yes cellular 11 may 220 1 339 4 failure no
2 35 management private tertiary no 1350 yes no cellular 16 apr 185 1 330 1 failure no
3 30 management private tertiary no 1476 yes yes unknown 3 jun 199 4 -1 0 unknown no
4 59 blue-collar private secondary no 0 yes no unknown 5 may 226 1 -1 0 unknown no
In [33]:
#replacing missing values(nan value) in dataframe with value x 

# check the second row with index 1, replacing nan (missing value of marital column) with unknown

df.replace(to_replace = np.nan, value ="unknown")[:5]
Out[33]:
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
0 30 unemployed married primary no 1787 no no cellular 19 oct 79 1 -1 0 unknown no
1 33 services unknown secondary no 4789 yes yes cellular 11 may 220 1 339 4 failure no
2 35 management single tertiary no 1350 yes no cellular 16 apr 185 1 330 1 failure no
3 30 management married tertiary no 1476 yes yes unknown 3 jun 199 4 -1 0 unknown no
4 59 blue-collar married secondary no 0 yes no unknown 5 may 226 1 -1 0 unknown no

Examples: removing 'to_replace =' and 'value =' & resullt will be the same

In [30]:
df.replace("unemployed","jobless ")[:3]
Out[30]:
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
0 30 jobless married primary no 1787 no no cellular 19 oct 79 1 -1 0 unknown no
1 33 services NaN secondary no 4789 yes yes cellular 11 may 220 1 339 4 failure no
2 35 management single tertiary no 1350 yes no cellular 16 apr 185 1 330 1 failure no
In [31]:
df.replace(np.nan,"unknown")[:3]
Out[31]:
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
0 30 unemployed married primary no 1787 no no cellular 19 oct 79 1 -1 0 unknown no
1 33 services unknown secondary no 4789 yes yes cellular 11 may 220 1 339 4 failure no
2 35 management single tertiary no 1350 yes no cellular 16 apr 185 1 330 1 failure no
In [32]:
df.replace(["married", "single"],"private")[:3]
Out[32]:
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome y
0 30 unemployed private primary no 1787 no no cellular 19 oct 79 1 -1 0 unknown no
1 33 services NaN secondary no 4789 yes yes cellular 11 may 220 1 339 4 failure no
2 35 management private tertiary no 1350 yes no cellular 16 apr 185 1 330 1 failure no
In [ ]: