Case Study (autos) : Handling Missing Values

In [133]:
# Importing csv file

import pandas as pd
import numpy as np

url= "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"


df= pd.read_csv(url, header = None)


# Adding Header

headers = ["symboling", "normalized_losses", "make", "fuel-type", "aspiration", "num_of_doors", "body_style","drive_wheels",
           "engine_location", "wheel_base", "length", "width", "height", "curb_weight", "engine_type", "num_of_cylinders",
           "engine_size", "fuel_system",  "bore",  "stroke", "compression_ratio", "horsepower",  "peak_rpm",  "city_mpg",
           "highway_mpg", "price" ]
           
df.columns = headers 


# Showing first 5 rows

df.head(5)
Out[133]:
symboling normalized_losses make fuel-type aspiration num_of_doors body_style drive_wheels engine_location wheel_base ... engine_size fuel_system bore stroke compression_ratio horsepower peak_rpm city_mpg highway_mpg price
0 3 ? alfa-romero gas std two convertible rwd front 88.6 ... 130 mpfi 3.47 2.68 9.0 111 5000 21 27 13495
1 3 ? alfa-romero gas std two convertible rwd front 88.6 ... 130 mpfi 3.47 2.68 9.0 111 5000 21 27 16500
2 1 ? alfa-romero gas std two hatchback rwd front 94.5 ... 152 mpfi 2.68 3.47 9.0 154 5000 19 26 16500
3 2 164 audi gas std four sedan fwd front 99.8 ... 109 mpfi 3.19 3.40 10.0 102 5500 24 30 13950
4 2 164 audi gas std four sedan 4wd front 99.4 ... 136 mpfi 3.19 3.40 8.0 115 5500 18 22 17450

5 rows × 26 columns

Checking columns with missing values

In [134]:
df.columns[df.isnull().any()]
Out[134]:
Index([], dtype='object')

Handling Missing Values

1. Replacing question mark (?) with nan -> replace()
2. Converting object to float and int -> astype()
3. Droping missing values -> dropna()
In [135]:
df["normalized_losses"] = df["normalized_losses"].replace({'?': np.nan}).astype(float)

df["price"] = df["price"].replace({'?': np.nan}).dropna().astype(int)
In [137]:
# Showing first 5 rows

df.dropna()[:5]
Out[137]:
symboling normalized_losses make fuel-type aspiration num_of_doors body_style drive_wheels engine_location wheel_base ... engine_size fuel_system bore stroke compression_ratio horsepower peak_rpm city_mpg highway_mpg price
3 2 164.0 audi gas std four sedan fwd front 99.8 ... 109 mpfi 3.19 3.40 10.0 102 5500 24 30 13950.0
4 2 164.0 audi gas std four sedan 4wd front 99.4 ... 136 mpfi 3.19 3.40 8.0 115 5500 18 22 17450.0
6 1 158.0 audi gas std four sedan fwd front 105.8 ... 136 mpfi 3.19 3.40 8.5 110 5500 19 25 17710.0
8 1 158.0 audi gas turbo four sedan fwd front 105.8 ... 131 mpfi 3.13 3.40 8.3 140 5500 17 20 23875.0
10 2 192.0 bmw gas std two sedan rwd front 101.2 ... 108 mpfi 3.50 2.80 8.8 101 5800 23 29 16430.0

5 rows × 26 columns