Case Study (autos) : Converting datatypes

In [37]:
import pandas as pd
import numpy as np

df = pd.read_csv('carsinfo.csv')

Replacing "?" with "nan"

As we discussd before, we should handle "?" for following columns before converting their datatypes:

normalized_losses     object

bore                  object

stroke                object

horsepower            object

peak_rpm              object

price                 object

Therefore, let's Get rid of "?" , replace it with "Nan" , and then convert datatypes

Replacing "?" with "nan" and converting datatypes

In [38]:
df["normalized_losses"] = df["normalized_losses"].replace({'?': np.nan}).dropna().astype(int)
df["bore"] = df["bore"].replace({'?': np.nan}).dropna().astype(float)
df["stroke"] = df["stroke"].replace({'?': np.nan}).dropna().astype(float)
df["horsepower"] = df["horsepower"].replace({'?': np.nan}).dropna().astype(float)
df["peak_rpm"] = df["peak_rpm"].replace({'?': np.nan}).dropna().astype(float)
df["price"] = df["price"].replace({'?': np.nan}).dropna().astype(float)

df.head()
Out[38]:
symboling normalized_losses make fuel_type aspiration num_of_doors body_style drive_wheels engine_location wheel_base ... engine_size fuel_system bore stroke compression_ratio horsepower peak_rpm city_mpg highway_mpg price
0 3 NaN alfa-romero gas std two convertible rwd front 88.6 ... 130 mpfi 3.47 2.68 9.0 111.0 5000.0 21 27 13495.0
1 3 NaN alfa-romero gas std two convertible rwd front 88.6 ... 130 mpfi 3.47 2.68 9.0 111.0 5000.0 21 27 16500.0
2 1 NaN alfa-romero gas std two hatchback rwd front 94.5 ... 152 mpfi 2.68 3.47 9.0 154.0 5000.0 19 26 16500.0
3 2 164.0 audi gas std four sedan fwd front 99.8 ... 109 mpfi 3.19 3.40 10.0 102.0 5500.0 24 30 13950.0
4 2 164.0 audi gas std four sedan 4wd front 99.4 ... 136 mpfi 3.19 3.40 8.0 115.0 5500.0 18 22 17450.0

5 rows × 26 columns

In [39]:
# Let's check datatypes again to see modifications

df.dtypes
Out[39]:
symboling              int64
normalized_losses    float64
make                  object
fuel_type             object
aspiration            object
num_of_doors          object
body_style            object
drive_wheels          object
engine_location       object
wheel_base           float64
length               float64
width                float64
height               float64
curb_weight            int64
engine_type           object
num_of_cylinders      object
engine_size            int64
fuel_system           object
bore                 float64
stroke               float64
compression_ratio    float64
horsepower           float64
peak_rpm             float64
city_mpg               int64
highway_mpg            int64
price                float64
dtype: object