Case Study (autos) : Checking Data Types

In [14]:
import pandas as pd
import numpy as np

df= pd.read_csv('carsinfo.csv')

Data Types : df.dtypes

In [53]:
df.dtypes
Out[53]:
symboling              int64
normalized_losses     object
make                  object
fuel-type             object
aspiration            object
num_of_doors          object
body_style            object
drive_wheels          object
engine_location       object
wheel_base           float64
length               float64
width                float64
height               float64
curb_weight            int64
engine_type           object
num_of_cylinders      object
engine_size            int64
fuel_system           object
bore                  object
stroke                object
compression_ratio    float64
horsepower            object
peak_rpm              object
city_mpg               int64
highway_mpg            int64
price                 object
dtype: object

Result : Following columns should be Numeric not object

normalized_losses     object

bore                  object

stroke                object

horsepower            object

peak_rpm              object

price                 object

key point:

- Before converting data types, we should deal with missing values, otherwise

- You will see error "ValueError: could not convert string to float: '?'"

- Check first three rows of "normalized_losses" to see ? in dataset
In [13]:
df.head() 
Out[13]:
symboling normalized_losses make fuel_type aspiration num_of_doors body_style drive_wheels engine_location wheel_base ... engine_size fuel_system bore stroke compression_ratio horsepower peak_rpm city_mpg highway_mpg price
0 3 ? alfa-romero gas std two convertible rwd front 88.6 ... 130 mpfi 3.47 2.68 9.0 111 5000 21 27 13495
1 3 ? alfa-romero gas std two convertible rwd front 88.6 ... 130 mpfi 3.47 2.68 9.0 111 5000 21 27 16500
2 1 ? alfa-romero gas std two hatchback rwd front 94.5 ... 152 mpfi 2.68 3.47 9.0 154 5000 19 26 16500
3 2 164 audi gas std four sedan fwd front 99.8 ... 109 mpfi 3.19 3.40 10.0 102 5500 24 30 13950
4 2 164 audi gas std four sedan 4wd front 99.4 ... 136 mpfi 3.19 3.40 8.0 115 5500 18 22 17450

5 rows × 26 columns

In [12]:
df["normalized_losses"].astype(float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-c32c4d90ac0c> in <module>
----> 1 df["price"].astype(float)

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5696         else:
   5697             # else, only a single dtype is given
-> 5698             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors)
   5699             return self._constructor(new_data).__finalize__(self)
   5700 

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    580 
    581     def astype(self, dtype, copy: bool = False, errors: str = "raise"):
--> 582         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    583 
    584     def convert(self, **kwargs):

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
    440                 applied = b.apply(f, **kwargs)
    441             else:
--> 442                 applied = getattr(b, f)(**kwargs)
    443             result_blocks = _extend_blocks(applied, result_blocks)
    444 

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    623             vals1d = values.ravel()
    624             try:
--> 625                 values = astype_nansafe(vals1d, dtype, copy=True)
    626             except (ValueError, TypeError):
    627                 # e.g. astype_nansafe can fail on object-dtype of strings

~/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    895     if copy or is_object_dtype(arr) or is_object_dtype(dtype):
    896         # Explicit copy, or required since NumPy can't view from / to object.
--> 897         return arr.astype(dtype, copy=True)
    898 
    899     return arr.view(dtype)

ValueError: could not convert string to float: '?'