Case Study (Salary & Position) :

SKLearrn (Polynomial Linear Regression)

- Considering following scatter plot:

    - A Simple Linear Regression doesn't fit quite well to the data 

    - Solution : Polynomial Linear Regression fits perfectly
In [11]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X, y)

plt.scatter(X, y, color = 'red')

plt.plot(X, model.predict(X), color = 'blue')

plt.title('Salary vs Position Level (Simple Linear Regression)')
plt.xlabel('Position Level')
plt.ylabel('Salary')

plt.show()
- Polynomial Linear Regression   

    - It is a special case of the Multiple Linear Regression


        1. Creating LinearRegression Model

            -> model = LinearRegression()

        2. Transforming features (input variables) to polynomial features

            ->  poly_features = PolynomialFeatures(degree = 4)

            ->  X_poly = poly_features.fit_transform(X)

        3. Fitting the model with polynomial features 

            -> model.fit(X_poly, y)


- Polynomial Linear Regression example:

    - Describing how diseases spread 

    - Describing how pandemics & epidemics spread across territory or population 


- Polynomial Equation:

    - y = b0 + b1 x1 + b2 x1^2 + b3 x1^3 + ... + bn x1^n 


- Why is it still called a Linear regression?

    - The relationship between Y and X is non-linear 

    - But the Regression Function is linear (a linear combination of coefficients)

Overview

- Importing the Relevant Libraries

- Loading the Data

- Declaring the Dependent and the Independent variables

- Splitting the dataset into the Training set and Test set

- Polynomial Regression Model

    - Polynomial Features Transform
    - Creating a Linear Regression 
    - Fitting The Model
    - Predicting the Results
    - Making a Single Observation Prediction

- Visualising the Polynomial Linear Regression 

- Visualising the Polynomial Linear Regression (Higher Resolution)

- Comparing the Results with Simple Linear Regression 

    - Simple Linear Regression 

        - Creating & Training the Model - Predicting a Single Observation 

    - Visualising the Simple Linear Regression

Importing the Relevant Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Loading the Data

In [2]:
url = "https://DataScienceSchools.github.io/Machine_Learning/Regression_Models_Intuition/Position_Salaries.csv"

dataset = pd.read_csv(url)

dataset
Out[2]:
Position Level Salary
0 Business Analyst 1 45000
1 Junior Consultant 2 50000
2 Senior Consultant 3 60000
3 Manager 4 80000
4 Country Manager 5 110000
5 Region Manager 6 150000
6 Partner 7 200000
7 Senior Partner 8 300000
8 C-level 9 500000
9 CEO 10 1000000

Declaring the Dependent and the Independent variables

- Exclude Position (Index 0)
- Position & Level are the same
In [3]:
X = dataset.iloc[:, 1:-1].values

y = dataset.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set

- Dataset is small, so we will not split it into training set & test set

Polynomial Regression Model

Polynomial Features Transform

- Transforming features (input variables) to polynomial features

- degree = 4  -> x1 to the power of 4  

- y = b0 + b1 x1 + b2 x1^2 + b3 x1^3 + b4 x1^4 

- PolynomialFeatures Class from preprocessing Module of sklearn Library

- poly_features -> Object of PolynomialFeatures Class

- poly_features.fit_transform(X) -> fitting & trasforming X at the same time
In [4]:
from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree = 4)

X_poly = poly_features.fit_transform(X)

Creating a Linear Regression

- LinearRegression Class from linear_model Module of sklearn Library

- model -> Object of LinearRegression Class
In [5]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()

Fitting The Model

- fit method -> training the model
In [6]:
model.fit(X_poly, y)
Out[6]:
LinearRegression()

Predicting the Results

- y_pred -> the predicted salaries
In [7]:
y_pred = model.predict(X_poly)

Making a Single Observation Prediction

- level: 6.5 -> Salary = 158,862

- fit_transform method acceptd 2D array -> [[]]
In [8]:
new_X_poly = poly_features.fit_transform([[6.5]])

model.predict(new_X_poly)
Out[8]:
array([158862.45265153])

Visualising the Polynomial Linear Regression

- Red points -> Actual Values

- Blue line  -> Predicted values
In [9]:
plt.scatter(X, y, color = 'red')

plt.plot(X, y_pred, color = 'blue')

plt.title('Salary vs Position Level (Polynomial Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')

plt.show()

Visualising the Polynomial Linear Regression

- Revising code for higher resolution and smoother curve
In [10]:
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))

X_poly = poly_features.fit_transform(X_grid)

plt.scatter(X, y, color = 'red')

plt.plot(X_grid, model.predict(X_poly), color = 'blue')

plt.title('Salary vs Position Level (Polynomial Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')

plt.show()