- Considering following scatter plot:
- A Simple Linear Regression doesn't fit quite well to the data
- Solution : Polynomial Linear Regression fits perfectly
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)
plt.scatter(X, y, color = 'red')
plt.plot(X, model.predict(X), color = 'blue')
plt.title('Salary vs Position Level (Simple Linear Regression)')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
- Polynomial Linear Regression
- It is a special case of the Multiple Linear Regression
1. Creating LinearRegression Model
-> model = LinearRegression()
2. Transforming features (input variables) to polynomial features
-> poly_features = PolynomialFeatures(degree = 4)
-> X_poly = poly_features.fit_transform(X)
3. Fitting the model with polynomial features
-> model.fit(X_poly, y)
- Polynomial Linear Regression example:
- Describing how diseases spread
- Describing how pandemics & epidemics spread across territory or population
- Polynomial Equation:
- y = b0 + b1 x1 + b2 x1^2 + b3 x1^3 + ... + bn x1^n
- Why is it still called a Linear regression?
- The relationship between Y and X is non-linear
- But the Regression Function is linear (a linear combination of coefficients)
- Importing the Relevant Libraries
- Loading the Data
- Declaring the Dependent and the Independent variables
- Splitting the dataset into the Training set and Test set
- Polynomial Regression Model
- Polynomial Features Transform
- Creating a Linear Regression
- Fitting The Model
- Predicting the Results
- Making a Single Observation Prediction
- Visualising the Polynomial Linear Regression
- Visualising the Polynomial Linear Regression (Higher Resolution)
- Comparing the Results with Simple Linear Regression
- Simple Linear Regression
- Creating & Training the Model - Predicting a Single Observation
- Visualising the Simple Linear Regression
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = "https://DataScienceSchools.github.io/Machine_Learning/Regression_Models_Intuition/Position_Salaries.csv"
dataset = pd.read_csv(url)
dataset
- Exclude Position (Index 0)
- Position & Level are the same
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
- Dataset is small, so we will not split it into training set & test set
- Transforming features (input variables) to polynomial features
- degree = 4 -> x1 to the power of 4
- y = b0 + b1 x1 + b2 x1^2 + b3 x1^3 + b4 x1^4
- PolynomialFeatures Class from preprocessing Module of sklearn Library
- poly_features -> Object of PolynomialFeatures Class
- poly_features.fit_transform(X) -> fitting & trasforming X at the same time
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree = 4)
X_poly = poly_features.fit_transform(X)
- LinearRegression Class from linear_model Module of sklearn Library
- model -> Object of LinearRegression Class
from sklearn.linear_model import LinearRegression
model = LinearRegression()
- fit method -> training the model
model.fit(X_poly, y)
- y_pred -> the predicted salaries
y_pred = model.predict(X_poly)
- level: 6.5 -> Salary = 158,862
- fit_transform method acceptd 2D array -> [[]]
new_X_poly = poly_features.fit_transform([[6.5]])
model.predict(new_X_poly)
- Red points -> Actual Values
- Blue line -> Predicted values
plt.scatter(X, y, color = 'red')
plt.plot(X, y_pred, color = 'blue')
plt.title('Salary vs Position Level (Polynomial Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
- Revising code for higher resolution and smoother curve
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
X_poly = poly_features.fit_transform(X_grid)
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, model.predict(X_poly), color = 'blue')
plt.title('Salary vs Position Level (Polynomial Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()