- Considering the scatter plot:
    - A Simple Linear Regression doesn't fit quite well to the data 
    - Polynomial Linear Regression fits perfectly
- Example:
    - Describing how diseases spread 
    - Describing how pandemics & epidemics spread across territory or population 
- Polynomial Equation:
    - y = b0 + b1 x1 + b2 x1^2 + b3 x1^3 + ... + bn x1^n 
- Why is it still called a linear regression?
    - Considering the scatter plot:
                - The relationship between Y and X is non-linear 
                - A nonlinear model fits to the data 
    - But the regression function is linear
                - a linear combination of coefficients
    - It is a special case of the multiple linear regression- Importing the Relevant Libraries
- Loading the Data
- Declaring the Dependent and the Independent variables
- Splitting the dataset into the Training set and Test set
- Polynomial Regression Model
    - Polynomial Features Transform
    - Creating a Linear Regression 
    - Fitting The Model
    - Predicting the Results
    - Making a Single Observation Prediction
- Visualising the Polynomial Linear Regression 
- Visualising the Polynomial Linear Regression (Higher Resolution)
- Comparing the Results with Simple Linear Regression 
    - Simple Linear Regression 
        - Creating & Training the Model - Predicting a Single Observation 
    - Visualising the Simple Linear Regressionimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = "https://DataScienceSchools.github.io/Machine_Learning/Sklearn/Regression/Position_Salaries.csv"
dataset = pd.read_csv(url)
dataset
| Position | Level | Salary | |
|---|---|---|---|
| 0 | Business Analyst | 1 | 45000 | 
| 1 | Junior Consultant | 2 | 50000 | 
| 2 | Senior Consultant | 3 | 60000 | 
| 3 | Manager | 4 | 80000 | 
| 4 | Country Manager | 5 | 110000 | 
| 5 | Region Manager | 6 | 150000 | 
| 6 | Partner | 7 | 200000 | 
| 7 | Senior Partner | 8 | 300000 | 
| 8 | C-level | 9 | 500000 | 
| 9 | CEO | 10 | 1000000 | 
- Exclude Position (Index 0)
- Position & Level are the sameX = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
- Dataset is small, so we will not split it into training set & test set- Creating new versions of input variables 
- Transforming features to polynomial features
- degree = 4  -> x1 to the power of 4  
- y = b0 + b1 x1 + b2 x1^2 + b3 x1^3 + b4 x1^4 
- PolynomialFeatures Class from preprocessing Module of sklearn Library
- poly_features -> Object of PolynomialFeatures Class
- poly_features.fit_transform(X) -> fitting & trasforming X at the same timefrom sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree = 4)
X_poly = poly_features.fit_transform(X)
- LinearRegression Class from linear_model Module of sklearn Library
- model -> Object of LinearRegression Classfrom sklearn.linear_model import LinearRegression
model = LinearRegression()
- fit method -> training the modelmodel.fit(X_poly, y)
LinearRegression()
- y_pred -> the predicted salariesy_pred = model.predict(X_poly)
- level: 6.5 -> Salary = 158,862
- fit_transform method acceptd 2D array -> [[]]new_X_poly = poly_features.fit_transform([[6.5]])
model.predict(new_X_poly)
array([158862.45265153])
- Red points -> Actual Values
- Blue line -> Predicted valuesplt.scatter(X, y, color = 'red')
plt.plot(X, y_pred, color = 'blue')
plt.title('Salary vs Position Level (Polynomial Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
- Revising code for higher resolution and smoother curveX_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
X_poly = poly_features.fit_transform(X_grid)
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, model.predict(X_poly), color = 'blue')
plt.title('Salary vs Position Level (Polynomial Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)
model.predict([[6.5]])
array([330378.78787879])
plt.scatter(X, y, color = 'red')
plt.plot(X, model.predict(X), color = 'blue')
plt.title('Salary vs Position Level (Simple Linear Regression)')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()