- Importing the Relevant Libraries
- Loading the Data
- Declaring the Dependent and the Independent variables
- Splitting the dataset into the Training set and Test set
- Reshaping the Independent variable y
- Feature Scaling
- Support Vector Regression (SVR)
- Creating SVR Model
- Reshaping the Independent variable y
- Fitting The Model
- Predicting the Results
- Making a Single Observation Prediction
- Visualising the SVR
- Visualising the SVR (Higher Resolution)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = "https://DataScienceSchools.github.io/Machine_Learning/Sklearn/Regression/Position_Salaries.csv"
dataset = pd.read_csv(url)
dataset
Position | Level | Salary | |
---|---|---|---|
0 | Business Analyst | 1 | 45000 |
1 | Junior Consultant | 2 | 50000 |
2 | Senior Consultant | 3 | 60000 |
3 | Manager | 4 | 80000 |
4 | Country Manager | 5 | 110000 |
5 | Region Manager | 6 | 150000 |
6 | Partner | 7 | 200000 |
7 | Senior Partner | 8 | 300000 |
8 | C-level | 9 | 500000 |
9 | CEO | 10 | 1000000 |
- Exclude Position (Index 0)
- Position & Level are the same
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
- Dataset is small, so we will not split it into training set & test set
- In feature scaling stage
- fit_transform method of StandardScaler Class accepts 2D array
y = y.reshape(-1, 1)
print(y.shape)
(10, 1)
- Always apply feature scaling after splitting
- Linear Regression Models
- No need to apply feature scaling for Linear Regression Models
- The coefficients can compensate the high values of the features
- SVR models
- Apply feature scaling
- There are not any coefficients to compensate the high values of the features
- Apply feature scaling for models which usually have an implicit equation
- Implicit relationship between the dependent variable Y and the features X
- Apply features Scaling on both the feature & the dependent variable
- the feature taking values from 1 to 10
- the dependent variable taking values from 45000 to 1 million
- By applying features Scalings
- the feature will not be neglected by the SVR model
- the feature taking much lower values than the dependent variable
- No need to apply features scaling
- when the dependent variable takes binary values 0 and 1
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X_scaled = sc_X.fit_transform(X)
y_scaled = sc_y.fit_transform(y)
- SVR Class from svm Module of sklearn Library
- model -> Object of SVR Class
- kernel = 'rbf' -> Recommended
from sklearn.svm import SVR
model = SVR(kernel = 'rbf')
- fit method accepts the independent variable as 1D array
y_scaled = y_scaled.reshape(-1, )
print(y_scaled.shape)
(10,)
- fit method -> training the model
model.fit(X_scaled, y_scaled)
SVR()
- y_pred -> the predicted salaries
y_pred = sc_y.inverse_transform(model.predict(X_scaled))
- position level: 6.5
- sc_X.transform([[6.5]])
-> apply feature scaling on the single observation
-> the same scaling used to scale features (X)
-> the predict method accepts scaled feature
-> fit_transform accepts 2D array -> [[]]
- apply sc_y.inverse_transform
-> to find original scale of predicted salary
-> the predicted salary is scaled
sc_y.inverse_transform(model.predict(sc_X.transform([[6.5]])))
array([170370.0204065])
plt.scatter(X, y, color = 'red')
plt.plot(X, y_pred, color = 'blue')
plt.title('Salary vs Position Level (SVR)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
- Revising code for higher resolution and smoother curve
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, sc_y.inverse_transform(model.predict(sc_X.transform(X_grid))), color = 'blue')
plt.title('Salary vs Position Level (SVR)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()