- Importing the Relevant Libraries
- Loading the Data
- Declaring the Dependent and the Independent variables
- Splitting the dataset into the Training set and Test set
- Simple Linear Regression
- Creating a Linear Regression
- Fitting The Model
- Predicting the Results
- Making a Single Observation Prediction
- Visualising the Simple Linear Regression
- Result
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = "https://DataScienceSchools.github.io/Machine_Learning/Sklearn/Regression/Position_Salaries.csv"
dataset = pd.read_csv(url)
dataset
Position | Level | Salary | |
---|---|---|---|
0 | Business Analyst | 1 | 45000 |
1 | Junior Consultant | 2 | 50000 |
2 | Senior Consultant | 3 | 60000 |
3 | Manager | 4 | 80000 |
4 | Country Manager | 5 | 110000 |
5 | Region Manager | 6 | 150000 |
6 | Partner | 7 | 200000 |
7 | Senior Partner | 8 | 300000 |
8 | C-level | 9 | 500000 |
9 | CEO | 10 | 1000000 |
- Exclude Position (Index 0)
- Position & Level are the same
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
- Dataset is small, so we will not split it into training set & test set
- LinearRegression Class from linear_model Module of sklearn Library
- model -> Object of LinearRegression Class
from sklearn.linear_model import LinearRegression
model = LinearRegression()
- fit method -> training the model
model.fit(X, y)
LinearRegression()
- y_pred -> the predicted salaries
y_pred = model.predict(X)
- level: 6.5
model.predict([[6.5]])
array([330378.78787879])
plt.scatter(X, y, color = 'red')
plt.plot(X, model.predict(X), color = 'blue')
plt.title('Salary vs Position Level (Simple Linear Regression)')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()
- Considering the scatter plot:
- A Simple Linear Regression doesn't fit quite well to the data
- Let's try Polynomial Linear Regression