- Importing the Relevant Libraries
- Loading the Data
- Declaring the Dependent and the Independent variables
- Linear Regression Model
- Creating a linear regression
- Fitting the Model
- Finding R-squared
- Finding Adjusted R-squared
- Function for calculating Adjusted R-squared
- Compare it with R-squared
- Compare it with R-squared of simple linear regression
Note: the dependent variable is 'price'
the independent variables are 'size'&'year'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.linear_model import LinearRegression
url = "https://datascienceschools.github.io/real_estate_price_size_year.csv"
df = pd.read_csv(url)
df.head()
- x : (Independent variable)-> Input or Feature
- y : (dependent variable)-> Output or Target
x = df[['size','year']]
y = df['price']
print(x.shape)
print(y.shape)
model = LinearRegression()
- Sklearn is optimised for multiple linear regression,
- So we do not need to reshape x into a matrix (2D object) before fitting the model
model.fit(x,y)
model.score(x,y)
$R^2_{adj.} = 1 - (1 - R^2) *\frac{n-1}{n-p-1}$
def adjusted_r2(x,y):
r2 = model.score(x,y)
n = x.shape[0]
p = x.shape[1]
adj_r2 = 1-(1-r2)*(n-1)/(n-p-1)
return adj_r2
adjusted_r2(x,y)
- Adjusted R-squared of Multiple Linear Regression : 0.77187
- R-squared of Multiple Linear Regression : 0.77648
The R-squared is only slightly larger than the Adjusted R-squared
=> we were not penalized a lot for the inclusion of 2 independent variables
- Adjusted R-squared of Multiple Linear Regression : 0.77187
- R-squared of Simple Linear Regression : 0.74473
=> 'Year' is not bringing too much value to the result