- Importing the Relevant Libraries
- Loading the Data
- Declaring the Dependent and the Independent variables
- Linear Regression Model
- Creating a Multiple linear regression
- Fitting The Model
- Calculating the R-squared
- Finding the intercept
- Finding the coefficients
- Finding Adjusted R-squared
- Function for calculating Adjusted R-squared
- Making predictions
- Adding New Apartments
- Predicting Price of New Apartments
- Creating Summary Table
- Plotting a Scatter Plot
- Plotting Regression Line
- Finding Coefficient & Intercept
- Calculating yhat
- Plotting Regression Line
Note: the dependent variable is 'price'
the independent variables are 'size'&'year'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.linear_model import LinearRegression
url = "https://datascienceschools.github.io/real_estate_price_size_year.csv"
df = pd.read_csv(url)
df.head()
- x : (Independent variable)-> Input or Feature
- y : (dependent variable)-> Output or Target
x = df[['size','year']]
y = df['price']
model = LinearRegression()
- Sklearn is optimised for multiple linear regression,
- So we do not need to reshape x into a matrix (2D object) before fitting the model
model.fit(x,y)
RSquared = model.score(x,y)
print('R-Squared is:', RSquared)
Intercept = model.intercept_
print('Intercept is:', Intercept)
Coefficient = model.coef_
print('coefficient is:', Coefficient)
$R^2_{adj.} = 1 - (1 - R^2) *\frac{n-1}{n-p-1}$
def adjusted_r2(x,y):
r2 = model.score(x,y)
n = x.shape[0]
p = x.shape[1]
adj_r2 = 1-(1-r2)*(n-1)/(n-p-1)
return adj_r2
print('Adjusted R-Squared is:', adjusted_r2(x,y))
- Adjusted R-squared of Multiple Linear Regression : 0.77187
- R-squared of Multiple Linear Regression : 0.77648
The R-squared is only slightly larger than the Adjusted R-squared
=> we were not penalized a lot for the inclusion of 2 independent variables
- Adjusted R-squared of Multiple Linear Regression : 0.77187
- R-squared of Simple Linear Regression : 0.74473
=> 'Year' is not bringing too much value to the result
- What should be the price of a apartment with a size of 500, 750 & 1000 sq.ft
in year 2009?
new_apartment = pd.DataFrame(np.array([[500, 2009], [750, 2009], [1000, 2009]]),
columns=['Size', 'Year'])
new_apartment
model.predict(new_apartment)
new_apartment['Predicted_Price'] = model.predict(new_apartment)
new_apartment