Case Study (Real Estate) :

SKLearn (Multiple Linear Regression)

- Finding the best fitting model
- It is not about the the best fitting line (like Simple Linear Regression) anymore

Overview

- Importing the Relevant Libraries

- Loading the Data

- Declaring the Dependent and the Independent variables

- Linear Regression Model

    - Creating a Multiple linear regression 
    - Fitting The Model
    - Calculating the R-squared
    - Finding the intercept
    - Finding the coefficients
    - Finding Adjusted R-squared
         - Function for calculating Adjusted R-squared
    - Making predictions 

         - Adding New Apartments
         - Predicting Price of New Apartments
         - Creating Summary Table


   - Plotting a Scatter Plot

   - Plotting Regression Line 

        - Finding Coefficient & Intercept
        - Calculating yhat
        - Plotting Regression Line 

    Note: the dependent variable is 'price'  

      the independent variables are 'size'&'year'

Importing the Relevant Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from sklearn.linear_model import LinearRegression

Loading the data

In [2]:
url = "https://datascienceschools.github.io/real_estate_price_size_year.csv"

df = pd.read_csv(url)

df.head()
Out[2]:
price size year
0 234314.144 643.09 2015
1 228581.528 656.22 2009
2 281626.336 487.29 2018
3 401255.608 1504.75 2015
4 458674.256 1275.46 2009

Declaring the dependent and the independent variables

    - x : (Independent variable)-> Input or Feature
    - y : (dependent variable)-> Output or Target 
In [3]:
x = df[['size','year']]

y = df['price']

Linear Regression Model

Creating a Multiple Linear Regression

In [4]:
model = LinearRegression()

Fitting The Model

- Sklearn is optimised for multiple linear regression,
- So we do not need to reshape x into a matrix (2D object) before fitting the model
In [5]:
model.fit(x,y)
Out[5]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Calculating the R-squared

In [6]:
RSquared = model.score(x,y)

print('R-Squared is:', RSquared)
R-Squared is: 0.7764803683276793

Finding the Intercept

In [7]:
Intercept = model.intercept_

print('Intercept is:', Intercept)
Intercept is: -5772267.017463278

Finding the Coefficients

In [8]:
Coefficient = model.coef_

print('coefficient is:', Coefficient)
coefficient is: [ 227.70085401 2916.78532684]

Finding Adjusted R-squared

$R^2_{adj.} = 1 - (1 - R^2) *\frac{n-1}{n-p-1}$

Function for calculating Adjusted R-Squared

In [9]:
def adjusted_r2(x,y):
    r2 = model.score(x,y)
    n = x.shape[0]
    p = x.shape[1]
    adj_r2 = 1-(1-r2)*(n-1)/(n-p-1)
    return adj_r2
    
print('Adjusted R-Squared is:', adjusted_r2(x,y))
Adjusted R-Squared is: 0.77187171612825

RESULT:

Comparing R-squared and Adjusted R-squared

- Adjusted R-squared of Multiple Linear Regression : 0.77187

- R-squared of Multiple Linear Regression : 0.77648

The R-squared is only slightly larger than the Adjusted R-squared


 => we were not penalized a lot for the inclusion of 2 independent variables


Comparing Adjusted R-squared with R-squared of the simple linear regression

- Adjusted R-squared of Multiple Linear Regression : 0.77187

- R-squared of Simple Linear Regression : 0.74473


=> 'Year' is not bringing too much value to the result

Making Predictions

- What should be the price of a apartment with a size of 500, 750 & 1000 sq.ft 

  in year 2009?

New Apartments (500, 750, 1000 sq.ft) , Year: 2009

In [10]:
new_apartment = pd.DataFrame(np.array([[500, 2009], [750, 2009], [1000, 2009]]),
                             columns=['Size', 'Year'])

new_apartment
Out[10]:
Size Year
0 500 2009
1 750 2009
2 1000 2009

Predicting the price of New Apartmnents

In [11]:
model.predict(new_apartment)
Out[11]:
array([201405.13115808, 258330.34465995, 315255.55816181])

Creating Summary Table

In [12]:
new_apartment['Predicted_Price'] = model.predict(new_apartment)

new_apartment
Out[12]:
Size Year Predicted_Price
0 500 2009 201405.131158
1 750 2009 258330.344660
2 1000 2009 315255.558162