Case Study (Real Estate) :

SKLearn (R-squared and Adjusted R-squared)

Overview

- Importing the Relevant Libraries

- Loading the Data

- Declaring the Dependent and the Independent variables

- Linear Regression Model

    - Creating a linear regression 
    - Fitting the Model
    - Finding R-squared
    - Finding Adjusted R-squared

         - Function for calculating Adjusted R-squared
         -  Compare it with R-squared
         -  Compare it with R-squared of simple linear regression 

Note: the dependent variable is 'price'  

      the independent variables are 'size'&'year'

Importing the relevant libraries

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from sklearn.linear_model import LinearRegression

Loading the data

In [3]:
url = "https://datascienceschools.github.io/real_estate_price_size_year.csv"

df = pd.read_csv(url)

df.head()
Out[3]:
price size year
0 234314.144 643.09 2015
1 228581.528 656.22 2009
2 281626.336 487.29 2018
3 401255.608 1504.75 2015
4 458674.256 1275.46 2009

Declaring the dependent and the independent variables

    - x : (Independent variable)-> Input or Feature
    - y : (dependent variable)-> Output or Target 
In [4]:
x = df[['size','year']]
y = df['price']

print(x.shape)
print(y.shape)
(100, 2)
(100,)

Regression Model

In [5]:
model = LinearRegression()

Fitting The Model

- Sklearn is optimised for multiple linear regression,
- So we do not need to reshape x into a matrix (2D object) before fitting the model
In [6]:
model.fit(x,y)
Out[6]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Finding R-Squared

In [7]:
model.score(x,y)
Out[7]:
0.7764803683276793

Finding Adjusted R-Squared

$R^2_{adj.} = 1 - (1 - R^2) *\frac{n-1}{n-p-1}$

Function for calculating Adjusted R-Squared

In [10]:
def adjusted_r2(x,y):
    r2 = model.score(x,y)
    n = x.shape[0]
    p = x.shape[1]
    adj_r2 = 1-(1-r2)*(n-1)/(n-p-1)
    return adj_r2
    
adjusted_r2(x,y)
Out[10]:
0.77187171612825

Comparing R-Squared and Adjusted R-Squared

- Adjusted R-squared of Multiple Linear Regression : 0.77187

- R-squared of Multiple Linear Regression : 0.77648

The R-squared is only slightly larger than the Adjusted R-squared


 => we were not penalized a lot for the inclusion of 2 independent variables


Comparing Adjusted R-Squared with R-Squared of the simple linear regression

- Adjusted R-squared of Multiple Linear Regression : 0.77187

- R-squared of Simple Linear Regression : 0.74473


=> 'Year' is not bringing too much value to the result