Case Study (Real Estate) :

SKLearrn (Simple Linear Regression)

 - Simple Linear Regression: Finding the best-fitting line through the data points

Overview

- Importing the Relevant Libraries

- Loading the Data

- Declaring the Dependent and the Independent variables

- Linear Regression Model

    - Creating a Linear Regression 
    - Reshaping x  into a matrix (2D object)
    - Fitting The Model
    - Calculating the R-squared
    - Finding the intercept
    - Finding the coefficients
    - Making predictions 

         1. Adding New Apartments
         2. Predicting Price of New Apartments
         3. Creating Summary Table


- Plotting a Scatter Plot

- Plotting Regression Line 

    - Finding Coefficient & Intercept
    - Calculating yhat
    - Plotting Regression Line 

Note: the dependent variable is 'price' & the independent variable is 'size'

Importing the Relevant Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from sklearn.linear_model import LinearRegression

Loading the data

In [2]:
url = "https://datascienceschools.github.io/real_estate_price_size.csv"

df = pd.read_csv(url)

df.head()
Out[2]:
price size
0 234314.144 643.09
1 228581.528 656.22
2 281626.336 487.29
3 401255.608 1504.75
4 458674.256 1275.46

Declaring the dependent and the independent variables

    - x : (Independent variable)-> Input or Feature
    - y : (dependent variable)-> Output or Target 
In [3]:
x = df['size']
y = df['price']

print(x.shape)
print(y.shape)
(100,)
(100,)

Linear Regression Model

Creating a Simple Linear Regression

In [4]:
model = LinearRegression()

Reshaping x into a matrix (2D object)

- Reshaping input into a matrix (two dimensional array) before fitting the model
In [5]:
x = x.values.reshape(-1,1)

x.shape
Out[5]:
(100, 1)

Fitting The Model

In [6]:
model.fit(x,y)
Out[6]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Calculating the R-Squared

In [7]:
RSquared = model.score(x,y)

print('R-Squared is:', RSquared)
R-Squared is: 0.7447391865847586

Finding the intercept

In [8]:
Intercept = model.intercept_

print('Intercept is:', Intercept)
Intercept is: 101912.60180122906

Finding the coefficients

In [9]:
Coefficient = model.coef_

print('coefficient is:', Coefficient)
coefficient is: [223.17874259]

Making predictions

- What should be the price of a apartment with a size of 500, 750 & 1000 sq.ft?

New Apartments (500, 750, 1000 sq.ft)

In [10]:
new_apartment = pd.DataFrame({'size': [500,750,1000]})

new_apartment
Out[10]:
size
0 500
1 750
2 1000

Predicting the price of New Apartments

In [11]:
model.predict(new_apartment)
Out[11]:
array([213501.97309853, 269296.65874718, 325091.34439584])

Creating Summary Table

In [12]:
new_apartment['predicted_price'] = model.predict(new_apartment)

new_apartment
Out[12]:
size predicted_price
0 500 213501.973099
1 750 269296.658747
2 1000 325091.344396

Plotting a Scatter Plot

- Positive linear relationship between Size & Price
In [13]:
plt.scatter(x,y)
plt.xlabel('Size',fontsize=20)
plt.ylabel('Price',fontsize=20)
plt.show()

Plotting Regression Line

1. Finding Coefficient & Intercept

2. Calculating yhat

3. Plotting Regression Line 

1. Finding Coefficient & Intercept

In [16]:
Coefficient = model.coef_

Intercept = model.intercept_

print("Coeficient is:", Coefficient, '\n Intercept is:', Intercept)
Coeficient is: [223.17874259] 
 Intercept is: 101912.60180122906

2. Finding yhat (Simple Linear Regression Equation)

In [17]:
yhat = Coefficient * x + Intercept

3. Plotting Regression Line

In [18]:
plt.scatter(x,y)
plt.xlabel('Size', fontsize = 20)
plt.ylabel('Price', fontsize = 20)

yhat = Coefficient * x + Intercept

plt.plot(x, yhat, lw = 4, c ='red', label ='regression line')

plt.show()