- Importing the Relevant Libraries
- Loading the Data
- Declaring the Dependent and the Independent variables
- Plotting a Scatter Plot
- OLS Regression
1. Adding a Constant
2. Fitting the Model
3. OLS Regression Results (Summary)
- Plotting Regression Line
1. Finding Coefficient & Intercept
2. Calculating yhat
3. Plotting Regression Line
- Making Predictions
1. Adding New Apartments
2. Predicting Price of New Apartments
3. Creating Summary Table
Note: the dependent variable is 'price', while the independent variable is 'size'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.linear_model import LinearRegression
url = "https://datascienceschools.github.io/real_estate_price_size.csv"
df = pd.read_csv(url)
df.head()
- x : (Independent variable)-> Input or Feature
- y : (dependent variable)-> Output or Target
x = df['size']
y = df['price']
print(x.shape)
print(y.shape)
- Positive linear relationship between Size & Price
plt.scatter(x,y)
plt.xlabel('Size',fontsize=20)
plt.ylabel('Price',fontsize=20)
plt.show()
- OLS (ordinary least squares)
- OLS is the most common method to estimate the linear regression equation
- This method aims to find the line which minimises the sum of the squared errors
1. Adding a Constant
2. Fitting the Model
3. OLS Regression Results (Summary)
- Model needs an intercept so we add a column of 1s
- x_constant = sm.add_constant(x) -> Add a constant column of 1s
import statsmodels.api as sm
x_constant = sm.add_constant(x)
x_constant
- Fitting the model according to the OLS method
results = sm.OLS(y,x_constant).fit()
results.summary()
- Positive linear relationship between Size & Price
1. Finding Coefficient & Intercept
2. Calculating yhat
3. Plotting Regression Line
- coef
const: 1.019e+05 -> Intercept
size: 223.1787 -> Coefficient
- yhat = Coefficient * x + Intercept
- plt.plot(x, yhat, lw=4, c='orange', label ='regression line')
plt.scatter(x,y)
plt.xlabel('Size', fontsize = 20)
plt.ylabel('Price', fontsize = 20)
yhat = 223.1787*x + 1.019e+05
plt.plot(x,yhat, lw=4, c='red', label ='regression line')
plt.show()
- What should be the price of apartments with a size of 500, 750 & 1000 sq.ft?
- Adding new apartments
- Predicting the price of new apartments
- Creating summary table
new_apartment = pd.DataFrame({'x_constant':1 , 'size': [500,750,1000]})
new_apartment
results.predict(new_apartment)
new_apartment['predicted_price'] = results.predict(new_apartment)
new_apartment