- Importing the Relevant Libraries
- Loading the Data
- Declaring the Dependent and the Independent variables
- F-Regression
- Finding p-value
- Creating a summary table
- Result
Note: the dependent variable is 'price'
the independent variables are 'size'&'year'
import numpy as np
import pandas as pd
url = "https://datascienceschools.github.io/real_estate_price_size_year.csv"
df = pd.read_csv(url)
df.head()
- x : (Independent variable)-> Input or Feature
- y : (dependent variable)-> Output or Target
x = df[['size','year']]
y = df['price']
from sklearn.feature_selection import f_regression
f_regression(x,y)
- Feature selection
p_values = f_regression(x,y)[1]
print(p_values)
print(p_values.round(3))
summary = pd.DataFrame(data = x.columns.values, columns=['Features'])
summary ['p-values'] = p_values.round(3)
summary
- 'Year' is not significant, therefore we should remove it from the model