Case Study (University Admission) :¶

StatsModels (Logistic Regression) - Logit Regression Results¶

Overview¶

- Importing the relevant libraries

- Loading data

- Dummy Variables

- Declaring the dependent and independent variables

- Adding a Constant

- Creating a Logit Regression 

- Fitting the Model

- Logit Regression Summary

- Finding the odds

Importing the relevant libraries¶

import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Loading data¶

url = 'https://datascienceschools.github.io/Machine_Learning/StatsModel/admission.csv'

df = pd.read_csv(url)

df.head()

Dummy Variables¶

- Replace all No entries with 0, and all Yes entries with 1

data = df.copy()

data['Admitted'] = data['Admitted'].map({'Yes': 1, 'No': 0})
data['Gender'] = data['Gender'].map({'Female': 1, 'Male': 0})

data.head()

Declaring the dependent and independent variables¶

y = data['Admitted']

x = data[['SAT','Gender']]

Adding a Constant¶

x_constant = sm.add_constant(x)

Creating a Logit Regression¶

model = sm.Logit(y,x_constant)

Fitting the Model¶

results = model.fit()

Optimization terminated successfully.
         Current function value: 0.120117
         Iterations 10

Logit Regression Summary¶

- MLE (Maximum Likelihood Estimation)

- log likelihood: the value of the log likelihood is almost but not always negative

- The bigger the likelihood function, the higher the probability that our model is correct

- LL-Null is the log likelihood of a model which has no independent variables

- LLR p-value

- Well you may want to compare the log likelihood of your model with the LLNL to see if your model has any explanatory power, seeing if our model is significant.

- LL-Null measures if our model is statistically different from the L-L now aka a useless model without telling you the exact way to perform it.

- We have it's P-value and that's all we need as we can see it is very low around zero point zero zero Our model is significant

- Pseudo R-squ.: A good Pseudo R-Squared is somewhere between 0.2 and 0.4

results.summary()

Finding the odds¶

- π : The probability of an event occurring 

- 1-π : the probability of the event not occurring    

- (π/1-π) -> odds

- coef
    - SAT       0.0406  
    - Gender     1.9449

- Female:1 , male:0

- Given the same SAT score,

        -> a female is 7 times more likely to be admitted than a male

np.exp(1.9449)

6.992932526814459

	SAT	Admitted	Gender
0	1363	0	0
1	1792	1	1
2	1954	1	1
3	1653	0	0
4	1593	0	0

Dep. Variable:	Admitted	No. Observations:	168
Model:	Logit	Df Residuals:	165
Method:	MLE	Df Model:	2
Date:	Mon, 05 Oct 2020	Pseudo R-squ.:	0.8249
Time:	04:52:03	Log-Likelihood:	-20.180
converged:	True	LL-Null:	-115.26
Covariance Type:	nonrobust	LLR p-value:	5.118e-42

	coef	std err	z	P>\|z\|	[0.025	0.975]
const	-68.3489	16.454	-4.154	0.000	-100.598	-36.100
SAT	0.0406	0.010	4.129	0.000	0.021	0.060
Gender	1.9449	0.846	2.299	0.022	0.287	3.603

	SAT	Admitted	Gender
0	1363	No	Male
1	1792	Yes	Female
2	1954	Yes	Female
3	1653	No	Male
4	1593	No	Male