Support Vector Machine (SVM)

Bank Customers Retirement Prediction

You work as a data scientist at a major bank in NYC and you have been tasked to develop a model that can predict whether a customer is able to retire or not based on his/her features. Features are his/her age and net 401K savings (retirement savings in the U.S.). 

Dr. Ryan @STEMplicity

Importing the Relevant Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Importing the Dataset

In [2]:
url = "https://DataScienceSchools.github.io/Machine_Learning/Classification_Models_CaseStudies/Bank_Customer_Retirement.csv"

df = pd.read_csv(url)

df.head()
Out[2]:
Customer ID Age 401K Savings Retire
0 0 39.180417 322349.8740 0
1 1 56.101686 768671.5740 1
2 2 57.023043 821505.4718 1
3 3 43.711358 494187.4850 0
4 4 54.728823 691435.7723 1

Droping Unnecessary Column

- Customer ID
In [3]:
df.drop(['Customer ID'],  axis=1, inplace=True)

Data Visualisation

In [4]:
sns.pairplot(df, hue = 'Retire')

plt.show()
In [5]:
sns.countplot(df['Retire'])

plt.show()

Declaring the Dependent & the Independent Variables

In [6]:
X = df.iloc[:, :-1].values

y = df.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set

In [7]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 5)

Feature Scaling

In [8]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

Training the Support Vector Machine Model

In [9]:
from sklearn.svm import SVC

model = SVC()

model.fit(X_train, y_train)
Out[9]:
SVC()

Predicting the Test Set Results

In [10]:
y_pred = model.predict(X_test)

Confusion Matrix

In [11]:
from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy is: {:.2f} %".format(accuracy*100))

sns.heatmap(cm, annot=True, fmt='d')

plt.show()
Accuracy is: 95.00 %

Classification Report

In [12]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

           0       0.95      0.93      0.94        43
           1       0.95      0.96      0.96        57

    accuracy                           0.95       100
   macro avg       0.95      0.95      0.95       100
weighted avg       0.95      0.95      0.95       100

k-Fold Cross Validation

In [13]:
from sklearn.model_selection import cross_val_score

accuracies = cross_val_score(estimator = model, X = X_train, y = y_train, cv = 10)

print("Accuracy: {:.2f} %".format(accuracies.mean()*100))

print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))
Accuracy: 94.25 %
Standard Deviation: 3.54 %

Improving the Model

Parameter Optimisation

- Applying Grid Search to find the best parameters

sklearn.svm.SVC Parameters

In [18]:
from sklearn.model_selection import GridSearchCV

parameters = [
    
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10], 'kernel': ['linear']},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10], 'kernel': ['poly'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10], 'kernel': ['sigmoid'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]}
 
 ]


grid_model = GridSearchCV(estimator = SVC(),
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,
                           n_jobs = -1)

grid_model.fit(X_train, y_train)

y_grid_pred = grid_model.predict(X_test)

best_accuracy = grid_model.best_score_

best_parameters = grid_model.best_params_

print("Best Accuracy: {:.2f} %".format(best_accuracy*100))

print("Best Parameters:", best_parameters)
Best Accuracy: 94.75 %
Best Parameters: {'C': 0.1, 'gamma': 0.8, 'kernel': 'rbf'}

Confusion Matrix

In [19]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_grid_pred)

sns.heatmap(cm, annot=True , fmt='d')

plt.show()