Support Vector Machine (SVM)¶

Bank Customers Retirement Prediction¶

You work as a data scientist at a major bank in NYC and you have been tasked to develop a model that can predict whether a customer is able to retire or not based on his/her features. Features are his/her age and net 401K savings (retirement savings in the U.S.).

Dr. Ryan @STEMplicity

Importing the Relevant Libraries¶

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Importing the Dataset¶

url = "https://DataScienceSchools.github.io/Machine_Learning/Classification_Models_CaseStudies/Bank_Customer_Retirement.csv"

df = pd.read_csv(url)

df.head()

Droping Unnecessary Column¶

- Customer ID

df.drop(['Customer ID'],  axis=1, inplace=True)

Data Visualisation¶

sns.pairplot(df, hue = 'Retire')

plt.show()

sns.countplot(df['Retire'])

plt.show()

Declaring the Dependent & the Independent Variables¶

X = df.iloc[:, :-1].values

y = df.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set¶

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 5)

Feature Scaling¶

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

Training the Support Vector Machine Model¶

from sklearn.svm import SVC

model = SVC()

model.fit(X_train, y_train)

SVC()

Predicting the Test Set Results¶

y_pred = model.predict(X_test)

Confusion Matrix¶

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy is: {:.2f} %".format(accuracy*100))

sns.heatmap(cm, annot=True, fmt='d')

plt.show()

Accuracy is: 95.00 %

Classification Report¶

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.95      0.93      0.94        43
           1       0.95      0.96      0.96        57

    accuracy                           0.95       100
   macro avg       0.95      0.95      0.95       100
weighted avg       0.95      0.95      0.95       100

k-Fold Cross Validation¶

from sklearn.model_selection import cross_val_score

accuracies = cross_val_score(estimator = model, X = X_train, y = y_train, cv = 10)

print("Accuracy: {:.2f} %".format(accuracies.mean()*100))

print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Accuracy: 94.25 %
Standard Deviation: 3.54 %

Improving the Model¶

Parameter Optimisation¶

- Applying Grid Search to find the best parameters

sklearn.svm.SVC Parameters

from sklearn.model_selection import GridSearchCV

parameters = [
    
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10], 'kernel': ['linear']},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10], 'kernel': ['poly'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10], 'kernel': ['sigmoid'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]}
 
 ]


grid_model = GridSearchCV(estimator = SVC(),
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,
                           n_jobs = -1)

grid_model.fit(X_train, y_train)

y_grid_pred = grid_model.predict(X_test)

best_accuracy = grid_model.best_score_

best_parameters = grid_model.best_params_

print("Best Accuracy: {:.2f} %".format(best_accuracy*100))

print("Best Parameters:", best_parameters)

Best Accuracy: 94.75 %
Best Parameters: {'C': 0.1, 'gamma': 0.8, 'kernel': 'rbf'}

Confusion Matrix¶

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_grid_pred)

sns.heatmap(cm, annot=True , fmt='d')

plt.show()

	Customer ID	Age	401K Savings	Retire
0	0	39.180417	322349.8740	0
1	1	56.101686	768671.5740	1
2	2	57.023043	821505.4718	1
3	3	43.711358	494187.4850	0
4	4	54.728823	691435.7723	1