GridSearchCV (Parameter Optimisation)

Support Vector Machine (Breast Cancer Wisconsin)

Importing the Relevant Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Importing the Dataset

In [2]:
from sklearn.datasets import load_breast_cancer

dataset = load_breast_cancer()

df = pd.DataFrame(np.c_[dataset['data'], dataset['target']], columns = np.append(dataset['feature_names'], ['target']))

df.head()
Out[2]:
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension target
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 ... 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890 0.0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 ... 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902 0.0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 ... 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758 0.0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 ... 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300 0.0
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 ... 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678 0.0

5 rows × 31 columns

Declaring the Dependent & the Independent Variables

In [3]:
X = df.iloc[:, :-1].values

y = df.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 5)

Feature Scaling

In [5]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

GridSearchCV (Parameter Optimisation)

- Applying Grid Search to find the best parameters

sklearn.svm.SVC Parameters

In [6]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

parameters = [
    
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10, 100, 1000], 'kernel': ['linear']},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10, 100, 1000], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10, 100, 1000], 'kernel': ['poly'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10, 100, 1000], 'kernel': ['sigmoid'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]}
 
 ]


grid_model = GridSearchCV( estimator = SVC(),
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,
                           n_jobs = -1)

grid_model.fit(X_train, y_train)

best_accuracy = grid_model.best_score_

best_parameters = grid_model.best_params_

print("Best Accuracy: {:.2f} %".format(best_accuracy*100))
print("Best Parameters:", best_parameters)
Best Accuracy: 98.25 %
Best Parameters: {'C': 0.25, 'kernel': 'linear'}

Training the SVM Model with Best Parameters

In [7]:
from sklearn.svm import SVC

model = SVC(kernel = 'linear', C = 0.25)

model.fit(X_train, y_train)
Out[7]:
SVC(C=0.25, kernel='linear')

Predicting the Test Set Results

In [8]:
y_pred = model.predict(X_test)

Confusion Matrix

In [9]:
from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy is: {:.2f} %".format(accuracy*100))

sns.heatmap(cm, annot=True , fmt='d')

plt.show()
Accuracy is: 97.37 %