GridSearchCV (Parameter Optimisation)¶

Support Vector Machine (Breast Cancer Wisconsin)¶

Importing the Relevant Libraries¶

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Importing the Dataset¶

from sklearn.datasets import load_breast_cancer

dataset = load_breast_cancer()

df = pd.DataFrame(np.c_[dataset['data'], dataset['target']], columns = np.append(dataset['feature_names'], ['target']))

df.head()

Declaring the Dependent & the Independent Variables¶

X = df.iloc[:, :-1].values

y = df.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set¶

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 5)

Feature Scaling¶

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

GridSearchCV (Parameter Optimisation)¶

- Applying Grid Search to find the best parameters

sklearn.svm.SVC Parameters

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

parameters = [
    
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10, 100, 1000], 'kernel': ['linear']},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10, 100, 1000], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10, 100, 1000], 'kernel': ['poly'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]},
{'C': [0.1, 0.25, 0.5, 0.75, 1, 10, 100, 1000], 'kernel': ['sigmoid'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]}
 
 ]


grid_model = GridSearchCV( estimator = SVC(),
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,
                           n_jobs = -1)

grid_model.fit(X_train, y_train)

best_accuracy = grid_model.best_score_

best_parameters = grid_model.best_params_

print("Best Accuracy: {:.2f} %".format(best_accuracy*100))
print("Best Parameters:", best_parameters)

Best Accuracy: 98.25 %
Best Parameters: {'C': 0.25, 'kernel': 'linear'}

Training the SVM Model with Best Parameters¶

from sklearn.svm import SVC

model = SVC(kernel = 'linear', C = 0.25)

model.fit(X_train, y_train)

SVC(C=0.25, kernel='linear')

Predicting the Test Set Results¶

y_pred = model.predict(X_test)

Confusion Matrix¶

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy is: {:.2f} %".format(accuracy*100))

sns.heatmap(cm, annot=True , fmt='d')

plt.show()

Accuracy is: 97.37 %

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	...	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension
0	17.99	10.38	122.80	1001.0	0.11840	0.27760	0.3001	0.14710	0.2419	0.07871	...	17.33	184.60	2019.0	0.1622	0.6656	0.7119	0.2654	0.4601	0.11890
1	20.57	17.77	132.90	1326.0	0.08474	0.07864	0.0869	0.07017	0.1812	0.05667	...	23.41	158.80	1956.0	0.1238	0.1866	0.2416	0.1860	0.2750	0.08902
2	19.69	21.25	130.00	1203.0	0.10960	0.15990	0.1974	0.12790	0.2069	0.05999	...	25.53	152.50	1709.0	0.1444	0.4245	0.4504	0.2430	0.3613	0.08758
3	11.42	20.38	77.58	386.1	0.14250	0.28390	0.2414	0.10520	0.2597	0.09744	...	26.50	98.87	567.7	0.2098	0.8663	0.6869	0.2575	0.6638	0.17300
4	20.29	14.34	135.10	1297.0	0.10030	0.13280	0.1980	0.10430	0.1809	0.05883	...	16.67	152.20	1575.0	0.1374	0.2050	0.4000	0.1625	0.2364	0.07678