CatBoost Classification¶

Breast Cancer Wisconsin¶

Installing CatBoost¶

!pip install catboost

Requirement already satisfied: catboost in /home/bahar/anaconda3/lib/python3.7/site-packages (0.24.3)
Requirement already satisfied: plotly in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (4.9.0)
Requirement already satisfied: graphviz in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (0.15)
Requirement already satisfied: matplotlib in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (3.2.2)
Requirement already satisfied: numpy>=1.16.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (1.18.5)
Requirement already satisfied: scipy in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (1.5.0)
Requirement already satisfied: pandas>=0.24.0 in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (1.0.5)
Requirement already satisfied: six in /home/bahar/anaconda3/lib/python3.7/site-packages (from catboost) (1.15.0)
Requirement already satisfied: retrying>=1.3.3 in /home/bahar/anaconda3/lib/python3.7/site-packages (from plotly->catboost) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /home/bahar/anaconda3/lib/python3.7/site-packages (from matplotlib->catboost) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from matplotlib->catboost) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from matplotlib->catboost) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/bahar/anaconda3/lib/python3.7/site-packages (from matplotlib->catboost) (1.2.0)
Requirement already satisfied: pytz>=2017.2 in /home/bahar/anaconda3/lib/python3.7/site-packages (from pandas>=0.24.0->catboost) (2020.1)

Attribute Information:¶

1) ID number
2) Diagnosis (M = malignant, B = benign)

3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)

Importing the Relevant Libraries¶

import numpy as np
import pandas as pd

Importing the Dataset¶

url = "https://DataScienceSchools.github.io/Machine_Learning/Sklearn/Case_Study/Classification/BreastCancerWisconsin/BreastCancer.csv"

dataset = pd.read_csv(url)

dataset.head()

Declaring the Dependent & the Independent Variables¶

X = dataset.iloc[:, :-1].values

y = dataset.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set¶

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

Training the CatBoost Classification Model¶

from catboost import CatBoostClassifier

model = CatBoostClassifier()

model.fit(X_train, y_train)

Predicting the Test Set Results¶

y_pred = model.predict(X_test)

Confusion Matrix¶

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy is: ", accuracy, "\n\n Confusion Matrix:\n\n ", cm)

Accuracy is:  0.9473684210526315 

 Confusion Matrix:

  [[103   4]
 [  5  59]]

K-Fold Cross Validation¶

from sklearn.model_selection import cross_val_score

accuracies = cross_val_score(estimator = model, X = X_train, y = y_train, cv = 10)

print("Accuracy: {:.2f} %".format(accuracies.mean()*100))

print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Accuracy: 97.26 %
Standard Deviation: 2.18 %

	Sample code number	Clump Thickness	Uniformity of Cell Size	Uniformity of Cell Shape	Marginal Adhesion	Single Epithelial Cell Size	Bare Nuclei	Bland Chromatin	Normal Nucleoli	Mitoses	Class
0	1000025	5	1	1	1	2	1	3	1	1	2
1	1002945	5	4	4	5	7	10	3	2	1	2
2	1015425	3	1	1	1	2	2	3	1	1	2
3	1016277	6	8	8	1	3	4	3	7	1	2
4	1017023	4	1	1	3	2	1	3	1	1	2