Random Forest Classification

Bank Note Authentication

Download Dataset

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.

Importing the Relevant Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Importing the Dataset

In [2]:
url = "https://datascienceschools.github.io/Machine_Learning/Classification_Models_CaseStudies/BankNote_Authentication.csv"

df = pd.read_csv(url)

df.head()
Out[2]:
variance skewness curtosis entropy class
0 3.62160 8.6661 -2.8073 -0.44699 0
1 4.54590 8.1674 -2.4586 -1.46210 0
2 3.86600 -2.6383 1.9242 0.10645 0
3 3.45660 9.5228 -4.0112 -3.59440 0
4 0.32924 -4.4552 4.5718 -0.98880 0

Declaring the Dependent & the Independent Variables

In [3]:
X = df.iloc[:,:-1].values

y = df.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

Training the Random Forest Model

In [5]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()

model.fit(X_train, y_train)
Out[5]:
RandomForestClassifier()

Feature Importances

In [6]:
df_feature = df.drop('class', axis=1)

feature_importances = pd.DataFrame(data = df_feature.columns.values, columns = ['Features'])

feature_importances['Importance'] =  model.feature_importances_

feature_importances.sort_values('Importance',ascending=False)
Out[6]:
Features Importance
0 variance 0.540481
1 skewness 0.235015
2 curtosis 0.168179
3 entropy 0.056325

Predicting the Test Set Results

In [7]:
y_pred = model.predict(X_test)

Confusion Matrix

In [8]:
from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy is: {:.2f} %".format(accuracy*100))

sns.heatmap(cm, annot=True, fmt='d')

plt.show()
Accuracy is: 99.03 %

Classification Report

In [9]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

           0       1.00      0.99      0.99       232
           1       0.98      0.99      0.99       180

    accuracy                           0.99       412
   macro avg       0.99      0.99      0.99       412
weighted avg       0.99      0.99      0.99       412

K-Fold Cross Validation

In [10]:
from sklearn.model_selection import cross_val_score

accuracies = cross_val_score(estimator = model, X = X_train, y = y_train, cv = 9)

print("Accuracy: {:.2f} %".format(accuracies.mean()*100))

print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))
Accuracy: 99.37 %
Standard Deviation: 0.77 %

Saving the Model

In [11]:
import pickle

pickle_out = open("RFClassifier.pkl","wb")

pickle.dump(model, pickle_out)

pickle_out.close()

Making Prediction (New Dataset)

In [12]:
url = "https://datascienceschools.github.io/Machine_Learning/Classification_Models_CaseStudies/BankNote_Authentication_Test.csv"

new_data = pd.read_csv(url)

new_data.head()
Out[12]:
variance skewness curtosis entropy
0 3.62160 8.6661 -2.8073 -0.44699
1 4.54590 8.1674 -2.4586 -1.46210
2 3.86600 -2.6383 1.9242 0.10645
3 3.45660 9.5228 -4.0112 -3.59440
4 -0.47465 -4.3496 1.9901 0.75170

Declaring Independent Variables

In [13]:
X_test = new_data.iloc[:,:].values

Predicting Dependent Variable (class)

In [14]:
y_pred_test = model.predict(X_test)

new_data['predicted_Survive'] = y_pred_test

new_data
Out[14]:
variance skewness curtosis entropy predicted_Survive
0 3.62160 8.66610 -2.80730 -0.44699 0
1 4.54590 8.16740 -2.45860 -1.46210 0
2 3.86600 -2.63830 1.92420 0.10645 0
3 3.45660 9.52280 -4.01120 -3.59440 0
4 -0.47465 -4.34960 1.99010 0.75170 1
5 1.05520 1.18570 -2.64110 0.11033 1
6 1.16440 3.80950 -4.94080 -4.09090 1
7 -4.47790 7.37080 -0.31218 -6.77540 1
8 -2.73380 0.45523 2.43910 0.21766 1