Random Forest Classification¶

Bank Note Authentication¶

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.

Importing the Relevant Libraries¶

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Importing the Dataset¶

url = "https://datascienceschools.github.io/Machine_Learning/Classification_Models_CaseStudies/BankNote_Authentication.csv"

df = pd.read_csv(url)

df.head()

Declaring the Dependent & the Independent Variables¶

X = df.iloc[:,:-1].values

y = df.iloc[:, -1].values

Splitting the Dataset into the Training Set and Test Set¶

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

Training the Random Forest Model¶

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()

model.fit(X_train, y_train)

RandomForestClassifier()

Feature Importances¶

df_feature = df.drop('class', axis=1)

feature_importances = pd.DataFrame(data = df_feature.columns.values, columns = ['Features'])

feature_importances['Importance'] =  model.feature_importances_

feature_importances.sort_values('Importance',ascending=False)

Predicting the Test Set Results¶

y_pred = model.predict(X_test)

Confusion Matrix¶

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy is: {:.2f} %".format(accuracy*100))

sns.heatmap(cm, annot=True, fmt='d')

plt.show()

Accuracy is: 99.03 %

Classification Report¶

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      0.99      0.99       232
           1       0.98      0.99      0.99       180

    accuracy                           0.99       412
   macro avg       0.99      0.99      0.99       412
weighted avg       0.99      0.99      0.99       412

K-Fold Cross Validation¶

from sklearn.model_selection import cross_val_score

accuracies = cross_val_score(estimator = model, X = X_train, y = y_train, cv = 9)

print("Accuracy: {:.2f} %".format(accuracies.mean()*100))

print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Accuracy: 99.37 %
Standard Deviation: 0.77 %

Saving the Model¶

import pickle

pickle_out = open("RFClassifier.pkl","wb")

pickle.dump(model, pickle_out)

pickle_out.close()

Making Prediction (New Dataset)¶

url = "https://datascienceschools.github.io/Machine_Learning/Classification_Models_CaseStudies/BankNote_Authentication_Test.csv"

new_data = pd.read_csv(url)

new_data.head()

Declaring Independent Variables¶

X_test = new_data.iloc[:,:].values

Predicting Dependent Variable (class)¶

y_pred_test = model.predict(X_test)

new_data['predicted_Survive'] = y_pred_test

new_data

	variance	skewness	curtosis	entropy
0	3.62160	8.6661	-2.8073	-0.44699
1	4.54590	8.1674	-2.4586	-1.46210
2	3.86600	-2.6383	1.9242	0.10645
3	3.45660	9.5228	-4.0112	-3.59440
4	0.32924	-4.4552	4.5718	-0.98880

	Features	Importance
0	variance	0.540481
1	skewness	0.235015
2	curtosis	0.168179
3	entropy	0.056325

	variance	skewness	curtosis	entropy
0	3.62160	8.6661	-2.8073	-0.44699
1	4.54590	8.1674	-2.4586	-1.46210
2	3.86600	-2.6383	1.9242	0.10645
3	3.45660	9.5228	-4.0112	-3.59440
4	-0.47465	-4.3496	1.9901	0.75170

	variance	skewness	curtosis	entropy	predicted_Survive
0	3.62160	8.66610	-2.80730	-0.44699	0
1	4.54590	8.16740	-2.45860	-1.46210	0
2	3.86600	-2.63830	1.92420	0.10645	0
3	3.45660	9.52280	-4.01120	-3.59440	0
4	-0.47465	-4.34960	1.99010	0.75170	1
5	1.05520	1.18570	-2.64110	0.11033	1
6	1.16440	3.80950	-4.94080	-4.09090	1
7	-4.47790	7.37080	-0.31218	-6.77540	1
8	-2.73380	0.45523	2.43910	0.21766	1