Artificial Neural Network (Binary Classification)

Bank Customer Churn

Overview

Part 1 - Data Preprocessing

    - Importing the Relevant Libraries

    - Loading the Data

    - Declaring the Dependent and the Independent variables 

    - Encoding Categorical Data ("Gender" column)

    - One Hot Encoding the "Geography" column

    - Splitting Training set & Test set

    - Feature Scaling


Part 2 - Building the Artificial Neural Network (ANN)

    - Initializing the ANN

    - Adding the Input Layer and the first Hidden Layer

    - Adding the second Hidden Layer

    - Adding the output layer


Part 3 - Training the Artificial Neural Network (ANN)

    - Compiling the ANN with optimizer, loss & metrics 

    - Training the ANN on the Training set


Part 4 - Making the Predictions and Evaluating the Model

    - Predicting the Test set

    - Confusion Matrix & Accuracy

    - Predicting a Single Observation 

Part 1 - Data Preprocessing

Importing the Relevant Libraries

Loading the Data

Declaring the Dependent and the Independent variables

- Dropping unnecessary columns for X: 

        - RowNumber  (Index 0)
        - CustomerId (Index 1)
        - Surname    (Index 2) 
        - Exited     (Index -1)

Label Encoding "Gender" column

- Index of "Gender" column is 2

One Hot Encoding the "Geography" column

- Index of "Geography" column is 1

- Result of Geography encoding can be seen in the first 3 Columns

Splitting Training set & Test set

Feature Scaling

Part 2 - Building the Artificial Neural Network (ANN)

Initializing the ANN

- ann -> object of Sequential Class from models module of keras Library belonging to tenorflow

- Sequential Class -> allowing to build the Artificial Neural Network as sequence of layers

Adding the Input Layer and the first Hidden Layer

- add method -> adding a fully connected layer 

- Dense class from layers module of keras Library belonging to tenorflow

- a fully connected layer as an object of Dense class 

- units = 6 -> Number of hidden neurons (Based on experimentation)

- activation='relu' -> activation function in hidden layer of fully connected neural network must be 'relu'

- ReLU -> The rectified linear Unit activation function

Adding the second Hidden Layer

Adding the output layer

- units=1 -> one neuron for binary output (0 or 1)

- activation='sigmoid' -> 'sigmoid' is the activation function of output layer for binary output

- a sigmoid activation function allows to get 

        - the predictions (whether the customers leave the bank or not) &

        - the probabilities that the binary outcome is one (the probability that the customer will leave)

- sigmoid functions most often show a return value (y axis) in the range 0 to 1

Part 3 - Training the Artificial Neural Network (ANN)

        1. Compiling the ANN with optimizer, loss & metrics 

        2. Training the ANN on the Training set

Compiling the ANN with optimizer, loss & metrics

- optimizer = 'adam' 

        -> The Adam optimization algorithm performing stochastic gradient descent


- loss = 'binary_crossentropy' 

        -> always 'binary_crossentropy' for Binary Classificaionn (Binary Output)


- metrics = ['accuracy']

       -> 'accuracy' is the most essential metrics to evaluate the ANN during training

Training the ANN on the Training set

- fit method -> for training the model

- batch_size = 32 

     -> 32 is the defult value

     -> propagating in batches (Compare 32 output with 32 Target) 

     -> Instead of propagating one by one (comparing one output by one Target)

- epochs = 100 -> the number of iterations

- accuracy: 0.8637 -> out of 100 observations, 86 correct predictions

Part 4 - Making the Predictions and Evaluating the Model

- Predicting the Test set

- Confusion Matrix & Accuracy

- Predicting the Result of a Single Observation    

Predicting the Test set

- ann will return this prediction in the form of a probability instead of getting the final outcome (0/1) 

- the customer will leave the bank (1) or stay in the bank (0)

- if you don't want the outcome in the form of a probability.

- the trick to convert this into the final prediction: (y_pred > 0.5)

        - choose a threshold of 0.5 
        - the predicted probability > 0.5 -> the outcome: 1, the customer will leave the bank
        - the predicted probability < 0.5 -> the oucome: 0, the customer will not leave the bank
        - update & save canges -> y_pred = 

- np.concatenate -> Putting y_pred & y_test next to each other



        - np.concatenate((first argument), second_argument)

        - first argument: y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)

        - second_argument :0 -> Horizontal

        - second_argument :1 -> Vertical


 - Comparing y_pred & y_test horizontally:

        -> y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)

        -> Changing vectors from being horizontal to vertical

Confusion Matrix & Accuracy

Predicting a Single Observation

- Predicting if the customer with the following informations will leave the bank: 

   - Geography: France -> It was encoded as "1, 0, 0" -> 1, 0, 0
   - Credit Score: 600 -> 600 
   - Gender: Male -> It was encoded as "1" -> 1
   - Age: 40 years old -> 40
   - Tenure: 3 years -> 3
   - Balance: $ 60000 -> 60000
   - Number of Products: 2 -> 2 
   - Does this customer have a credit card ? Yes -> 1
   - Is this customer an Active Member: Yes -> 1
   - Estimated Salary: $ 50000 -> 50000


- The predict method always expects a 2D array as the format of its inputs

    -> the values of the features were all input in a double pair of square brackets

- sc.transform -> Scaling the observation

- If the predicted probability  > 0.5, the result is true (1)

- If the predicted probability  < 0.5, the result is false (0)

Result

- 0.02919498 -> The probability that this customer will leave

- False -> This customer will not leave the bank