Association Rule Mining via Eclat

Market Basket Optimisation

Overview

- Installing Apyori

- Importing the Relevant Libraries

- Loading the Data

- Data Preprocessing

- Training the Eclat Model

- Two Ways to Display the Result

    - 1. Displaying the products & the support

    - 2. Displaying the results in a Table (Pandas Dataframe)

Installing Apyori

In [1]:
!pip install apyori
Requirement already satisfied: apyori in /home/bahar/anaconda3/lib/python3.7/site-packages (1.1.2)

Importing the Relevant Libraries

In [2]:
import numpy as np
import pandas as pd

Loading the Data

- csv file does not have header

- while reading the csv file -> set header = None 
In [3]:
url = "https://DataScienceSchools.github.io/Machine_Learning/Unsupervised_Learning/Association_Rule/Market_Basket_Optimisation.csv"

df = pd.read_csv(url, header = None)

df.head()
Out[3]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 shrimp almonds avocado vegetables mix green grapes whole weat flour yams cottage cheese energy drink tomato juice low fat yogurt green tea honey salad mineral water salmon antioxydant juice frozen smoothie spinach olive oil
1 burgers meatballs eggs NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 chutney NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 turkey avocado NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 mineral water milk energy bar whole wheat rice green tea NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Data Preprocessing

- The apriori class accepts a list of lists not a pandas dataframe

- Solution:

    - converting pandas dataframe into a list of lists

    - a list including each transaction in the dataset 

    - transactions = [] -> creating an empty list

    - for i in range(0, 7501) -> loop over all rows (7502)

    - for j in range(0, 20) -> loop over all columns in each row

    - transactions.append -> appending data to the list (transactions)

    - df.values[i,j] -> getting the data of each cell (row i & collumn j)

    - str(df.values[i,j]) -> converting data to string

    - transactions -> displaying list of transactions
In [4]:
transactions = []

for i in range(0, 7501):
    
  transactions.append([str(df.values[i,j]) for j in range(0, 20)])

Training the Eclat Model

In [5]:
from apyori import apriori

relations = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

results = list(relations)

Two Ways to Display the Result:

1. Displaying Products & Support

In [6]:
def inspect(results_list): 
    for item in results_list:

        pair = item[0] 
        items = [x for x in pair]
        print("Product 1:", items[0] + ", Product 2:" + items[1])

        print("\nSupport: " + str(item[1]))

        print("\n=====================================\n")
        
inspect(results)
Product 1: chicken, Product 2:light cream

Support: 0.004532728969470737

=====================================

Product 1: escalope, Product 2:mushroom cream sauce

Support: 0.005732568990801226

=====================================

Product 1: escalope, Product 2:pasta

Support: 0.005865884548726837

=====================================

Product 1: fromage blanc, Product 2:honey

Support: 0.003332888948140248

=====================================

Product 1: ground beef, Product 2:herb & pepper

Support: 0.015997866951073192

=====================================

Product 1: ground beef, Product 2:tomato sauce

Support: 0.005332622317024397

=====================================

Product 1: olive oil, Product 2:light cream

Support: 0.003199573390214638

=====================================

Product 1: olive oil, Product 2:whole wheat pasta

Support: 0.007998933475536596

=====================================

Product 1: shrimp, Product 2:pasta

Support: 0.005065991201173177

=====================================

2. Displaying the results in a Table (Pandas Dataframe)

- results.nlargest(n = 10, columns = 'Support') 

    -> displaying the results in order based on support column

    -> n = 10 -> number of items to display
In [7]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    return list(zip(lhs, rhs, supports))

results_table = pd.DataFrame(inspect(results), columns = ['Product 1', 'Product 2', 'Support'])

results_table.nlargest(n = 10, columns = 'Support')
Out[7]:
Product 1 Product 2 Support
4 herb & pepper ground beef 0.015998
7 whole wheat pasta olive oil 0.007999
2 pasta escalope 0.005866
1 mushroom cream sauce escalope 0.005733
5 tomato sauce ground beef 0.005333
8 pasta shrimp 0.005066
0 light cream chicken 0.004533
3 fromage blanc honey 0.003333
6 light cream olive oil 0.003200