Association Rule Mining via Apriori¶

Market Basket Optimisation¶

Source of Association rule mining & Apriori Algorithm

Association rule mining:¶

- a technique to identify underlying relations between different items

- identifying an associations between products to generate more profit

- Example:

- a Super Market where customers can buy variety of items

- there is a pattern in what the customers buy

    - mothers with babies buy baby products such as milk and diaper

    - bachelors may buy beers and chips


- If item A and B are bought together more frequently:

    - A and B can be placed together 

    - Discounts can be offered on these products if the customer buys both of them

Apriori Algorithm¶

- Three major components of Apriori algorithm:

            - Support
            - Confidence
            - Lift

- Suppose we have a record of 1000 customer transactions

    - find the Support, Confidence, and Lift for two items (burgers and ketchup)

- Out of 1000 transactions,

    - 100 transactions contain a ketchup 
    - 150 transactions contain a burger
    - 50 transactions contain Burger and Ketchup


- 1. Support (B): 

    - Support(B) = (Transactions containing(B))/(Total Transactions)

    - Support(Ketchup) = (Transactions containing Ketchup)/(Total Transactions)

    - Support(Ketchup) = 100/1000 = 10%


- 2. Confidence (A→B):

    - refers to the likelihood of buying item B if item A is purchsed

    - Confidence(A→B) = (Transactions containing both)/(Transactions containing A)

    - Confidence(B→K) = (Transactions containing B&K)/(Transactions containing B)

    - Confidence(Burger→Ketchup) = 50/150 = 33.3%


- 3. Lift (A -> B):

    - refers to the increase in the ratio of sale of B when A is sold

    - Lift(A→B) = (Confidence (A→B))/(Support (B))

    - Lift(Burger→Ketchup) = (Confidence (Burger→Ketchup))/(Support (Ketchup))

    - Lift(Burger→Ketchup) = 33.3/10 = 3.33

    - The likelihood of buying both is 3.33 times more than only the ketchup

    - Lift = 1 means there is no association between products A and B

    - Lift > 1 means products A and B are more likely to be bought together

    - Lift < 1 means products A and B are unlikely to be bought together

Overview¶

- Installing Apyori

- Importing the Relevant Libraries

- Loading the Data

- Data Preprocessing

- Training the Apriori Model

- Two Ways to Display the Result

    - 1. Displaying Rule, Support, Confidence & Lift

    - 2. Displaying the results in a Table (Pandas Dataframe)

Installing Apyori¶

!pip install apyori

Requirement already satisfied: apyori in /home/bahar/anaconda3/lib/python3.7/site-packages (1.1.2)

Importing the Relevant Libraries¶

import numpy as np
import pandas as pd

Loading the Data¶

- csv file does not have header

- while reading the csv file -> set header = None

url = "https://DataScienceSchools.github.io/Machine_Learning/Unsupervised_Learning/Association_Rule/Market_Basket_Optimisation.csv"

df = pd.read_csv(url, header = None)

df.head()

Data Preprocessing¶

- The apriori class accepts a list of lists not a pandas dataframe

- Solution:

    - converting pandas dataframe into a list of lists

    - a list including each transaction in the dataset 

    - transactions = [] -> creating an empty list

    - for i in range(0, 7501) -> loop over all rows (7502)

    - for j in range(0, 20) -> loop over all columns in each row

    - transactions.append -> appending data to the list (transactions)

    - df.values[i,j] -> getting the data of each cell (row i & collumn j)

    - str(df.values[i,j]) -> converting data to string

    - transactions -> displaying list of transactions

transactions = []

for i in range(0, 7501):
    
  transactions.append([str(df.values[i,j]) for j in range(0, 20)])

Training the Apriori Model¶

- apriori class from apyori library

- rules -> object of apriori class


* The apriori class parameters:

- transactions 

        - accepts the list of list(transactions)

- min_support 

        - selecting the items with support values greater than the value specified

- min_confidence 

        - selecting the rules with confidence greater than threshold specified

- min_lift 

        - specifing the minimum lift value

- min_length 

         - specifing the minimum number of items in rules


- Example: (dataset is for a one-week time period)

Let's suppose that we want rules for only the items that are purchased 

    - at least 3 times a day

    - 7 x 3 = 21 times in one week

- min_support for those items can be calculated as 

    - 21/7501 = 0.0027 -> almost 0.003

- min_confidence: 0.2 

- min_lift: 3 

- min_length & max_length: 2 

    - at least two products in the rules

    - maximun two products in the rules


- results -> converting the rules into a list -> easier to view the results

from apyori import apriori

rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

results = list(rules)

Two Ways to show the Result:¶

1. Displaying Rule, Support, Confidence & Lift¶

def inspect(results_list): 
    for item in results_list:

        pair = item[0] 
        items = [x for x in pair]
        print("Rule: " + items[0] + " -> " + items[1])

        print("Support: " + str(item[1]))

        print("Confidence: " + str(item[2][0][2]))
        print("Lift: " + str(item[2][0][3]))
        print("=====================================")
        
inspect(results)

Rule: light cream -> chicken
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
=====================================
Rule: escalope -> mushroom cream sauce
Support: 0.005732568990801226
Confidence: 0.3006993006993007
Lift: 3.790832696715049
=====================================
Rule: escalope -> pasta
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
=====================================
Rule: honey -> fromage blanc
Support: 0.003332888948140248
Confidence: 0.2450980392156863
Lift: 5.164270764485569
=====================================
Rule: ground beef -> herb & pepper
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
=====================================
Rule: ground beef -> tomato sauce
Support: 0.005332622317024397
Confidence: 0.3773584905660377
Lift: 3.840659481324083
=====================================
Rule: light cream -> olive oil
Support: 0.003199573390214638
Confidence: 0.20512820512820515
Lift: 3.1147098515519573
=====================================
Rule: whole wheat pasta -> olive oil
Support: 0.007998933475536596
Confidence: 0.2714932126696833
Lift: 4.122410097642296
=====================================
Rule: pasta -> shrimp
Support: 0.005065991201173177
Confidence: 0.3220338983050847
Lift: 4.506672147735896
=====================================

2. Displaying the results in a Table (Pandas Dataframe)¶

- results.nlargest(n = 10, columns = 'Lift') 

    -> displaying the results in order based on lift column

    -> n = 10 -> number of items to display

def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))

results_table = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

results_table.nlargest(n = 10, columns = 'Lift')

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	shrimp	almonds	avocado	vegetables mix	green grapes	whole weat flour	yams	cottage cheese	energy drink	tomato juice	low fat yogurt	green tea	honey	salad	mineral water	salmon	antioxydant juice	frozen smoothie	spinach	olive oil
1	burgers	meatballs	eggs	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	chutney	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	turkey	avocado	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	mineral water	milk	energy bar	whole wheat rice	green tea	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	Left Hand Side	Right Hand Side	Support	Confidence	Lift
3	fromage blanc	honey	0.003333	0.245098	5.164271
0	light cream	chicken	0.004533	0.290598	4.843951
2	pasta	escalope	0.005866	0.372881	4.700812
8	pasta	shrimp	0.005066	0.322034	4.506672
7	whole wheat pasta	olive oil	0.007999	0.271493	4.122410
5	tomato sauce	ground beef	0.005333	0.377358	3.840659
1	mushroom cream sauce	escalope	0.005733	0.300699	3.790833
4	herb & pepper	ground beef	0.015998	0.323450	3.291994
6	light cream	olive oil	0.003200	0.205128	3.114710