- a technique to identify underlying relations between different items
- identifying an associations between products to generate more profit
- Example:
- a Super Market where customers can buy variety of items
- there is a pattern in what the customers buy
- mothers with babies buy baby products such as milk and diaper
- bachelors may buy beers and chips
- If item A and B are bought together more frequently:
- A and B can be placed together
- Discounts can be offered on these products if the customer buys both of them
- Three major components of Apriori algorithm:
- Support
- Confidence
- Lift
- Suppose we have a record of 1000 customer transactions
- find the Support, Confidence, and Lift for two items (burgers and ketchup)
- Out of 1000 transactions,
- 100 transactions contain a ketchup
- 150 transactions contain a burger
- 50 transactions contain Burger and Ketchup
- 1. Support (B):
- Support(B) = (Transactions containing(B))/(Total Transactions)
- Support(Ketchup) = (Transactions containing Ketchup)/(Total Transactions)
- Support(Ketchup) = 100/1000 = 10%
- 2. Confidence (A→B):
- refers to the likelihood of buying item B if item A is purchsed
- Confidence(A→B) = (Transactions containing both)/(Transactions containing A)
- Confidence(B→K) = (Transactions containing B&K)/(Transactions containing B)
- Confidence(Burger→Ketchup) = 50/150 = 33.3%
- 3. Lift (A -> B):
- refers to the increase in the ratio of sale of B when A is sold
- Lift(A→B) = (Confidence (A→B))/(Support (B))
- Lift(Burger→Ketchup) = (Confidence (Burger→Ketchup))/(Support (Ketchup))
- Lift(Burger→Ketchup) = 33.3/10 = 3.33
- The likelihood of buying both is 3.33 times more than only the ketchup
- Lift = 1 means there is no association between products A and B
- Lift > 1 means products A and B are more likely to be bought together
- Lift < 1 means products A and B are unlikely to be bought together
- Installing Apyori
- Importing the Relevant Libraries
- Loading the Data
- Data Preprocessing
- Training the Apriori Model
- Two Ways to Display the Result
- 1. Displaying Rule, Support, Confidence & Lift
- 2. Displaying the results in a Table (Pandas Dataframe)
!pip install apyori
import numpy as np
import pandas as pd
- csv file does not have header
- while reading the csv file -> set header = None
url = "https://DataScienceSchools.github.io/Machine_Learning/Unsupervised_Learning/Association_Rule/Market_Basket_Optimisation.csv"
df = pd.read_csv(url, header = None)
df.head()
- The apriori class accepts a list of lists not a pandas dataframe
- Solution:
- converting pandas dataframe into a list of lists
- a list including each transaction in the dataset
- transactions = [] -> creating an empty list
- for i in range(0, 7501) -> loop over all rows (7502)
- for j in range(0, 20) -> loop over all columns in each row
- transactions.append -> appending data to the list (transactions)
- df.values[i,j] -> getting the data of each cell (row i & collumn j)
- str(df.values[i,j]) -> converting data to string
- transactions -> displaying list of transactions
transactions = []
for i in range(0, 7501):
transactions.append([str(df.values[i,j]) for j in range(0, 20)])
- apriori class from apyori library
- rules -> object of apriori class
* The apriori class parameters:
- transactions
- accepts the list of list(transactions)
- min_support
- selecting the items with support values greater than the value specified
- min_confidence
- selecting the rules with confidence greater than threshold specified
- min_lift
- specifing the minimum lift value
- min_length
- specifing the minimum number of items in rules
- Example: (dataset is for a one-week time period)
Let's suppose that we want rules for only the items that are purchased
- at least 3 times a day
- 7 x 3 = 21 times in one week
- min_support for those items can be calculated as
- 21/7501 = 0.0027 -> almost 0.003
- min_confidence: 0.2
- min_lift: 3
- min_length & max_length: 2
- at least two products in the rules
- maximun two products in the rules
- results -> converting the rules into a list -> easier to view the results
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)
results = list(rules)
def inspect(results_list):
for item in results_list:
pair = item[0]
items = [x for x in pair]
print("Rule: " + items[0] + " -> " + items[1])
print("Support: " + str(item[1]))
print("Confidence: " + str(item[2][0][2]))
print("Lift: " + str(item[2][0][3]))
print("=====================================")
inspect(results)
- results.nlargest(n = 10, columns = 'Lift')
-> displaying the results in order based on lift column
-> n = 10 -> number of items to display
def inspect(results):
lhs = [tuple(result[2][0][0])[0] for result in results]
rhs = [tuple(result[2][0][1])[0] for result in results]
supports = [result[1] for result in results]
confidences = [result[2][0][2] for result in results]
lifts = [result[2][0][3] for result in results]
return list(zip(lhs, rhs, supports, confidences, lifts))
results_table = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])
results_table.nlargest(n = 10, columns = 'Lift')