- Installing Apyori
- Importing the Relevant Libraries
- Loading the Data
- Data Preprocessing
- Training the Eclat Model
- Two Ways to Display the Result
- 1. Displaying the products & the support
- 2. Displaying the results in a Table (Pandas Dataframe)
!pip install apyori
import numpy as np
import pandas as pd
- csv file does not have header
- while reading the csv file -> set header = None
url = "https://DataScienceSchools.github.io/Machine_Learning/Unsupervised_Learning/Association_Rule/Market_Basket_Optimisation.csv"
df = pd.read_csv(url, header = None)
df.head()
- The apriori class accepts a list of lists not a pandas dataframe
- Solution:
- converting pandas dataframe into a list of lists
- a list including each transaction in the dataset
- transactions = [] -> creating an empty list
- for i in range(0, 7501) -> loop over all rows (7502)
- for j in range(0, 20) -> loop over all columns in each row
- transactions.append -> appending data to the list (transactions)
- df.values[i,j] -> getting the data of each cell (row i & collumn j)
- str(df.values[i,j]) -> converting data to string
- transactions -> displaying list of transactions
transactions = []
for i in range(0, 7501):
transactions.append([str(df.values[i,j]) for j in range(0, 20)])
from apyori import apriori
relations = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)
results = list(relations)
def inspect(results_list):
for item in results_list:
pair = item[0]
items = [x for x in pair]
print("Product 1:", items[0] + ", Product 2:" + items[1])
print("\nSupport: " + str(item[1]))
print("\n=====================================\n")
inspect(results)
- results.nlargest(n = 10, columns = 'Support')
-> displaying the results in order based on support column
-> n = 10 -> number of items to display
def inspect(results):
lhs = [tuple(result[2][0][0])[0] for result in results]
rhs = [tuple(result[2][0][1])[0] for result in results]
supports = [result[1] for result in results]
return list(zip(lhs, rhs, supports))
results_table = pd.DataFrame(inspect(results), columns = ['Product 1', 'Product 2', 'Support'])
results_table.nlargest(n = 10, columns = 'Support')