Foreword

  • Output options: the ‘tango’ syntax and the ‘readable’ theme.
  • Snippets and results.


Mining Association Rules

When supermarkets or bookstores analyze patterns of purchasing behavior, they want to find products that are often purchased together.

The next step is to build recommendation systems to generate cross-sales. For bookstores, people who bought book A might also be interested in book B or C. For supermarkets, if we know people purchase beer with diapers, can a deal a bundle induce them to buy more?

The concept of association

Customers have grocery carts that contain several items (samples) from a larger set of items (the grocery store stocks).

Association rule mining seeks to find affinities (affinity analysis) to predict patterns: if x is purchased, then y too.

Data miners work with transactions. From the dataset, they compute correlation and occurrence ratios. For example, the diapers-beer bundle is 0.67 correlated with 1.0 confidence (beer occurs 100% of the time with diapers).

Even smaller confidence rate can reveal purchasing patterns. Once these association rules are known, the goal is to encourage the behaviour with recommendation systems, pricing, coupon offers, advertising, special offers, and other marketing tactics.

The package

Load the specialized package and datasets.

library(arules)

Find out more about the arules package (and associated packages) and go visit the GitHub repository.

The dataset

The arules contains many datasets. Load the Groceries dataset; included in the package.

data(Groceries)

dim(Groceries)
## [1] 9835  169

We have a matrix of 9835 rows by 169 columns. Each row is a list of items that might appear in a grocery cart. However, Groceries is not a simple matrix as we would encounter in matrix algebra.

class(Groceries)
## [1] "transactions"
## attr(,"package")
## [1] "arules"

It is a special matrix.

summary(Groceries)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55 
##   16   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   46   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##        labels  level2           level1
## 1 frankfurter sausage meat and sausage
## 2     sausage sausage meat and sausage
## 3  liver loaf sausage meat and sausage

The reason why the matrix is ‘sparse’ is that very few of the items exist in any given grocery basket1. A cart ‘may’ contain one of the 169 possible items. The 9835 rows are samples of client’s baskets.

  • Item appears in a basket = 1.
  • Item does not appear = 0.

For each row, most of the cells are empty (0). There are very few items in a cart according to the summary statistics: a minimum of 1, a median of 3, and a maximum of 32.

Exploring the results – Items

We want to extract the relative frequency of occurrence of different items. For example, the item ‘yogurt’ appears in 1372 out of 9835 rows; in about 14% of cases.

14% alternatively means support, occurrence, expectation or frequency. These terms are interchangeable.

We set the support parameter to somewhere around 10%-15% and plot the results.

threshold <-  0.10

itemFrequencyPlot(Groceries, support = threshold)
grid(nx = NA, ny = NULL)
abline(h = threshold, lty = 1, lwd = 2, col = 'red3')

We stress that ‘yogurt’ has a 14% support and makes the cut.

We can experiment by changing support.

# 15 %
threshold <-  0.15

itemFrequencyPlot(Groceries, support = threshold)
grid(nx = NA, ny = NULL)
abline(h = threshold, lty = 1, lwd = 2, col = 'red3')

# 5 %
threshold <-  0.05

itemFrequencyPlot(Groceries, support = threshold, cex.names = 0.5)
grid(nx = NA, ny = NULL)
abline(h = threshold, lty = 1, lwd = 2, col = 'red3')

Exploring the results – Combos

We want to focus on items that occur with some meaningful frequency. apriori is a very commonly used algorithm for finding rules in the transaction data.

Rules take the form of ‘if LHS then RHS’: if the ‘Left Hand Side’ item occurs, the ‘Right Hand Side’ too item has a probability of occurring. Co-occurrence is often coined as ‘confidence’.

  • Milk-Butter’s support is over 10% (frequency).
    • Milk’s support is 25% (frequency).
  • Milk-Butter’s confidence is 40% (co-occurrence).

LHS is either one or multiple items, but RHS is always singular.

  • Milk-Butter.
  • (Milk, Bread)-Butter.

Set parameters and see what we can extract from the dataset.

  • support = 0.5 % (frequency).
  • conficence = 50 % (co-occurrence).
  • The other parameter are set by default.
    • minlen = 1 item; length.
    • maxlen = 10 items; length.
    • smax = 1 item; subset maximum size.

We recall from before that the mean-median length (number of item per cart) is: mean = 4.4 and median = 3.

apriori(Groceries, parameter = list(support = 0.005, confidence = 0.5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 49 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [120 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [120 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## set of 120 rules

We obtain 120 rules or LHS-RHS associations.

Alternatively, we set the parameters to:

  • support = 1 % (frequency).
  • conficence = 50 % (co-occurrence).
ruleset <- apriori(Groceries, parameter = list(support = 0.01,  confidence = 0.5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## set of 15 rules

We obtain gives 15 rules (from the 120 above) because support goes from 0.5% to 1% (like a pyramid, the more we rise, the fewer rules we find).

We examine the output in more detail.

summary(ruleset)
## set of 15 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3 
## 15 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       3       3       3       3       3       3 
## 
## summary of quality measures:
##     support          confidence          lift      
##  Min.   :0.01007   Min.   :0.5000   Min.   :1.984  
##  1st Qu.:0.01174   1st Qu.:0.5151   1st Qu.:2.036  
##  Median :0.01230   Median :0.5245   Median :2.203  
##  Mean   :0.01316   Mean   :0.5411   Mean   :2.299  
##  3rd Qu.:0.01403   3rd Qu.:0.5718   3rd Qu.:2.432  
##  Max.   :0.02227   Max.   :0.5862   Max.   :3.030  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835    0.01        0.5

The median cart contains 3 items or the median LHS-RHS association is 3.

Inspect the first rule.

inspect(ruleset[1])
##     lhs              rhs          support    confidence lift    
## [1] {curd,yogurt} => {whole milk} 0.01006609 0.5823529  2.279125

Rule #1 have 3 items (LHS = curd, yogurt, RHS = whole milk).

Inspect all the rules.

inspect(ruleset)
##      lhs                     rhs                   support confidence     lift
## [1]  {curd,                                                                   
##       yogurt}             => {whole milk}       0.01006609  0.5823529 2.279125
## [2]  {other vegetables,                                                       
##       butter}             => {whole milk}       0.01148958  0.5736041 2.244885
## [3]  {other vegetables,                                                       
##       domestic eggs}      => {whole milk}       0.01230300  0.5525114 2.162336
## [4]  {yogurt,                                                                 
##       whipped/sour cream} => {whole milk}       0.01087951  0.5245098 2.052747
## [5]  {other vegetables,                                                       
##       whipped/sour cream} => {whole milk}       0.01464159  0.5070423 1.984385
## [6]  {pip fruit,                                                              
##       other vegetables}   => {whole milk}       0.01352313  0.5175097 2.025351
## [7]  {citrus fruit,                                                           
##       root vegetables}    => {other vegetables} 0.01037112  0.5862069 3.029608
## [8]  {tropical fruit,                                                         
##       root vegetables}    => {other vegetables} 0.01230300  0.5845411 3.020999
## [9]  {tropical fruit,                                                         
##       root vegetables}    => {whole milk}       0.01199797  0.5700483 2.230969
## [10] {tropical fruit,                                                         
##       yogurt}             => {whole milk}       0.01514997  0.5173611 2.024770
## [11] {root vegetables,                                                        
##       yogurt}             => {other vegetables} 0.01291307  0.5000000 2.584078
## [12] {root vegetables,                                                        
##       yogurt}             => {whole milk}       0.01453991  0.5629921 2.203354
## [13] {root vegetables,                                                        
##       rolls/buns}         => {other vegetables} 0.01220132  0.5020921 2.594890
## [14] {root vegetables,                                                        
##       rolls/buns}         => {whole milk}       0.01270971  0.5230126 2.046888
## [15] {other vegetables,                                                       
##       yogurt}             => {whole milk}       0.02226741  0.5128806 2.007235

Inspect the rules sorted by lift.

inspect(sort(ruleset, by = 'lift'))
##      lhs                     rhs                   support confidence     lift
## [1]  {citrus fruit,                                                           
##       root vegetables}    => {other vegetables} 0.01037112  0.5862069 3.029608
## [2]  {tropical fruit,                                                         
##       root vegetables}    => {other vegetables} 0.01230300  0.5845411 3.020999
## [3]  {root vegetables,                                                        
##       rolls/buns}         => {other vegetables} 0.01220132  0.5020921 2.594890
## [4]  {root vegetables,                                                        
##       yogurt}             => {other vegetables} 0.01291307  0.5000000 2.584078
## [5]  {curd,                                                                   
##       yogurt}             => {whole milk}       0.01006609  0.5823529 2.279125
## [6]  {other vegetables,                                                       
##       butter}             => {whole milk}       0.01148958  0.5736041 2.244885
## [7]  {tropical fruit,                                                         
##       root vegetables}    => {whole milk}       0.01199797  0.5700483 2.230969
## [8]  {root vegetables,                                                        
##       yogurt}             => {whole milk}       0.01453991  0.5629921 2.203354
## [9]  {other vegetables,                                                       
##       domestic eggs}      => {whole milk}       0.01230300  0.5525114 2.162336
## [10] {yogurt,                                                                 
##       whipped/sour cream} => {whole milk}       0.01087951  0.5245098 2.052747
## [11] {root vegetables,                                                        
##       rolls/buns}         => {whole milk}       0.01270971  0.5230126 2.046888
## [12] {pip fruit,                                                              
##       other vegetables}   => {whole milk}       0.01352313  0.5175097 2.025351
## [13] {tropical fruit,                                                         
##       yogurt}             => {whole milk}       0.01514997  0.5173611 2.024770
## [14] {other vegetables,                                                       
##       yogurt}             => {whole milk}       0.02226741  0.5128806 2.007235
## [15] {other vegetables,                                                       
##       whipped/sour cream} => {whole milk}       0.01464159  0.5070423 1.984385

Lift favors situations where LHS and RHS are not abundant, but where the relatively few occurrences always happen together.

We seek the highest lift rates.

inspect(head(ruleset, by = 'lift'))
##     lhs                   rhs                   support confidence     lift
## [1] {citrus fruit,                                                         
##      root vegetables}  => {other vegetables} 0.01037112  0.5862069 3.029608
## [2] {tropical fruit,                                                       
##      root vegetables}  => {other vegetables} 0.01230300  0.5845411 3.020999
## [3] {root vegetables,                                                      
##      rolls/buns}       => {other vegetables} 0.01220132  0.5020921 2.594890
## [4] {root vegetables,                                                      
##      yogurt}           => {other vegetables} 0.01291307  0.5000000 2.584078
## [5] {curd,                                                                 
##      yogurt}           => {whole milk}       0.01006609  0.5823529 2.279125
## [6] {other vegetables,                                                     
##      butter}           => {whole milk}       0.01148958  0.5736041 2.244885

These rules have the highest levels of lift (the top two is over 3).

The fruits and vegetables involved have a relatively low frequency of occurrence compared to the other rules, but their support (frequency) and confidence (co-occurrence) are both relatively high.

Or we can sift the results and extract lift rates over 3.

goodrules <- ruleset[quality(ruleset)$lift > 3]

inspect(goodrules)
##     lhs                  rhs                   support confidence     lift
## [1] {citrus fruit,                                                        
##      root vegetables} => {other vegetables} 0.01037112  0.5862069 3.029608
## [2] {tropical fruit,                                                      
##      root vegetables} => {other vegetables} 0.01230300  0.5845411 3.020999

We obtain the 9 best rules.

We could carry on: with more functions to extract more results.



Visualization

Exploring new results

In the last case the parameters were:

  • support = 1 % (frequency).
  • conficence = 50 % (co-occurrence).

We change the parameters; they are too stringent. The new parameters are:

  • support = 0.5 % (frequency).
  • conficence = 35 % (co-occurrence).

We run another extraction.

ruleset2 <- apriori(Groceries, parameter = list(support = 0.005,  confidence = 0.35))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.35    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 49 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [120 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [357 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(ruleset2)
## set of 357 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3   4 
##  71 251  35 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   3.000   2.899   3.000   4.000 
## 
## summary of quality measures:
##     support           confidence          lift      
##  Min.   :0.005084   Min.   :0.3510   Min.   :1.388  
##  1st Qu.:0.005796   1st Qu.:0.4000   1st Qu.:1.822  
##  Median :0.007117   Median :0.4487   Median :2.087  
##  Mean   :0.009615   Mean   :0.4624   Mean   :2.169  
##  3rd Qu.:0.009964   3rd Qu.:0.5182   3rd Qu.:2.431  
##  Max.   :0.074835   Max.   :0.7000   Max.   :4.085  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835   0.005       0.35

We obtain 357 rules or LHS-LHS associations.

We inspect all the first 10 rules.

inspect(ruleset2[1:10])
##      lhs                      rhs                support     confidence
## [1]  {cake bar}            => {whole milk}       0.005592272 0.4230769 
## [2]  {mustard}             => {whole milk}       0.005185562 0.4322034 
## [3]  {pot plants}          => {whole milk}       0.006914082 0.4000000 
## [4]  {pasta}               => {whole milk}       0.006100661 0.4054054 
## [5]  {herbs}               => {root vegetables}  0.007015760 0.4312500 
## [6]  {herbs}               => {other vegetables} 0.007727504 0.4750000 
## [7]  {herbs}               => {whole milk}       0.007727504 0.4750000 
## [8]  {processed cheese}    => {whole milk}       0.007015760 0.4233129 
## [9]  {semi-finished bread} => {whole milk}       0.007117438 0.4022989 
## [10] {detergent}           => {whole milk}       0.008947636 0.4656085 
##      lift    
## [1]  1.655775
## [2]  1.691492
## [3]  1.565460
## [4]  1.586614
## [5]  3.956477
## [6]  2.454874
## [7]  1.858983
## [8]  1.656698
## [9]  1.574457
## [10] 1.822228

Visualizing the results

This complement package adds visualization functionalities.

library(arulesViz)

We plot the results.

The maximum level of lift is 4. We seek for higher lift rates, therefore, we look for darker shades of red.

All rules with a high lift seem to have support below 1%.

More scatter plot.

Matrix plots on a reduced sample (lift > 3).

## Itemsets in Antecedent (LHS)
##  [1] "{herbs}"                                            
##  [2] "{root vegetables,onions}"                           
##  [3] "{onions,other vegetables}"                          
##  [4] "{beef,other vegetables}"                            
##  [5] "{beef,whole milk}"                                  
##  [6] "{tropical fruit,curd}"                              
##  [7] "{pip fruit,whipped/sour cream}"                     
##  [8] "{tropical fruit,whipped/sour cream}"                
##  [9] "{citrus fruit,pip fruit}"                           
## [10] "{pip fruit,yogurt}"                                 
## [11] "{pip fruit,other vegetables}"                       
## [12] "{citrus fruit,root vegetables}"                     
## [13] "{citrus fruit,other vegetables}"                    
## [14] "{tropical fruit,root vegetables}"                   
## [15] "{other vegetables,whole milk,fruit/vegetable juice}"
## [16] "{other vegetables,whole milk,whipped/sour cream}"   
## [17] "{pip fruit,root vegetables,whole milk}"             
## [18] "{pip fruit,other vegetables,whole milk}"            
## [19] "{citrus fruit,root vegetables,whole milk}"          
## [20] "{citrus fruit,other vegetables,whole milk}"         
## [21] "{tropical fruit,root vegetables,whole milk}"        
## [22] "{tropical fruit,whole milk,yogurt}"                 
## [23] "{root vegetables,whole milk,yogurt}"                
## [24] "{tropical fruit,other vegetables,whole milk}"       
## [25] "{other vegetables,whole milk,yogurt}"               
## Itemsets in Consequent (RHS)
## [1] "{root vegetables}"  "{other vegetables}" "{yogurt}"          
## [4] "{tropical fruit}"

3D matrix plot.

## Itemsets in Antecedent (LHS)
##  [1] "{herbs}"                                            
##  [2] "{root vegetables,onions}"                           
##  [3] "{onions,other vegetables}"                          
##  [4] "{beef,other vegetables}"                            
##  [5] "{beef,whole milk}"                                  
##  [6] "{tropical fruit,curd}"                              
##  [7] "{pip fruit,whipped/sour cream}"                     
##  [8] "{tropical fruit,whipped/sour cream}"                
##  [9] "{citrus fruit,pip fruit}"                           
## [10] "{pip fruit,yogurt}"                                 
## [11] "{pip fruit,other vegetables}"                       
## [12] "{citrus fruit,root vegetables}"                     
## [13] "{citrus fruit,other vegetables}"                    
## [14] "{tropical fruit,root vegetables}"                   
## [15] "{other vegetables,whole milk,fruit/vegetable juice}"
## [16] "{other vegetables,whole milk,whipped/sour cream}"   
## [17] "{pip fruit,root vegetables,whole milk}"             
## [18] "{pip fruit,other vegetables,whole milk}"            
## [19] "{citrus fruit,root vegetables,whole milk}"          
## [20] "{citrus fruit,other vegetables,whole milk}"         
## [21] "{tropical fruit,root vegetables,whole milk}"        
## [22] "{tropical fruit,whole milk,yogurt}"                 
## [23] "{root vegetables,whole milk,yogurt}"                
## [24] "{tropical fruit,other vegetables,whole milk}"       
## [25] "{other vegetables,whole milk,yogurt}"               
## Itemsets in Consequent (RHS)
## [1] "{root vegetables}"  "{other vegetables}" "{yogurt}"          
## [4] "{tropical fruit}"

Dual-scale matrix plot.

## Itemsets in Antecedent (LHS)
##  [1] "{herbs}"                                            
##  [2] "{root vegetables,onions}"                           
##  [3] "{onions,other vegetables}"                          
##  [4] "{beef,other vegetables}"                            
##  [5] "{beef,whole milk}"                                  
##  [6] "{tropical fruit,curd}"                              
##  [7] "{pip fruit,whipped/sour cream}"                     
##  [8] "{tropical fruit,whipped/sour cream}"                
##  [9] "{citrus fruit,pip fruit}"                           
## [10] "{pip fruit,yogurt}"                                 
## [11] "{pip fruit,other vegetables}"                       
## [12] "{citrus fruit,root vegetables}"                     
## [13] "{citrus fruit,other vegetables}"                    
## [14] "{tropical fruit,root vegetables}"                   
## [15] "{other vegetables,whole milk,fruit/vegetable juice}"
## [16] "{other vegetables,whole milk,whipped/sour cream}"   
## [17] "{pip fruit,root vegetables,whole milk}"             
## [18] "{pip fruit,other vegetables,whole milk}"            
## [19] "{citrus fruit,root vegetables,whole milk}"          
## [20] "{citrus fruit,other vegetables,whole milk}"         
## [21] "{tropical fruit,root vegetables,whole milk}"        
## [22] "{tropical fruit,whole milk,yogurt}"                 
## [23] "{root vegetables,whole milk,yogurt}"                
## [24] "{tropical fruit,other vegetables,whole milk}"       
## [25] "{other vegetables,whole milk,yogurt}"               
## Itemsets in Consequent (RHS)
## [1] "{root vegetables}"  "{other vegetables}" "{yogurt}"          
## [4] "{tropical fruit}"

Sorted dual-scale matrix plot.

## Itemsets in Antecedent (LHS)
##  [1] "{other vegetables,whole milk,yogurt}"               
##  [2] "{other vegetables,whole milk,whipped/sour cream}"   
##  [3] "{citrus fruit,other vegetables}"                    
##  [4] "{tropical fruit,whole milk,yogurt}"                 
##  [5] "{beef,whole milk}"                                  
##  [6] "{onions,other vegetables}"                          
##  [7] "{beef,other vegetables}"                            
##  [8] "{pip fruit,other vegetables,whole milk}"            
##  [9] "{herbs}"                                            
## [10] "{citrus fruit,other vegetables,whole milk}"         
## [11] "{tropical fruit,other vegetables,whole milk}"       
## [12] "{tropical fruit,whipped/sour cream}"                
## [13] "{other vegetables,whole milk,fruit/vegetable juice}"
## [14] "{tropical fruit,curd}"                              
## [15] "{tropical fruit,root vegetables,whole milk}"        
## [16] "{citrus fruit,root vegetables,whole milk}"          
## [17] "{pip fruit,root vegetables,whole milk}"             
## [18] "{pip fruit,whipped/sour cream}"                     
## [19] "{root vegetables,onions}"                           
## [20] "{citrus fruit,root vegetables}"                     
## [21] "{tropical fruit,root vegetables}"                   
## [22] "{pip fruit,yogurt}"                                 
## [23] "{pip fruit,other vegetables}"                       
## [24] "{root vegetables,whole milk,yogurt}"                
## [25] "{citrus fruit,pip fruit}"                           
## Itemsets in Consequent (RHS)
## [1] "{tropical fruit}"   "{other vegetables}" "{yogurt}"          
## [4] "{root vegetables}"

Interactive visualization

We can turn the plot into an interactive tool in Rstudio.

  • Select zones and Inspect the results.
  • Select zones and Zoom in or Zoom out the data points.
  • Filter the plot setting a lift level.
  • End the visualization.
# The code
plot(ruleset2, method = NULL, measure = c("support", "confidence"), shading = "lift", interactive = TRUE)

A .gif example.

We can also generate an interactive plot with the plotly package, save the plot for an online document.

plotly_arules(ruleset2, method = "scatterplot", measure = c("support", "confidence"), shading = "lift", max = 100)

A .gif example.

If we push further, the output can be integrated into the Shiny app for example to create online interactive dashboards.



Takeaway

The key takeaway here is that using good visualization tools to examine the preliminary results enhances the process of sorting through the rules and finding the most promising lifts.

There might be other constraints then maximixing lift. We might want to find the best association for a given item or find out B-type combos that don’t demonstrate the highest lifts, but have the highest profitability.

Once we have identified the best lifts according to our objectives and formulated the final conclusion, we can recommend marketing recipes to turn forecasts into actuals.

Other industries

The arules contain additional dataset.

  • Adult
  • Epub
  • Income
  • SunBai

Tools

Data mining can sometimes benefit from less coding.

A GUI provides less friction when it comes to intense exploration and analyses: Rattle by Togaware.




  1. Association rule mining is very similar to text mining and topic analysis (know as Natural Language Processing or NLP). In NLP, we also deal with sparse matrices (corpus of documents), frequencies, co-occurrences, and associations (ngrams). Check out the qdap and tm packages.