Association Rule Mining

Overview

Association rule mining is a data mining technique that uncovers relationships or associations between different items within a transaction (or “basket”) dataset. These relationships are called rules, which describe a particular pattern in the data. They consist of a left-hand side and a right-hand side. The order of these rules matters.

Support, Confidence, and Lift

The key measures in association rule mining are support, confidence, and lift.

  • Support is the proportion of transactions in the dataset that contain both items in the rule. It measures the frequency of the rule in the dataset. Support is the joint probability of observing both itemsets in the same transaction.

Support(A,B) = P(A \cap B)

  • Confidence is the conditional probability that an item on the right-hand side of the rule appears in a transaction given that the item on the left-hand side also appears in the same transaction. From Bayes’ Theorem:

Confidence(A,B) = P(B|A) = \frac{P(A\cap B)}{P(A)} = \frac{Support(A,B)}{P(A)}

  • Lift measures the degree of dependence between the items. A lift of one indicates that the left and right-hand sides are independent. If the lift value is greater than one or less than one, then the itemsets are positively or negatively correlated, respectively.

Lift(A,B) = \frac{P(A \cap B)}{P(A)P(B)} = \frac{Support(A,B)}{P(A)P(B)}

Apriori

Apriori is an algorithm for mining frequent patterns in a dataset. The Apriori algorithm works on the principle of the Apriori property, which states that any subset of a frequent itemset must also be frequent. In turn, any superset of an infrequent set must also be infrequent. It works by generating candidate itemsets of increasing size and checking their frequency in the dataset. At each iteration, the algorithm prunes the candidate itemsets that do not meet a minimum confidence/support threshold. Additionally, the algorithm also prunes the supersets of these infrequent sets. Apriori pruning reduces the number of candidate itemsets that need to be checked for frequency in the dataset, making the algorithm more efficient.

In the context of this project, association rule mining will reveal the relationships between economic and political indicators. We know that the World Bank uses income levels to determine how countries receive development loans. ARM may be useful in determining how political institutions/orientation, economic conditions, and geography are related. Understanding these rules could help researchers understand what attributes contribute to foreign aid donations.

Data Prep

Association rule mining requires only unlabeled transactional data. Since the dataset of this project is primarily quantitative, additional qualitative data needed to be collected. The Inter-American Development Bank (IDA) maintains a dataset of countries and their political institutions. Additionally, the World Bank classifies countries by income level and lending category. The lending category indicates which bank, IDA or IRBD, lends to that country. “The IRBD primarily lends to middle-income and creditworthy low-income countries, while the IDA provides interest-free loans – called credits – and grants to governments of the poorest countries” (World Bank). International financial institutions use these qualitative indicators to inform their lending decisions. Aid donor countries may also use these indicators to decide how they allocate foreign aid. The aforementioned qualitative variables were transformed into a “transaction” dataset for analysis with association rule mining.

Code

Results

For this ARM analysis, a minimum confidence of 0.3 and a minimum support of 0.2 provided the best results.

Frequency Plot for Top 10 items
Top 15 rules, ordered by support, confidence, and lift
Network of top 15 rules by lift, support, and confidence (Click for interactive version)
Top 15 by a support
Top 15 by confidence
Top 15 by lift

Conclusions

ARM was able to uncover some interesting rules about the relationship between economic, political, and geographic indicators. Specifically, the method discovered rules that relate income-level with lending category, as expected. Additionally, ARM uncovered rules regarding lending category and political institution. This result affirms the hypothesis that political institutions play a role in foreign aid allocation, as well as international finance. The results from ARM are useful not only for this analysis, but also for research that pertains to how political systems inform the way development loans are allocated.