Data Portfolio
Association Rules Analysis (R)
NIH Spending Categories
National Institutes of Health (NIH) funds research in more than 280 areas with investments totaling over $32 billion per year. Grants are funded by one institute however funds may be categorized under several spending categories. The objective of this analysis is to determine common groupings of spending categories assigned to grants that may indicate how funding could be reallocated to other spending categories upon the dissolution of an NIH entity.
All spending categories are included in the data set, but the focus will be placed on the top five (Biotechnology, Cancer, Clinical Research, Genetics, and Neurosciences) and their connections.
Top 5 Spending Categories

Apriori algorithm
For analysis, the apriori function in the arules package was used. Arules assigns a label for each value and creates a matrix indicating which items (spending categories) are part of each transaction (grant). Associations are identified for each grant. The associations are used to identify frequent itemsets. Apriori tests frequent itemsets for including by determining the number of times the itemset appears in the transactions. Itemsets that are classified as small are removed. Itemsets that are large enough are candidates for rules.

Analysis
The top five focus spending categories and their corresponding number of rules are: Biotechnology (11), Cancer (16), Clinical Research (21), Genetics (12), and Neurosciences (27).
Each of the focus spending categories shows substantial overlap with other related spending categories.
Confidence measures are high for the rules generated and very high for over 75%.
The top spending categories (representative of top areas for awarded grants) are each interwoven with multiple spending categories.