Visualisation

Graph Construction

Since each recipe (e.g. 理氣之劑) contains several medicinal formulae (e.g. 補中益氣湯), we parsed and catalogued all the Chinese medicinal ingredients contained in each formula. Next, we constructed an association dictionary which encodes the concurrent appearance of all recorded ingredients in various medicinal formulae within the same recipe. The ingredient pairs are designated as keys, and the frequencies of their co-appearance are considered as the corresponding values. We then generated an undirected graph from this dictionary, with the ingredients as nodes, where an edge is established if the two ingredients have co-occurrence. The frequencies of co-occurrence are reflected by the edge weights.

Code Demonstration

First, the code generated a dictionary of ingredients:

ingredrows = []
ingred_list = []
with open("ingredfreq.csv", "r", encoding = "utf-8-sig") as file:
    csvreader = csv.reader(file)
    header = next(csvreader)
    for row in csvreader:
        ingredrows.append(row)
        ingred_list.append(row[0])
    ingred_dict = {key: i for i, key in enumerate(ingred_list)}

It then took each recipe and returned all the formulae contained in the recipe and the respective ingredients:

for files in os.listdir():
    reciperows = []
    with open(files, "r", encoding = "utf-8-sig") as file:
        csvreader = csv.reader(file)
        for row in csvreader:
            row = list(dict.fromkeys(row))
            row = list(filter(None, row))
            if len(row) != 1:
                reciperows.append(row)

    # recipes[0]: Name | recipe[1]: Ingredients used (in the format of index)
    recipes = [[],[]]
    for i in range(len(reciperows)):
        if not reciperows[i][0] in recipes[0]:
            recipes[0].append(reciperows[i][0])
            recipes[1].append([])

It then inspected the co-occurrence of ingredients and returned the association dictionary:

    ingred_asso_dict = dict()

    for line in reciperows:
        appears = []
        for ingred in ingredrows:
            if ingred[0] in line:
                appears.append(ingred[0])
        relationships = itertools.combinations(sorted(appears), 2)
        for relationship in relationships:
            if relationship in ingred_asso_dict:
                ingred_asso_dict[relationship] += 1
            else:
                ingred_asso_dict[relationship] = 1

Visualization of Co-occurrence Frequencies

The generated graph was then visualized using Gephi (http://gephi.org), which is an open-source software for network visualization. Our custom-written python code generated an edge table and a node table, which are spreadsheets for input into Gephi. The initial spatial coordinates of the nodes were randomized, which served as initial conditions for subsequent iterations computed via the force-directed Fruchterman–Reingold algorithm. In general, more connected nodes (i.e. ingredients co-occurring more frequently) are in closer proximities upon convergence. For better visualization, we have colour-coded both the degree of nodes (number of appearance) and the edge weights with red and blue representing higher and lower values respectively. An example (from 祛風之劑) is shown below.

Descriptive Statistics

We also identified the most commonly used ingredients and colour-coded them with the Xingwei, where green, red, and blue refers to neutral, hot, and cold nature respectively. Note that three out of five most commonly used ingredients are neutral in nature, possibly due to their roles in regulating and harmonizing the ingredients.

Analysis of Xingwei

We also hypothesized that the xingwei of the ingredients should be correlated with the function of the recipes. Hence, we chose two recipes (清暑之劑 & 祛寒之劑) where the former is believed to reduce hotness (熱) and the later coldness (寒) and plotted the proportion of xingwei of ingredients used in each recipe. We found that 祛寒之劑 intuitively uses more ingredients of the hot nature, while the 清暑之劑 uses a more balanced ingredients for hot and cold nature, with a total count of cold ingredients more than that of hot.

Clustering of Ingredients across all Recipes

We also performed clustering of the ingredients based on their connectivities to determine communities that are commonly used together. One point to note is that various clustering methods can be applied to the graph analysis, such as spectral clustering. In our case, an example is shown below using the Circle Pack Layout based on the modularity class of the graph.

Interactive Graph Visualizer

Finally, we built an interactive graph visualizer for a more user-friendly inspection on the data using the SigmaJS Exporter plugin of Gephi. The degree of each node and the highest edge weight that the node are displayed when one clicks on the node.