Author Frequency Presented in Word Clouds

Author Frequency Presented in Word Clouds

A word cloud compares the frequencies of data values in a dataset by presenting the values in different font sizes.  The values of higher frequency data are displayed with larger font size and those of less frequent data with smaller font size. 

Creating Word Clouds

Tools for creating word clouds exist.  However, they may not support processing a large amount of data.  Also, the creators may not be permitted to decide on the shapes and colour schemes of the word clouds as well as the number of data values being displayed.  Therefore, we would like to introduce how to use Python to create word clouds to solve these issues.

In order to create a word cloud, we need a data file with all the appearances of data values and an image template for the word cloud.  The data file (such as the excerpted one in Fig. 3) can be in .txt format.  The statistics of data values in the data file are not required.  To display the author frequency of The Weekly, we used the function def create_full_name_list to create a data file with a total of 55,506 extracted author names from the metadata of The Weekly records in the Hong Kong Literature Database.  Running the code on this page with the data file and an image template, a word cloud will be created.  It presents the relatively higher frequency author names in all issues of The Weekly collected in the Database.

Fig. 3.  Data values in a data file of extracted author names (excerpt)

Using different image templates can change the shape and colour scheme of a word cloud.  Fig. 4 is a word cloud created based on a single-colour rectangle image.

Fig. 4.  A word cloud of relatively higher frequency author names in The Weekly (1952–1974, except the missing issues).

Since the Peanuts comics were printed in The Weekly, we used the Snoopy and its dog house as image templates for creating word clouds (Figs 5 and 6).

Fig. 5.  Snoopy-shaped word cloud of frequent authors in The Weekly. 

Fig. 6.  Dog house-shaped word cloud of top-100 frequent authors in The Weekly.

This last line in our code runs the function def MakeImage:

MakeImage("Image_Template.jpg", text, 100, "out_WordCloud.jpg")

By putting the file name of the image template in the first parameter, the number of data values to be presented in the third one, and the file name of the word cloud being created in the last one, and running the code on this page, a word cloud with the specified number of data values using the stated file name and in the shape of the image template will be created.  The dog-house word cloud in Fig. 6 consists of the top 100 frequent authors while the one in Fig. 7 consists of top 300 frequent authors.  

Fig. 7.  Dog house-shaped word cloud of top-300 frequent authors in The Weekly.

The colour schemes of the above word clouds are based on the default setting of the wordcloud library in Python.  If you would like to create a word cloud based on the colour scheme of the image tamplate, you can change the plt.imshow code line to: 

plt.imshow(wc.recolor(color_func=image_colors))

Snoopy’s dog house is in red.  With the above code line, a word cloud in the red colour scheme will be generated.

Fig. 8.  Word cloud of top-300 frequent authors in The Weekly in the original colour scheme of the dog house.

Code

The following code in Python 3.7 is for creating word clouds.

import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import matplotlib.pyplot as plt
font_path = "msjh.ttc"
file = open("Author name weekly.txt", encoding = "utf-8")
text = file.read()
file.close()

def MakeImage(image, text, 100, outfilename):
    mask = np.array(Image.open(image))
    # generate word cloud 
    wc = WordCloud(font_path = font_path, background_color="white", max_words=num, collocations = False, mask=mask)
    wc.generate((text))
    image_colors = ImageColorGenerator(mask)
    plt.imshow(wc)
    plt.axis("off")
    plt.savefig(outfilename)
    plt.show()
plt.imshow(wc.recolor(color_func=image_colors))
MakeImage("Image_Template.jpg", text, 100, "out_WordCloud.jpg")