Result topic and timeline visualization

Result topic and timeline visualization

1. Obtain topic names

Since the topics are not stated by the LDA models, we investigated the keywords of categories and summarized them into different topics.

For example, for topic #7 ‘工作 政府 餐館 地方 雜貨店 生活 經營 華僑 原話 飛機 雜貨 名字 辦莊 西人 大使 貨店 退休金 西文 結果 兒女’ in our LDA result, we define it as topic ‘生意’.

Based on this, we get the topics as ‘家鄉’, ‘生意’, ‘教育’, ‘經濟生活’ and ‘政治改革革命’.

2. Extract topics of each family letter

we use Excel function to analyze the keyword frequency in the text of the family letters. 

= (LEN(cell)-LEN(SUBSTITUTE(cell, keyword, '')))/LEN(keyword)

3. Topic extraction result

After previous steps, we obtain the topics of each family letter and they are integrated into one file:
As shown in the image below, column A includes the text of the letters; column B is the date of the letter, columns C to G are the topics that we have consolidated.

4. Timeline Visualization

To show the topic change and distribution of family letters along the timeline, we create a timeline visualization with KnightLab Timeline platform.

As shown in the image below, column A includes the text of the letters; column B is the date of the letter, columns C to G are the topics that we have consolidated.