Visualization

Visualization

Because of the limit of the web page, we only present part of the analysis result here. Go to Our Google Colab and Google Drive to check all details

Word Cloud&Frequency

Due to the total number of pictures, we only show the word cloud and word frequency of a specific year (1953) here as an example.

Word Cloud from 1953

Relationship Analysis

The size of each circle surrounding a person’s name represents the number of occurrences of that word. The thickness of the line between terms represents the strength of the connection between the two. Again, we take the data for 1953 as an example.

Topic Modeling

After we chose the optimal transformer model to do the topic modeling, we got about ten topics for one year of newspaper, and we reduced the number of topics to 10. We wanted to show the topics clearly in the graph so the audience could get some knowledge about the topic each year. There are totally three kinds of plots we generated.


The first is the heat map. Both the x-axis and y-axis are the topics generated by the model, which shows the similarity between any two topics. With the color becoming darker, the topic topics are more related to each other.

The second one is the intertopic distance map. This map is similar to the heap map, both showing the relationship between the topics generated.

The third one is the Bar chart. The original bar chart only contains eight topics with the word and its score in each topic. The score is calculated by the c-TF-IDF, which is a metric showing the words’ importance in that topic. The higher the score one word has, the more representative that word is. 

Sentiment Analysis

We wanted to find the attitude of the newspaper 《天文臺》 toward different objects during its publishing period. We applied some methods (Check methodology for details) to get the attitude value (between -1 and 1) of different objects and combined the situation of the same word in different years to get the sentiment change.


Here we present 3 analyses as examples: “美國”, “蘇俄” and “中國”. The results are shown in a bar chart, the height of the bar indicates the word frequency in a certain year (the orange part at top for the situation when this word appears as positive, while the blue part at the button is for the negative situation).