Yearly Topics Based on LDA Model – Digital Scholarship Projects, CUHK Library

Topic Model——LDA Model

Purpose of Analyzing Yearly Topics

Methodology

Principle of LDA Model

Topic model is a type of unsupervised model for discovering the abstract “topics” that occur in a series of documents. Supposing that a document (such as a poem or an article) is about a particular topic, relative words are expected to appear in a higher frequency.

Latent Dirichlet Allocation Model (LDA) is a generative thematic model proposed by Blei et al in 2003, which is also known as Three-tier Bayesian Probability Model with three-tier structure of document (D), topic (Z) and word (W), which can effectively model the text. Based on LDA topic model, we are able to mine the potential topics in the data set, and then analyze the main information of the data sets and related feature words.

Data Preprocessing

Re-organize records with co-authors
Remove some meaningless characters and extract Chinese characters only
Build a stopword dictionary
Jieba segmentation
Remove stopwords
Remove duplicate terms
Transfer into word dictionary

Modelling

Invocate <corpora.dictionary>: A mapping between words and their integer ids.
Invocate <dictionary.doc2bow>: Convert document into the bag-of-words (BoW) format.
Invocate <gensim.models.ldamodel>: Run and train the LDA model
Invocate <pyLDAvis.gensim>: Visualize LDA topic model results

Visualization Results

Yearly Topics of Voice & Verse Magazine

① Result of 2013（Click Here to see the interactive html）

②Result of 2019（Click Here to see the interactive html

To see the results of other years(click the corresponding year)

2011, 2012, 2014, 2015, 2016, 2017, 2018

Codes

[1]. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent dirichlet allocation.” Journal of machine Learning research 3.Jan (2003): 993-1022.

[2].Sievert, Carson, and Kenneth Shirley.“LDAvis: A method for visualizing and interpreting topics.”Proceedings of the workshop on interactive language learning, visualization, and interfaces. 2014.

[3].Chuang, Jason, Christopher D. Manning, and Jeffrey Heer. “Termite: Visualization techniques for assessing textual topic models.” Proceedings of the international working conference on advanced visual interfaces. 2012.