Yearly Topics Based on LDA Model

Topic Model——LDA Model

Purpose of Analyzing Yearly Topics


Methodology

  1. Principle of LDA Model

Topic model is a type of unsupervised model for discovering the abstract “topics” that occur in a series of documents. Supposing that a document (such as a poem or an article) is about a particular topic, relative words are expected to appear in a higher frequency.

Latent Dirichlet Allocation Model (LDA) is a generative thematic model proposed by Blei et al in 2003, which is also known as Three-tier Bayesian Probability Model with three-tier structure of document (D), topic (Z) and word (W), which can effectively model the text. Based on LDA topic model, we are able to mine the potential topics in the data set, and then analyze the main information of the data sets and related feature words.

  1. Data Preprocessing
  • Re-organize records with co-authors
  • Remove some meaningless characters and extract Chinese characters only
  • Build a stopword dictionary
  • Jieba segmentation
  • Remove stopwords
  • Remove duplicate terms 
  • Transfer into word dictionary
  1. Modelling
  • Invocate <corpora.dictionary>: A mapping between words and their integer ids.
  • Invocate <dictionary.doc2bow>: Convert document into the bag-of-words (BoW) format.
  • Invocate <gensim.models.ldamodel>: Run and train the LDA model
  • Invocate <pyLDAvis.gensim>: Visualize LDA topic model results

Visualization Results

     Yearly Topics of Voice & Verse Magazine

          ① Result of 2013(Click Here to see the interactive html)

          ②Result of 2019(Click Here to see the interactive html

To see the results of other years(click the corresponding year)

2011201220142015201620172018


Codes

 


[1]. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent dirichlet allocation.” Journal of machine Learning research 3.Jan (2003): 993-1022.

[2].Sievert, Carson, and Kenneth Shirley.“LDAvis: A method for visualizing and interpreting topics.”Proceedings of the workshop on interactive language learning, visualization, and interfaces. 2014.

[3].Chuang, Jason, Christopher D. Manning, and Jeffrey Heer. “Termite: Visualization techniques for assessing textual topic models.” Proceedings of the international working conference on advanced visual interfaces. 2012.