Tuesday, Mar 22, 2016 - 3:30pm - Allen 14
Selecting the optimum number of topics in an LDA model
Dr. Dale Bowman, Mathematical Sciences, University of Memphis
Title: Selecting the optimum number of topics in an LDA model
Abstract: The Latent Dirichlet Allocation (LDA) model is one of the most used topic models in text analytics. LDA assumes that documents in a corpus are generated using a hierarchical Bayesian model in which each document is modeled as a finite mixture over a set of latent topics. Modeling these topics in order to reduce dimensionality and to cluster documents are the goals of analysis using the LDA model. Gibb’s sampling and variational EM algorithms are often used to estimate the parameters of the underlying Bayesian model. Both algorithms assume that the number of latent topics is a fixed and known quantity. Of interest is criteria that may be used to select the optimum number of topics for a corpus of documents. In this presentation several methods which have been proposed to determine the best number of topics will be discussed and a new method based on goodness of fit of the LDA model is proposed.