Tuesday, Aug 30, 2016 - 3:30pm - Allen 14
Determining the optimum number of topics in a latent Dirichlet allocation topic model
Dr. Dale Bowman, Mathematical Statistics, University of Memphis
Title: Determining the optimum number of topics in a latent Dirichlet allocation topic model
Abstract: Topic modeling is a useful tool for examining latent structures in a corpus of documents. Latent Dirichlet allocation (LDA) is a popular topic modeling method that assumes a Bayesian generative model for collections of exchangeable binary observations such as the presence or absence of words within a document. The degree to which an LDA model is useful for modeling a corpus depends, in part, on the number of topics selected. Too few topics can result in an LDA model that does not provide sufficient separation of topics and too many topics can result in a model that is overly complex and difficult to interpret. Several ad hoc, heuristic methods for selecting the proper number of topics have been proposed. These typically require that the LDA model be fit over a varying number of topics and the performance of the resulting model be measured by some criteria such as perplexity, rate of perplexity change, and goodness of fit statistics. We propose a new method based on a goodness of fit test and compare to existing methods.
Refreshment will be provided in the faculty lounge at 3:00 pm. Everyone is welcome.