Dr. Michael Grabchak, Professor of Statistics, Department of Mathematics and Statistics, UNC Charlotte
Statistics Seminar Series (Hybrid)
On Turing’s Formula and the Estimation of the Missing Mass
In this talk we address the question: How do we estimate the probability of something that we’ve never seen before? This probability is often called the missing mass. In the context of ecological applications, it corresponds to the probability of observing a new species, while, in the context of authorship attribution studies, it corresponds to the probability that an author will use a word that he or she has not used before. Perhaps, the most famous estimator of the missing mass is Turing’s formula. In this talk, we give necessary and sufficient conditions for the consistency and asymptotic normality of this formula. We then show that these conditions always hold when the distribution is regularly varying with index α ∈ (0,1]. This part of the talk is primarily based on [1, 3].
At the end of the talk, we will discuss an unrelated but important question: How do we perform a paired t-test when we don’t know how to pair? This is especially relevant in social science applications when the pairing may be censored due to privacy concerns. This part of the talk is primarily based on .
 J. Chang and M. Grabchak (2023). Necessary and sufficient conditions for the asymptotic normality of higher order Turing estimators. To appear in Bernoulli.
 M. Grabchak (2022). How do we perform a paired t-test when we don’t know how to pair? The American Statistician, DOI 10.1080/00031305.2022.2115552.
 M. Grabchak and Z. Zhang (2017). Asymptotic properties of Turing’s formula in relative error. Machine Learning, 106(11):1771-1785.
Dr. Michael Grabchak is a professor in the Department of Mathematics and Statistics at UNC Charlotte. He works in the areas of applied probability and statistics. His recent research can be found at https://webpages.charlotte.edu/~mgrabcha/research.html.