Mathematics Seminar - 03/06/26

Mar 6 3:30 pm
Speaker

Dr. John Terilla, Executive Officer and Professor, CUNY Graduate Center

Title

Mathematics Seminar Series

Subtitle

Interpretability in the Data: Word embeddings and the Nucleus of a Matrix

Physical Location

Allen 14

Abstract:

Large language models are often discussed as if they were cognitive agents.  In this talk I’ll take a different stance: an LLM is a computable function trained on data, and much of what people call “interpretability” should begin not inside the model but in the structure of the data it is trained on.

I’ll start with a concrete example where this viewpoint is clarifying: vector embeddings of words.  The word2vec algorithm can be understood as an implicit matrix factorization of a transformed word–context co‑occurrence matrix (a shifted PMI‑type matrix).  This reframes a classic “neural” success story as the extraction of geometry from data.

From there I’ll broaden the question: given a real matrix M what other structures can we extract besides Euclidean embeddings?   By changing how we view the real numbers, we are led naturally to a beautiful duality revealing a tropical polyhedral cell structure and a canonical projective metric.

Note:

Contact Prof. Shantia Yarahmadian at syarahmadian@math.msstate.edu or (662) 325-7143 for additional information.