Dr. Keren Li, Assistant Professor, Department of Mathematics, University of Alabama at Birmingham
Statistics Seminar Series
Big Data, Distributed Learning, and Representative
In the realm of modern data exploration, the quest for collective intelligence has propelled innovative paradigms that transcend traditional data analysis methods. Distributed Learning, the synergy of Big Data, orchestrats machine learning across nodes characterized by their distinct traits: being massively distributed, non-iid, unbalanced, and constrained by data privacy and limited bandwidth. These multifaceted challenges disrupt the conventional norms of machine learning, necessitating innovative solutions that can harmonize disparate data perspectives while addressing critical concerns of privacy and resource constraints.
Representative Learning crafts pseudo data points, named representatives, to encapsulate the inherent characteristics of local data nodes. The representative sets are channeled to train regular models at the central unit, enabling a symphonic convergence of insights. Representative Learning, with its ingenious architecture, serves as conduits that transcend the barriers posed by data privacy and scarce communication resources, sidestepping concerns that often hinder analysis.
Moreover, Distributed Learning thrives in the heterogeneity between node distributions, particularly in settings characterized by vast node numbers. Smaller nodes inherently yield more reasonable representatives, while the inherent dissimilarity between nodes safeguards low variance of estimation from the Representative Learning approach.