##### Speaker

Dr. Xiaoming Huo, A. Russell Chandler III Professor, Stewart School of Industrial & Systems Engineering, Georgia Tech; Fellow of ASA

##### Title

Statistics Seminar Series (Hybrid)

##### Subtitle

Two Statistical Results in Deep Learning

##### Physical Location

Allen 14

##### Digital Location

https://msstate.webex.com/msstate/j.php?MTID=m8ce377b82dde20f05e835b18379eff44

**Abstract:**

This talk has two parts.

(1) Regularization Matters for Generalization of Overparametrized Deep Neural Network under Noisy Observations. In part one, we study the generalization properties of the overparameterized deep neural network (DNN) with ReLU activations. Under the non-parametric regression framework, it is assumed that the ground-truth function is from a reproducing kernel Hilbert space (RKHS) induced by a neural tangent kernel (NTK) of ReLU DNN, and a dataset is given with the noises. Without a delicate adoption of early stopping, we prove that the overparametrized DNN trained by vanilla gradient descent does not recover the ground-truth function. It turns out that the estimated DNN’s L2 prediction error is bounded away from 0. As a complement of the above result, we show that the L2-regularized gradient descent enables the overparametrized DNN achieve the minimax optimal convergence rate of the L2 prediction error, without early stopping. Notably, the rate we obtained is faster than the one that is known in the literature.

(2) Directional Bias Helps SGD to Generalize. We study the Stochastic Gradient Descent (SGD) algorithm in kernel regression. Specifically, SGD with moderate and annealing step size converges along the direction corresponding to the large eigenvalue of the Kernel matrix, on the contrary the Gradient Descent (GD) with a moderate or small step size converges along the direction corresponding to the small eigenvalue. For a general squared risk minimization problem, we show that directional bias towards a large eigenvalue of the Hessian (which is the Kernel matrix in our case) results in an estimator that is closer to the ground truth. Adopt this result to kernel regression, the directional bias helps SGD estimator generalize better. This result gives one way to explain how noise helps in generalization when learning with a nontrivial step size, which may be useful for promoting further understanding of stochastic algorithms in deep learning.

**Bio:**

Xiaoming Huo is A. Russell Chandler III Professor at the Stewart School of Industrial & Systems Engineering at Georgia Tech.

Dr. Huo’s research interests include statistics, machine learning, and the foundation of data science. He has made numerous contributions to sparse representation, compressive sensing, wavelets, theory of deep learning, and fast algorithms. His papers appeared in top journals, and some are highly cited.

Dr. Huo received the B.S. degree in mathematics from the University of Science and Technology, China, in 1993, and the M.S. degree in electrical engineering and the Ph.D. degree in statistics from Stanford University, Stanford, CA, in 1997 and 1999, respectively. Since August 1999, he has been an Assistant/Associate/Full Professor with the School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta. He represented China in the 30th International Mathematical Olympiad (IMO), which was held in Braunschweig, Germany, in 1989, and received a golden prize. From August 2013 to August 2015, he served the USA National Science Foundation as a Program Director in the Division of Mathematical Sciences (DMS).

Dr. Huo has presented keynote talks in major conferences and numerous invited colloquia and seminar presentations in the US, Asia, and Europe. He is the Specialty Chief Editor in Frontiers in Applied Mathematics and Statistics – Statistics, April 2021 – present.

Dr. Huo is now the Executive Director of TRIAD (Transdisciplinary Research Institute for Advancing Data Science), an NSF-funded research center located at Georgia Tech. In addition, he is the Associate Director for Research of the Institute for Data Engineering and Science. He is also an Associate Director in the Master of Science in Analytics program, overseeing a new branch in the Shenzhen-China campus of Georgia Institute of Technology.