Speakers
Donald Goldfarb
Columbia University
Details
Event Description
Due to the enormous number of parameters that Deep Neural Networks (DNNs) have, using the Hessian matrix or a full approximation to it is prohibitive. Hence, we have proposed, and will describe in our talk, efficient and effective ways to use second-order information to train DNNs. These include diagonal, block-diagonal and Kronecker factored quasi-Newton, natural gradient Fisher Information matrix approximations, and concepts of tensor normal covariance and self-concordance, that give rise to methods that often outperform first-order methods.