I have developed and taught a graduate course at Purdue called Optimization for Deep Learning. This course discusses the optimization algorithms that have been the engine that powered the recent rise of machine learning (ML) and deep learning (DL). The “learning” in modern ML and DL tasks typically boils down to non-convex optimization problems with high-dimensional parameter spaces and objective functions involving millions of terms. Additionally, the success of DL models is reliant on finding solutions to these problems efficiently. Hence, the needs of ML are different than those of other fields in terms of optimization. This course introduces students to the theoretical principles behind stochastic, gradient-based algorithms for DL as well as practical considerations such as adaptivity, generalization, distributed learning, and non-convex loss surfaces typically present in modern DL problems.
The concatenated slides are available here for download and public use under CC BY-NC-SA license.