Non-Convex Optimization from both mathematical and practical perspective: SGD, SGDMomentum, AdaGrad, RMSprop, and Adam in Python — This article will provide the short mathematical expressions of common non-convex optimizers and their Python implementations from scratch. Understanding the math behind these optimization algorithms will enlighten your perspective when training complex machine learning models. The structure of this article will be as follows. First, I will talk about the…