Adam: A Method for Stochastic Optimization

We propose Adam , a method for efcient stochastic optimization that only requires rst-order gra-dients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of rst and second moments of the gradients; the name Adam is derived from adaptive moment estimation.…