Full-Waveform Inversion is a nonlinear inversion problem, and a typical optimization algorithm such as nonlinear conjugate-gradient or LBFGS would iteratively update the model along gradient-descent direction of the misfit function or a slight modification of it. Rather than using a hand-designed optimization algorithm, we trained a machine to learn an optimization algorithm which we refer to as”ML-descent” and applied it in FWI. Using recurrent neural network (RNN), we use the gradient of the misfit function as input for training and the hidden states in the RNN uses the history information of the gradient similar to an BFGS algorithm. However, unlike the fixed BFGS algorithm, the ML version evolves as the gradient directs it to evolve.The loss function for training is formulated by summarization of the FWI misfit function by the L2-norm of the data residual. Any well-defined nonlinear inverse problem can be locally approximated by a linear convex problem, and thus, in order to accelerate the training speed, we train the neural network using the solution of randomly generated quadratic functions instead of the time-consuming FWI gradient. We use the Marmousi example to demonstrate that the ML-descent method outperform the steepest descent method, and the energy in the deeper part of the model can be compensable well by the ML-descent when the pseudo-inverse of the Hessian is not incorporated in the gradient of FWI.