The adjoint source is an integral component of the waveform inversion optimization problems. The adjoint source is often derived from the objective function, and fixed regardless of the data. Thus, to utilize data in formulating the adjoint source, we propose to learn the adjoint source in FWI directly. We introduce the new method, we refer to as ML-adjoint, in the framework of Markov decision process (MDP). In MDP, a policy network takes input given by the predicted and measured data and outputs the adjoint source for back propagation in FWI. To achieve fast convergence in training, we specially design the neural network architecture to mimic the computation of the data residual and Jacobian matrix in constructing the adjoint source. The Marmousi model example demonstrates the robustness of the ML-adjoint in converging to an accurate model.