The least squares L2 norm is susceptible to local minima if the low wavenumber components of the initial model are not accurate, and this happens often with data corresponding to salt bodies. Deconvolution of the predicted and observed data offers an extended comparison, which is more global. The matching filter calculated from the deconvolution has energy focussed at zero-lag, like a Dirac delta function, when the velocity model is accurate, and the predicted data matches the observed one. We utilize a framework for designing a misfit function by measuring the Wasserstein distance W2 between the resulting matching filter and a representation of the Dirac delta function based on the optimal transport theory. Unlike data, the matching filter can be easily transformed into a distribution satisfying the requirement of the optimal transport theory. This optimal transport between two distributions leads to minimizing the least squares difference of the mean and variance of the distributions. If the objective for the matching filter is a Dirac delta function, i.e., with both zero mean and variance, the optimization reduces to the adaptive waveform inversion (AWI) misfit. Along with a total variation constraint, this method can invert for the BP model with high accuracy.