Full Waveform Inversion updates the subsurface model iteratively by minimizing a misfit function, which measures the difference between observed and predicted data. The conventional l norm misfit function is widely used as it provides a simple, sample by sample, high resolution misfit function. However it is susceptible to local minima if the low wavenum-ber components of the initial model are not accurate. A deconvolution of the predicted and observed data offers an extend space comparison, which is more global. The matching filter calculated from the deconvolution has energy focussed at zero lag, like a Dirac Delta function, when the predicted data matches the observed ones. We use the Wasserstein distance to measure the difference between the matching filter and a Dirac Delta function. Unlike data, the matching filter can be easily transformed to a distribution satisfying the requirement of optimal transport theory. Compared with the conventional normalized penalty applied to non-zero lag energy in the matching filter, the new misfit function is a metric and has solid mathematical foundation based on optimal transport theory. Both synthetic and real data examples verified the effectiveness of the proposed misfit function.