The standard full waveform inversion (FWI) attempts to minimize the difference between the observed and modeled data. When the initial velocity is kinematically accurate, FWI often converges to to the best velocity model, usually of a high-resolution nature. However, when the modeled data using an initial velocity is far from the observed data, conventional local gradient based methods converge to a solution near the initial velocity instead of the global minimum. This is also known as the cycle skipping problem, which results in a zero correlation when observed and modeled data are not correlated. To reduce the cycle-skipping problem, we can compare the envelope of the modeled and observed data instead of the original data. However, when the initial velocity is not good enough, the correlation of the envelope of the modeled and observed data do not contribute accurately to the gradient. To mitigate this issue, we suggest to maximize not only the zero-lag correlation of the envelope but also the non-zero-lag correlation of the envelope. A weighting function, which has its maximum value at zero lag and decays away from zero lag, is proposed to balance the role of the lags. The resulting objective function is less sensitive to the choice of the maximum lag allowed and has a wider radius of convergence compared to standard FWI and envelope inversions. The implementation has the same computational complexity as conventional FWI as the only difference in the calculation is related to the modified adjoint source. We implement this algorithm on the AMD GPU based on OPENCL and obtained about a 14 fold speed up compared to a CPU implementation based on OPENMP. At last, several numerical examples are shown to demonstrate the proper convergence of the proposed method. Application to the Marmousi model shows that this method converges starting with a linearly increasing velocity model, even with data free of frequencies below 3 Hz.