Standard full waveform inversion (FWI) attempts to minimize the difference between observed and modeled data. When the initial velocity is kinematically accurate, FWI often converges to the best velocity model, usually of a high-resolution nature. However, when the modeled data using an initial velocity is far from the observed data, conventional local gradient based methods converge to a solution near the initial velocity instead of the global minimum. This is known as the cycle-skipping problem, which results in a zero correlation when observed and modeled data are not correlated. To reduce the cycle-skipping problem, we compare the envelope of the modeled and observed data instead of comparing the modeled and observed data directly. However, if the initial velocity is not sufficient, the correlation of the envelope of the modeled and observed data might still be zero. To mitigate this issue, we propose to maximize both the zero-lag correlation of the envelope and the non-zero-lag correlations of the envelope. A weighting function with maximum value at zero lag and decays away from zero lag is introduced to balance the role of the lags. The resulting objective function is less sensitive to the choice of the maximum lag allowed and has a wider region of convergence with respect to standard FWI and envelope inversions. The implementation has the same computational complexity as conventional FWI as the only difference in the calculation is related to a modified adjoint source. The implementation of this algorithm was performed on an AMD GPU platform based on OPENCL and provided a 14 times speedup over a CPU implementation based on OPENMP. Several numerical examples are shown to demonstrate the proper convergence of the proposed method. Application to the Marmousi model shows that this method converges starting with a linearly increasing velocity model, even with data free of frequencies below 3 Hz.