We have presented a new method that can be dispersion free and unconditionally stable. Thus the computational cost and memory requirement will be reduced a lot. Based on this feature, we have implemented this algorithm on GPU based CUDA for the anisotropic Reverse time migration. There is almost no communication between CPU and GPU. For the prestack wavefield extrapolation, it can combine all the shots together to migration. However, it requires to solve a bigger dimensional problem and more meory which can't fit into one GPU cards. In this situation, we implement it based on domain decomposition method and MPI for distributed memory system.