In complex acoustic or elastic media, finite element meshes often require regions of refinement to honour external or internal topography, or small-scale features. These localized smaller elements create a bottleneck for explicit time-stepping schemes due to the Courant-Friedrichs-Lewy stability condition. Recently developed local time stepping (LTS) algorithms reduce the impact of these small elements by locally adapting the time-step size to the size of the element. The recursive, multi-level nature of our LTS scheme introduces an additional challenge, as standard partitioning schemes create a strong load imbalance across processors. We examine the use of multi-constraint graph and hypergraph partitioning tools to achieve effective, load-balanced parallelization. We implement LTS-Newmark in the seismology code SPECFEM3D and compare performance and scalability between different partitioning tools on CPU and GPU clusters using examples from computational seismology.