Earthquake rupture models inferred from inversions of geophysical and/or geodetic data exhibit remarkable variability due to uncertainties in modelling assumptions, the use of different inversion algorithms, or variations in data selection and data processing. A robust statistical comparison of different rupture models obtained for a single earthquake is needed to quantify the intra-event variability, both for benchmark exercises and for real earthquakes. The same approach may be useful to characterize (dis-)similarities in events that are typically grouped into a common class of events (e.g. moderate-size crustal strike-slip earthquakes or tsunamigenic large subduction earthquakes). For this purpose, we examine the performance of the spatial prediction comparison test (SPCT), a statistical test developed to compare spatial (random) fields by means of a chosen loss function that describes an error relation between a 2-D field (‘model’) and a reference model. We implement and calibrate the SPCT approach for a suite of synthetic 2-D slip distributions, generated as spatial random fields with various characteristics, and then apply the method to results of a benchmark inversion exercise with known solution. We find the SPCT to be sensitive to different spatial correlations lengths, and different heterogeneity levels of the slip distributions. The SPCT approach proves to be a simple and effective tool for ranking the slip models with respect to a reference model.