A general stability-based algorithmic scheme for assessing the reliability of clustering solutions may be summarized in the following way:
Several approaches have been proposed to implement the first step: a random "perturbation" of the data may be obtained through bootstrap samples drawn from the available data [15,18], or random noise injection into the data [17] or random projections into lower dimensional subspaces [20,7].
The application of a given algorithm (step 2) represents a choice based on "a priori" knowledge or assumptions about the characteristics of the data. To estimate the similarity between clusterings (step 3), classical measures, such as the Rand Index [19], or the Jaccard or the Fowlkes and Mallows coefficients [14] or their equivalent dot-product representations [4] may be applied.
Several stability indices for model order selection have been proposed in the literature (see, e.g. [17,20,18,16]): very schematically they can be divided into indices that use statistics of the similarity measures [20,7] or their overall empirical distribution [4].
The last step, that is the selection of the most stable/reliable clustering, given a set of similarity measures and the related stability indices, has been usually approached by choosing the best scored clustering (according to the chosen stability index). A major problem in this last step is represented by the estimate of the statistical significance of the discovered solutions.