Indeed cluster analysis has been used for investigating structure in microarray data, such as the search of new tumor taxonomies [2],[9],[16]. It provides a way for validating groups of patients according to prior biological knowledge or to discover new "natural groups" inside the data. Anyway, clustering algorithms always find structure in the data, even when no structure is present instead. Hence we need methods for assessing the validity of the discovered clusters to test the existence of biologically meaningful clusters.

To assess the reliability of the discovered classes, *clusterv* provides a set of measures that estimate the
stability of the clusters obtained by perturbing the original data set.
This perturbation is achieved through random projections of the original high dimensional data to lower dimensional
subspaces, approximately preserving the distances between examples, in order to avoid too large distortions of the data.
These random projections are repeated many times and each time a new clustering is performed.
The obtained multiple clusterings are then compared with the clustering for which we need to evaluate its reliability.
Intuitively a cluster will be reliable if it will be maintained across multiple clusterings performed in the lower
dimensional subspaces. The measures provided by *clusterv* are based on the evaluation of the stability of the
clusters across multiple random projections. By these measures we can assess:

- the reliability of single individual clusters inside a clustering
- the reliability of the overall clustering (that is, an estimate of the "optimal" number of clusters)
- the confidence by which example may be assigned to each cluster

Our approach is based on random projections in euclidean spaces
and in the next section we provide a brief overview of this topic.
To learn more about our approach, please see [4].
A *clusterv* tutorial
introduces to the usage of the package, providing also some examples
of applications of the stability measures to synthetic and real DNA microarray data.
To download the R software and documentation (comprising the tutorial and the reference manual in pdf format)
go to the section Download software and documentation.

The stability measures based on random projections implemented in the *clusterv* package have been jointly designed
by *Alberto Bertoni* (DSI, Università degli Studi di Milano) and *Giorgio Valentini*.
The author of the *clusterv* package thanks *Alberto Bertoni* for his fundamental theoretical and methodological contributions.