It is worth noting that even if the proposed stability indices may be applied to any clustering algorithm, their computation is based on clusterings applied to subspaces of euclidean spaces, where the condition of low distortion is guaranteed if euclidean distances are used.
With this limitation in mind, the logical steps of the procedure you need to compute the stability indices for a given generic Clust algorithm applied to a d-dimensional data set D are:
For each logical step, the clusterv functions to be used are listed below:
From an implementative point of view, Clust needs to return the clustering represented as a vector v or a list l. More precisely the vector v should be an integer vector, whose indices represent the label of the examples, and the content the label (an integer) of the cluster to which the example belong (such as clustering component of the list partition.object of the cluster package.
The elements of list l must be vectors. Each vector represents a different cluster of the clustering; the elements of the vector are the (integer) labels of the examples (such as the orig.cluster component of the list returned by, e.g., the functions Random.hclustering.validity or Random.PAM.validity of the clusterv package.
Note that the functions implemented in clusterv need the clusterings implemented as list. Anyway if your clustering algorithm outputs a vector v, you can use the function Transform.vector.to.list. Consider, for instance, that the kmeans function of the package stats returns a vector for the computed k-means clustering. To convert the vector to a list:
> M <- generate.sample0(n=10, m=2, sigma=1, dim=500) > # transforming a clustering vector obtained with kmeans to a list > r<-kmeans(t(M), 3, 100); > clustering.list.kmeans <- Transform.vector.to.list(r$cluster);
As an example about how to write a function to compute the stability indices for a generic clustering algorithm, consider the code of the function Random.kmeans.validity that computes the stability indices for the clusterings obtained with the k-means algorithm (note that the Step 1,2, etc refer to the logical steps listed above):
Random.kmeans.validity <- function(M, dim, pmethod="PMO", c=3, it.max=1000, n=50, scale=TRUE, seed=-1, AC=TRUE) { dim.Sim.M <- ncol(M); if (seed == -1) seed <- round(runif(1,1,10000)); # Step 1. Computing the clusters in the original space r<-kmeans(t(M), c, iter.max = it.max); cl.orig <- Transform.vector.to.list(r$cluster); # Step 2. and 3. Perform multiple random projections and multiple # clusterings on the resulting projected data cl <- Multiple.Random.kmeans (M=M, dim=dim, pmethod=pmethod, c=c, n=n, it.max=it.max, scale=scale, seed=seed); # Step 4. Compute the similarity matrix Sim.M <- Do.similarity.matrix(cl, dim.Sim.M); # Computing the list of validity measures # Step 5. Computing the validity indices vi c <- length(cl.orig); # it corrects for empty sets vi <- Validity.indices(cl.orig, c, Sim.M); # Step 6. Computing the overall validity of the clustering: ov.vi <- sum(vi)/c; # Step 7 and 8. Computing the AC indices and store # the results in a list: if (AC == TRUE) { ac <- AC.index(cl.orig, c, Sim.M); res <- list (validity=vi, overall.validity=ov.vi, similarity.matrix=Sim.M, dimension=dim, cluster=cl, orig.cluster=cl.orig, AC=ac); } else res <- list (validity=vi, overall.validity=ov.vi, similarity.matrix=Sim.M, dimension=dim, cluster=cl, orig.cluster=cl.orig); return(res); }For the meaning of the input parameters of the above function, please, see the Reference manual. Finally the code of the function Multiple.Random.kmeans used for the Steps 2 and 3 inside the function Random.kmeans.validity is the following (see the reference manual for the meaning of the input parameters):
Multiple.Random.kmeans <- function(M, dim, pmethod="PMO", c=3, n=50, it.max=1000, scale=TRUE, seed=100) { set.seed(seed); cl <- list(); for (i in 1:n) { # A. selection of the randomized map P.M <- switch(pmethod, RS = random.subspace(d=dim, M, scaling=scale), PMO = Plus.Minus.One.random.projection(d=dim, M, scaling=scale), Norm = norm.random.projection(d=dim, M, scaling=scale), Achlioptas = Achlioptas.random.projection(d=dim, M, scaling=scale), stop("Multiple.Random.kmeans: not supported random projection.", call.=FALSE)); r<-kmeans(t(P.M), c, iter.max = it.max); cl[[i]] <- Transform.vector.to.list(r$cluster); } return(cl); }