Overview of Clusterv functionalities

Next: Getting started with Clusterv Up: Clusterv tutorial Previous: Introduction

Overview of Clusterv functionalities

The clusterv R package provides a set of functionalities to assess the reliability of clusters discovered in data characterized by high-dimensionality.

Most of the functions are independent of the specific clustering algorithm used, in the sense that may be used by different distance-based clustering algorithms (e.g. k-means, hierarchical clustering, Self-Organizing-Maps, PAM) to compute the stability indices for assessing the reliability of the clusters.

Other functions are high-level functions for a specific clustering algorithm: they directly cluster the data and provide the stability measures to evaluate the reliability of clusters produced by a specific clustering algorithm.

The functionalities provided by the clusterv package can be summarized in the following list:

Functions clustering-algorithm-dependent
- Functions for high dimensional synthetic data generation
- Functions to implement different types of random projections from high to lower dimensional subspaces
- Functions to evaluate the distortion induced by random projections
- Functions to compute the similarity matrix
- Functions to compute the stability indices:
  - Individual cluster stability index for the estimate of the reliability of individual clusters inside a clustering.
  - Overall cluster stability index for the estimate of the "optimal" number of clusters.
  - Assignment-Confidence index for the estimate of the confidence by which an example may be assigned to a specific cluster.
Functions clustering-algorithm dependent
- Functions to perform multiple clusterings on multiple instances of projected data
- Functions to compute the stability indices for a specific clustering algorithm

Next: Getting started with Clusterv Up: Clusterv tutorial Previous: Introduction

Giorgio 2006-08-16