next up previous
Next: Analysis of cluster reliability Up: Application of clusterv to Previous: Application of clusterv to

Analysis of cluster reliability in melanoma patients.

In this subsection we study the reliability of the clusters obtained in melanoma patients, using a cDNA microarray data set of 38 examples, including 31 melanomas and 7 controls [7]. The 8150 cDNAs represent 6971 unique genes in the melanoma array used in the experiments. For this dataset we directly downloaded the ratio expression levels of the just filtered 3613 genes from the web site associated with the Bittner et al. paper. According to [7], to avoid distortions of the data resulting from ratios where the signal in one channel (Cy5 or Cy3) is large, and the signal in the other channel is undetectable, we truncated ratios higher than 50 or lower than 0.02. We restricted our experiments to only the 31 melanoma examples to better verify the reliability of the "tightly clustered" set of 19 specimens found in [7]. We present here the stability indices computed using PMO and RS projections.

The overall stability index estimates as $N=4$ the optimal number of clusters, if we disregard the case with $N=2$ and $N=3$ clusters, characterized by the presence of singleton clusters (Tab. 2, 3 and Fig. 5). Indeed also the stability indices for all the individual clusters strongly support their reliability. With $N=4$ clusters the first two clusters are singletons, while the third is a big cluster with 23 examples, including the 19-members melanoma subclass found out in [7]; the fourth very stable cluster groups together the remaining 6 examples. To find the same 19-members Bittner's cluster we need to choose $N=9$ clusters: the fifth cluster exactly corresponds to it. However the stability index of this cluster is quite low ($s \simeq 0.4$ with PMO, Achlioptas and Normal projections, but significantly larger with RS projections), as well as the overall stability index for $N=9$. Bittner et al. provided an overall stability measure for the clustering ($WADP_k$) that is based on perturbation of the original data by adding random noise: their results support clusterings with $N \leq 9$ clusters, but they do not provide an individual cluster stability measure. Note that the application of the Ben-Hur et al. method [3], based on bootstrapping techniques to estimate the "natural" number of clusters, found $N=4$ clusters as the most reliable number of estimated clusters in the data. Moreover Smolkin and Gosh found that we need to expand the original 19-members cluster to obtain a quite stable cluster, using their stability scores based on random subspace projections proposed in their paper [18]. Dudoit and Fridlyand using PAM clustering with the $100$ most variable genes found that four additional observations are joined to the 19-observations cluster [8]. These results support our findings: a more reliable melanoma cluster is composed by the 19 examples found by Bittner et al. plus other 3-4 examples 1 (Tab. 2, 3 and Fig. 5).


Table 2: Melanoma: Estimate of cluster stability with PMO projections
N. Overall stability index S
$\epsilon=0.5$ $\epsilon=0.4$ $\epsilon=0.3$ $\epsilon=0.2$ $\epsilon=0.1$
2 0.8186 0.8613 0.9040 0.9253 1.0000
3 0.8946 0.8752 0.9129 0.9786 1.0000
4 0.8907 0.9266 0.9618 0.9728 0.9782
5 0.7010 0.7306 0.7384 0.7430 0.7316
6 0.5800 0.5950 0.5942 0.5929 0.5930
7 0.4789 0.4947 0.4950 0.4930 0.4969
8 0.4998 0.5272 0.5357 0.5407 0.5481
9 0.4977 0.5098 0.5149 0.5138 0.5187
10 0.5049 0.5245 0.5370 0.5389 0.5378
12 0.3993 0.3795 0.3998 0.3812 0.3910
N. Cl. Stability index s
$\epsilon=0.5$ $\epsilon=0.4$ $\epsilon=0.3$ $\epsilon=0.2$ $\epsilon=0.1$
2 1 0.6600 0.7400 0.8200 0.8600 1.0000
2 0.9773 0.9826 0.9880 0.9906 1.0000
3 1 0.9600 0.9600 0.9200 1.0000 1.0000
2 0.7600 0.7000 0.8400 0.9400 1.0000
3 0.9639 0.9658 0.9789 0.9958 1.0000
4 1 0.9600 1.0000 0.9800 1.0000 1.0000
2 0.8000 0.8800 0.9800 0.9800 1.0000
3 0.8098 0.8265 0.8875 0.9113 0.9130
4 0.9933 1.0000 1.0000 1.0000 1.0000
5 1 0.9600 1.0000 1.0000 1.0000 1.0000
2 0.9200 0.9800 0.9800 1.0000 1.0000
3 0.6534 0.6733 0.7124 0.7152 0.6580
4 0.0000 0.0000 0.0000 0.0000 0.0000
5 0.9720 1.0000 1.0000 1.0000 1.0000
6 1 1.0000 1.0000 1.0000 1.0000 1.0000
2 0.9800 1.0000 1.0000 1.0000 1.0000
3 0.5635 0.5802 0.5657 0.5577 0.5584
4 0.0000 0.0000 0.0000 0.0000 0.0000
5 0.0000 0.0000 0.0000 0.0000 0.0000
6 0.9366 0.9900 1.0000 1.0000 1.0000
9 1 1.0000 1.0000 1.0000 1.0000 1.0000
2 1.0000 1.0000 1.0000 1.0000 1.0000
3 0.0000 0.0000 0.0000 0.0000 0.0000
4 0.6066 0.5200 0.4933 0.4733 0.3466
5 0.3732 0.3888 0.3810 0.3914 0.4023
6 0.0000 0.0000 0.0000 0.0000 0.0000
7 0.0000 0.0000 0.0000 0.0000 0.0000
8 0.6600 0.7400 0.8000 0.7800 0.9400
9 0.8400 0.9400 0.9600 0.9800 0.9800


Table 3: Melanoma: Estimate of cluster stability with RS projections
N. Overall stability index S
$\epsilon=0.5$ $\epsilon=0.4$ $\epsilon=0.3$ $\epsilon=0.2$ $\epsilon=0.1$
2 0.4717 0.4437 0.4911 0.6213 0.8334
3 0.3083 0.3333 0.3860 0.5047 0.8382
4 0.3560 0.3792 0.4538 0.5404 0.8253
5 0.3455 0.3432 0.4154 0.4994 0.8016
6 0.2998 0.3291 0.3642 0.4666 0.8059
7 0.3167 0.3179 0.3615 0.4643 0.7657
8 0.3454 0.3825 0.4003 0.5383 0.8246
9 0.3941 0.4298 0.4601 0.5398 0.8297
10 0.3671 0.4066 0.4495 0.5341 0.7741
12 0.4275 0.4691 0.5269 0.6102 0.7941
N. Cl. Stability index s
$\epsilon=0.5$ $\epsilon=0.4$ $\epsilon=0.3$ $\epsilon=0.2$ $\epsilon=0.1$
2 1 0.1000 0.0600 0.1400 0.3600 0.7000
2 0.8434 0.8274 0.8422 0.8826 0.9669
3 1 0.1400 0.1400 0.2000 0.4400 0.9200
2 0.0800 0.1400 0.2400 0.3000 0.6600
3 0.7051 0.7199 0.7181 0.7742 0.9346
4 1 0.1800 0.1800 0.2400 0.4800 0.9600
2 0.1400 0.1600 0.3000 0.3000 0.7600
3 0.6800 0.7491 0.8035 0.9084 0.9895
4 0.4240 0.4280 0.4720 0.4733 0.5920
5 1 0.2400 0.2400 0.3000 0.5000 0.9800
2 0.2400 0.1600 0.3000 0.3000 0.7600
3 0.6036 0.6463 0.7294 0.8313 0.9661
4 0.2600 0.3200 0.3800 0.4200 0.7000
5 0.3840 0.3500 0.3680 0.4460 0.6020
6 1 0.2400 0.3000 0.3400 0.5000 0.9800
2 0.2400 0.2200 0.3000 0.3400 0.7600
3 0.4993 0.5750 0.5986 0.7498 0.9192
4 0.3000 0.3800 0.4200 0.5000 0.8600
5 0.1200 0.1400 0.1400 0.2400 0.6000
6 0.4000 0.3600 0.3866 0.4700 0.7166
9 1 0.4200 0.4000 0.5000 0.6600 0.9800
2 0.4600 0.4600 0.5200 0.5200 0.9000
3 0.2600 0.2600 0.3600 0.3800 0.6400
4 0.4266 0.4733 0.4933 0.4466 0.7466
5 0.4007 0.4356 0.4878 0.6516 0.8409
6 0.5200 0.6000 0.5600 0.6800 0.9800
7 0.4000 0.4200 0.4600 0.5600 0.9400
8 0.3800 0.3800 0.3600 0.4800 0.6400
9 0.2800 0.4400 0.4000 0.4800 0.8000

Figure: Hierarchical clustering of Melanoma samples (Average linkage method with 1- Pearson dissimilarity measure). Gray dotted lines cut the dendrogram such that exactly $k$ clusters are produced, for $k=2,4,6,9$. We pointed out the big "stable" cluster discovered by Bittner. See Table 2 and 3 for the the corresponding stability indices.
\includegraphics[width = 15cm]{ps/tree.Bittner.filtered.melanoma.average.eps}


next up previous
Next: Analysis of cluster reliability Up: Application of clusterv to Previous: Application of clusterv to
Giorgio 2006-08-16