Subsections


An example of the usage of HCGene for the analysis of the FunCat taxonomy for the yeast

This section shows how HCGene can be used to analyze of the characteristics and the structure of the Functional Catalogue (FunCat), a hierarchically structured, organism-independent classification system enabling the functional description of proteins from any organism. In particular, in this section the structure of the FunCat taxonomy is analyzed with respect to the eukaryotic unicellular organism Saccharomyces cerevisiae (budding yeast).

We start with an analysis of the "universal" FunCat tree (that is the tree that collects all the available FunCat classes); then, we proceed with the analysis of FunCat trees considering only nodes related to the yeast. The FunCat trees characteristics are analyzed in terms of the distribution of: node depths, node outdegrees, FunCat class cardinalities, and gene labels cardinalities.

We also provide a graphical representation of these trees, showing subtrees extracted from the "universal" FunCat tree according to some relevant properties of their nodes.

Here we present the results. The R code used to perform the analysis is downloadable from: http://homes.dsi.unimi.it/valenti/SW/hcgene/examples/AnalysisYeastFunCat.R.

Note that this kind of analysis can be easily applied to other organisms, like mouse or cress. The code is essentially the same as the previously introduced example file AnalysisYeastFunCat.R for the yeast (see above); we only need to change the code implementing the associations of genes to FunCat classes. To this end the HCGene function Get.Funcat.specific.classes can be used to obtain the genes annotated with the most specific classes, and then Get.Funcat.all.classes can be used to obtain all the annotations associated with each gene of the organism under investigation.

For instance, to obtain all the most specific FunCat annotations for the mouse, one only needs two lines of code:

data(MOUSEFUNCAT);
mouse.genes.to.Funcat <- Get.Funcat.specific.classes(MOUSEFUNCAT);
The first line loads the environment MOUSEFUNCAT that maps Mus musculus genes to FunCat classes. Then Get.Funcat.specific.classes returns a list with all the genes of the mouse mapped to the most specific FunCat classes. The following code shows that the mouse protein Endonuclease VIII-like 3, which corresponds to the gene Neil 3 (Mgi identifier mc8000734), belongs to the FunCat classes 10.01.05.01, 16.03.01, 16.17.09, 32.01.09, and 70.10:
> get("mc8000734",MOUSEFUNCAT)[[1]]$Prot.Descr;
[1] "Endonuclease VIII-like 3"
> get("mc8000734",MOUSEFUNCAT)[[1]]$Gene.Name;
[1] "Neil3"
> mouse.genes.to.Funcat$mc8000734;
[1] "10.01.05.01" "16.03.01"    "16.17.09"    "32.01.09"    "70.10"

To obtain all the available FunCat annotations for the genes of the mouse, we can use this code:

mouse.genes.to.all.Funcat <- Get.Funcat.all.classes(mouse.genes.to.Funcat);

For instance, all the classes associated with the gene mc8000734 are:

> mouse.genes.to.all.Funcat$mc8000734;
 [1] "00"          "10"          "10.01"       "10.01.05"    "10.01.05.01"
 [6] "16"          "16.03"       "16.03.01"    "16.17"       "16.17.09"   
[11] "32"          "32.01"       "32.01.09"    "70"          "70.10"

In the rest of this section we only present the results of the analysis. Please see AnalysisYeastFunCat.R for the relevant R code.

FunCat hierarchical trees

In this section we analyze the characteristics of the FunCat tree for the hierarchical classification of genes. We consider the "universal" tree, available for all species from yeast to human. We provide some basic statistics about the distribution of node depths and outdegrees (i.e., number of children) for the universal FunCat tree.

The "universal" FunCat tree

We start by considering the general structure of the FunCat taxonomy. Then, we build the associated tree.

The "universal" FunCat tree is represented in Fig. 4. We added a "dummy" root node (FunCat ID code 00) to which all the 27 "first level" nodes are linked.

A detail of the "first level" nodes of the tree is available in Fig. 5. Note that the FunCat ID codes of the first level nodes are represented by 2 digits. The FunCat IDs of the second level nodes are represented by a pair of digits separated by a dot; e.g., the children of node "01" are: "01.01", "01.02", "01.03", "01.04", "01.05", "01.06", "01.07", "01.20", "01.25". Third level nodes, e.g., the children of "01.01", are represented as "01.01.03", "01.01.05", "01.01.06", "01.01.09", "01.01.11", "01.01.13". The same criterion is applied to all levels.

The entire tree, excluding the dummy root node, has 1362 functional classes.

Figure 4: Tree of "universal" FunCat ontology with 1363 functional classes (including the "dummy" root node).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Funcat.Universal.up.graph.ps}\end{figure}

Figure 5: Tree of the "first level" classes of the FunCat ontology for the the eukaryote S. cerevisiae
\begin{figure}\centering
\includegraphics [width = 14.0cm] {ps/Funcat.Universal.graph.level1.ps}\end{figure}

Distribution of the "depths" in the universal FunCat tree

Tab. 1 summarizes some basic statistics about the depth of the nodes of the universal-yeast tree: the mean depth and the median of each node is about 4. The maximum depth is 6. The distribution of the depths is approximated by a normal distribution centered around depths 3-4 (Fig. 6).

Table 1: Statistics of the depth FunCat classes
Statistics of the depth of the FunCat classes (all evidence codes)
Number of yeast classes: $1362$
mean depth: $3.54$
median : 4
st. dev. : 1.01
quantiles : 0% 25% 50% 75% 100%
0 3 4 4 6

Figure 6: Histogram of the depth of FunCat classes.
\begin{figure}\centering
\includegraphics [width = 12.0cm] {ps/Funcat.Universal.hist.depth.ps} \\\end{figure}

Distribution of the number of children in the FunCat universal tree

In the universal FunCat tree, the average degree of each node is 1 and the median is 0 (Tab. 2). The largest number of children is 27 (the first level of the tree, Fig. 5).

The histogram of Fig. 7 show that more than 1000 nodes (about 70%) are leaves. Most of the remaining nodes have 2, 3 and 4 children (in this order).

The empirical cumulative distribution of outdegrees (Fig. 8) shows that more than 90% of the nodes have outdegree equal to or lower than 4. Finally, 19 nodes have more than 9 children (about 1%).


Table 2: Statistics of the node degrees in the FunCat universal tree
Number of classes: $1362$
Statistics of the degree of the nodes
mean: 1.00
median : 0
st.dev. : 2.46
quantiles : 0% 25% 50% 75% 100%
0 0 0 1 27

Figure 7: Histogram of the degrees in the FunCat universal tree.
\begin{figure}\centering
\includegraphics [width = 12cm]{ps/Funcat.universal.tree.hist.outdegree.ps}\end{figure}
Figure 8: Empirical cumulative distribution of the degrees in the FunCat universal tree.
\begin{figure}\centering
\includegraphics [width = 12cm]{ps/Funcat.universal.tree.ecdf.outdegree.ps}\end{figure}

Subgraphs of the universal FunCat tree

From the "universal" FunCat tree we may extract subgraphs of interest for a specific investigation. For instance, we could extract a subgraph of the FunCat classes involved in a specific pathway, or, more generally, we could extract FunCat classes involved in the specific biological problem under investigation.

In this section we present some subgraphs extracted from the universal FunCat tree on the basis of the node depth, or starting from the subtree rooted at a specific node.

Subgraphs extracted according to the depth of the nodes

We may be interested in classifying functional classes at different levels of detail. This can be achieved by using the depth of the nodes in the graph: shallow nodes represent more general functional classes, while deeper nodes represent more specific classes within the ontology. For example, Fig. 5 represents only nodes of depth 1 in the universal-FunCat tree, Fig. 9 and Fig. 10 represent nodes with depth equal or larger than 2 and 3 respectively.

Figure 9: Subtree of the FunCat taxonomy with nodes up to depth level 2.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Funcat.Universal.graph.level2.ps}\end{figure}

Figure 10: Subtree of the FunCat taxonomy with nodes up to depth level 3.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Funcat.Universal.graph.level3.ps}\end{figure}


Subtrees rooted at a specific node

Here we present some subgraphs obtained by choosing a specific node and the associated subtree rooted at that node. This may be useful if we would like to investigate a specific functional class together with its subclasses. For instance, consider the subtree rooted at FunCat ID node "01". This corresponds to the general functional class Metabolism; the corresponding subtree rooted at the Metabolism node is shown in Fig. 11.

Fig. 12 shows a subtree (FunCat class "01.01" - amino acid metabolism) of the previous example (FunCat node "01.01" is a child of the Metabolism node "01").

Fig. 13 shows a "second level" branch (FunCat class "01.02.02" - nitrogen metabolism) of the subtree rooted at "01" (Fig. 13).

Finally, a completely different subtree rooted at node "30" (Cellular communication/signal transduction mechanism) is shown in Fig. 14.

Figure 11: Subtree of the FunCat taxonomy rooted at node "01" (Metabolism).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Funcat.Universal.graph.01.ps}\end{figure}

Figure 12: Subtree of the FunCat taxonomy rooted at node "01.01" (amino acid metabolism).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Funcat.Universal.graph.01.01.ps}\end{figure}

Figure 13: Subtree of the FunCat taxonomy rooted at node "01.02.02" (nitrogen metabolism).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Funcat.Universal.graph.01.02.02.ps}\end{figure}

Figure 14: Subtree of the FunCat taxonomy rooted at node "30" (Cellular communication/signal transduction mechanism).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Funcat.Universal.graph.30.ps}\end{figure}

Hierarchical trees of the FunCat classes involved in the biological processes of the yeast

In this section we analyze the characteristics of the FunCat trees restricted to the FunCat classes of interest for the yeast. We also provide some basic statistics about the distribution of the node depths and outdegrees for the FunCat ontology of the yeast, the distribution of the cardinalities of FunCat yeast classes, and the distribution of yeast labels relatively to FunCat classes.

The "universal" FunCat tree for the yeast

We start by considering the FunCat annotated genes (according to any evidence) of the yeast. We build the tree including all FunCat classes with at least one associated yeast gene.

The "universal" FunCat tree for the yeast, considering all the evidence codes (universal yeast tree), is represented in Fig. 15. We added a "dummy" root node (FunCat ID code 00) to which all the 18 first level nodes are linked. A detail of the first level nodes of the tree is available in Fig. 16: note that the the FunCat ID codes of the first level nodes are represented by 2 digits. The FunCat ID of the second level nodes is represented by a pair of digits separated by a dot; e.g., children of node "01" are: "01.01", "01.02", "01.03", "01.04", "01.05", "01.06", "01.07", "01.20", "01.25". Third level nodes, e.g., children of "01.01", are represented as: "01.01.03", "01.01.05", "01.01.06", "01.01.09", "01.01.11", "01.01.13". The same criterion is applied to all levels.

The entire tree, excluding the dummy root node, has 507 functional classes.

Figure 15: Tree of "universal-yeast" FunCat taxonomy for the the eukaryote S. cerevisiae: 508 functional classes, including the "dummy" root node are listed.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.ps}\end{figure}

Figure 16: Three first level classes of the FunCat ontology for the eukaryote S. cerevisiae
\begin{figure}\centering
\includegraphics [width = 14.0cm] {ps/Yeast.Funcat.graph.level1.ps}\end{figure}

Distribution of the "depths" in the universal yeast tree

Tab. 3 summarizes some basic statistics about the depth of the nodes of the universal yeast tree: the mean and the median depth of each node is about 3. The maximum depth is 6. The distribution of the depths is approximated by a normal distribution centered around depth 3 (Fig. 17).

Table 3: Statistics of the depth FunCat yeast classes
Statistics of the depth of the yeast FunCat classes (all evidence codes)
Number of yeast classes: $507$
Number of edges: $507$
mean depth: $3.38$
median : 3
st.dev. : 1.07
quantiles : 0% 25% 50% 75% 100%
0 3 3 4 6

Figure 17: Histogram of depths of FunCat classes for the yeast.
\begin{figure}\centering
\includegraphics [width = 12.0cm] {ps/Yeast.Funcat.hist.depth.ps} \\\end{figure}

Distribution of the number of children in the universal tree for the yeast

The outdegree distribution for the nodes in the universal yeast tree has average 1 and median 0 (Tab. 4). The maximum number of children is 18 (the first level of the tree, Fig. 16).

The histograms of Fig. 18 show that more than 300 nodes (about 60%) are leaves. Most of the remaining nodes have two children, but three children or one child is also quite common.

The empirical cumulative distribution of outdegree (Fig. 19) shows that about 90% of the nodes have an outdegree equal or lower than 3. Only 17 nodes have more than 5 children (about 3%).


Table 4: Statistics of the node degrees in the FunCat universal tree for the yeast
Number of yeast classes: $507$
Number of edges: $507$
Statistics of the degree of the nodes
mean: 1.00
median : 0
st.dev. : 2.04
quantiles : 0% 25% 50% 75% 100%
0 0 0 2 18

Figure 18: Histogram of the distribution of the degrees in the universal tree for the yeast.
\begin{figure}\centering
\includegraphics [width = 12cm] {ps/Yeast.Funcat.hist.outdegree.ps}\end{figure}

Figure 19: Empirical cumulative distribution of the degrees in the universal tree for the yeast.
\begin{figure}\centering
\includegraphics [width = 12cm] {ps/Yeast.Funcat.ecdf.outdegree.ps} \\\end{figure}

Distribution of positive examples for FunCat classes

. In this section we provide basic statistics about the distribution of the number of positive examples in FunCat classes for the unicellular eukaryote S. cerevisiae (budding yeast).

The FunCat project provides only the most specific annotations for each gene. The complete multilabel of a gene can be derived by transitivity; i.e., by adding all the classes (up to the root) that are ancestors of a class in the most specific annotation.

According to the FunCat annotation we have about 6167 annotated genes for the yeast, considering every type of evidence in the annotation. There is a total of 507 FunCat classes with annotated genes (including all the annotations added by transitivity).

The statistics about the mean, median number of genes per class, and quantiles are shown in Tab. 5.

Fig. 20 shows the histogram of the number of positive examples in FunCat classes; Fig. 21 shows the number of classes with a number of positive examples larger or equal to the value represented in abscissa. From this figure we may observe that about 300 classes have at least 10 positive examples, about 150 more than 50 positive examples, and 100 have more than 100 positive examples. It is worth noting that more than 50 classes have more than 200 positive examples.


Table 5: Statistics of positive examples of the FunCat classes for all the annotated genes of the yeast
Number of annotated genes : $6167$
Number of yeast classes: $507$
mean number of genes : 79.57
median : 15
st.dev. : 177.78
quantiles : 0% 25% 50% 75% 100%
1 5 15 64 1519

Figure 20: Histogram of the number of positive examples for FunCat classes in S. cerevisiae.
\begin{figure}\centering
\includegraphics [width = 12.0cm] {ps/Yeast.Funcat.general.per.class.hist.ps}\end{figure}

Figure 21: Distribution of positive examples for FunCat classes in S. cerevisiae: plot of the number of classes with having at least as many positive examples as the value represented in abscissa.
\begin{figure}\centering
\includegraphics [width = 12.0cm] {ps/Yeast.Funcat.general.per.class.distr.ps}\end{figure}

Distribution of gene labels

In this section we provide basic statistics about the distribution of gene labels for the unicellular eukaryote S. cerevisiae (budding yeast).

Tab. 6 summarizes some statistics about the distribution of the cardinality of gene labels (including all the FunCat annotations added by transitivity).

Fig. 23 represents the histogram and the empirical cumulative distribution of the gene labels. Note that all the genes are labeled with the "dummy" FunCat class "00": hence all the genes with two labels are actually labeled with only one "true" FunCat class, genes with three labels with two, and so on. More than 1400 genes (20%) are labeled with two classes (the dummy and one of the "first level" classes), but a consistent number of genes has between 4 and 10 labels (Fig. 22). From the ecdf we can observe that more than 40% of genes belong to more than 9 classes.


Table 6: Statistics of gene labels for annotated genes in S. cerevisiae
Number of annotated genes : 6167
Number of yeast classes: 507
Statistics of all annotated genes:
mean : 7.54
median : 6
st.dev. : 5.52
quartiles : 0% 25% 50% 75% 100%
2 3 6 10 46

Figure 22: Histogram of the number of labels (FunCat classes) per gene in S. cerevisiae.
\begin{figure}\centering
\includegraphics [width = 12.0cm] {ps/Yeast.Funcat.general.labels.histogram.ps}\end{figure}

Figure 23: Empirical cumulative distribution of the number of labels (FunCat classes) per gene in S. cerevisiae.
\begin{figure}\centering
\includegraphics [width = 12.0cm] {ps/Yeast.Funcat.general.labels.ecdf.ps}\end{figure}

Subgraphs of the universal tree for the yeast

From the "universal" FunCat tree for the yeast we can extract subgraphs of interest for a specific investigation. For instance, we could extract a subgraph of the FunCat classes involved in a specific pathway, or, more generally, we could extract FunCat classes involved in the specific biological problem under investigation.

In this section we present some subgraphs extracted from the universal yeast tree on the basis of the node depth or cardinality, or starting from the subtree rooted at a specific node.

Subgraphs extracted according to the node depths

Some examples of subtrees extracted from the universal yeast tree are presented in the following figures: Fig. 16 only includes nodes of depth 1 in the universal yeast tree, Fig. 24 and Fig. 25 include nodes with depth equal or larger than 2 and 3 respectively.

Figure 24: Subtree of the FunCat yeast taxonomy with nodes up to depth level $2$.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.level2.ps}\end{figure}

Figure 25: Subtree of the FunCat yeast taxonomy with nodes up to depth level $3$.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.level3.ps}\end{figure}

Subgraphs extracted according to the cardinality of positive examples

It is well known that the generalization abilities of classification algorithms depend on the amount of training data. From this standpoint it is reasonable to attempt a classification of functional classes only if there is a sufficient number of available examples. The following figures depict subgraphs of the universalmyeast tree extracted according to the cardinality of positive examples for each FunCat class. Fig. 26 shows the subtree of the yeast FunCat tree only including nodes with at least 20 positive examples. Fig. 27, 28, 29, and 30 show graphs extracted from the universal-yeast tree including nodes having at least 50, 100, 200 and 400 positive examples respectively.

Figure 26: Subtree of the yeast FunCat taxonomy with nodes having at least 20 positive examples.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.card20.ps}\end{figure}

Figure 27: Subtree of the yeast FunCat taxonomy with nodes having at least 50 positive examples (all evidence codes).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.card50.ps}\end{figure}

Figure 28: Subtree of the yeast FunCat taxonomy with nodes having at least 100 positive examples.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.card100.ps}\end{figure}

Figure 29: Subtree of the yeast FunCat taxonomy with nodes having at least 200 positive examples.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.card200.ps}\end{figure}

Figure 30: Subtree of the yeast FunCat taxonomy with nodes having at least 400 positive examples.
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.card400.ps}\end{figure}

Subtrees rooted at a specific node

Here we present the same examples of subtrees rooted at specific nodes as in Sect. 5.2.2, but in this case we only consider FunCat classes for which annotated yeast genes do exist. The subtree rooted at FunCat ID node "01", which corresponds to the general functional class Metabolism, is shown in Fig. 31.

Fig. 32 shows a subtree (FunCat class "01.01" - amino acid metabolism) of the previous example; indeed, FunCat node "01.01" is a child of the Metabolism node "01".

Fig. 33 shows a "second level" branch (FunCat class "01.02.02" - nitrogen metabolism) of the subtree rooted at "01" (Fig. 33).

Finally, a completely different subtree, rooted at node "30" (Cellular communication/signal transduction mechanism) is shown in Fig. 34.

Figure 31: Subtree of the yeast FunCat taxonomy rooted at node "01" (Metabolism).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.01.ps}\end{figure}

Figure 32: Subtree of the yeast FunCat taxonomy rooted at node "01.01" (amino acid metabolism).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.01.01.ps}\end{figure}

Figure 33: Subtree of the yeast FunCat taxonomy rooted at node "01.02.02" (nitrogen metabolism).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.01.02.02.ps}\end{figure}

Figure 34: Subtree of the yeast FunCat taxonomy rooted at node "30" (Cellular communication/signal transduction mechanism).
\begin{figure}\centering
\includegraphics [width = 16.0cm] {ps/Yeast.Funcat.graph.30.ps}\end{figure}