Benchmarking results

In this section we provide some results about the time needed to perform typical tasks with the HCGene package. As an example we present some measurements relative to the processing and analysis of FunCat trees. Comparable results can also be obtained with the GO (taking into account a certain overload due its more complex DAG structure).

We measure the computational time of the script used in the previous example (Sect. 5). As a general result the script http://homes.dsi.unimi.it/valenti/SW/hcgene/examples/AnalysisYeastFunCat.R, executed with an Intel Pentium D 3.40GHz CPU with 2 Gbyte RAM in a linux Ubuntu 7.10 environment, requires about 140 seconds of computation.

The script builds up the the "universal" FunCat tree (that is the tree that collects all the available FunCat classes); then it generates a graphical representation of the tree, saving it in postscript format, and computes several statistical analyses of the "universal" tree, such as basic statistics about the depth of the tree (distances of each node from the root, median, quartiles and histograms of the distances), statistics on the out degree of nodes (e.g. histograms, ecdf). Then subtrees are extracted and plotted, depending on the level of the nodes; other subtrees rooted at specific nodes are extracted and plotted.

In the second part of the script, FunCat trees, having only nodes related to the yeast, are constructed and analyzed. The FunCat characteristics of the yeast tree are analyzed in terms of the distribution of node depths, node outdegrees, FunCat class cardinalities, and gene labels cardinalities, providing corresponding statistics and graphics. Moreover the table of multilabels associated with each gene in the yeast is constructed and saved in a file, several FunCat trees with annotated genes for the yeast are generated according to the depth of the nodes, or rooted at specific nodes. Then subtrees with classes selected according to the number of examples are constructed and plotted.

In the rest of this section we provide benchmarking results specific to some of the tasks listed above. To estimate the computation time we used the system.time function of the base R package.

Considering the construction of the "universal" FunCat tree (composed by 1364 nodes), we obtained the following results:

> system.time( gUniversalFuncat <- Do.universal.tree.Funcat())
   user   system   elapsed 
   0.404   0.004   0.409
We recall that system.time provides computation time in seconds: hence in this case the construction of the tree from scratch required about 0.4 seconds.

The computation of the statistics require small computational time. For instance, the computation time of all the distances of the nodes from the root amounts to about 16 msec, and the corresponding histogram about 8 msec.

 
> system.time(d<-Compute.statistics.Funcat.degree(gUniversalFuncat))
   user     system  elapsed 
   0.016   0.000    0.018 
 > system.time( hist(d$degree$outDegree, main="",xlab="out degree", breaks=0:maxout, right=FALSE))
   user     system  elapsed 
   0.008   0.000    0.005

The extraction of the subtree rooted at node 01.01 (amino acid metabolism) from the "universal" FunCat tree and its plotting on the screen requires about 1.5 s:

> system.time({s.nodes <- Subtree.nodes(gUniversalFuncat, "01.01");
+ g <- subGraph(s.nodes, gUniversalFuncat);
+ Pretty.plot.graph(g,fontsize=12,fillcolor="lightgreen",height=1.2,width=3.2,color="black", fontcolor="transparent");})
   user  system elapsed 
  1.592   0.000   1.617

The construction of the list of the most specific annotated genes in the yeast (6167 genes) is computed in about 10 seconds:

> system.time(Yeast.Funcat.specific <- Get.yeast.Funcat.specific.classes())
   user      system   elapsed 
  10.216   0.000     10.219
while the computation of the list with all the available annotations for the yeast genes (obtained by transitive closure) requires about 2 seconds:
> system.time(Yeast.Funcat.general <- Get.yeast.Funcat.all.classes(Yeast.Funcat.specific))
   user    system elapsed 
   1.924  0.004   1.935
Finally the construction of the table of multilabels associated with each gene of the yeast has a computational time of about 7 seconds
> system.time( Yeast.Funcat.Table <- Build.Funcat.Table.labels(Yeast.Funcat.general, store=FALSE))
   user   system   elapsed 
   7.036  0.200    7.253
Note that you need to compute this table only one time, and then you can use the columns of the table to extract data belonging to speific classes (see Sect. 4.3 and 4.2 for more details about the preparation of the data and the association of the genes to multilabels).

Basic statistics about the cardinality of the multilabels are computed in milliseconds:

> system.time( l<-Compute.statistics.gene.labels(Yeast.Funcat.general))
   user   system  elapsed 
   0.032  0.000   0.031
and basic statistics about the distribution of the cardinality of the FunCat classes for the yeast are computed in about 1 second:
>  system.time(l<-Compute.statistics.functional.classes(Yeast.Funcat.Table[,2:ncol(Yeast.Funcat.Table)]))
   user   system  elapsed 
  1.005   0.096   1.099
Finally, the construction of the tree of FunCat classes with at least 400 annotated yeast genes requires about 0.5 s:
> system.time({ Yeast.Funcat.classes <- names(Yeast.Funcat.Table);
+ gYeastFuncat <- subGraph(Yeast.Funcat.classes, gUniversalFuncat);
+ nodes <- Select.functional.classes.by.cardinality(Yeast.Funcat.Table, 400);
+ gYeast.card.400 <- subGraph(nodes, gYeastFuncat);})
   user    system  elapsed 
   0.452   0.096   0.548

In conclusion, processing FunCat trees with HCGene is quite efficient and all the computations to process, analyze graphs, labels and data can be easily performed with ordinary desktop and laptop computers.