Main functionalities of HCGene
In this section we summarize the main functionalities provided by the package.
We then describe some examples of R scripts to introduce the usage of the HCGene R package.
For details about the single functions implemented in the library, please see the
Reference manual.
The main functionalities of the software library can be summarized as follows.
- Graph processing:
building of hierarchical structures covering the main ontologies (trees and graphs);
methods to analyze the structure and the relationships between functional classes
(e.g., distribution of class nodes with respect to their depth, distribution of
indegree and outdegree, number of annotated genes for each class, distribution of
leaves at different levels); methods to extract biologically meaningful structures
from GO DAGs and FunCat trees.
- Multilabel generation:
extraction of the most specific annotations and building of the full annotation of
genes exploiting transitivity between class annotations;
derivation of a compact representation for the multilabel of each gene; mapping functions
to associate gene names or identifiers (e.g., ORF ID or EntrezGene IDs) to functional classes.
- Data building:
methods to associate gene names to different types of data;
methods to select positive and negative examples for each class according to different strategies;
methods to build data relative to specific functional classes.
In more detail, the list of the main functionalities and functions provided by the package is the following:
- Functions to construct and process GO graphs and FunCat trees.
- Build.universal.graph.ontology.up : Construction the full graph of a GO ontology with edges from children to parents
- Build.universal.graph.ontology.down : Construction the full graph of a GO ontology with edges from parents to children
- Do.universal.tree.Funcat : Construction of the "universal tree" of the FunCat taxonomy
- Select.functional.classes.by.cardinality : Selection of a set of GO/FunCat classes on the basis of the number of positive examples
- Select.GO.classes.by.distance : Selection of a set of GO classes at a given distance from the root of a given ontology
- Select.Funcat.classes.by.depth : Selection of a set of nodes at a given distance from the root of the FunCat tree
- Select.GO.rooted.classes : Selection of nodes rooted at a set of given nodes in a given GO graph
- Select.ontology.evidence : Selection of functional classes associated to genes with respect to the evidence code of the annotation
- Subtree.nodes : Extraction of the subtree associated to a given node
- Utility functions to navigate inside the FunCat tree : Functions to obtain children, parent and ancestors of a given FunCat tree node.
- Functions to analyze properties of GO and FunCat graphs
- Compute.distances.from.root : Computation of all distances from the root of a given ontology
- Compute.statistics.GO.degree : Statistics about the distribution of the in and out degree of GO nodes
- Compute.statistics.Funcat.degree : Statistics about the distribution of the outdegree of FunCat nodes
- Compute.depth.statistics.GO.graph : Statistics about the distribution of node depths in a given GO graph
- Compute.depth.statistics.Funcat.labels : Statistics about the distribution of node depths in the FunCat tree
- Get.depth.Funcat.labels : Statistics of the distribution the depth of FunCat nodes
- Get.leaf.distribution.by.depth.Funcat.labels : Distribution of leaves by depth value in a FunCat tree
- Functions to associate genes to GO or FunCat classes
- Get.GO.specific.classes : Function to get all the most specific GO classes for each gene
- GO.transitive.closure : Function to compute the transitive closure of a given list of GO classes
- Funcat.transitive.closure : Function to compute the transitive closure of a given list of FunCat classes
- Get.GO.all.classes : Function that provides all the GO classes for a list of genes
- Code.decode.classes : Coding GO/FunCat classes to integers
- Build.GO.class.labels : Construction of a data frame that maps genes to multiple GO classes
- Get.Funcat.specific.classes : Function to get all the most specific FunCat classes for each gene of a given species
- Get.Funcat.all.classes : Function that provides all the FunCat classes for a list of genes
- Build.Funcat.Table.labels : Construction of a data frame that maps genes to multiple FunCat functional classes
- Extract.class : Function to extract the examples belonging to a given GO/FunCat class
- Build.class.labels.from.selected.classes : Extraction of a subset of classes from a given functional table
- Function to associate genes of specific GO/FunCat classes to biological data and to build data
- Get.matrix.data.for.two.selected.classes : Data matrix construction for a binary classification problem: negative examples are all those examples that belong to a specific class
- Get.matrix.data.for.classid : Data matrix construction for a binary classification problem: negative examples are all those examples that do not belong to the class
- Get.matrix.data.from.parent.only : Data matrix construction for a binary classification problem: negative examples are all those examples that do not belong to the class and belong to the parent class
- Get.matrix.data.without.ancestors : Data matrix construction for a binary classification problem: negative examples are all those examples that do not belong to the class or its ancenstors
- Get.positive.matrix.data.for.classid : Data matrix construction for a binary classification problem: only positive examples
- Build.list.yeast.homology.data : Construction of the list of lists of homology data for the yeast
- Build.list.human.homology.data : Construction of the list of lists of homology data for the human
- Build.list.mouse.homology.data : Construction of the list of lists of homology data for the mouse
- Build.list.arabidopsis.homology.data : Construction of the list of lists of homology data for the cress
- Do.binary.data.homology : Function to build homology (phylogenetic) data in binary format
- Do.float.data.homology : Function to build homology (phylogenetic) data in floating point format
- Get.all.common.genes : Getting genes common to different data sets
- Graphic functions
- Do.table.hist.cardinality.labels : histogram of cardinality of gene labels
- Do.ecdf.cardinality.labels : ecdf of cardinality of gene labels
- Plot.histogram.gene.per.class : Plot of the histogram of the number of genes per class
- Plot.distribution.gene.per.class : Plot of the distribution of the number of examples per class
- Plot.ontology.graph : Plotting graphs of GO and FunCat
- Pretty.plot.graph : Plotting graphs of GO and FunCat
- Plot.hist.Funcat.depth.labels : Histogram of the depth of FunCat classes
- Mapping environments
- ATDESCRIPTION : Description of A thaliana AGI identifiers
- ATFUNCAT : AGI identifiers to Functional Classification (FunCat) mapping
- ATGO : AGI identifiers to Gene Ontology (GO) mapping
- FID2TERM : Mapping of FunCat class ID to the corresponding FunCat functional term
- TERM2FID : Mapping of FunCat functional term to FunCat class ID
- HGID2ORF : Mapping between yeast Homologene ID and ORF ID
- ORF2HGID : Mapping between yeast ORF ID and Homologene ID
- ORF2RefSeq : Mapping between yeast ORF ID to Refseq ID
- RefSeq2ORF : Mapping between Refseq ID to yeast ORF ID
- HUMANGO : Human Entrez Gene ID identifiers to Gene Ontology (GO) mapping
- MOUSEFUNCAT : Mouse Mfun MIPS ID identifiers to FunCat mapping
- MOUSEGO : Mouse Entrez Gene ID identifiers to Gene Ontology (GO) mapping
- YEASTFUNCAT : Yeast ORF identifiers to FunCat mapping
In the following sections we describe some examples of R scripts to introduce the usage of the
HCgene R package. The source code of the scripts and the related data are downloadable from:
http://homes.dsi.unimi.it/valenti/SW/hcgene/examples.