Universitŕ degli Studi di Milano

Molecular Biotechnologies and Bioinformatics
a.a. 2016/17

Bioinformatics methods

Holder: Giorgio Valentini

DI - Dipartimento di Informatica, Universita' degli Studi di Milano

e-mail: valentini@di.unimi.it



The lectures deal with computational methods and techniques for Computational Biology and Bioinformatics, covering both programming languages for Bioinformatics and Machine Learning Methods for Computational Biology.

At the end of the course the student should acquire:
  • Basic knowledge of machine learning algorithms
  • Ability to apply Machine Learning algorithms to the analysis of complex biomolecular data
  • Programming skills to realize software applications in Bioinformatics.

Course contents

1. The R programming language.

  • Main data structures in R: vectors, factors, matrices, arrays, lists and environments.
  • Control of execution flow: blocks, conditional statements, loops.
  • Functions and scripts
  • I/O functions and operators; R data import/export
  • R graphics
  • Object-oriented programming in R
  • Packages and R "extensions"

2. Machine Learning and Computational Biology.
  • Learning from data: supervised, unsupervised and semi-supervised machine learning methods.
  • Some examples of Computational Biology applications of Machine Learning:
- Automated functional annotation of proteins
- Systems Biology approaches to disease gene prioritization and to the analysis of biological networks.
- Outcome and abnormal phenotype prediction from multiple sources of omics data.
- Prediction of genetic variants and mutations associated with genetic diseases and cancer.

Methodology

Lectures and lab exercises, where each student will have a personal computer at his/her disposal.

Exam 
Development of a Bioinformatics R software project.
The data needed for the project are downloadable from here.
Project groups with the corresponding data to be analyzed.

When and where

Lessons begin: 8 November 2016 and end: January 2017
  Tuesday: 10.30-13.30
  Wednesday:  13.30-16.30
Aula informatica Via Celoria 20, Milano


Anacleto Lab (Computational Biology and Bioinformatics at the Computer Science Dept., University of Milan)



Teaching material

R Slides (in italian):


Link to the directory with the Machine Learning and Computational Biology slides


Papers (to be updated)
- P. Larranaga et al. Machine learning in bioinformatics, Briefings in Bioinformatics 7(1):86-112, 2006
- Y. Jiang et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracyGenome Biology, 17:184 September 2016.
- Barabasi A, Gulbahce N, Loscalzo J. Network medicine: a network-based
approach to human disease
. Nature Rev Genet 12:56–68.2011.
- X. Z. Zhou, J. Menche, A.-L. Barabási, A. Sharma Human symptoms–disease network
Nature Communications 5:4212, 1-10 (2014)
- J. Menche, A. Sharma, M. Kitsak, D. Ghiassian, M. Vidal, J. Loscazlo, A.-L. Barabasi
Uncovering disease-disease relationships through the incomplete interactome
Science 347:6224, 1257601-1, 2015.
- Y. Moreau, L. Tranchevent Computational tools for prioritizing candidate genes: boosting disease gene discovery, Nature Rev Genet, 13 (8), pp. 523-536, 2012.
- S. Aerts, D. Lambrechts, S. Maity, P. Van Loo, B. Coessens, F. De Smet, et al. Gene prioritization through genomic data fusion Nature Biotechnology, 24 (5) 2006.
- M. Kann Protein interactions and disease: computational approaches to uncover the etiology of diseases, Brief Bioinform, 8(5), 2007.
- R. Sharan, I. Ulitsky and R. Shamir, Network-based prediction of protein function , Molecular Systems Biology 3:88, 2007.
- S. Kohler, S. Bauer, D. Horn and P. Robinson, Walking the Interactome for Prioritization of Candidate Disease Genes, Am J Hum Genet. 82(4): 949–958 , 2008.
- S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios and Q. Morris, GeneMANIA: A Real-Time Multiple Association Network Integration Algorithm for Predicting Gene Function, Genome Biology, vol. 9, article S4, 2008.
- H. Chua, W. Sung and L. Wong, An Efficient Strategy for Extensive Integration of Diverse Biological Data for Protein Function Prediction, Bioinformatics, vol. 23, no. 24, pp. 3364-3373, 2007.
- M. Re, M. Mesiti and G. Valentini, A Fast Ranking Algorithm for Predicting Gene Functions in Biomolecular Networks, IEEE ACM Transactions on Computational Biology and Bioinformatics 9(6) pp. 1812-1818, 2012. IEEE link
- M. Mesiti, M. Re and G. Valentini, A Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction , GigaScience, 3 (2014), p. 5 doi: 10.1186/2047-217X-3-5 gigascience link
- G. Valentini, A. Paccanaro, H. Caniza, A. Romero, M. Re, An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods, Artificial Intelligence in Medicine, 61:2, pp.63-78, June 2014
- T. Sevimoglu, K. Y. Arga, The role of protein interaction networks in systems biomedicine, Computational and Structural Biotechnology Journal, Volume 11, Issue 18, pp. 22-27, August 2014