Overview

The Bioinformatics and Machine Learning Group was founded in 2015, in the Department of Computer Science, Federal University of São Carlos, São Carlos, Brazil. Our research focus is the development and application of Machine Learning methods mainly to solve Biology-related problems.

Bioinformatics is a research field which uses computer science, statistics and mathematics in order to understand, process biological data. Prior to the advances in computer science, bioinformatics algorithms had to be explicitly hand programmed. This task has become extremely difficult, since the great amount of generated data nowadays makes unfeasible to process such amount of data without the use of automated algorithms. In this direction Machine Learning has emerged as a research field with huge potential of applications in bioinformatics.

Machine Learning is a subfield of computer science aiming at using mathematics and statistics to develop algorithms with the ability to learn a given task. This learning comes from the ability to progressively improve performance on a specific task using training data, without being explicitly programmed to perform the task. These algorithms have the ability to make predictions and extract patterns from data, and have been applied to many bioinformatics fields such as genomics, proteomics and evolution.

Research Topics

Our research aims at proposing machine learning methods to meet the growing demand on predictive and descriptive tasks in bioinformatics, specially those complex ones involving multiple and structured outputs. Thus, our group has a special focus on supervised methods for multi-output learning, using neural networks, evolutionary methods, decision trees, among others. Our main investigated topics are listed below.

Protein function prediction: proteins can perform multiple functions simultaneously, and these functions can be structured in a hierarchical taxonomy such as a tree or a graph.

Protein subcelullar localization: proteins can be located in different places in a cell, and these locations directly influence the functions the protein performs.

Protein protein interaction: to predict the interaction among proteins is important to understand their functions. Multiple interactions can occur among proteins.

Transposable elements classification: transposable elements are DNA sequences able to move or copy themselves in the genome of a cell. There are many different types of these elements, organized in a hierarchical taxonomy. Their identification and prediction is important to understand the roles they perform in the genomes.

Prediction of non-coding RNAs: it is known that non-coding RNAs have important roles in the organisms. The correct identification and prediction of these sequences is important to better understand their roles.

Prediction of microRNA target interactions: microRNAs have different target sites. The prediction of these interactions is important to understand their roles in the organisms.

Analysis of SNPs: single-nucleotide polymorphism is a variation in a single nucleotide occurring at a specific position in the genome. Different SNPs can influence in different mutations, and be related to different diseases. Thus, the identification and classification of these SNPs is very important.

Apart from the listed above topics, our group also investigates machine learning methods for many other applications such as Data Stream Mining, Active Learning, Semi-supervised Learning, Pattern Classification, Multi-objective Optimization, Hierarchical and Multi-label Classification, and Multi-target Classification and Regression.

Research Collaborations

Formally, our group maintains research collaborations and projects with Brazilian and foreign universities, resulting in publications in international peer-reviewed journals and in the exchange of students. Our current collaborations are listed bellow.

Dr. Celine Vens - Department of Public Health and Primary Care - Katholieke Universiteit Leuven - Belgium

Dr. Isaac Triguero - School of Computer Science - University of Nottingham - United Kingdom

Dr. João Gama - Laboratory of Artificial Intelligence and Decision Support - University of Porto - Portugal

Dr. Yaochu Jin - Nature Inspired Computing and Engineering (NICE) group - University of Surrey - United Kingdom

Dr. Rodrigo C. Barros - Faculdade de Informática - Pontifícia Universidade Católica do Rio Grande do Sul - Brazil

Dr. André C. P. L. F. de Carvalho - Instituto de Ciências Matemáticas e de Computação - Universidade de São Paulo - Brazil

Dr. Elaine Ribeiro de Faria Paiva - Faculdade de Computação - Universidade Federal de Uberlândia - Brazil

Dr. Jonathan de Andrade Silva - Universidade Federal do Mato Grosso do Sul - Campus Ponta Porã - Brazil

Dr. Carlos Norberto Fischer - Departamento de Estatística, Matemática Aplicada e Computação - Universidade Estadual Paulista - Brazil

Join the team

If you are an enthusiastic of Machine Learning and Bioinformatics, consider applying for a Master/PhD position in our group. Contact one of our members.