Project Description/Proposal

doc

Advances in molecular and cellular biology have led to the sequencing of genomes, the prediction of protein structure, gene expression microarrays, and gene regulatory networks. Each of these areas of study produce large amounts of data that must be processed. Due to the large amounts of data it is very difficult to analyze and abstract information from the collected data. Soft computing is gradually opening up possibilities of analyzing the data by providing low-cost, low-precision, solutions to search, classification, and prediction problems. Some examples of problems that soft computing has been applied to are genetic coding region identification, protein structure and function prediction and gene expression clustering.

For my final project I would like to develop a support vector machine (SVM) to address the problem of predicting protein function. With the ever growing amount of genomes sequenced it is becoming more important to be able to classify protein functions accurately and quickly. Currently, the only reliable way to find protein function is to experimentally determine the function which is very expensive and time consuming.


Work on applying SVMs to this problem has already been attempted with promising results. Protein function can be predicted based on protein sequence, protein interactions, and various other characteristics. The function can be classified by using the convention developed by the Gene Ontology Consortium. The gene ontology (GO) classification system provides descriptions of protein function based on biological processes, cellular components and molecular functions. There are several GO annotated genomes that will provide training and test data for my project. Thus, the goal of my project is to develop a SVM that uses one or more variables for predicting protein function to classify proteins according to the developed GO system.


Mitra, S., Hayashi, Y. 2006. Bioinformatics with Soft Computing. IEEE Transactions on Systems, Man and Cybernetics-Part C: Applications and Reviews. 36(5): 616-635.

Vinayagam A, König R, Moormann J, Schubert F, Eils R, Glatting KH, Suhai S. Applying Support Vector Machines for Gene Ontology based gene function prediction. BMC Bioinformatics. 2004;5:116.

KDD Cup 2001 Guidelines. http://www.cs.wisc.edu/~dpage/kddcup2001/.

The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29.