PCI-Based Protein Scondary Structure Site Prediction Server

J.R. Green, M.J. Korenberg, M.O. Aboul-Magd

James Green

Michael Korenberg
Description  Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein’s structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.

Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. The resulting prediction accuracies compare well with those of top contemporary methods based on artificial neural networks (ANNs) and hidden Markov models (HMMs), when evaluated over a set of 63 new protein chains guaranteed to be dissimilar to all training data.

When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, not only are fewer errors committed when compared with PSIPRED alone, but also the rate of occurrence of a particularly detrimental error is reduced by up to 25%. In fact, the combined classifier achieves the highest Q3 and SOV accuracies observed for any method evaluated in the present study over a novel set of 63 protein chains guaranteed to be dissimilar to all proteins used to train all methods.


Click to learn about

  Sequence Name 
Display results on screen OR Email Results to:

References  Green, J.R. and Korenberg, M.J., "Nonlinear System Identification Provides Insight Into Protein Folding", CCECE06, Ottawa, Ontario, 7-10 May, 2006.

Green, J.R. and Korenberg, M.J., "MISO Dynamic Nonlinear Protein Secondary Structure Prediction", BMC Bioinformatics 10:222, 2009.

PSIPRED - Jones, D.T. "Protein secondary structure prediction based on position-specific scoring matrices," J. Mol. Biol. 292: 195-202, 1999.
© 2007