STATISTICAL MODEL OF AMINO ACID CODE OF PROTEIN SECONDARY STRUCTURE
B. V. Shestopalov
Institute of Cytology RAS, St. Petersburg;
e-mail: shest@mail.cytspb.rssi.ru
In the previous paper (Shestopalov, 2003) we presented the amino acid code of protein secondary structure as a 
partial solution of the fundamental problem of the protein three-dimensional structure calculation from the amino 
acid sequence. Here a statistical model of the code is described. The model is based on the structural data from 
2258 protein chains (417 112 amino acid residues used). 60 and 61 % of the secondary structure, calculated using 
the model, coincide, respectively, with the observed secondary structure in the training subset and test subset 
(104 protein chains and 21 166 residues used). This is equal to the threshold value for all the secondary structure 
calculations, based on the models, where, similarly as here, only the nearest and middle-range interactions are 
considered. Therefore the constructed model can be applied for the protein structure prediction from the amino acid 
sequence, especially when additional information is used along with expert analysis, as in the most successful 
prediction methods. The model can be used for analysis of the secondary structure changes during protein folding by 
comparison of the calculated and observed secondary structures. The information about the conformationally invariant 
segments can serve for the simulation of the supersecondary structure formation, One can try to obtain and examine 
the protein subset, in which the calculated and observed secondary structures are very similar.
Key words:  protein secondary structure encoding, protein secondary structure prediction
Back   
Contents   
Main