STATISTICAL MODEL OF AMINO ACID CODE OF PROTEIN SECONDARY STRUCTURE
B. V. Shestopalov
Institute of Cytology RAS, St. Petersburg;
e-mail: shest@mail.cytspb.rssi.ru
In the previous paper (Shestopalov, 2003) we presented the amino acid code of protein secondary structure as a
partial solution of the fundamental problem of the protein three-dimensional structure calculation from the amino
acid sequence. Here a statistical model of the code is described. The model is based on the structural data from
2258 protein chains (417 112 amino acid residues used). 60 and 61 % of the secondary structure, calculated using
the model, coincide, respectively, with the observed secondary structure in the training subset and test subset
(104 protein chains and 21 166 residues used). This is equal to the threshold value for all the secondary structure
calculations, based on the models, where, similarly as here, only the nearest and middle-range interactions are
considered. Therefore the constructed model can be applied for the protein structure prediction from the amino acid
sequence, especially when additional information is used along with expert analysis, as in the most successful
prediction methods. The model can be used for analysis of the secondary structure changes during protein folding by
comparison of the calculated and observed secondary structures. The information about the conformationally invariant
segments can serve for the simulation of the supersecondary structure formation, One can try to obtain and examine
the protein subset, in which the calculated and observed secondary structures are very similar.
Key words: protein secondary structure encoding, protein secondary structure prediction
Back
Contents
Main