Tsitologyia. Home page

Vol. 45 (2003), N 7, p. 707-713

STATISTICAL MODEL OF AMINO ACID CODE OF PROTEIN SECONDARY STRUCTURE

B. V. Shestopalov
Institute of Cytology RAS, St. Petersburg;
e-mail: shest@mail.cytspb.rssi.ru

In the previous paper (Shestopalov, 2003) we presented the amino acid code of protein secondary structure as a partial solution of the fundamental problem of the protein three-dimensional structure calculation from the amino acid sequence. Here a statistical model of the code is described. The model is based on the structural data from 2258 protein chains (417 112 amino acid residues used). 60 and 61 % of the secondary structure, calculated using the model, coincide, respectively, with the observed secondary structure in the training subset and test subset (104 protein chains and 21 166 residues used). This is equal to the threshold value for all the secondary structure calculations, based on the models, where, similarly as here, only the nearest and middle-range interactions are considered. Therefore the constructed model can be applied for the protein structure prediction from the amino acid sequence, especially when additional information is used along with expert analysis, as in the most successful prediction methods. The model can be used for analysis of the secondary structure changes during protein folding by comparison of the calculated and observed secondary structures. The information about the conformationally invariant segments can serve for the simulation of the supersecondary structure formation, One can try to obtain and examine the protein subset, in which the calculated and observed secondary structures are very similar.
Key words: protein secondary structure encoding, protein secondary structure prediction

Back Contents Main