AMINO ACID CODE OF PROTEIN SECONDARY STRUCTURE
B. V. Shestopalov
Institute of Cytology RAS, St. Petersburg;
e-mail: shest@mail.cytspb.rssi.ru
The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to
be solved. This paper presents principles of the code theory of protein secondary structure, and their
consequence - the amino acid code of protein secondary structure. The doublet code model of protein secondary
structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are:
1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual)
and middle-range (at a distance no more than that between residues i and i +5) interactions;
2) the secondary structure consists of regular (α-helical and ρ-structural) and irregular (coil)
segments;
3) the α-helices, ρ-strands and coil segments are encoded, respectively, by residue pairs
(i, i + 4), (i, i + 2),
(i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid
sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types
depending on their strength, i. e. their encoding capability; 6) overlappings of structurons of one and the
same structure generate the longer segments of this structure; 7) overlapping of structurons of different
structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the
code theory of protein secondary structure generates six variants of the amino acid code of protein secondary
structure. There are two possible kinds of model construction based on the theory: the physical one using
physical properties of amino acid residues, and the statistical one using results of statistical analysis of a
great body of structural data. Some evident consequences of the theory are: a) the theory can be used for cal
culating the secondary structure from the amino acid sequence as a partial solution of the problem of calcu
lation of protein three-dimensional structure from the amino acid sequence, and the calculated secondary
structure and codon strength distribution can be used for simulating the next step of protein folding; b) one
can propose that the same secondary structures can be folded into different tertiary structures and, vice
versa, different secondary structures can be folded into the same tertiary structures, provided codon distribu
tions are considered also; c) codons can be considered as first elements of protein three-dimensional struc
ture language.
Key words: protein folding, protein secondary structure, protein structure encoding
Back
Contents
Main