Tsitologyia. Home page

Vol. 45 (2003), N 7, p. 702-706

AMINO ACID CODE OF PROTEIN SECONDARY STRUCTURE

B. V. Shestopalov
Institute of Cytology RAS, St. Petersburg;
e-mail: shest@mail.cytspb.rssi.ru

The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence - the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i +5) interactions; 2) the secondary structure consists of regular (α-helical and ρ-structural) and irregular (coil) segments; 3) the α-helices, ρ-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i. e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for cal culating the secondary structure from the amino acid sequence as a partial solution of the problem of calcu lation of protein three-dimensional structure from the amino acid sequence, and the calculated secondary structure and codon strength distribution can be used for simulating the next step of protein folding; b) one can propose that the same secondary structures can be folded into different tertiary structures and, vice versa, different secondary structures can be folded into the same tertiary structures, provided codon distribu tions are considered also; c) codons can be considered as first elements of protein three-dimensional struc ture language.
Key words: protein folding, protein secondary structure, protein structure encoding

Back Contents Main