Promising tool to analyze sequences

Sequence Analysis

To extract knowledge
  • Naturally co-occurring amino acids, term coevolution, in a protein family play a significant role in both protein engineering and folding, and it is expanding in recent years from the studies of the effects of single-site mutations to the complete re-design of a protein and its folding, especially in structure prediction. An approach (SAEC) is developed to identify the evolutionary signatures (highly ordered networks of coupled amino acids, termed "residue communities", RCs) from protein homologous sequences.

/

Multiple sequence alignment

/

Given a protein amino acid sequence, its multiple sequence alignment (MSA) is obtained from two databases by the profiles HMM1 and HHblits2 homology search tools. Firstly, an alignment of a target protein is obtained by searching its sequence against the Uniclust30 database using HHblits with the default parameters of an E-value threshold 0.001. The other alignment of the same protein is generated by the default five search iterations of jackhmmer in the HMM suite, searching the query sequence against the UniRef90 database. Finally, the two alignments are combined

and aligned according to the target sequence. The final alignment of each protein is combined from the two obtained alignments according to the query sequence. Thereafter, it is trimmed based on minimum coverage, which satisfies two basic rules3,4: (1) a single site with more than 90% gaps across the MSA will be removed; and (2) a sequence with the percentage of gaps less than a given threshold (80%) will be deleted from the MSA. The weight, ω, of each sequence is computed using Leri 4.

Degree of conservation

The Kullback-Leibler relative entropy5 is used to measure how different the observed amino acid A at the ith position would be if A randomly occurred with an expected probability distribution.

Pairwise couplings

A pairwise coupling is a pattern between two covariant amino acids, and those couplings are conserved across evolution due to the functional and structural constraints. They can be used to predict protein structures, binding interfaces and protein-protein interactions, as well as protein design.

References
  • [1] S. R. Eddy, PLoS computational biology 7(10), e1002195 (2011)
  • [2] M. Mirdita et al., Nucleic acids research 45, D170–D176 (2017)
  • [3] N. J. Cheung, W. Yu. BMC bioinformatics 20, 1–11 (2019)
  • [4] N. J. Cheung et al., Comput. Struct. Biotechnol. J. (2021)
  • [5] S. Kullback, R. A. Leibler, The annals mathematical statistics 22,79–86 (1951).

© AmoAi. All rights reserved.