Bioinformatic Identification of Rare Codon Clusters (RCCs) in HBV Genome and Evaluation of RCCs in Proteins Structure of Hepatitis B Virus.
ABSTRACT: BACKGROUND:Hepatitis B virus (HBV) as an infectious disease that has nine genotypes (A - I) and a 'putative' genotype J. OBJECTIVES:The aim of this study was to identify the rare codon clusters (RCC) in the HBV genome and to evaluate these RCCs in the HBV proteins structure. METHODS:For detection of protein family accession numbers (Pfam) in HBV proteins, the UniProt database and Pfam search tool were used. Protein family accession numbers is a comprehensive and accurate collection of protein domains and families. It contains annotation of each family in the form of textual descriptions, links to other resources and literature references. Genome projects have used Pfam extensively for large-scale functional annotation of genomic data; Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). The Pfam search tools are databases that identify Pfam of proteins. These Pfam IDs were analyzed in Sherlocc program and the location of RCCs in HBV genome and proteins were detected and reported as translated EMBL nucleotide sequence data library (TrEMBL) entries. The TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of European molecular biology laboratory (EMBL) nucleotide sequence entries not yet integrated in SWISS-PROT. Furthermore, the structures of TrEMBL entries proteins were studied in the PDB database and 3D structures of the HBV proteins and locations of RCCs were visualized and studied using Swiss PDB Viewer software®. RESULTS:The Pfam search tool found nine protein families in three frames. Results of Pfams studies in the Sherlocc program showed that this program has not identified RCCs in the external core antigen (PF08290) and truncated HBeAg gene (PF08290) of HBV. By contrast, the RCCs were identified in gene of hepatitis core antigen (PF00906 and the residues 224 - 234 and 251 - 255), large envelope protein S (PF00695 and the residues 53-56 and 70 - 84), X protein (PF00739 and the residues 10 - 24, 29 - 83, 95 - 99. 122 - 129, 139 - 143), DNA polymerase (viral) N-terminal domain (PF00242 and the residues 59 - 62, 214 - 217, 407 - 413) and protein P (Pf00336 and the residues 225 - 228). In HBV genome, seven RCCs were identified in the gene area of hepatitis core antigen, large envelope protein S and DNA polymerase, while protein structures of TrEMBL entries sequences found in Sherlocc program outputs were not complete. CONCLUSIONS:Based on the location of detected RCCs in the structure of HBV proteins, it was found that these RCCs may have a critical role in correct folding of HBV proteins and can be considered as drug targets. The results of this study provide new and deep perspectives about structure of HBV proteins for further researches and designing new drugs for treatment of HBV.
PROVIDER: S-EPMC5116127 | BioStudies |