Other

Dataset Information

0

Cryptic endogenous retrovirus subfamilies in the primate lineage


ABSTRACT: Many endogenous retroviruses (ERVs) in the human genome are primate-specific and contribute novel cis-regulatory elements and transcripts. Classification and annotation of ERVs, which are flanked by long terminal repeats (LTRs) containing sequences that control their transcriptional activity, is important to better understand their evolution and potential roles in the host. Here, we observed that many of the currently annotated LTR subfamilies spreading in the primate lineage have subsets of instances that appear to be misclassified. Focusing on the evolutionary young MER11A/B/C subfamilies, we performed a phylogenetic analysis and relied on cross-species conservation to reveal the presence of 4 phyletic groups, suggesting a new annotation for 412 (19.8%) of the MER11 instances. Next, we showed that the epigenetic heterogeneity observed within the MER11A/B/C subfamilies was better explained in the context of these new phyletic groups. Using a massively parallel reporter assay (MPRA), we also demonstrated the regulatory potential of the four phyletic groups and identified motifs that were associated with their differential activities. The MPRA combined with the use of phyletic groups across primates revealed an apes-specific gain of SOX related motifs through a single-nucleotide deletion. Finally, we applied a similar approach across all 53 simian-specific LTR subfamilies and determined the presence of 75 phyletic groups. We found that 3,807 (30.0%) instances from 26 of these LTR subfamilies changed annotation and could be characterized into one of these novel phyletic groups, many of which with a distinct epigenetic profile. This refined annotation of simian-specific LTRs could improve our understanding of the evolution of ERVs in primate genomes and reveal functional signals that would have been missed otherwise.

ORGANISM(S): Homo sapiens

PROVIDER: GSE245662 | GEO | 2023/10/23

REPOSITORIES: GEO

Similar Datasets

2018-02-13 | GSE102989 | GEO
2014-08-04 | E-GEOD-56567 | biostudies-arrayexpress
2014-08-04 | E-GEOD-56568 | biostudies-arrayexpress
2014-08-04 | E-GEOD-54848 | biostudies-arrayexpress
2022-04-01 | GSE186430 | GEO
2014-08-04 | GSE56568 | GEO
2014-08-04 | GSE56567 | GEO
2014-08-04 | GSE54848 | GEO
2013-03-15 | E-GEOD-33367 | biostudies-arrayexpress
2014-06-26 | E-GEOD-57092 | biostudies-arrayexpress