Insertion and deletion polymorphisms of the ancient AluS family in the human genome.
ABSTRACT: Polymorphic Alu elements account for 17% of structural variants in the human genome. The majority of these belong to the youngest AluY subfamilies, and most structural variant discovery efforts have focused on identifying Alu polymorphisms from these currently retrotranspositionally active subfamilies. In this report we analyze polymorphisms from the evolutionarily older AluS subfamily, whose peak activity was tens of millions of years ago. We annotate the AluS polymorphisms, assess their likely mechanism of origin, and evaluate their contribution to structural variation in the human genome.Of 52 previously reported polymorphic AluS elements ascertained for this study, 48 were confirmed to belong to the AluS subfamily using high stringency subfamily classification criteria. Of these, the majority (77%, 37/48) appear to be deletion polymorphisms. Two polymorphic AluS elements (4%) have features of non-classical Alu insertions and one polymorphic AluS element (2%) likely inserted by a mechanism involving internal priming. Seven AluS polymorphisms (15%) appear to have arisen by the classical target-primed reverse transcription (TPRT) retrotransposition mechanism. These seven TPRT products are 3' intact with 3' poly-A tails, and are flanked by target site duplications; L1 ORF2p endonuclease cleavage sites were also observed, providing additional evidence that these are L1 ORF2p endonuclease-mediated TPRT insertions. Further sequence analysis showed strong conservation of both the RNA polymerase III promoter and SRP9/14 binding sites, important for mediating transcription and interaction with retrotransposition machinery, respectively. This conservation of functional features implies that some of these are fairly recent insertions since they have not diverged significantly from their respective retrotranspositionally competent source elements.Of the polymorphic AluS elements evaluated in this report, 15% (7/48) have features consistent with TPRT-mediated insertion, thus suggesting that some AluS elements have been more active recently than previously thought, or that fixation of AluS insertion alleles remains incomplete. These data expand the potential significance of polymorphic AluS elements in contributing to structural variation in the human genome. Future discovery efforts focusing on polymorphic AluS elements are likely to identify more such polymorphisms, and approaches tailored to identify deletion alleles may be warranted.
Project description:Alu insertions have contributed to >11% of the human genome and ?30-35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end reads. Comparison with published calls obtained using PacBio long-reads indicates a false discovery rate below 5%, at the cost of reduced sensitivity due to the colocation of reference and non-reference repeats. We generate a highly accurate call set of 1614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project (HGDP) panel. We utilize the reconstructed alternative insertion haplotypes to genotype 1010 fully assembled insertions, obtaining >99% agreement with genotypes obtained by PCR. In our assembled sequences, we find evidence of premature insertion mechanisms and observe 5' truncation in 16% of AluYa5 and AluYb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5' truncations. Additionally, we identified variable AluJ and AluS elements that likely arose due to non-retrotransposition mechanisms.
Project description:Sequence analysis of the orangutan genome revealed that recent proliferative activity of Alu elements has been uncharacteristically quiescent in the Pongo (orangutan) lineage, compared with all previously studied primate genomes. With relatively few young polymorphic insertions, the genomic landscape of the orangutan seemed like the ideal place to search for a driver, or source element, of Alu retrotransposition.Here we report the identification of a nearly pristine insertion possessing all the known putative hallmarks of a retrotranspositionally competent Alu element. It is located in an intronic sequence of the DGKB gene on chromosome 7 and is highly conserved in Hominidae (the great apes), but absent from Hylobatidae (gibbon and siamang). We provide evidence for the evolution of a lineage-specific subfamily of this shared Alu insertion in orangutans and possibly the lineage leading to humans. In the orangutan genome, this insertion contains three orangutan-specific diagnostic mutations which are characteristic of the youngest polymorphic Alu subfamily, AluYe5b5_Pongo. In the Homininae lineage (human, chimpanzee and gorilla), this insertion has acquired three different mutations which are also found in a single human-specific Alu insertion.This seemingly stealth-like amplification, ongoing at a very low rate over millions of years of evolution, suggests that this shared insertion may represent an ancient backseat driver of Alu element expansion.
Project description:Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline - TypeTE - which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.
Project description:The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages.
Project description:BACKGROUND: Alu polymorphisms are some of the most common polymorphisms in the genome, yet few methods have been developed for their detection. METHODS: We present algorithms to discover Alu polymorphisms using paired-end high throughput sequencing data from multiple individuals. We consider the problem of identifying sites containing polymorphic Alu insertions. RESULTS: We give efficient and practical algorithms that detect polymorphic Alus, both those that are inserted with respect to the reference genome and those that are deleted. The algorithms have a linear time complexity and can be run on a standard desktop machine in a very short amount of time on top of the output of tools standard for sequencing analysis. CONCLUSIONS: In our simulated dataset we are able to locate 98.1% of Alus inserted with respect to the reference and 97.7% of Alus deleted, our simulations also show an excellent correlations between the deletions detected in parents and children. We further run our algorithms on publicly available data from the 1000 genomes project and find several thousand Alu polymorphisms in each individual.
Project description:<h4>Background</h4>The vast majority of the 1.1 million Alu elements are retrotranspositionally inactive, where only a few loci referred to as 'source elements' can generate new Alu insertions. The first step in identifying the active Alu sources is to determine the loci transcribed by RNA polymerase III (pol III). Previous genome-wide analyses from normal and transformed cell lines identified multiple Alu loci occupied by pol III factors, making them candidate source elements.<h4>Findings</h4>Analysis of the data from these genome-wide studies determined that the majority of pol III-bound Alus belonged to the older subfamilies Alu S and Alu J, which varied between cell lines from 62.5% to 98.7% of the identified loci. The pol III-bound Alus were further scored for estimated retrotransposition potential (ERP) based on the absence or presence of selected sequence features associated with Alu retrotransposition capability. Our analyses indicate that most of the pol III-bound Alu loci candidates identified lack the sequence characteristics important for retrotransposition.<h4>Conclusions</h4>These data suggest that Alu expression likely varies by cell type, growth conditions and transformation state. This variation could extend to where the same cell lines in different laboratories present different Alu expression patterns. The vast majority of Alu loci potentially transcribed by RNA pol III lack important sequence features for retrotransposition and the majority of potentially active Alu loci in the genome (scored high ERP) belong to young Alu subfamilies. Our observations suggest that in an in vivo scenario, the contribution of Alu activity on somatic genetic damage may significantly vary between individuals and tissues.
Project description:Background:The evolution of Alu elements has been ongoing in primate lineages and Alu insertion polymorphisms are widely used in phylogenetic and population genetics studies. Alu subfamilies in the squirrel monkey (Saimiri), a New World Monkey (NWM), were recently reported. Squirrel monkeys are commonly used in biomedical research and often require species identification. The purpose of this study was two-fold: 1) Perform locus-specific PCR analyses on recently integrated Alu insertions in Saimiri to determine their amplification dynamics, and 2) Identify a subset of Alu insertion polymorphisms with species informative allele frequency distributions between the Saimiri sciureus and Saimiri boliviensis groups. Results:PCR analyses were performed on a DNA panel of 32 squirrel monkey individuals for 382 Alu insertion events ?2% diverged from 46 different Alu subfamily consensus sequences, 25 Saimiri specific and 21 NWM specific Alu subfamilies. Of the 382 loci, 110 were polymorphic for presence / absence among squirrel monkey individuals, 35 elements from 14 different Saimiri specific Alu subfamilies and 75 elements from 19 different NWM specific Alu subfamilies (13 of 46 subfamilies analyzed did not contain polymorphic insertions). Of the 110 Alu insertion polymorphisms, 51 had species informative allele frequency distributions between Saimiri sciureus and Saimiri boliviensis groups. Conclusions:This study confirms the evolution of Alu subfamilies in Saimiri and provides evidence for an ongoing and prolific expansion of these elements in Saimiri with many active subfamilies concurrently propagating. The subset of polymorphic Alu insertions with species informative allele frequency distribution between Saimiri sciureus and Saimiri boliviensis will be instructive for specimen identification and conservation biology.
Project description:The Alu family is a highly successful group of non-LTR retrotransposons ubiquitously found in primate genomes. Similar to the L1 retrotransposon family, Alu elements integrate primarily through an endonuclease-dependent mechanism termed target site-primed reverse transcription (TPRT). Recent studies have suggested that, in addition to TPRT, L1 elements occasionally utilize an alternative endonuclease-independent pathway for genomic integration. To determine whether an analogous mechanism exists for Alu elements, we have analyzed three publicly available primate genomes (human, chimpanzee and rhesus macaque) for endonuclease-independent recently integrated or lineage specific Alu insertions. We recovered twenty-three examples of such insertions and show that these insertions are recognizably different from classical TPRT-mediated Alu element integration. We suggest a role for this process in DNA double-strand break repair and present evidence to suggest its association with intra-chromosomal translocations, in-vitro RNA recombination (IVRR), and synthesis-dependent strand annealing (SDSA).
Project description:Alu elements are the most active and predominant type of short interspersed elements (SINEs) in the human genome. Recently inserted polymorphic (for presence/absence) Alu elements contribute to genome diversity among different human populations, and they are useful genetic markers for population genetic studies. The objective of this study is to identify polymorphic Alu insertions through an in silico comparative genomics approach and to analyze their distribution pattern throughout the human genome. By computationally comparing the public and Celera sequence assemblies of the human genome, we identified a total of 800 polymorphic Alu elements. We used polymerase chain reaction-based assays to screen a randomly selected set of 16 of these 800 Alu insertion polymorphisms using a human diversity panel to demonstrate the efficiency of our approach. Based on sequence analysis of the 800 Alu polymorphisms, we report three new Alu subfamilies, Ya3, Ya4b, and Yb11, with Yb11 being the smallest known Alu subfamily. Analysis of retrotransposition activity revealed Yb11, Ya8, Ya5, Yb9, and Yb8 as the most active Alu subfamilies and the maintenance of a very low level of retrotransposition activity or recent gene conversion events involving S subfamilies. The 800 polymorphic Alu insertions are characterized by the presence of target site duplications (TSDs) and longer than average polyA-tail length. Their pre-integration sites largely follow an extended "NT-AARA" motif. Among chromosomes, the density of Alu insertion polymorphisms is positively correlated with the Alu-site availability and is inversely correlated with the densities of older Alu elements and genes.
Project description:A human Alu repeat subfamily (the PV subfamily) whose members include insertional polymorphisms is found, as predicted, to differ by five tightly linked mutations relative to another subfamily of recently inserted Alu repeats. Based on these sequence differences some of the small number of polymorphic Alus can be selected from the background of nearly one million member sequences which are fixed in the human genome. Shared patterns of mutations suggest that PV subfamily members are the progeny of several different founder sequences. The additional observation that all members of the PV subfamily end in a stretch of uninterrupted polyadenine residues rather than merely A-rich sequences is evidence for post-transcriptional polyadenylation of the presumptive RNA intermediate. The drift of polyadenine sequences toward tandemly repeated A-rich motifs suggests a biological function that may select for the fixation of dispersed Alu repeats.