The role of recombination in the origin and evolution of Alu subfamilies.
ABSTRACT: Alus are the most abundant and successful short interspersed nuclear elements found in primate genomes. In humans, they represent about 10% of the genome, although few are retrotransposition-competent and are clustered into subfamilies according to the source gene from which they evolved. Recombination between them can lead to genomic rearrangements of clinical and evolutionary significance. In this study, we have addressed the role of recombination in the origin of chimeric Alu source genes by the analysis of all known consensus sequences of human Alus. From the allelic diversity of Alu consensus sequences, validated in extant elements resulting from whole genome searches, distinct events of recombination were detected in the origin of particular subfamilies of AluS and AluY source genes. These results demonstrate that at least two subfamilies are likely to have emerged from ectopic Alu-Alu recombination, which stimulates further research regarding the potential of chimeric active Alus to punctuate the genome.
Project description:The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages.
Project description:Relative to genomes of other sequenced organisms, the human genome appears particularly enriched for large, highly homologous segmental duplications (> or =90% sequence identity and > or =10 kbp in length). The molecular basis for this enrichment is unknown. We sought to gain insight into the mechanism of origin, by systematically examining sequence features at the junctions of duplications. We analyzed 9,464 junctions within regions of high-quality finished sequence from a genomewide set of 2,366 duplication alignments. We observed a highly significant (P<.0001) enrichment of Alu short interspersed element (SINE) sequences near or within the junction. Twenty-seven percent of all segmental duplications terminated within an Alu repeat. The Alu junction enrichment was most pronounced for interspersed segmental duplications separated by > or =1 Mb of intervening sequence. Alu elements at the junctions showed higher levels of divergence, consistent with Alu-Alu-mediated recombination events. When we classified Alu elements into major subfamilies, younger elements (AluY and AluS) accounted for the enrichment, whereas the oldest primate family (AluJ) showed no enrichment. We propose that the primate-specific burst of Alu retroposition activity (which occurred 35-40 million years ago) sensitized the ancestral human genome for Alu-Alu-mediated recombination events, which, in turn, initiated the expansion of gene-rich segmental duplications and their subsequent role in nonallelic homologous recombination.
Project description:Polymorphic Alu elements account for 17% of structural variants in the human genome. The majority of these belong to the youngest AluY subfamilies, and most structural variant discovery efforts have focused on identifying Alu polymorphisms from these currently retrotranspositionally active subfamilies. In this report we analyze polymorphisms from the evolutionarily older AluS subfamily, whose peak activity was tens of millions of years ago. We annotate the AluS polymorphisms, assess their likely mechanism of origin, and evaluate their contribution to structural variation in the human genome.Of 52 previously reported polymorphic AluS elements ascertained for this study, 48 were confirmed to belong to the AluS subfamily using high stringency subfamily classification criteria. Of these, the majority (77%, 37/48) appear to be deletion polymorphisms. Two polymorphic AluS elements (4%) have features of non-classical Alu insertions and one polymorphic AluS element (2%) likely inserted by a mechanism involving internal priming. Seven AluS polymorphisms (15%) appear to have arisen by the classical target-primed reverse transcription (TPRT) retrotransposition mechanism. These seven TPRT products are 3' intact with 3' poly-A tails, and are flanked by target site duplications; L1 ORF2p endonuclease cleavage sites were also observed, providing additional evidence that these are L1 ORF2p endonuclease-mediated TPRT insertions. Further sequence analysis showed strong conservation of both the RNA polymerase III promoter and SRP9/14 binding sites, important for mediating transcription and interaction with retrotransposition machinery, respectively. This conservation of functional features implies that some of these are fairly recent insertions since they have not diverged significantly from their respective retrotranspositionally competent source elements.Of the polymorphic AluS elements evaluated in this report, 15% (7/48) have features consistent with TPRT-mediated insertion, thus suggesting that some AluS elements have been more active recently than previously thought, or that fixation of AluS insertion alleles remains incomplete. These data expand the potential significance of polymorphic AluS elements in contributing to structural variation in the human genome. Future discovery efforts focusing on polymorphic AluS elements are likely to identify more such polymorphisms, and approaches tailored to identify deletion alleles may be warranted.
Project description:Transposable elements (TEs) are interspersed DNA sequences that can move or copy to new positions within a genome. TEs are believed to promote speciation and their activities play a significant role in human disease. In the human genome, the 22 AluY and 6 AluS TE subfamilies have been the most recently active, and their transposition has been implicated in many inherited human diseases and in various forms of cancer. Therefore, understanding their transposition activity is very important and identifying the factors that affect their transpositional activity is of great interest. Recently, there has been some work done to quantify the activity levels of active Alu TEs based on variation in the sequence. Given this activity data, an analysis of TE activity based on the position of mutations is conducted.A method/simulation is created to computationally predict so-called harmful mutation regions in the consensus sequence of a TE; that is, mutations that occur in these regions decrease the transpositional activity dramatically. The methods are applied to the most active subfamily, AluY, to identify the harmful regions, and seven harmful regions are identified within the AluY consensus with q-values less than 0.05. A supplementary simulation also shows that the identified harmful regions covering the AluYa5 RNA functional regions are not occurring by chance. This method is then applied to two additional TE families: the Alu family and the L1 family, to computationally detect the harmful regions in these elements.We use a computational method to identify a set of harmful mutation regions. Mutations within the identified harmful regions decrease the transpositional activity of active elements. The correlation between the mutations within these regions and the transpositional activity of TEs are shown to be statistically significant. Verifications are presented using the activity of AluY elements and the secondary structure of the AluYa5 RNA, providing evidence that the method is successfully identifying harmful mutation regions.
Project description:BACKGROUND: Nuclear receptors are hormone-regulated transcription factors whose signaling controls numerous aspects of development and physiology. Many receptors recognize DNA hormone response elements formed by direct repeats of RGKTCA motifs separated by 1 to 5 bp (DR1-DR5). Although many known such response elements are conserved in the mouse and human genomes, it is unclear to which extent transcriptional regulation by nuclear receptors has evolved specifically in primates. RESULTS: We have mapped the positions of all consensus DR-type hormone response elements in the human genome, and found that DR2 motifs, recognized by retinoic acid receptors (RARs), are heavily overrepresented (108,582 elements). 90% of these are present in Alu repeats, which also contain lesser numbers of other consensus DRs, including 50% of consensus DR4 motifs. Few DR2s are in potentially mobile AluY elements and the vast majority are also present in chimp and macaque. 95.5% of Alu-DR2s are distributed throughout subclasses of AluS repeats, and arose largely through deamination of a methylated CpG dinucleotide in a non-consensus motif present in AluS sequences. We find that Alu-DR2 motifs are located adjacent to numerous known retinoic acid target genes, and show by chromatin immunoprecipitation assays in squamous carcinoma cells that several of these elements recruit RARs in vivo. These findings are supported by ChIP-on-chip data from retinoic acid-treated HL60 cells revealing RAR binding to several Alu-DR2 motifs. CONCLUSION: These data provide strong support for the notion that Alu-mediated expansion of DR elements contributed to the evolution of gene regulation by RARs and other nuclear receptors in primates and humans.
Project description:Methylation of the cytosine is the most frequent epigenetic modification of DNA in mammalian cells. In humans, most of the methylated cytosines are found in CpG-rich sequences within tandem and interspersed repeats that make up to 45% of the human genome, being Alu repeats the most common family. Demethylation of Alu elements occurs in aging and cancer processes and has been associated with gene reactivation and genomic instability. By targeting the unmethylated SmaI site within the Alu sequence as a surrogate marker, we have quantified and identified unmethylated Alu elements on the genomic scale. Normal colon epithelial cells contain in average 25 486 +/- 10 157 unmethylated Alu's per haploid genome, while in tumor cells this figure is 41 995 +/- 17 187 (P = 0.004). There is an inverse relationship in Alu families with respect to their age and methylation status: the youngest elements exhibit the highest prevalence of the SmaI site (AluY: 42%; AluS: 18%, AluJ: 5%) but the lower rates of unmethylation (AluY: 1.65%; AluS: 3.1%, AluJ: 12%). Data are consistent with a stronger silencing pressure on the youngest repetitive elements, which are closer to genes. Further insights into the functional implications of atypical unmethylation states in Alu elements will surely contribute to decipher genomic organization and gene regulation in complex organisms.
Project description:BACKGROUND: Abundant pseudogenes are a feature of mammalian genomes. Processed pseudogenes (PPs) are reverse transcribed from mRNAs. Recent molecular biological studies show that mammalian long interspersed element 1 (L1)-encoded proteins may have been involved in PP reverse transcription. Here, we present the first comprehensive analysis of human PPs using all known human genes as queries. RESULTS: The human genome was queried and 3,664 candidate PPs were identified. The most abundant were copies of genes encoding keratin 18, glyceraldehyde-3-phosphate dehydrogenase and ribosomal protein L21. A simple method was developed to estimate the level of nucleotide substitutions (and therefore the age) of PPs. A Poisson-like age distribution was obtained with a mean age close to that of the Alu repeats, the predominant human short interspersed elements. These data suggest a nearly simultaneous burst of PP and Alu formation in the genomes of ancestral primates. The peak period of amplification of these two distinct retrotransposons was estimated to be 40-50 million years ago. Concordant amplification of certain L1 subfamilies with PPs and Alus was observed. CONCLUSIONS: We suggest that a burst of formation of PPs and Alus occurred in the genome of ancestral primates. One possible mechanism is that proteins encoded by members of particular L1 subfamilies acquired an enhanced ability to recognize cytosolic RNAs in trans.
Project description:<h4>Background</h4>The vast majority of the 1.1 million Alu elements are retrotranspositionally inactive, where only a few loci referred to as 'source elements' can generate new Alu insertions. The first step in identifying the active Alu sources is to determine the loci transcribed by RNA polymerase III (pol III). Previous genome-wide analyses from normal and transformed cell lines identified multiple Alu loci occupied by pol III factors, making them candidate source elements.<h4>Findings</h4>Analysis of the data from these genome-wide studies determined that the majority of pol III-bound Alus belonged to the older subfamilies Alu S and Alu J, which varied between cell lines from 62.5% to 98.7% of the identified loci. The pol III-bound Alus were further scored for estimated retrotransposition potential (ERP) based on the absence or presence of selected sequence features associated with Alu retrotransposition capability. Our analyses indicate that most of the pol III-bound Alu loci candidates identified lack the sequence characteristics important for retrotransposition.<h4>Conclusions</h4>These data suggest that Alu expression likely varies by cell type, growth conditions and transformation state. This variation could extend to where the same cell lines in different laboratories present different Alu expression patterns. The vast majority of Alu loci potentially transcribed by RNA pol III lack important sequence features for retrotransposition and the majority of potentially active Alu loci in the genome (scored high ERP) belong to young Alu subfamilies. Our observations suggest that in an in vivo scenario, the contribution of Alu activity on somatic genetic damage may significantly vary between individuals and tissues.
Project description:BACKGROUND:Research into great ape genomes has revealed widely divergent activity levels over time for Alu elements. However, the diversity of this mobile element family in the genome of the western lowland gorilla has previously been uncharacterized. Alu elements are primate-specific short interspersed elements that have been used as phylogenetic and population genetic markers for more than two decades. Alu elements are present at high copy number in the genomes of all primates surveyed thus far. The AluY subfamily and its derivatives have been recognized as the evolutionarily youngest Alu subfamily in the Old World primate lineage. RESULTS:Here we use a combination of computational and wet-bench laboratory methods to assess and catalog AluY subfamily activity level and composition in the western lowland gorilla genome (gorGor3.1). A total of 1,075 independent AluY insertions were identified and computationally divided into 10 subfamilies, with the largest number of gorilla-specific elements assigned to the canonical AluY subfamily. CONCLUSIONS:The retrotransposition activity level appears to be significantly lower than that seen in the human and chimpanzee lineages, while higher than that seen in orangutan genomes, indicative of differential Alu amplification in the western lowland gorilla lineage as compared to other Homininae.
Project description:The primate-specific Alu elements, which originated 65 million years ago, exist in over a million copies in the human genome. These elements have been involved in genome shuffling and various diseases not only through retrotransposition but also through large scale Alu-Alu mediated recombination. Only a few subfamilies of Alus are currently retropositionally active and show insertion/deletion polymorphisms with associated phenotypes. Retroposition occurs by means of RNA intermediates synthesised by a RNA polymerase III promoter residing in the A-Box and B-Box in these elements. Alus have also been shown to harbour a number of transcription factor binding sites, as well as hormone responsive elements. The distribution of Alus has been shown to be non-random in the human genome and these elements are increasingly being implicated in diverse functions such as transcription, translation, response to stress, nucleosome positioning and imprinting.We conducted a retrospective analysis of putative functional sites, such as the RNA pol III promoter elements, pol II regulatory elements like hormone responsive elements and ligand-activated receptor binding sites, in Alus of various evolutionary ages. We observe a progressive loss of the RNA pol III transcriptional potential with concomitant accumulation of RNA pol II regulatory sites. We also observe a significant over-representation of Alus harboring these sites in promoter regions of signaling and metabolism genes of chromosome 22, when compared to genes of information pathway components, structural and transport proteins. This difference is not so significant between functional categories in the intronic regions of the same genes.Our study clearly suggests that Alu elements, through retrotransposition, could distribute functional and regulatable promoter elements, which in the course of subsequent selection might be stabilized in the genome. Exaptation of regulatory elements in the preexisting genes through Alus could thus have contributed to evolution of novel regulatory networks in the primate genomes. With such a wide spectrum of regulatory sites present in Alus, it also becomes imperative to screen for variations in these sites in candidate genes, which are otherwise repeat-masked in studies pertaining to identification of predisposition markers.