LTR retrotransposon landscape in Medicago truncatula: more rapid removal than in rice.
ABSTRACT: BACKGROUND: Long terminal repeat retrotransposons (LTR elements) are ubiquitous Eukaryotic TEs that transpose through RNA intermediates. Accounting for significant proportion of many plant genomes, LTR elements have been well established as one of the major forces underlying the evolution of plant genome size, structure and function. The accessibility of more than 40% of genomic sequences of the model legume Medicago truncatula (Mt) has made the comprehensive study of its LTR elements possible. RESULTS: We use a newly developed tool LTR_FINDER to identify LTR retrotransposons in the Mt genome and detect 526 full-length elements as well as a great number of copies related to them. These elements constitute about 9.6% of currently available genomic sequences. They are classified into 85 families of which 64 are reported for the first time. The majority of the LTR retrotransposons belong to either Copia or Gypsy superfamily and the others are categorized as TRIMs or LARDs by their length. We find that the copy-number of Copia-like families is 3 times more than that of Gypsy-like ones but the latter contribute more to the genome. The analysis of PBS and protein-coding domain structure of the LTR families reveals that they tend to use only 4-5 types of tRNAs and many families have quite conservative ORFs besides known TE domains. For several important families, we describe in detail their abundance, conservation, insertion time and structure. We investigate the amplification-deletion pattern of the elements and find that the detectable full-length elements are relatively young and most of them were inserted within the last 0.52 MY. We also estimate that more than ten million bp of the Mt genomic sequences have been removed by the deletion of LTR elements and the removal of the full-length structures in Mt has been more rapid than in rice. CONCLUSION: This report is the first comprehensive description and analysis of LTR retrotransposons in the Mt genome. Many important novel LTR families were discovered and their evolution is elucidated. Our results may outline the LTR retrotransposon landscape of the model legume.
Project description:BACKGROUND:Long terminal repeat (LTR) retrotransposons constitute a major fraction of the genomes of higher plants. For example, retrotransposons comprise more than 50% of the maize genome and more than 90% of the wheat genome. LTR retrotransposons are believed to have contributed significantly to the evolution of genome structure and function. The genome sequencing of selected experimental and agriculturally important species is providing an unprecedented opportunity to view the patterns of variation existing among the entire complement of retrotransposons in complete genomes. RESULTS:Using a new data-mining program, LTR_STRUC, (LTR retrotransposon structure program), we have mined the GenBank rice (Oryza sativa) database as well as the more extensive (259 Mb) Monsanto rice dataset for LTR retrotransposons. Almost two-thirds (37) of the 59 families identified consist of copia-like elements, but gypsy-like elements outnumber copia-like elements by a ratio of approximately 2:1. At least 17% of the rice genome consists of LTR retrotransposons. In addition to the ubiquitous gypsy- and copia-like classes of LTR retrotransposons, the rice genome contains at least two novel families of unusually small, non-coding (non-autonomous) LTR retrotransposons. CONCLUSIONS:Each of the major clades of rice LTR retrotransposons is more closely related to elements present in other species than to the other clades of rice elements, suggesting that horizontal transfer may have occurred over the evolutionary history of rice LTR retrotransposons. Like LTR retrotransposons in other species with relatively small genomes, many rice LTR retrotransposons are relatively young, indicating a high rate of turnover.
Project description:The evolutionary dynamics of long terminal repeat (LTR) retrotransposons in tree genomes has remained largely unknown. The availability of the complete genome sequences of the mulberry tree (Morus notabilis) has offered an unprecedented opportunity for us to characterize these retrotransposon elements. We investigated 202 and 114 families of Copia and Gypsy superfamilies, respectively, comprising 2916 intact elements in the mulberry genome. The tRNAMet was the most frequently used type of tRNA in both superfamilies. Phylogenetic analysis suggested that Copia and Gypsy from mulberry can be grouped into eight and six lineages, respectively. All previously characterized families of such elements could also be found in the mulberry genome. About 95% of the identified Copia and Gypsy full elements were estimated to have been inserted into the mulberry genome within the past 2–3 million years. Meanwhile, the estimated insertion times of members of the three most abundant families of the Copia superfamily (908 members from the three most abundant families) and Gypsy superfamily (783 members from the three most abundant families) revealed divergent life histories. Compared with the situation in Gypsy elements, three families of Copia elements are under positive selection pressure, which suggested that Copia elements may have a dominant influence in the evolution of mulberry genes. Analysis of insertion and deletion dynamics suggested that Copia and Gypsy elements exhibited a very long half-life in the mulberry genome. The present work provides new insights into the insertion and deletion dynamics of LTR retrotransposons, and it will greatly improve our understanding of the important roles transposable elements play in the architecture of the mulberry genome.
Project description:Long Terminal Repeat (LTR) retrotransposons constitute a significant part of eukaryotic genomes and play an important role in genome evolution especially in plants. Jute is an important fiber crop with a large genome of 1,250 Mbps. This genome is still mostly unexplored. In this study we aimed at identifying and characterizing the LTR retrotransposons of jute with a view to understanding the jute genome better. In this study, the Reverse Transcriptase domain of Ty1-copia and Ty3-gypsy LTR retrotransposons of jute were amplified by degenerate primers and their expressions were examined by reverse transcription PCR. Copy numbers of reverse transcriptase (RT) genes of Ty1-copia and Ty3-gypsy elements were determined by dot blot analysis. Sequence analysis revealed higher heterogeneity among Ty1-copia retrotransposons than Ty3-gypsy and clustered each of them in three groups. Copy number of RT genes in Ty1-copia was found to be higher than that of Ty3-gypsy elements from dot blot hybridization. Cumulatively Ty1-copia and Ty3-gypsy may constitute around 19% of the jute genome where two groups of Ty1-copia were found to be transcriptionally active. Since the LTR retrotransposons constitute a large portion of jute genome, these findings imply the importance of these elements in the evolution of jute genome.
Project description:BACKGROUND:The three superfamilies of Long Terminal Repeat (LTR) retrotransposons are a widespread kind of transposable element and a major factor in eukaryotic genome evolution. In metazoans, recent studies suggested that Copia LTR-retrotransposons display specific dynamic compared to the more abundant and diverse Gypsy elements. Indeed, Copia elements show a relative scarcity and the prevalence of only a few clades in specific hosts. Thus, BEL/Pao seems to be the second most abundant superfamily. However, the generality of these assumptions remains to be assessed. Therefore, we carried out the first large-scale comparative genomic analysis of LTR-retrotransposons in molluscs. The aim of this study was to analyse the diversity, copy numbers, genomic proportions and distribution of LTR-retrotransposons in a large host phylum. RESULTS:We compare nine genomes of molluscs and further added LTR-retrotransposons sequences detected in databases for 47 additional species. We identified 1709 families, which enabled us to define 31 clades. We show that clade richness was highly dependent on the considered superfamily. We found only three Copia clades, including GalEa and Hydra which appear to be widely distributed and highly dominant as they account for 96% of the characterised Copia elements. Among the seven BEL/Pao clades identified, Sparrow and Surcouf are characterised for the first time. We find no BEL or Pao elements, but the rare clades Dan and Flow are present in molluscs. Finally, we characterised 21 Gypsy clades, only five of which had been previously described, the C-clade being the most abundant one. Even if they are found in the same number of host species, Copia elements are clearly less abundant than BEL/Pao elements in copy number or genomic proportions, while Gypsy elements are always the most abundant ones whatever the parameter considered. CONCLUSIONS:Our analysis confirms the contrasting dynamics of Copia and Gypsy elements in metazoans and indicates that BEL/Pao represents the second most abundant superfamily, probably reflecting an intermediate dynamic. Altogether, the data obtained in several taxa highly suggest that these patterns can be generalised for most metazoans. Finally, we highlight the importance of using database information in complement of genome analyses when analyzing transposable element diversity.
Project description:Long terminal repeat (LTR) retrotransposons are highly abundant in plant genomes and require transcriptional activity for their proliferative mode of replication. These sequences exist in plant genomes as diverse sublineages within the main element superfamilies (i.e., gypsy and copia). While transcriptional activity of these elements is increasingly recognized as a regular attribute of plant transcriptomes, it is currently unknown the extent to which different sublineages of these elements are transcriptionally active both within and across species. In the current report, we utilize next generation sequencing methods to examine genomic copy number abundance of diverse LTR retrotransposon sublineages and their corresponding levels of transcriptional activity in three diploid wild sunflower species, Helianthus agrestis, H. carnosus and H. porteri.The diploid sunflower species under investigation differ in genome size 2.75-fold, with 2C values of 22.93 for H. agrestis, 12.31 for H. carnosus and 8.33 for H. porteri. The same diverse gypsy and copia sublineages of LTR retrotransposons were identified across species, but with gypsy sequences consistently more abundant than copia and with global gypsy sequence abundance positively correlated with nuclear genome size. Transcriptional activity was detected for multiple copia and gypsy sequences, with significantly higher activity levels detected for copia versus gypsy. Interestingly, of 11 elements identified as transcriptionally active, 5 exhibited detectable expression in all three species and 3 exhibited detectable expression in two species.Combined analyses of LTR retrotransposon genomic abundance and transcriptional activity across three sunflower species provides novel insights into genome size evolution and transposable element dynamics in this group. Despite considerable variation in nuclear genome size among species, relatively conserved patterns of LTR retrotransposon transcriptional activity were observed, with a highly overlapping set of copia and gypsy sequences observed to be transcriptionally active across species. A higher proportion of copia versus gypsy elements were found to be transcriptionally active and these sequences also were expressed at higher levels.
Project description:The most abundant transposable elements (TEs) in plant genomes are Class I long terminal repeat (LTR) retrotransposons represented by superfamilies gypsy and copia Amplification of these superfamilies directly impacts genome structure and contributes to differential patterns of genome size evolution among plant lineages. Utilizing short-read Illumina data and sequence information from a panel of Helianthus annuus (sunflower) full-length gypsy and copia elements, we explore the contribution of these sequences to genome size variation among eight diploid Helianthus species and an outgroup taxon, Phoebanthus tenuifolius We also explore transcriptional dynamics of these elements in both leaf and bud tissue via RT-PCR. We demonstrate that most LTR retrotransposon sublineages (i.e., families) display patterns of similar genomic abundance across species. A small number of LTR retrotransposon sublineages exhibit lineage-specific amplification, particularly in the genomes of species with larger estimated nuclear DNA content. RT-PCR assays reveal that some LTR retrotransposon sublineages are transcriptionally active across all species and tissue types, whereas others display species-specific and tissue-specific expression. The species with the largest estimated genome size, H. agrestis, has experienced amplification of LTR retrotransposon sublineages, some of which have proliferated independently in other lineages in the Helianthus phylogeny.
Project description:BACKGROUND: LTR Retrotransposons transpose through reverse transcription of an RNA intermediate and are ubiquitous components of all eukaryotic genomes thus far examined. Plant genomes, in particular, have been found to be comprised of a remarkably high number of LTR retrotransposons. There is a significant body of direct and indirect evidence that LTR retrotransposons have contributed to gene and genome evolution in plants. RESULTS: To explore the evolutionary history of long terminal repeat (LTR) retrotransposons and their impact on the genome of Oryza sativa, we have extended an earlier computer-based survey to include all identifiable full-length, fragmented and solo LTR elements in the rice genome database as of April 2002. A total of 1,219 retroelement sequences were identified, including 217 full-length elements, 822 fragmented elements, and 180 solo LTRs. In order to gain insight into the chromosomal distribution of LTR-retrotransposons in the rice genome, a detailed examination of LTR-retrotransposon sequences on Chromosome 10 was carried out. An average of 22.3 LTR-retrotransposons per Mb were detected in Chromosome 10. CONCLUSIONS: Gypsy-like elements were found to be >4 x more abundant than copia-like elements. Eleven of the thirty-eight investigated LTR-retrotransposon families displayed significant subfamily structure. We estimate that at least 46.5% of LTR-retrotransposons in the rice genome are older than the age of the species (< 680,000 years). LTR-retrotransposons present in the rice genome range in age from those just recently inserted up to nearly 10 million years old. Approximately 20% of LTR retrotransposon sequences lie within putative genes. The distribution of elements across chromosome 10 is non-random with the highest density (48 elements per Mb) being present in the pericentric region.
Project description:Improved knowledge of genome composition, especially of its repetitive component, generates important information for both theoretical and applied research. The olive repetitive component is made up of two main classes of sequences: tandem repeats and retrotransposons (REs). In this study, we provide characterization of a sample of 254 unique full-length long terminal repeat (LTR) REs. In the sample, Ty1-Copia elements were more numerous than Ty3-Gypsy elements. Mapping a large set of Illumina whole-genome shotgun reads onto the identified retroelement set revealed that Gypsy elements are more redundant than Copia elements. The insertion time of intact retroelements was estimated based on sister LTR's divergence. Although some elements inserted relatively recently, the mean insertion age of the isolated retroelements is around 18 million yrs. Gypsy and Copia retroelements showed different waves of transposition, with Gypsy elements especially active between 10 and 25 million yrs ago and nearly inactive in the last 7 million yrs. The occurrence of numerous solo-LTRs related to isolated full-length retroelements was ascertained for two Gypsy elements and one Copia element. Overall, the results reported in this study show that RE activity (both retrotransposition and DNA loss) has impacted the olive genome structure in more ancient times than in other angiosperms.
Project description:<h4>Background</h4>Plant LTR-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy. They are further divided into an enormous number of families which are, due to the high diversity of their nucleotide sequences, usually specific to a single or a group of closely related species. Previous attempts to group these families into broader categories reflecting their phylogenetic relationships were limited either to analyzing a narrow range of plant species or to analyzing a small numbers of elements. Furthermore, there is no reference database that allows for similarity based classification of LTR-retrotransposons.<h4>Results</h4>We have assembled a database of retrotransposon encoded polyprotein domains sequences extracted from 5410 Ty1/copia elements and 8453 Ty3/gypsy elements sampled from 80 species representing major groups of green plants (Viridiplantae). Phylogenetic analysis of the three most conserved polyprotein domains (RT, RH and INT) led to dividing Ty1/copia and Ty3/gypsy retrotransposons into 16 and 14 lineages respectively. We also characterized various features of LTR-retrotransposon sequences including additional polyprotein domains, extra open reading frames and primer binding sites, and found that the occurrence and/or type of these features correlates with phylogenies inferred from the three protein domains.<h4>Conclusions</h4>We have established an improved classification system applicable to LTR-retrotransposons from a wide range of plant species. This system reflects phylogenetic relationships as well as distinct sequence and structural features of the elements. A comprehensive database of retrotransposon protein domains (REXdb) that reflects this classification provides a reference for efficient and unified annotation of LTR-retrotransposons in plant genomes. Access to REXdb related tools is implemented in the RepeatExplorer web server (https://repeatexplorer-elixir.cerit-sc.cz/) or using a standalone version of REXdb that can be downloaded seaparately from RepeatExplorer web page (http://repeatexplorer.org/).
Project description:Two retrotransposons from the superfamilies Copia and Gypsy named as Copia-LTR_SS and Gypsy-LTR_SS, respectively, were identified in the genomic bank of Sclerotinia sclerotiorum. These transposable elements (TEs) contained direct and preserved long terminal repeats (LTR). Domains related to codified regions for gag protein, integrase, reverse transcriptase and RNAse H were identified in Copia-LTR_SS, whereas in Gypsy-LTR_SS only domains for gag, reverse transcriptase and RNAse H were found. The abundance of identified LTR-Solo suggested possible genetic recombination events in the S. sclerotiorum genome. Furthermore, alignment of the sequences for LTR elements from each superfamily suggested the presence of a RIP (repeat-induced point mutation) silencing mechanism that may directly affect the evolution of this species.