TE-Tracker: systematic identification of transposition events through whole-genome resequencing.
ABSTRACT: BACKGROUND: Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements. RESULTS: We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker . CONCLUSIONS: We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.
Project description:<h4>Background</h4>Transposable elements (TE) are mobile genetic entities present in nearly all genomes. Previous work has shown that TEs tend to have a different nucleotide composition than the host genes, either considering codon usage bias or dinucleotide frequencies. We show here how these compositional differences can be used as a tool for detection and analysis of TE sequences.<h4>Results</h4>We compared the composition of TE sequences and host gene sequences using probabilistic models of nucleotide sequences. We used hidden Markov models (HMM), which take into account the base composition of the sequences (occurrences of words n nucleotides long, with n ranging here from 1 to 4) and the heterogeneity between coding and non-coding parts of sequences. We analyzed three sets of sequences containing class I TEs, class II TEs and genes respectively in three species: Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. Each of these sets had a distinct, homogeneous composition, enabling us to distinguish between the two classes of TE and the genes. However the particular base composition of the TEs differed in the three species studied.<h4>Conclusions</h4>This approach can be used to detect and annotate TEs in genomic sequences and complements the current homology-based TE detection methods. Furthermore, the HMM method is able to identify the parts of a sequence in which the nucleotide composition resembles that of a coding region of a TE. This is useful for the detailed annotation of TE sequences, which may contain an ancient, highly diverged coding region that is no longer fully functional.
Project description:The recent availability of genome sequences has provided unparalleled insights into the broad-scale patterns of transposable element (TE) sequences in eukaryotic genomes. Nevertheless, the difficulties that TEs pose for genome assembly and annotation have prevented detailed, quantitative inferences about the contribution of TEs to genomes sequences.Using a high-resolution annotation of TEs in Release 4 genome sequence, we revise estimates of TE abundance in Drosophila melanogaster. We show that TEs are non-randomly distributed within regions of high and low TE abundance, and that pericentromeric regions with high TE abundance are mosaics of distinct regions of extreme and normal TE density. Comparative analysis revealed that this punctate pattern evolves jointly by transposition and duplication, but not by inversion of TE-rich regions from unsequenced heterochromatin. Analysis of genome-wide patterns of TE nesting revealed a 'nesting network' that includes virtually all of the known TE families in the genome. Numerous directed cycles exist among TE families in the nesting network, implying concurrent or overlapping periods of transpositional activity.Rapid restructuring of the genomic landscape by transposition and duplication has recently added hundreds of kilobases of TE sequence to pericentromeric regions in D. melanogaster. These events create ragged transitions between unique and repetitive sequences in the zone between euchromatic and beta-heterochromatic regions. Complex relationships of TE nesting in beta-heterochromatic regions raise the possibility of a co-suppression network that may act as a global surveillance system against the majority of TE families in D. melanogaster.
Project description:Transposable elements (TEs) and their relics play major roles in genome evolution. However, mobilization of TEs is usually deleterious and strongly repressed. In plants and mammals, this repression is typically associated with DNA methylation, but the relationship between this epigenetic mark and TE sequences has not been investigated systematically. Here, we present an improved annotation of TE sequences and use it to analyze genome-wide DNA methylation maps obtained at single-nucleotide resolution in Arabidopsis. We show that although the majority of TE sequences are methylated, ?26% are not. Moreover, a significant fraction of TE sequences densely methylated at CG, CHG and CHH sites (where H?=?A, T or C) have no or few matching small interfering RNA (siRNAs) and are therefore unlikely to be targeted by the RNA-directed DNA methylation (RdDM) machinery. We provide evidence that these TE sequences acquire DNA methylation through spreading from adjacent siRNA-targeted regions. Further, we show that although both methylated and unmethylated TE sequences located in euchromatin tend to be more abundant closer to genes, this trend is least pronounced for methylated, siRNA-targeted TE sequences located 5' to genes. Based on these and other findings, we propose that spreading of DNA methylation through promoter regions explains at least in part the negative impact of siRNA-targeted TE sequences on neighboring gene expression.
Project description:BACKGROUND:Transposable elements (TEs) are mobile DNA sequences known as drivers of genome evolution. Their impacts have been widely studied in animals, plants and insects, but little is known about them in microalgae. In a previous study, we compared the genetic polymorphisms between strains of the haptophyte microalga Tisochrysis lutea and suggested the involvement of active autonomous TEs in their genome evolution. RESULTS:To identify potentially autonomous TEs, we designed a pipeline named PiRATE (Pipeline to Retrieve and Annotate Transposable Elements, download: https://doi.org/10.17882/51795 ), and conducted an accurate TE annotation on a new genome assembly of T. lutea. PiRATE is composed of detection, classification and annotation steps. Its detection step combines multiple, existing analysis packages representing all major approaches for TE detection and its classification step was optimized for microalgal genomes. The efficiency of the detection and classification steps was evaluated with data on the model species Arabidopsis thaliana. PiRATE detected 81% of the TE families of A. thaliana and correctly classified 75% of them. We applied PiRATE to T. lutea genomic data and established that its genome contains 15.89% Class I and 4.95% Class II TEs. In these, 3.79 and 17.05% correspond to potentially autonomous and non-autonomous TEs, respectively. Annotation data was combined with transcriptomic and proteomic data to identify potentially active autonomous TEs. We identified 17 expressed TE families and, among these, a TIR/Mariner and a TIR/hAT family were able to synthesize their transposase. Both these TE families were among the three highest expressed genes in a previous transcriptomic study and are composed of highly similar copies throughout the genome of T. lutea. This sum of evidence reveals that both these TE families could be capable of transposing or triggering the transposition of potential related MITE elements. CONCLUSION:This manuscript provides an example of a de novo transposable element annotation of a non-model organism characterized by a fragmented genome assembly and belonging to a poorly studied phylum at genomic level. Integration of multi-omics data enabled the discovery of potential mobile TEs and opens the way for new discoveries on the role of these repeated elements in genomic evolution of microalgae.
Project description:Transposable elements (TEs) are mobile genomic DNA sequences found in most organisms. They so densely populate the genomes of many eukaryotic species that they are often the major constituents. With the rapid generation of many plant genome sequencing projects over the past few decades, there is an urgent need for improved TE annotation as a prerequisite for genome-wide studies. Analogous to the use of RNA-seq for gene annotation, we propose a new method for de novo TE annotation that uses as a guide 24 nt-siRNAs that are a part of TE silencing pathways. We use this new approach, called TASR (for Transposon Annotation using Small RNAs), for de novo annotation of TEs in Arabidopsis, rice and soybean and demonstrate that this strategy can be successfully applied for de novo TE annotation in plants.Executable PERL is available for download from: http://tasr-pipeline.sourceforge.net/.
Project description:The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all of the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a pipeline that greatly facilitates this process. This program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete long terminal repeat (LTR) retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Oryza sativa (rice). In these three species, RepeatModeler2 identified approximately 3 times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license (https://github.com/Dfam-consortium/RepeatModeler, http://www.repeatmasker.org/RepeatModeler/).
Project description:Advances in genomic sequencing have recently offered vast opportunities for biological exploration, unraveling the evolution and improving our understanding of Earth biodiversity. Due to distinct plant species characteristics in terms of genome size, ploidy and heterozygosity, transposable elements (TEs) are common characteristics of many genomes. TEs are ubiquitous and dispersed repetitive DNA sequences that frequently impact the evolution and composition of the genome, mainly due to their redundancy and rearrangements. For this study, we provided an atlas of TE data by employing an easy-to-use portal ( APTE website ). To our knowledge, this is the most extensive and standardized analysis of TEs in plant genomes. We evaluated 67 plant genomes assembled at chromosome scale, recovering a total of 49,802,023 TE records, representing a total of 47,992,091,043 (~47,62%) base pairs (bp) of the total genomic space. We observed that new types of TEs were identified and annotated compared to other data repositories. By establishing a standardized catalog of TE annotation on 67 genomes, new hypotheses, exploration of TE data and their influences on the genomes may allow a better understanding of their function and processes. All original code and an example of how we developed the TE annotation strategy is available on GitHub ( <i>Extended data</i>).
Project description:This is a report of many distant but significant protein sequence relationships between human proteins and transposable elements (TEs). The libraries of human repeated sequences contain the DNA sequences of many TEs. These were translated in all reading frames, ignoring stop codons, and were used as amino acid sequence probes to search with BLASTP for similar sequences in a library of 25,193 human proteins. The probes show regions of significant amino acid sequence similarity to 1,950 different human genes, with an expectation of <10(-3). In comparison with previous REPEATMASKER (Institute for Systems Biology, Seattle) studies, these probes detect many more TE sequences in more human coding sequences with greater length than previous work using DNA sequences. If the criterion is opened, very many matches are found occurring on 4,653 different genes after correction for the number seen with random amino acid sequence probes. The processes that led to these extensive sets of sequence relationships between TEs and coding sequences of human genes have been a major source of variation and novel genes during evolution. This paper lists the number of sequence similarities seen by amino acid sequence comparison, which is surely an underestimate of the actual number of significant relationships. It appears that many of these are the result of past events of duplication of genes or gene regions, rather than a direct result of TE insertion. This report of observable relationships leaves to the future the functional implications as well as the detection of the events of TE insertion.
Project description:Transposable elements (TEs) initially attracted attention because they comprise a major portion of the genomic sequences in plants and animals. TEs may jump around the genome and disrupt both coding genes as well as regulatory sequences to cause disease. Host cells have therefore evolved various epigenetic and functional RNA-mediated mechanisms to mitigate the disruption of genomic integrity by TEs. TE associated sequences therefore acquire the tendencies of attracting various epigenetic modifiers to induce epigenetic alterations that may spread to the neighboring genes. In addition to posting threats for (epi)genome integrity, emerging evidence suggested the physiological importance of endogenous TEs either as cis-acting control elements for controlling gene regulation or as TE-containing functional transcripts that modulate the transcriptome of the host cells. Recent advances in long-reads sequence analysis technologies, bioinformatics and genetic editing tools have enabled the profiling, precise annotation and functional characterization of TEs despite their challenging repetitive nature. The importance of specific TEs in preimplantation embryonic development, germ cell differentiation and meiosis, cell fate determination and in driving species specific differences in mammals will be discussed.
Project description:A plant genome usually encompasses different families of transposable elements (TEs) that may constitute up to 85% of nuclear DNA. Under stressful conditions, some of them may activate, leading to sequence variation. In vitro plant regeneration may induce either phenotypic or genetic and epigenetic changes. While DNA methylation alternations might be related, i.e., to the Yang cycle problems, DNA pattern changes, especially DNA demethylation, may activate TEs that could result in point mutations in DNA sequence changes. Thus, TEs have the highest input into sequence variation (SV). A set of barley regenerants were derived via in vitro anther culture. High Performance Liquid Chromatography (RP-HPLC), used to study the global DNA methylation of donor plants and their regenerants, showed that the level of DNA methylation increased in regenerants by 1.45% compared to the donors. The Methyl-Sensitive Transposon Display (MSTD) based on methylation-sensitive Amplified Fragment Length Polymorphism (metAFLP) approach demonstrated that, depending on the selected elements belonging to the TEs family analyzed, varying levels of sequence variation were evaluated. DNA sequence contexts may have a different impact on SV generated by distinct mobile elements belonged to various TE families. Based on the presented study, some of the selected mobile elements contribute differently to TE-related SV. The surrounding context of the TEs DNA sequence is possibly important here, and the study explained some part of SV related to those contexts.