Dataset Information

Informed kmer selection for de novo transcriptome assembly.

ABSTRACT:

Motivation

De novo transcriptome assembly is an integral part for many RNA-seq workflows. Common applications include sequencing of non-model organisms, cancer or meta transcriptomes. Most de novo transcriptome assemblers use the de Bruijn graph (DBG) as the underlying data structure. The quality of the assemblies produced by such assemblers is highly influenced by the exact word length k As such no single kmer value leads to optimal results. Instead, DBGs over different kmer values are built and the assemblies are merged to improve sensitivity. However, no studies have investigated thoroughly the problem of automatically learning at which kmer value to stop the assembly. Instead a suboptimal selection of kmer values is often used in practice.

Results

Here we investigate the contribution of a single kmer value in a multi-kmer based assembly approach. We find that a comparative clustering of related assemblies can be used to estimate the importance of an additional kmer assembly. Using a model fit based algorithm we predict the kmer value at which no further assemblies are necessary. Our approach is tested with different de novo assemblers for datasets with different coverage values and read lengths. Further, we suggest a simple post processing step that significantly improves the quality of multi-kmer assemblies.

Conclusion

We provide an automatic method for limiting the number of kmer values without a significant loss in assembly quality but with savings in assembly time. This is a step forward to making multi-kmer methods more reliable and easier to use.

Availability and implementation

A general implementation of our approach can be found under: https://github.com/SchulzLab/KREATIONSupplementary information: Supplementary data are available at Bioinformatics online.

Contact

mschulz@mmci.uni-saarland.de.

SUBMITTER: Durai DA

PROVIDER: S-EPMC4892416 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Informed kmer selection for de novo transcriptome assembly.

Durai Dilip A DA Schulz Marcel H MH

Bioinformatics (Oxford, England) 20160428 11

<h4>Motivation</h4>De novo transcriptome assembly is an integral part for many RNA-seq workflows. Common applications include sequencing of non-model organisms, cancer or meta transcriptomes. Most de novo transcriptome assemblers use the de Bruijn graph (DBG) as the underlying data structure. The quality of the assemblies produced by such assemblers is highly influenced by the exact word length k As such no single kmer value leads to optimal results. Instead, DBGs over different kmer values are ...[more]

PMID: 27153653

Dataset Information

Informed kmer selection for de novo transcriptome assembly.

Motivation

Results

Conclusion

Availability and implementation

Contact

Publications

Informed kmer selection for de novo transcriptome assembly.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

The Oyster River Protocol: a multi-assembler and kmer approach for de novo transcriptome assembly.
| S-EPMC6078068 | biostudies-literature

De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.
| S-EPMC5968573 | biostudies-literature

Kollector: transcript-informed, targeted de novo assembly of gene loci.
| S-EPMC5860073 | biostudies-literature

Kollector: transcript-informed, targeted de novo assembly of gene loci.
| S-EPMC5572715 | biostudies-literature

Effect of de novo transcriptome assembly on transcript quantification.
| S-EPMC6549443 | biostudies-literature

De novo transcriptome assembly of Setatria italica variety Taejin.
| S-EPMC4878839 | biostudies-literature

De novo transcriptome assembly of two contrasting pumpkin cultivars.
| S-EPMC4778644 | biostudies-literature

A Bayesian approach for accurate de novo transcriptome assembly.
| S-EPMC8417280 | biostudies-literature

De novo transcriptome assembly of Sorghum bicolor variety Taejin.
| S-EPMC4878842 | biostudies-literature

De novo transcriptome assembly of two different apricot cultivars.
| S-EPMC4664767 | biostudies-literature