Dataset Information

SAKE: Strobemer-assisted k-mer extraction.

ABSTRACT: K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads.

SUBMITTER: Leinonen M

PROVIDER: S-EPMC10686461 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SAKE: Strobemer-assisted k-mer extraction.

Leinonen Miika M Salmela Leena L

PloS one 20231129 11

K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer ...[more]

PMID: 38019768

Similar Datasets

Project description:In recent years, there has been growing interest in bioactive plant compounds for their beneficial effects on health and for their potential in reducing the risk of developing certain diseases such as cancer, cardiovascular diseases, and neurodegenerative disorders. The extraction techniques conventionally used to obtain these phytocompounds, however, due to the use of toxic solvents and high temperatures, tend to be supplanted by innovative and unconventional techniques, in line with the demand for environmental and economic sustainability of new chemical processes. Among non-thermal technologies, cold plasma (CP), which has been successfully used for some years in the food industry as a treatment to improve food shelf life, seems to be one of the most promising solutions in green extraction processes. CP is characterized by its low environmental impact, low cost, and better extraction yield of phytochemicals, saving time, energy, and solvents compared with other classical extraction processes. In light of these considerations, this review aims to provide an overview of the potential and critical issues related to the use of CP in the extraction of phytochemicals, particularly polyphenols and essential oils. To review the current knowledge status and future insights of CP in this sector, a bibliometric study, providing quantitative information on the research activity based on the available published scientific literature, was carried out by the VOSviewer software (v. 1.6.18). Scientometric analysis has seen an increase in scientific studies over the past two years, underlining the growing interest of the scientific community in this natural substance extraction technique. The literature studies analyzed have shown that, in general, the use of CP was able to increase the yield of essential oil and polyphenols. Furthermore, the composition of the phytoextract obtained with CP would appear to be influenced by process parameters such as intensity (power and voltage), treatment time, and the working gas used. In general, the studies analyzed showed that the best yields in terms of total polyphenols and the antioxidant and antimicrobial properties of the phytoextracts were obtained using mild process conditions and nitrogen as the working gas. The use of CP as a non-conventional extraction technique is very recent, and further studies are needed to better understand the optimal process conditions to be adopted, and above all, in-depth studies are needed to better understand the mechanisms of plasma-plant matrix interaction to verify the possibility of any side reactions that could generate, in a highly oxidative environment, potentially hazardous substances, which would limit the exploitation of this technique at the industrial level.

Dataset Information

SAKE: Strobemer-assisted k-mer extraction.

Publications

SAKE: Strobemer-assisted k-mer extraction.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets