Unknown

Dataset Information

0

A new statistic for efficient detection of repetitive sequences.


ABSTRACT:

Motivation

Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions.

Results

Using the statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate clustered regularly interspaced short palindromic repeats regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads.

Availability and implementation

The codes are available at https://github.com/XuegongLab/D2R_codes under GPL 3.0 license.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Chen S 

PROVIDER: S-EPMC7963086 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

A new statistic for efficient detection of repetitive sequences.

Chen Sijie S   Chen Yixin Y   Sun Fengzhu F   Waterman Michael S MS   Zhang Xuegong X  

Bioinformatics (Oxford, England) 20191101 22


<h4>Motivation</h4>Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently di  ...[more]

Similar Datasets

| S-EPMC2559850 | biostudies-literature
| S-EPMC6682110 | biostudies-literature
| S-EPMC4989159 | biostudies-literature
| S-EPMC7572123 | biostudies-literature
| S-EPMC2837741 | biostudies-literature
2015-08-26 | GSE60784 | GEO
| S-EPMC4745763 | biostudies-literature
| PRJEB34450 | ENA
| S-EPMC9415267 | biostudies-literature
| S-EPMC8046333 | biostudies-literature