Unknown

Dataset Information

0

A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.


ABSTRACT: We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

SUBMITTER: Gog JR 

PROVIDER: S-EPMC5898753 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

Gog Julia R JR   Lever Andrew M L AML   Skittrall Jordan P JP  

PloS one 20180413 4


We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers  ...[more]

Similar Datasets

| S-EPMC3441209 | biostudies-literature
| S-EPMC2515872 | biostudies-literature
| S-EPMC3621466 | biostudies-literature
2019-03-19 | GSE114948 | GEO
| S-EPMC7329539 | biostudies-literature
2024-03-18 | GSE249817 | GEO
| S-EPMC120672 | biostudies-literature
| S-EPMC7003185 | biostudies-literature
| S-EPMC2395254 | biostudies-literature
| S-EPMC3878968 | biostudies-literature