Dataset Information


A systematic evaluation of pattern discovery algorithms

ABSTRACT: Pattern discovery algorithms are methods for discovering recurrent, non-random motifs widely used in the analysis of biological sequences. Many algorithms exist but few comparisons have been made amongst them. We systematically profile eight representative methods at multiple parameter settings across 174 diverse experimental datasets, including ten novel ChIP-on-chip datasets. We executed 16,777 pattern discovery analyses to assess prediction accuracy, CPU usage and memory consumption. For 144 datasets we developed a gold-standard using machine-learning algorithms; cross-validation was used for the remaining datasets. Performance was highly disparate, with median accuracy ranging from 32% to 96%. Importantly we were unable to replicate previously reported algorithm-rankings, emphasizing the need to use many and diverse experimental datasets. We found deterministic algorithms like Projection and Oligo/Dyad had the highest prediction accuracy. Computational efficiency was not linearly related to dataset size and becomes critical: some algorithms are intractably slow on large datasets. This work provides the first combined assessment of the CPU, memory, and prediction accuracies of pattern discovery algorithms on real experimental datasets. HL60-Mnt-ChIP: ChIP-Chip with 10 biological replicates HL60-Trrap-ChIP: ChIP-Chip with 13 biological replicates

ORGANISM(S): Homo sapiens  

SUBMITTER: Linda Z Penn   Paul C Boutros  Adam P Hanley  Igor Jurisica 

PROVIDER: E-GEOD-15370 | ArrayExpress | 2010-05-19



Similar Datasets

2010-05-17 | E-GEOD-8449 | ArrayExpress
2008-10-31 | E-GEOD-8448 | ArrayExpress
2010-05-17 | E-GEOD-8447 | ArrayExpress
2015-04-23 | E-GEOD-65101 | ArrayExpress
2012-10-10 | E-GEOD-34794 | ArrayExpress
2015-11-23 | E-GEOD-51817 | ArrayExpress
2015-05-07 | E-GEOD-60378 | ArrayExpress
2016-08-06 | E-GEOD-71860 | ArrayExpress
2013-07-11 | E-GEOD-47073 | ArrayExpress
2016-07-24 | E-GEOD-77854 | ArrayExpress