Unknown

Dataset Information

0

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.


ABSTRACT: Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.

SUBMITTER: Castro-Mondragon JA 

PROVIDER: S-EPMC5737723 | biostudies-literature | 2017 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

Castro-Mondragon Jaime Abraham JA   Jaeger Sébastien S   Thieffry Denis D   Thomas-Chollier Morgane M   van Helden Jacques J  

Nucleic acids research 20170701 13


Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of m  ...[more]

Similar Datasets

| S-EPMC3287167 | biostudies-literature
| S-EPMC4254000 | biostudies-literature
2014-04-17 | E-MTAB-1694 | biostudies-arrayexpress
| S-EPMC10477186 | biostudies-literature
| S-EPMC3312004 | biostudies-literature
| S-EPMC5758894 | biostudies-literature
| S-EPMC3055704 | biostudies-literature
| S-EPMC9997904 | biostudies-literature
| S-EPMC9252783 | biostudies-literature
| S-EPMC1925120 | biostudies-literature