Unknown

Dataset Information

0

Text-mining assisted regulatory annotation.


ABSTRACT: BACKGROUND: Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature. RESULTS: We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process. CONCLUSION: Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation.

SUBMITTER: Aerts S 

PROVIDER: S-EPMC2374703 | biostudies-other | 2008

REPOSITORIES: biostudies-other

altmetric image

Publications


<h4>Background</h4>Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature.<h4>Results</h4>We develop text-mining strat  ...[more]

Similar Datasets

| S-EPMC3660268 | biostudies-literature
| S-EPMC3287586 | biostudies-literature
| S-EPMC2697649 | biostudies-literature
| S-EPMC4674139 | biostudies-literature
| S-EPMC5975701 | biostudies-literature
| S-EPMC3939821 | biostudies-literature
| S-EPMC3475109 | biostudies-literature
| S-EPMC6550425 | biostudies-literature