Unknown

Dataset Information

0

Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum.


ABSTRACT: Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.

SUBMITTER: Chou WC 

PROVIDER: S-EPMC4446414 | biostudies-literature | 2015 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum.

Chou Wen-Chi WC   Ma Qin Q   Yang Shihui S   Cao Sha S   Klingeman Dawn M DM   Brown Steven D SD   Xu Ying Y  

Nucleic acids research 20150312 10


Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individua  ...[more]

Similar Datasets

| S-EPMC3990059 | biostudies-literature
| S-EPMC5975590 | biostudies-literature
| S-EPMC3479195 | biostudies-literature
| S-EPMC6529933 | biostudies-literature
| S-EPMC3623140 | biostudies-literature
| S-EPMC7644310 | biostudies-literature
| S-EPMC4555090 | biostudies-literature
| S-EPMC3425587 | biostudies-literature
| PRJEB4206 | ENA
2020-11-01 | GSE100047 | GEO