ENdb: a manually curated database of experimentally supported enhancers for human and mouse.
ABSTRACT: Enhancers are a class of cis-regulatory elements that can increase gene transcription by forming loops in intergenic regions, introns and exons. Enhancers, as well as their associated target genes, and transcription factors (TFs) that bind to them, are highly associated with human disease and biological processes. Although some enhancer databases have been published, most only focus on enhancers identified by high-throughput experimental techniques. Therefore, it is highly desirable to construct a comprehensive resource of manually curated enhancers and their related information based on low-throughput experimental evidences. Here, we established a comprehensive manually-curated enhancer database for human and mouse, which provides a resource for experimentally supported enhancers, and to annotate the detailed information of enhancers. The current release of ENdb documents 737 experimentally validated enhancers and their related information, including 384 target genes, 263 TFs, 110 diseases and 153 functions in human and mouse. Moreover, the enhancer-related information was supported by experimental evidences, such as RNAi, in vitro knockdown, western blotting, qRT-PCR, luciferase reporter assay, chromatin conformation capture (3C) and chromosome conformation capture-on-chip (4C) assays. ENdb provides a user-friendly interface to query, browse and visualize the detailed information of enhancers. The database is available at http://www.licpathway.net/ENdb.
Project description:Super-enhancers are important for controlling and defining the expression of cell-specific genes. With research on human disease and biological processes, human H3K27ac ChIP-seq datasets are accumulating rapidly, creating the urgent need to collect and process these data comprehensively and efficiently. More importantly, many studies showed that super-enhancer-associated single nucleotide polymorphisms (SNPs) and transcription factors (TFs) strongly influence human disease and biological processes. Here, we developed a comprehensive human super-enhancer database (SEdb, http://www.licpathway.net/sedb) that aimed to provide a large number of available resources on human super-enhancers. The database was annotated with potential functions of super-enhancers in the gene regulation. The current version of SEdb documented a total of 331 601 super-enhancers from 542 samples. Especially, unlike existing super-enhancer databases, we manually curated and classified 410 available H3K27ac samples from >2000 ChIP-seq samples from NCBI GEO/SRA. Furthermore, SEdb provides detailed genetic and epigenetic annotation information on super-enhancers. Information includes common SNPs, motif changes, expression quantitative trait locus (eQTL), risk SNPs, transcription factor binding sites (TFBSs), CRISPR/Cas9 target sites and Dnase I hypersensitivity sites (DHSs) for in-depth analyses of super-enhancers. SEdb will help elucidate super-enhancer-related functions and find potential biological effects.
Project description:Genome-wide association studies have successfully identified thousands of genomic loci potentially associated with hundreds of complex traits in the past decade. Nevertheless, the fact that more than 90% of such disease-associated variants lie in non-coding DNA with unknown functional implications has been appealing for advanced analysis of plenty of genetic variants. Toward this goal, recent studies focusing on individual non-coding variants have revealed that complex diseases are often the consequences of erroneous interactions between enhancers and their target genes. However, such enhancer-disease associations are dispersed in a variety of independent studies, and thus far it is still difficult to carry out comprehensive downstream analysis with these experimentally supported enhancer-disease associations. To fill in this gap, we collected experimentally supported associations between complex diseases and enhancers and then developed a manually curated database called EnDisease (http://bioinfo.au.tsinghua.edu.cn/endisease/). Concretely, EnDisease documents 535 associations between 133 diseases and 454 enhancers, extracted from 199 articles. Moreover, after annotating these enhancers using 649 human and 115 mouse DNase-seq experiments, we find that cancer-related enhancers tend to be open across a large number of cell types. This database provides a user-friendly interface for browsing and searching, and it also allows users to download data freely. EnDisease has the potential to become a helpful and important resource for researchers who aim to understand the molecular mechanisms of enhancers involved in complex diseases.
Project description:Transcription factors (TFs) are pivotal regulatory proteins that control gene expression in a context-dependent and tissue-specific manner. In contrast to human, where comprehensive curated TF collections exist, bovine TFs are only rudimentary recorded and characterized. In this article, we present a manually-curated compendium of 865 sequence-specific DNA-binding bovines TFs, which we analyzed for domain family distribution, evolutionary conservation, and tissue-specific expression. In addition, we provide a list of putative transcription cofactors derived from known interactions with the identified TFs. Since there is a general lack of knowledge concerning the regulation of gene expression in cattle, the curated list of TF should provide a basis for an improved comprehension of regulatory mechanisms that are specific to the species.
Project description:Large-scale sequencing studies discovered substantial genetic variants occurring in enhancers which regulate genes via long range chromatin interactions. Importantly, such variants could affect enhancer regulation by changing transcription factor bindings or enhancer hijacking, and in turn, make an essential contribution to disease progression. To facilitate better usage of published data and exploring enhancer deregulation in various human diseases, we created DiseaseEnhancer (http://biocc.hrbmu.edu.cn/DiseaseEnhancer/), a manually curated database for disease-associated enhancers. As of July 2017, DiseaseEnhancer includes 847 disease-associated enhancers in 143 human diseases. Database features include basic enhancer information (i.e. genomic location and target genes); disease types; associated variants on the enhancer and their mediated phenotypes (i.e. gain/loss of enhancer and the alterations of transcription factor bindings). We also include a feature on our website to export any query results into a file and download the full database. DiseaseEnhancer provides a promising avenue for researchers to facilitate the understanding of enhancer deregulation in disease pathogenesis, and identify new biomarkers for disease diagnosis and therapy.
Project description:Transcription factors (TFs) and their target genes have important functions in human diseases and biological processes. Gene expression profile analysis before and after knockdown or knockout is one of the most important strategies for obtaining target genes of TFs and exploring TF functions. Human gene expression profile datasets with TF knockdown and knockout are accumulating rapidly. Based on the urgent need to comprehensively and effectively collect and process these data, we developed KnockTF (http://www.licpathway.net/KnockTF/index.html), a comprehensive human gene expression profile database of TF knockdown and knockout. KnockTF provides a number of resources for human gene expression profile datasets associated with TF knockdown and knockout and annotates TFs and their target genes in a tissue/cell type-specific manner. The current version of KnockTF has 570 manually curated RNA-seq and microarray datasets associated with 308 TFs disrupted by different knockdown and knockout techniques and across multiple tissue/cell types. KnockTF collects upstream pathway information of TFs and functional annotation results of downstream target genes. It provides details about TFs binding to promoters, super-enhancers and typical enhancers of target genes. KnockTF constructs a TF-differentially expressed gene network and performs network analyses for genes of interest. KnockTF will help elucidate TF-related functions and potential biological effects.
Project description:Super-enhancers (SEs) are critical for the transcriptional regulation of gene expression. We developed the super-enhancer archive version 3.0 (SEA v. 3.0, http://sea.edbc.org) to extend SE research. SEA v. 3.0 provides the most comprehensive archive to date, consisting of 164 545 super-enhancers. Of these, 80 549 are newly identified from 266 cell types/tissues/diseases using an optimized computational strategy, and 52 have been experimentally confirmed with manually curated references. We now support super-enhancers in 11 species including 7 new species (zebrafish, chicken, chimp, rhesus, sheep, Xenopus tropicalis and stickleback). To facilitate super-enhancer functional analysis, we added several new regulatory datasets including 3 361 785 typical enhancers, chromatin interactions, SNPs, transcription factor binding sites and SpCas9 target sites. We also updated or developed new criteria query, genome visualization and analysis tools for the archive. This includes a tool based on Shannon Entropy to evaluate SE cell type specificity, a new genome browser that enables the visualization of SE spatial interactions based on Hi-C data, and an enhanced enrichment analysis interface that provides online enrichment analyses of SE related genes. SEA v. 3.0 provides a comprehensive database of all available SE information across multiple species, and will facilitate super-enhancer research, especially as related to development and disease.
Project description:We introduce a manually constructed and curated regulatory network model that describes the current state of knowledge of transcriptional regulation of Bacillus subtilis. The model corresponds to an updated and enlarged version of the regulatory model of central metabolism originally proposed in 2008. We extended the original network to the whole genome by integration of information from DBTBS, a compendium of regulatory data that includes promoters, transcription factors (TFs), binding sites, motifs, and regulated operons. Additionally, we consolidated our network with all the information on regulation included in the SporeWeb and Subtiwiki community-curated resources on B. subtilis. Finally, we reconciled our network with data from RegPrecise, which recently released their own less comprehensive reconstruction of the regulatory network for B. subtilis. Our model describes 275 regulators and their target genes, representing 30 different mechanisms of regulation such as TFs, RNA switches, Riboswitches, and small regulatory RNAs. Overall, regulatory information is included in the model for ?2500 of the ?4200 genes in B. subtilis 168. In an effort to further expand our knowledge of B. subtilis regulation, we reconciled our model with expression data. For this process, we reconstructed the Atomic Regulons (ARs) for B. subtilis, which are the sets of genes that share the same "ON" and "OFF" gene expression profiles across multiple samples of experimental data. We show how ARs for B. subtilis are able to capture many sets of genes corresponding to regulated operons in our manually curated network. Additionally, we demonstrate how ARs can be used to help expand or validate the knowledge of the regulatory networks by looking at highly correlated genes in the ARs for which regulatory information is lacking. During this process, we were also able to infer novel stimuli for hypothetical genes by exploring the genome expression metadata relating to experimental conditions, gaining insights into novel biology.
Project description:BACKGROUND:Recent studies demonstrated that long non-coding RNAs (lncRNAs) could be intricately implicated in cancer-related molecular networks, and related to cancer occurrence, development and prognosis. However, clinicopathological and molecular features for these cancer-related lncRNAs, which are very important in bridging lncRNA basic research with clinical research, fail to well settle to integration. RESULTS:After manually reviewing more than 2500 published literature, we collected the cancer-related lncRNAs with the experimental proof of functions. By integrating from literature and public databases, we constructed CRlncRNA, a database of cancer-related lncRNAs. The current version of CRlncRNA embodied 355 entries of cancer-related lncRNAs, covering 1072 cancer-lncRNA associations regarding to 76 types of cancer, and 1238 interactions with different RNAs and proteins. We further annotated clinicopathological features of these lncRNAs, such as the clinical stages and the cancer hallmarks. We also provided tools for data browsing, searching and download, as well as online BLAST, genome browser and gene network visualization service. CONCLUSIONS:CRlncRNA is a manually curated database for retrieving clinicopathological and molecular features of cancer-related lncRNAs supported by highly reliable evidences. CRlncRNA aims to provide a bridge from lncRNA basic research to clinical research. The lncRNA dataset collected by CRlncRNA can be used as a golden standard dataset for the prospective experimental and in-silico studies of cancer-related lncRNAs. CRlncRNA is freely available for all users at http://crlnc.xtbg.ac.cn .
Project description:Polycystic ovary syndrome (PCOS) is a complex disorder affecting approximately 5-10 percent of all women of reproductive age. It is a multi-factorial endocrine disorder, which demonstrates menstrual disturbance, infertility, anovulation, hirsutism, hyper androgenism and others. It has been indicated that differential expression of genes, genetic level variations, and other molecular alterations interplay in PCOS and are the target sites for clinical applications. Therefore, integrating the PCOS-associated genes along with its alteration and underpinning the underlying mechanism might definitely provide valuable information to understand the disease mechanism. We manually curated the information from 234 published literatures, including gene, molecular alteration, details of association, significance of association, ethnicity, age, drug, and other annotated summaries. PCOSDB is an online resource that brings comprehensive information about the disease, and the implication of various genes and its mechanism. We present the curated information from peer reviewed literatures, and organized the information at various levels including differentially expressed genes in PCOS, genetic variations such as polymorphisms, mutations causing PCOS across various ethnicities. We have covered both significant and non-significant associations along with conflicting studies. PCOSDB v1.0 contains 208 gene reports, 427 molecular alterations, and 46 phenotypes associated with PCOS.
Project description:Computational models of enhancer function generally assume that transcription factors (TFs) exert their regulatory effects independently, modeling an enhancer as a "bag of sites." These models fail on endogenous loci that harbor multiple enhancers, and a "two-tier" model appears better suited: in each enhancer TFs work independently, and the total expression is a weighted sum of their expression readouts. Here, we test these two opposing views on how cis-regulatory information is integrated. We fused two Drosophila blastoderm enhancers, measured their readouts, and applied the above two models to these data. The two-tier mechanism better fits these readouts, suggesting that these fused enhancers comprise multiple independent modules, despite having sequence characteristics typical of single enhancers. We show that short-range TF-TF interactions are not sufficient to designate such modules, suggesting unknown underlying mechanisms. Our results underscore that mechanisms of how modules are defined and how their outputs are combined remain to be elucidated.