Project description:Esophageal cancer (EC) involves many genomic, epigenetic and transcriptomic disorders, which play key roles in the heterogeneous progression of cancer. However, the study of EC with multi-omics has not been conducted. This study identified a high consistency between DNA copy number variations and abnormal methylations in EC by analyzing genomics, epigenetics and transcriptomics data and investigating mutual correlations of DNA copy number variation, methylation and gene expressions, and stratified copy number variation genes (CNV-Gs) and methylation genes (MET-Gs). The methylation, CNVs and expression profiles of CNV-Gs and MET-Gs were analyzed by consistent clustering using iCluster integration, here, we determined three subtypes (iC1, iC2, iC3) with different molecular traits, prognostic characteristics and tumor immune microenvironment features. We also identified 4 prognostic genes (CLDN3, FAM221A, GDF15 and YBX2) differentially expressed in the three subtypes, and could therefore be used as representative biomarkers for the three subtypes of EC. In conclusion, by performing comprehensive analysis on genomic, epigenetic and transcriptomic regulations, the current study provided new insights into the multilayer molecular and pathological traits of EC, and contributed to the precision medication for EC patients.
Project description:Integration of multiple profiling data and construction of functional gene networks may provide additional insights into the molecular mechanisms of complex diseases. Osteoporosis is a worldwide public health problem, but the complex gene-gene interactions, post-transcriptional modifications and regulation of functional networks are still unclear. To gain a comprehensive understanding of osteoporosis etiology, transcriptome gene expression microarray, epigenomic miRNA microarray and methylome sequencing were performed simultaneously in 5 high hip BMD (Bone Mineral Density) subjects and 5 low hip BMD subjects. SPIA (Signaling Pathway Impact Analysis) and PCST (Prize Collecting Steiner Tree) algorithm were used to perform pathway-enrichment analysis and construct the interaction networks. Through integrating the transcriptomic and epigenomic data, firstly we identified 3 genes (FAM50A, ZNF473 and TMEM55B) and one miRNA (hsa-mir-4291) which showed the consistent association evidence from both gene expression and methylation data; secondly in network analysis we identified an interaction network module with 12 genes and 11 miRNAs including AKT1, STAT3, STAT5A, FLT3, hsa-mir-141 and hsa-mir-34a which have been associated with BMD in previous studies. This module revealed the crosstalk among miRNAs, mRNAs and DNA methylation and showed four potential regulatory patterns of gene expression to influence the BMD status. In conclusion, the integration of multiple layers of omics can yield in-depth results than analysis of individual omics data respectively. Integrative analysis from transcriptomics and epigenomic data improves our ability to identify causal genetic factors, and more importantly uncover functional regulation pattern of multi-omics for osteoporosis etiology.
Project description:Biomedical research studies have generated large multi-omic datasets to study complex diseases like Alzheimer's disease (AD). An important aim of these studies is the identification of candidate genes that demonstrate congruent disease-related alterations across the different data types measured by the study. We developed a new method to detect such candidate genes in large multi-omic case-control studies that measure multiple data types in the same set of samples. The method is based on a gene-centric integrative coefficient quantifying to what degree consistent differences are observed in the different data types. For statistical inference, a Bayesian hierarchical model is used to study the distribution of the integrative coefficient. The model employs a conditional autoregressive prior to integrate a functional gene network and to share information between genes known to be functionally related. We applied the method to an AD dataset consisting of histone acetylation, DNA methylation, and RNA transcription data from human cortical tissue samples of 233 subjects, and we detected 816 genes with consistent differences between persons with AD and controls. The findings were validated in protein data and in RNA transcription data from two independent AD studies. Finally, we found three subnetworks of jointly dysregulated genes within the functional gene network which capture three distinct biological processes: myeloid cell differentiation, protein phosphorylation and synaptic signaling. Further investigation of the myeloid network indicated an upregulation of this network in early stages of AD prior to accumulation of hyperphosphorylated tau and suggested that increased CSF1 transcription in astrocytes may contribute to microglial activation in AD. Thus, we developed a method that integrates multiple data types and external knowledge of gene function to detect candidate genes, applied the method to an AD dataset, and identified several disease-related genes and processes demonstrating the usefulness of the integrative approach.
Project description:Integrating diverse genomics data can provide a global view of the complex biological processes related to the human complex diseases. Although substantial efforts have been made to integrate different omics data, there are at least three challenges for multi-omics integration methods: (i) How to simultaneously consider the effects of various genomic factors, since these factors jointly influence the phenotypes; (ii) How to effectively incorporate the information from publicly accessible databases and omics datasets to fully capture the interactions among (epi)genomic factors from diverse omics data; and (iii) Until present, the combination of more than two omics datasets has been poorly explored. Current integration approaches are not sufficient to address all of these challenges together. We proposed a novel integrative analysis framework by incorporating sparse model, multivariate analysis, Gaussian graphical model, and network analysis to address these three challenges simultaneously. Based on this strategy, we performed a systemic analysis for glioblastoma multiforme (GBM) integrating genome-wide gene expression, DNA methylation, and miRNA expression data. We identified three regulatory modules of genomic factors associated with GBM survival time and revealed a global regulatory pattern for GBM by combining the three modules, with respect to the common regulatory factors. Our method can not only identify disease-associated dysregulated genomic factors from different omics, but more importantly, it can incorporate the information from publicly accessible databases and omics datasets to infer a comprehensive interaction map of all these dysregulated genomic factors. Our work represents an innovative approach to enhance our understanding of molecular genomic mechanisms underlying human complex diseases.
Project description:BackgroundRegulation of transcription is central to the emergence of new cell types during development, and it often involves activation of genes via proximal and distal regulatory regions. The activity of regulatory elements is determined by transcription factors (TFs) and epigenetic marks, but despite extensive mapping of such patterns, the extraction of regulatory principles remains challenging.ResultsHere we study differentially and similarly expressed genes along with their associated epigenomic profiles, chromatin accessibility and DNA methylation, during lineage specification at gastrulation in mice. Comparison of the three lineages allows us to identify genomic and epigenomic features that distinguish the two classes of genes. We show that differentially expressed genes are primarily regulated by distal elements, while similarly expressed genes are controlled by proximal housekeeping regulatory programs. Differentially expressed genes are relatively isolated within topologically associated domains, while similarly expressed genes tend to be located in gene clusters. Transcription of differentially expressed genes is associated with differentially open chromatin at distal elements including enhancers, while that of similarly expressed genes is associated with ubiquitously accessible chromatin at promoters.ConclusionBased on these associations of (linearly) distal genes' transcription start sites (TSSs) and putative enhancers for developmental genes, our findings allow us to link putative enhancers to their target promoters and to infer lineage-specific repertoires of putative driver transcription factors, within which we define subgroups of pioneers and co-operators.
Project description:Schizophrenia and bipolar disorder are serious mental illnesses that affect more than 2% of adults. While large-scale genetics studies have identified genomic regions associated with disease risk, less is known about the molecular mechanisms by which risk alleles with small effects lead to schizophrenia and bipolar disorder. In order to fill this gap between genetics and disease phenotype, we have undertaken a multi-cohort genomics study of postmortem brains from controls, individuals with schizophrenia and bipolar disorder. Here we present a public resource of functional genomic data from the dorsolateral prefrontal cortex (DLPFC; Brodmann areas 9 and 46) of 986 individuals from 4 separate brain banks, including 353 diagnosed with schizophrenia and 120 with bipolar disorder. The genomic data include RNA-seq and SNP genotypes on 980 individuals, and ATAC-seq on 269 individuals, of which 264 are a subset of individuals with RNA-seq. We have performed extensive preprocessing and quality control on these data so that the research community can take advantage of this public resource available on the Synapse platform at http://CommonMind.org .
Project description:This work included the primer design details of Quantitative Microbial Elemental Cycling (QMEC) method, a high-throughput qPCR method for microbial carbon (C), nitrogen (N), phosphorus (P), sulfur (S) and methane metabolism potential detection and assessment. We designed 36 novel primers based on their amino acid sequences. Via illumina sequencing technology, their phylogenetic taxonomy was identified and analyzed to validate the primer specificity.
Project description:Bamboo, one of the most crucial nontimber forest resources worldwide, has the capacity for rapid growth. In recent years, the genome of moso bamboo (Phyllostachys edulis) has been decoded, and a large amount of transcriptome data has been published. In this study, we generated the genome-wide profiles of the histone modification H3K4me3 in leaf, stem, and root tissues of bamboo. The trends in the distribution patterns were similar to those in rice. We developed a processing pipeline for predicting novel transcripts to refine the structural annotation of the genome using H3K4me3 ChIP-seq data and 29 RNA-seq datasets. As a result, 12,460 novel transcripts were predicted in the bamboo genome. Compared with the transcripts in the newly released version 2.0 of the bamboo genome, these novel transcripts are tissue-specific and shorter, and most have a single exon. Some representative novel transcripts were validated by semiquantitative RT-PCR and qRT-PCR analyses. Furthermore, we put these novel transcripts back into the ChIP-seq analysis pipeline and discovered that the percentages of H3K4me3 in genic elements were increased. Overall, this work integrated transcriptomic data and epigenomic data to refine the annotation of the genome in order to discover more functional genes and study bamboo growth and development, and the application of this predicted pipeline may help refine the structural annotation of the genome in other species.
Project description:Temporal profiling of DNA replication timing (RT) in combination with chromatin modifications, chromatin accessibility, and gene expression provides new insights into the causal relationships between chromatin and RT during cell cycle. Here, we describe a protocol for in-depth integrative computational analyses of Repli-seq, ATAC-seq, RNA-seq, and ChIP-seq or CUT&RUN data for multiple marks at various time points across cell cycle and changes in their interrelationships upon an experimental perturbation (e.g., knockdown or overexpression of a regulatory protein). For complete details on the use and execution of this protocol, please refer to Van Rechem et al. (2021).