Identification of Regulatory Modules That Stratify Lupus Disease Mechanism through Integrating Multi-Omics Data.
ABSTRACT: Although recent advances in genetic studies have shed light on systemic lupus erythematosus (SLE), its detailed mechanisms remain elusive. In this study, using datasets on SLE transcriptomic profiles, we identified 750 differentially expressed genes (DEGs) in T and B lymphocytes and peripheral blood cells. Using transcription factor (TF) binding data derived from chromatin immunoprecipitation sequencing (ChIP-seq) experiments from the Encyclopedia of DNA Elements (ENCODE) project, we inferred networks of co-regulated genes (NcRGs) based on binding profiles of the upregulated DEGs by significantly enriched TFs. Modularization analysis of NcRGs identified co-regulatory modules among the DEGs and master TFs vital for each module. Remarkably, the co-regulatory modules stratified the common SLE interferon (IFN) signature and revealed SLE pathogenesis pathways, including the complement cascade, cell cycle regulation, NETosis, and epigenetic regulation. By integrative analyses of disease-associated genes (DAGs), DEGs, and enriched TFs, as well as proteins interacting with them, we identified a hierarchical regulatory cascade with TFs regulated by DAGs, which in turn regulates gene expression. Integrative analysis of multi-omics data provided valuable molecular insights into the molecular mechanisms of SLE.
Project description:This study aimed to gain a better understanding of the molecular circuitry of Schmid-type metaphyseal chondrodysplasia (SMCD), and to identify more potential genes associated with the pathogenesis of SMCD. Microarray data from GSE72261 were downloaded from the NCBI GEO database, including collagen X p.Asn617Lys knock-in mutation (ColXN617K), ablated XBP1 activity (Xbp1Cart?Ex2), compound mutant (C/X), and wild-type (WT) specimens. Differentially expressed genes (DEGs) were screened in Xbp1 vs. WT, Col vs. WT and CX vs. WT, respectively. Pathway enrichment analysis of these DEGs was performed. Transcription factors (TFs) of the overlapping DEGs were identified. Weighted correlation network analysis (WGCNA) was performed to find modules of DEGs with high correlations, followed by gene function analysis and a protein-protein interaction network construction. In total, 481, 1,530 and 1,214 DEGs were identified in Xbp1 vs. WT, Col vs. WT and CX vs. WT, respectively. These DEGs were enriched in different pathways, such as extracellular matrix (ECM)-receptor interaction and metabolism-related pathways. A total of 7 TFs were found to regulate 19 common upregulated genes, and 4 TFs were identified to regulate 21 common downregulated genes. Two significant gene co-expression modules were enriched and DEGs in the 2 modules were mainly enriched in different biological processes, such as ribosome biogenesis. Moreover, Kras (downregulated), Col5a1 (upregulated) and Furin (upregulated) were both identified in the regulatory networks and protein-protein interaction (PPI) network. On the whole, our findings indicate that the Kras, Col5a1 and Furin genes may play essential roles in the molecular mechanisms of SMCD, which warrants further investigation.
Project description:Sepsis is a type of systemic inflammatory response caused by infection. The present study aimed to identify novel targets for the treatment of sepsis. We conducted bioinformatic analysis of the microarray Gene Expression Omnibus dataset GSE12624, which includes data on 34 patients with sepsis and 36 healthy individuals without sepsis. Differentially expressed genes (DEGs) in sepsis patients were identified using Bayesian methods included in the limma package in R. Correlations among the expression values of DEGs were analyzed using the weighted gene co?expression network analysis (WGCNA) to construct a co?expression network. Subsequently, the generated co?expression network was visualized using Cytoscape 3.3 software. Additionally, a protein?protein interaction (PPI) network was constructed based on all the DEGs using STRING. Finally, the integrated regulatory network was constructed based on DEGs, microRNAs (miRNAs) and transcription factors (TFs). A total of 407 DEGs were identified in the sepsis samples, including 227 upregulated DEGs and 180 downregulated DEGs. WGCNA grouped the DEGs into 13 co?expressed modules. Additionally, MAP3K8 and RPS6KA5 in the MEyellow module were enriched in the MAPK and TNF signaling pathways. In addition, the PPI network comprised 48 nodes and 112 edges, which included the pairs MAP3K8?RPS6KA5, MAP3K8?IL10, RPS6KA5?EXOSC4 and EXOSC4?EXOSC5. Lastly, the TF?miRNA?target DEG regulatory network was constructed based on eight TFs (NF??B), seven miRNAs (miR152, miR?148A/B), and 52 TF?miRNA?target gene triplets (17 upregulated genes, including MAP3K8, and 10 downregulated genes, including RPS6KA5). Our analysis showed that the members of the miR?148 family (miR?148A/B and miR?152) are candidate biomarkers for sepsis.
Project description:<h4>Background</h4>Hypoplastic left heart syndrome (HLHS) is one of the most complex congenital cardiac malformations, and the molecular mechanism of heart failure (HF) in HLHS is still elusive.<h4>Methods</h4>Integrative bioinformatics analysis was performed to unravel the underlying genes and mechanisms involved in HF in HLHS. Microarray dataset GSE23959 was screened out for the differentially expressed genes (DEGs), after which the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses were carried out using the Metascape. The protein-protein interaction (PPI) network was generated, and the modules and hub genes were identified with the Cytoscape-plugin. And the integrated network of transcription factor (TF)-DEGs and miRNA-DEGs was constructed, respectively.<h4>Results</h4>A total of 210 DEGs were identified, including 135 up-regulated and 75 down-regulated genes. The functional enrichment analysis of DEGs pointed towards the mitochondrial-related biological processes, cellular components, molecular functions and signaling pathways. A PPI network was constructed including 155 nodes as well as 363 edges. And 15 hub genes, such as <i>NDUFB6, UQCRQ, SDHD, ATP5H</i>, were identified based on three topological analysis methods and mitochondrial components and functions were the most relevant. Furthermore, by integrating network interaction construction, 23 TFs (NFKB1, RELA, HIF1A, VHL, GATA1, PPAR-γ, etc.) as well as several miRNAs (hsa-miR-155-5p, hsa-miR-191-5p, hsa-mir-124-3p, hsa-miR-1-3p, etc.) were detected and indicated the possible involvement of NF-κB signaling pathways in mitochondrial dysfunction in HLHS.<h4>Conclusion</h4>The present study applied the integrative bioinformatics analysis and revealed the mitochondrial-related key genes, regulatory pathways, TFs and miRNAs underlying the HF in HLHS, which improved the understanding of disease mechanisms and the development of novel therapeutic targets.
Project description:The brain is a highly complex organ consisting of numerous types of cells with ample diversity at the epigenetic level to achieve distinct gene expression profiles. During neuronal cell specification, transcription factors (TFs) form regulatory modules with chromatin remodeling proteins to initiate the cascade of epigenetic changes. Currently, little is known about brain epigenetic regulatory modules and how they regulate gene expression in a cell-type specific manner. To infer TFs involved in neuronal specification, we applied a recursive motif search approach on the differentially methylated regions identified from single-cell methylomes. The epigenetic transcription regulatory modules (ETRM), including EGR1 and MEF2C, were predicted and the co-expression of TFs in ETRMs were examined with RNA-seq data from single or sorted brain cells using a conditional probability matrix. Lastly, computational predications were validated with EGR1 ChIP-seq data. In addition, methylome and RNA-seq data generated from Egr1 knockout mice supported the essential role of EGR1 in brain epigenome programming, in particular for excitatory neurons. In summary, we demonstrated that brain single cell methylome and RNA-seq data can be integrated to gain a better understanding of how ETRMs control cell specification. The analytical pipeline implemented in this study is freely accessible in the Github repository (https://github.com/Gavin-Yinld/brain_TF).
Project description:<h4>Background</h4>Systemic lupus erythematosus (SLE) is a multisystemic, chronic inflammatory disease characterized by destructive systemic organ involvement, which could cause the decreased functional capacity, increased morbidity and mortality. Previous studies show that SLE is characterized by autoimmune, inflammatory processes, and tissue destruction. Some seriously-ill patients could develop into lupus nephritis. However, the cause and underlying molecular events of SLE needs to be further resolved.<h4>Methods</h4>The expression profiles of GSE144390, GSE4588, GSE50772 and GSE81622 were downloaded from the Gene Expression Omnibus (GEO) database to obtain differentially expressed genes (DEGs) between SLE and healthy samples. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichments of DEGs were performed by metascape etc. online analyses. The protein-protein interaction (PPI) networks of the DEGs were constructed by GENEMANIA software. We performed Gene Set Enrichment Analysis (GSEA) to further understand the functions of the hub gene, Weighted gene co-expression network analysis (WGCNA) would be utilized to build a gene co-expression network, and the most significant module and hub genes was identified. CIBERSORT tools have facilitated the analysis of immune cell infiltration patterns of diseases. The receiver operating characteristic (ROC) analyses were conducted to explore the value of DEGs for SLE diagnosis.<h4>Results</h4>In total, 6 DEGs (IFI27, IFI44, IFI44L, IFI6, EPSTI1 and OAS1) were screened, Biological functions analysis identified key related pathways, gene modules and co-expression networks in SLE. IFI27 may be closely correlated with the occurrence of SLE. We found that an increased infiltration of moncytes, while NK cells resting infiltrated less may be related to the occurrence of SLE.<h4>Conclusion</h4>IFI27 may be closely related pathogenesis of SLE, and represents a new candidate molecular marker of the occurrence and progression of SLE. Moreover immune cell infiltration plays important role in the progession of SLE.
Project description:<h4>Purpose</h4> The variation in inflammation in chronic obstructive pulmonary disease (COPD) between individuals is genetically determined. This study aimed to identify gene signatures of COPD through bioinformatics analysis based on multiple gene sets and explore their immune characteristics and transcriptional regulation mechanisms. <h4>Methods</h4> Data from four microarrays were downloaded from the Gene Expression Omnibus database to screen differentially expressed genes (DEGs) between COPD patients and controls. Weighted gene co-expression network analysis was applied to identify trait-related modules and then select key module-related DEGs. The optimized gene set of signatures was obtained using the least absolute shrinkage and selection operator (LASSO) regression analysis. The CIBERSORT algorithm and Pearson correlation test were used to analyze the relationship between gene signatures and immune cells. Finally, public databases were used to predict the transcription factors (TFs) and upstream miRNAs. <h4>Results</h4> A total of 127 DEGs in COPD were identified from the combined dataset. By considering the intersection of DEGs and genes in two trait-related modules, 83 key module-related DEGs were identified, which were mainly enriched in interleukin-related pathways. Seven-gene signatures, including MTHFD2, KANK3, GFPT2, PHLDA1, HS3ST2, FGG, and RPS4Y1, were further selected using the LASSO algorithm. These gene signatures showed the predictive potential for COPD risks and were significantly correlated with 18 types of immune cells. Finally, nine miRNAs and three TFs were predicted to target MTHFD2, GFPT2, PHLDA1, and FGG. <h4>Conclusion</h4> We proposed the seven-gene-signature to predict COPD risk and explored its potential immune characteristics and regulatory mechanisms.
Project description:Summary Non-alcoholic fatty liver disease (NAFLD) is a leading cause of chronic liver disease worldwide. We performed network analysis to investigate the dysregulated biological processes in the disease progression and revealed the molecular mechanism underlying NAFLD. Based on network analysis, we identified a highly conserved disease-associated gene module across three different NAFLD cohorts and highlighted the predominant role of key transcriptional regulators associated with lipid and cholesterol metabolism. In addition, we revealed the detailed metabolic differences between heterogeneous NAFLD patients through integrative systems analysis of transcriptomic data and liver-specific genome-scale metabolic model. Furthermore, we identified transcription factors (TFs), including SREBF2, HNF4A, SREBF1, YY1, and KLF13, showing regulation of hepatic expression of genes in the NAFLD-associated modules and validated the TFs using data generated from a mouse NAFLD model. In conclusion, our integrative analysis facilitates the understanding of the regulatory mechanism of these perturbed TFs and their associated biological processes. Graphical abstract Highlights • Disease-associated gene modules are conserved across multiple NAFLD cohorts• The central genes in disease-associated modules are key enzymes in cholesterol synthesis• YY1 and KLF13 are potential key transcriptional regulators of NAFLD development Hepatology; Gene network; Systems biology
Project description:<h4>Background</h4>Site-specific transcription factors (TFs) are coordinators of developmental and physiological gene expression programs. Their binding to cis-regulatory modules of target genes mediates the precise cell- and context-specific activation and repression of genes. The expression of TFs should therefore reflect the core expression program of each cell.<h4>Results</h4>We studied the expression dynamics of about 750 TFs using the available genomics resources in Drosophila melanogaster. We find that 95% of these TFs are expressed at some point during embryonic development, with a peak roughly between 10 and 12 hours after egg laying, the core stages of organogenesis. We address the differential utilization of DNA-binding domains in different developmental programs systematically in a spatio-temporal context, and show that the zinc finger class of TFs is predominantly early expressed, while Homeobox TFs exhibit later expression in embryogenesis.<h4>Conclusions</h4>Previous work, dissecting cis-regulatory modules during Drosophila development, suggests that TFs are deployed in groups acting in a cooperative manner. In contrast, we find that there is rapid exchange of co-expressed partners amongst the fly TFs, at rates similar to the genome-wide dynamics of co-expression clusters. This suggests there may also be a high level of combinatorial complexity of TFs at cis-regulatory modules.
Project description:Prostate cancer is a global health issue. Usually, men with metastatic disease will progress to castration-resistant prostate cancer (CRPC). We aimed to identify the differentially expressed genes (DEGs) in tumor samples from non-castrated and castrated men from LNCaP Orthotopic xenograft models of prostate cancer and to study the mechanisms of CRPC.In this work, GSE46218 containing 4 samples from non-castrated men and 4 samples from castrated men was downloaded from Gene Expression Omnibus. We identified DEGs using limma Geoquery in R, the Robust Multi-array Average (RMA) method in Bioconductor, and Bias methods, followed by constructing an integrated regulatory network involving DEGs, miRNAs, and TFs using Cytoscape. Then, we analyzed network motifs of the integrated gene regulatory network using FANMOD. We selected regulatory modules corresponding to network motifs from the integrated regulatory network by Perl script. We preformed gene ontology (GO) and pathway enrichment analysis of DEGs in the regulatory modules using DAVID.We identified total 443 DEGs. We built an integrated regulatory network, found three motifs (motif 1, motif 2 and motif 3), and got two function modules (module 1 corresponded to motif 1, and module 2 corresponded to motif 2). Several GO terms (such as regulation of cell proliferation, positive regulation of macromolecule metabolic process, phosphorylation, and phosphorus metabolic process) and two pathways (pathway in cancer and Melanoma) were enriched. Furthermore, some significant DEGs (such as CAV1, LYN, FGFR3 and FGFR3) were related to CPRC development.These genes might play important roles in the development and progression of CRPC.
Project description:The combinatorial binding of trans-acting factors (TFs) to the DNA is critical to the spatial and temporal specificity of gene regulation. For certain regulatory regions, more than one regulatory module (set of TFs that bind together) are combined to achieve context-specific gene regulation. However, previous approaches are limited to either pairwise TF co-association analysis or assuming that only one module is used in each regulatory region.We present a new computational approach that models the modular organization of TF combinatorial binding. Our method learns compact and coherent regulatory modules from in vivo binding data using a topic model. We found that the binding of 115 TFs in K562 cells can be organized into 49 interpretable modules. Furthermore, we found that tens of thousands of regulatory regions use multiple modules, a structure that cannot be observed with previous hard clustering based methods. The modules discovered recapitulate many published protein-protein physical interactions, have consistent functional annotations of chromatin states, and uncover context specific co-binding such as gene proximal binding of NFY?+?FOS?+?SP and distal binding of NFY?+?FOS?+?USF. For certain TFs, the co-binding partners of direct binding (motif present) differs from those of indirect binding (motif absent); the distinct set of co-binding partners can predict whether the TF binds directly or indirectly with up to 95% accuracy. Joint analysis across two cell types reveals both cell-type-specific and shared regulatory modules.Our results provide comprehensive cell-type-specific combinatorial binding maps and suggest a modular organization of combinatorial binding.