Robust de novo pathway enrichment with KeyPathwayMiner 5.
ABSTRACT: Identifying functional modules or novel active pathways, recently termed de novo pathway enrichment, is a computational systems biology challenge that has gained much attention during the last decade. Given a large biological interaction network, KeyPathwayMiner extracts connected subnetworks that are enriched for differentially active entities from a series of molecular profiles encoded as binary indicator matrices. Since interaction networks constantly evolve, an important question is how robust the extracted results are when the network is modified. We enable users to study this effect through several network perturbation techniques and over a range of perturbation degrees. In addition, users may now provide a gold-standard set to determine how enriched extracted pathways are with relevant genes compared to randomized versions of the original network.
Project description:Triple?negative breast cancer (TNBC) is a heterogeneous disease characterized by an aggressive phenotype and reduced survival. The aim of the present study was to investigate the molecular mechanisms involved in the carcinogenesis of TNBC and to identify novel target molecules for therapy. The differentially expressed genes (DEGs) in TNBC and normal adjacent tissue were assessed by analyzing the GSE41970 microarray data using Qlucore Omics Explorer, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes. Pathway enrichment analyses for DEGs were performed using the Database for Annotation, Visualization and Integrated Discovery online resource. A protein?protein interaction (PPI) network was constructed using Search Tool for the Retrieval of Interacting Genes, and subnetworks were analyzed by ClusterONE. The PPI network and subnetworks were visualized using Cytoscape software. A total of 121 DEGs were obtained, of which 101 were upregulated and 20 were downregulated. The upregulated DEGs were significantly enriched in 14 pathways and 83 GO biological processes, while the downregulated DEGs were significantly enriched in 18 GO biological processes. The PPI network with 118 nodes and 1,264 edges was constructed and three subnetworks were extracted from the entire network. The significant hub DEGs with high degrees were identified, including TP53, glyceraldehyde?3?phosphate dehydrogenase, cyclin D1, HRAS and proliferating cell nuclear antigen, which were predominantly enriched in the cell cycle pathway and pathways in cancer. A number of critical genes and pathways were revealed to be associated with TNBC. The present study may provide an improved understanding of the pathogenesis of TNBC and contribute to the development of therapeutic targets for TNBC.
Project description:Hepatitis C virus infection is one of the most common and chronic in the world, and hepatitis associated with HCV infection is a major risk factor for the development of cirrhosis and hepatocellular carcinoma (HCC). The rapidly growing number of viral-host and host protein-protein interactions is enabling more and more reliable network-based analyses of viral infection supported by omics data. The study of molecular interaction networks helps to elucidate the mechanistic pathways linking HCV molecular activities and the host response that modulates the stepwise hepatocarcinogenic process from preneoplastic lesions (cirrhosis and dysplasia) to HCC. Simulating the impact of HCV-host molecular interactions throughout the host protein-protein interaction (PPI) network, we ranked the host proteins in relation to their network proximity to viral targets. We observed that the set of proteins in the neighborhood of HCV targets in the host interactome is enriched in key players of the host response to HCV infection. In opposition to HCV targets, subnetworks of proteins in network proximity to HCV targets are significantly enriched in proteins reported as differentially expressed in preneoplastic and neoplastic liver samples by two independent studies. Using multi-objective optimization, we extracted subnetworks that are simultaneously "guilt-by-association" with HCV proteins and enriched in proteins differentially expressed. These subnetworks contain established, recently proposed and novel candidate proteins for the regulation of the mechanisms of liver cells response to chronic HCV infection.
Project description:MOTIVATION:Functional enrichment testing methods can reduce data comprising hundreds of altered biomolecules to smaller sets of altered biological 'concepts' that help generate testable hypotheses. This study leveraged differential network enrichment analysis methodology to identify and validate lipid subnetworks that potentially differentiate chronic kidney disease (CKD) by severity or progression. RESULTS:We built a partial correlation interaction network, identified highly connected network components, applied network-based gene-set analysis to identify differentially enriched subnetworks, and compared the subnetworks in patients with early-stage versus late-stage CKD. We identified two subnetworks 'triacylglycerols' and 'cardiolipins-phosphatidylethanolamines (CL-PE)' characterized by lower connectivity, and a higher abundance of longer polyunsaturated triacylglycerols in patients with severe CKD (stage ?4) from the Clinical Phenotyping Resource and Biobank Core. These finding were replicated in an independent cohort, the Chronic Renal Insufficiency Cohort. Using an innovative method for elucidating biological alterations in lipid networks, we demonstrated alterations in triacylglycerols and cardiolipins-phosphatidylethanolamines that precede the clinical outcome of end-stage kidney disease by several years. AVAILABILITY AND IMPLEMENTATION:A complete list of NetGSA results in HTML format can be found at http://metscape.ncibi.org/netgsa/12345-022118/cric_cprobe/022118/results_cric_cprobe/main.html. The DNEA is freely available at https://github.com/wiggie/DNEA. Java wrapper leveraging the cytoscape.js framework is available at http://js.cytoscape.org. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:In silico approaches are increasingly considered to improve breast cancer treatment. One of these treatments, neoadjuvant TFAC chemotherapy, is used in cases where application of preoperative systemic therapy is indicated. Estimating response to treatment allows or improves clinical decision-making and this, in turn, may be based on a good understanding of the underlying molecular mechanisms. Ever increasing amounts of high throughput data become available for integration into functional networks. In this study, we applied our software tool ExprEssence to identify specific mechanisms relevant for TFAC therapy response, from a gene/protein interaction network. We contrasted the resulting active subnetwork to the subnetworks of two other such methods, OptDis and KeyPathwayMiner. We could show that the ExprEssence subnetwork is more related to the mechanistic functional principles of TFAC therapy than the subnetworks of the other two methods despite the simplicity of ExprEssence. We were able to validate our method by recovering known mechanisms and as an application example of our method, we identified a mechanism that may further explain the synergism between paclitaxel and doxorubicin in TFAC treatment: Paclitaxel may attenuate MELK gene expression, resulting in lower levels of its target MYBL2, already associated with doxorubicin synergism in hepatocellular carcinoma cell lines. We tested our hypothesis in three breast cancer cell lines, confirming it in part. In particular, the predicted effect on MYBL2 could be validated, and a synergistic effect of paclitaxel and doxorubicin could be demonstrated in the breast cancer cell lines SKBR3 and MCF-7.
Project description:Genes are organized in functional modules (or pathways), thus their action and their dysregulation in diseases may be better understood by the identification of the modules most affected by the disease (aka disease modules, or active subnetworks). We describe how an algorithm based on the Core&Peel method is used to detect disease modules in co-expression networks of genes. We first validate Core&Peel for the general task of functional module detection by comparison with 42 methods participating in the Disease Module Identification DREAM challenge. Next, we use four specific disease test cases (colorectal cancer, prostate cancer, asthma, and rheumatoid arthritis), four state-of-the-art algorithms (ModuleDiscoverer, Degas, KeyPathwayMiner, and ClustEx), and several pathway databases to validate the proposed algorithm. Core&Peel is the only method able to find significant associations of the predicted disease module with known validated relevant pathways for all four diseases. Moreover, for the two cancer datasets, Core&Peel detects further eight relevant pathways not discovered by the other methods used in the comparative analysis. Finally, we apply Core&Peel and other methods to explore the transcriptional response of human cells to SARS-CoV-2 infection, finding supporting evidence for drug repositioning efforts at a pre-clinical level.
Project description:Recent studies showed that somatic cancer mutations target genes that are in specific signaling and cellular pathways. However, in each patient only a few of the pathway genes are mutated. Current approaches consider only existing pathways and ignore the topology of the pathways. For this reason, new efforts have been focused on identifying significantly mutated subnetworks and associating them with cancer characteristics. We applied two well-established network analysis approaches to identify significantly mutated subnetworks in the breast cancer genome. We took network topology into account for measuring the mutation similarity of a gene-pair to allow us to infer the significantly mutated subnetworks. Our goals are to evaluate whether the identified subnetworks can be used as biomarkers for predicting breast cancer patient survival and provide the potential mechanisms of the pathways enriched in the subnetworks, with the aim of improving breast cancer treatment. Using the copy number alteration (CNA) datasets from the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) study, we identified a significantly mutated yet clinically and functionally relevant subnetwork using two graph-based clustering algorithms. The mutational pattern of the subnetwork is significantly associated with breast cancer survival. The genes in the subnetwork are significantly enriched in retinol metabolism KEGG pathway. Our results show that breast cancer treatment with retinoids may be a potential personalized therapy for breast cancer patients since the CNA patterns of the breast cancer patients can imply whether the retinoids pathway is altered. We also showed that applying multiple bioinformatics algorithms at the same time has the potential to identify new network-based biomarkers, which may be useful for stratifying cancer patients for choosing optimal treatments.
Project description:Online health communities (OHCs) provide a convenient and commonly used way for people to connect around shared health experiences, exchange information, and receive social support. Users often interact with peers via multiple communication methods, forming a multirelational social network. Use of OHCs is common among smokers, but to date, there have been no studies on users' online interactions via different means of online communications and how such interactions are related to smoking cessation. Such information can be retrieved in multirelational social networks and could be useful in the design and management of OHCs.To examine the social network structure of an OHC for smoking cessation using a multirelational approach, and to explore links between subnetwork position (ie, centrality) and smoking abstinence.We used NetworkX to construct 4 subnetworks based on users' interactions via blogs, group discussions, message boards, and private messages. We illustrated topological properties of each subnetwork, including its degree distribution, density, and connectedness, and compared similarities among these subnetworks by correlating node centrality and measuring edge overlap. We also investigated coevolution dynamics of this multirelational network by analyzing tie formation sequences across subnetworks. In a subset of users who participated in a randomized, smoking cessation treatment trial, we conducted user profiling based on users' centralities in the 4 subnetworks and identified user groups using clustering techniques. We further examined 30-day smoking abstinence at 3 months postenrollment in relation to users' centralities in the 4 subnetworks.The 4 subnetworks have different topological characteristics, with message board having the most nodes (36,536) and group discussion having the highest network density (4.35×10(-3)). Blog and message board subnetworks had the most similar structures with an in-degree correlation of .45, out-degree correlation of .55, and Jaccard coefficient of .23 for edge overlap. A new tie in the group discussion subnetwork had the lowest probability of triggering subsequent ties among the same two users in other subnetworks: 6.33% (54,142/855,893) for 2-tie sequences and 2.13% (18,207/855,893) for 3-tie sequences. Users' centralities varied across the 4 subnetworks. Among a subset of users enrolled in a randomized trial, those with higher centralities across subnetworks generally had higher abstinence rates, although high centrality in the group discussion subnetwork was not associated with higher abstinence rates.A multirelational approach revealed insights that could not be obtained by analyzing the aggregated network alone, such as the ineffectiveness of group discussions in triggering social ties of other types, the advantage of blogs, message boards, and private messages in leading to subsequent social ties of other types, and the weak connection between one's centrality in the group discussion subnetwork and smoking abstinence. These insights have implications for the design and management of online social networks for smoking cessation.
Project description:Overlaying differential changes in gene expression on protein interaction networks has proven to be a useful approach to interpreting the cell's dynamic response to a changing environment. Despite successes in finding active subnetworks in the context of a single species, the idea of overlaying lists of differentially expressed genes on networks has not yet been extended to support the analysis of multiple species' interaction networks. To address this problem, we designed a scalable, cross-species network search algorithm, neXus (Network-cross(X)-species-Search), that discovers conserved, active subnetworks based on parallel differential expression studies in multiple species. Our approach leverages functional linkage networks, which provide more comprehensive coverage of functional relationships than physical interaction networks by combining heterogeneous types of genomic data. We applied our cross-species approach to identify conserved modules that are differentially active in stem cells relative to differentiated cells based on parallel gene expression studies and functional linkage networks from mouse and human. We find hundreds of conserved active subnetworks enriched for stem cell-associated functions such as cell cycle, DNA repair, and chromatin modification processes. Using a variation of this approach, we also find a number of species-specific networks, which likely reflect mechanisms of stem cell function that have diverged between mouse and human. We assess the statistical significance of the subnetworks by comparing them with subnetworks discovered on random permutations of the differential expression data. We also describe several case examples that illustrate the utility of comparative analysis of active subnetworks.
Project description:Radiotherapy is mainly a traditional treatment for breast cancer; however, the key genes and pathways in breast cancer associated with irradiation are not clear. In this study, we aimed to explore the messenger RNA expression changes between preradiation and postradiation breast cancer. The gene expression data set (GSE59733) was downloaded from Gene Expression Omnibus database. According to |log2FC (fold change) | ? 1 and with false discovery rate adjusted P value <.05, differentially expressed genes (DEGs) were screened and annotated by R programming software. The protein-protein interaction (PPI) network was conducted through STRING database, and subnetworks and hub genes were extracted by plug-in in Cytoscape. A total of 82 DEGs (74 upregulated and 8 downregulated genes) were identified. These DEGs mainly enriched in an intrinsic apoptotic signaling pathway and G-protein-coupled receptor binding. What's more, tumor necrosis factor signaling pathway and interleukin 17 signaling pathway abnormally activated in postradiation tumor samples. Two characteristic subnetworks and 3 hub genes (FOS, CCL2, and CXCL12) were strongly distinguished in PPI network. Moreover, the expression level of the hub genes was confirmed in irradiated MCF-7 cell and SUM-159 cell using quantitative real-time polymerase chain reaction assay. These findings imply that these hub genes may play momentous function in breast cancer to irradiation.
Project description:Untargeted metabolomics using high-resolution liquid chromatography-mass spectrometry (LC-MS) is becoming one of the major areas of high-throughput biology. Functional analysis, that is, analyzing the data based on metabolic pathways or the genome-scale metabolic network, is critical in feature selection and interpretation of metabolomics data. One of the main challenges in the functional analyses is the lack of the feature identity in the LC-MS data itself. By matching mass-to-charge ratio (m/z) values of the features to theoretical values derived from known metabolites, some features can be matched to one or more known metabolites. When multiple matchings occur, in most cases only one of the matchings can be true. At the same time, some known metabolites are missing in the measurements. Current network/pathway analysis methods ignore the uncertainty in metabolite identification and the missing observations, which could lead to errors in the selection of significant subnetworks/pathways. In this paper, we propose a flexible network feature selection framework that combines metabolomics data with the genome-scale metabolic network. The method adopts a sequential feature screening procedure and machine learning-based criteria to select important subnetworks and identify the optimal feature matching simultaneously. Simulation studies show that the proposed method has a much higher sensitivity than the commonly used maximal matching approach. For demonstration, we apply the method on a cohort of healthy subjects to detect subnetworks associated with the body mass index (BMI). The method identifies several subnetworks that are supported by the current literature, as well as detects some subnetworks with plausible new functional implications. The R code is available at http://web1.sph.emory.edu/users/tyu8/MSS.