Project description:Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.
Project description:Systemic Sclerosis (SSc) is an autoimmune disease associated with changes in the skin's structure in which the immune system attacks the body. A recent meta-analysis has reported a high incidence of cancer prognosis including lung cancer (LC), leukemia (LK), and lymphoma (LP) in patients with SSc as comorbidity but its underlying mechanistic details are yet to be revealed. To address this research gap, bioinformatics methodologies were developed to explore the comorbidity interactions between a pair of diseases. Firstly, appropriate gene expression datasets from different repositories on SSc and its comorbidities were collected. Then the interconnection between SSc and its cancer comorbidities was identified by applying the developed pipelines. The pipeline was designed as a generic workflow to demonstrate a premise comorbid condition that integrate regarding gene expression data, tissue/organ meta-data, Gene Ontology (GO), Molecular pathways, and other online resources, and analyze them with Gene Set Enrichment Analysis (GSEA), Pathway enrichment and Semantic Similarity (SS). The pipeline was implemented in R and can be accessed through our Github repository: https://github.com/hiddenntreasure/comorbidity. Our result suggests that SSc and its cancer comorbidities share differentially expressed genes, functional terms (gene ontology), and pathways. The findings have led to a better understanding of disease pathways and our developed methodologies may be applied to any set of diseases for finding any association between them. This research may be used by physicians, researchers, biologists, and others.
Project description:Identifying the genes and proteins associated with a biological process or disease is a central goal of the biomedical research enterprise. However, relatively few systematic approaches are available that provide objective evaluation of the genes or proteins known to be important to a research topic, and hence researchers often rely on subjective evaluation of domain experts and laborious manual literature review. Computational bibliometric analysis, in conjunction with text mining and data curation, attempts to automate this process and return prioritized proteins in any given research topic. We describe here a method to identify and rank protein-topic relationships by calculating the semantic similarity between a protein and a query term in the biomerical literature while adjusting for the impact and immediacy of associated research articles. We term the calculated metric the weighted copublication distance (WCD) and show that it compares well to related approaches in predicting benchmark protein lists in multiple biological processes. We used WCD to extract prioritized "popular proteins" across multiple cell types, subanatomical regions, and standardized vocabularies containing over 20 000 human disease terms. The collection of protein-disease associations across the resulting human "diseasome" supports data analytical workflows to perform reverse protein-to-disease queries and functional annotation of experimental protein lists. We envision that the described improvement to the popular proteins strategy will be useful for annotating protein lists and guiding method development efforts as well as generating new hypotheses on understudied disease proteins using bibliometric information.
Project description:The extracellular matrix (ECM) is earning an increasingly relevant role in many disease states and aging. The analysis of these disease states is possible with the GWAS and PheWAS methodologies, and through our analysis, we aimed to explore the relationships between polymorphisms in the compendium of ECM genes (i.e., matrisome genes) in various disease states. A significant contribution on the part of ECM polymorphisms is evident in various types of disease, particularly those in the core-matrisome genes. Our results confirm previous links to connective-tissue disorders but also unearth new and underexplored relationships with neurological, psychiatric, and age-related disease states. Through our analysis of the drug indications for gene-disease relationships, we identify numerous targets that may be repurposed for age-related pathologies. The identification of ECM polymorphisms and their contributions to disease will play an integral role in future therapeutic developments, drug repurposing, precision medicine, and personalized care.
Project description:BackgroundDetecting epistatic interactions plays a significant role in improving pathogenesis, prevention, diagnosis, and treatment of complex human diseases. Applying machine learning or statistical methods to epistatic interaction detection will encounter some common problems, e.g., very limited number of samples, an extremely high search space, a large number of false positives, and ways to measure the association between disease markers and the phenotype.ResultsTo address the problems of computational methods in epistatic interaction detection, we propose a score-based Bayesian network structure learning method, EpiBN, to detect epistatic interactions. We apply the proposed method to both simulated datasets and three real disease datasets. Experimental results on simulation data show that our method outperforms some other commonly-used methods in terms of power and sample-efficiency, and is especially suitable for detecting epistatic interactions with weak or no marginal effects. Furthermore, our method is scalable to real disease data.ConclusionsWe propose a Bayesian network-based method, EpiBN, to detect epistatic interactions. In EpiBN, we develop a new scoring function, which can reflect higher-order epistatic interactions by estimating the model complexity from data, and apply a fast Branch-and-Bound algorithm to learn the structure of a two-layer Bayesian network containing only one target node. To make our method scalable to real data, we propose the use of a Markov chain Monte Carlo (MCMC) method to perform the screening process. Applications of the proposed method to some real GWAS (genome-wide association studies) datasets may provide helpful insights into understanding the genetic basis of Age-related Macular Degeneration, late-onset Alzheimer's disease, and autism.
Project description:The apolipoprotein E (APOE) genotype is the major genetic risk factor for Alzheimer's disease (AD). We have access to cerebrospinal fluid (CSF) and plasma APOE protein levels from 641 individuals and genome-wide genotyped data from 570 of these samples. The aim of this study was to test whether CSF or plasma APOE levels could be a useful endophenotype for AD and to identify genetic variants associated with APOE levels. We found that CSF (P = 8.15 × 10(-4)) but not plasma (P = 0.071) APOE protein levels are significantly associated with CSF A?(42) levels. We used Mendelian randomization and genetic variants as instrumental variables to confirm that the association of CSF APOE with CSF A?(42) levels and clinical dementia rating (CDR) is not because of a reverse causation or confounding effect. In addition the association of CSF APOE with A?(42) levels was independent of the APOE ?4 genotype, suggesting that APOE levels in CSF may be a useful endophenotype for AD. We performed a genome-wide association study to identify genetic variants associated with CSF APOE levels: the APOE ?4 genotype was the strongest single-genetic factor associated with CSF APOE protein levels (P = 6.9 × 10(-13)). In aggregate, the Illumina chip single nucleotide polymorphisms explain 72% of the variability in CSF APOE protein levels, whereas the APOE ?4 genotype alone explains 8% of the variability. No other genetic variant reached the genome-wide significance threshold, but nine additional variants exhibited a P-value <10(-6). Pathway mining analysis indicated that these nine additional loci are involved in lipid metabolism (P = 4.49 × 10(-9)).
Project description:Prediction and confirmation of the presence of disease-related miRNAs is beneficial to understand disease mechanisms at the miRNA level. However, the use of experimental verification to identify disease-related miRNAs is expensive and time-consuming. Effective computational approaches used to predict miRNA-disease associations are highly specific. In this study, we develop the Network Consistency Projection for miRNA-Disease Associations (NCPMDA) method to reveal the potential associations between miRNAs and diseases. NCPMDA is a non-parametric universal network-based method that can simultaneously predict miRNA-disease associations in all diseases but does not require negative samples. NCPMDA can also confirm the presence of miRNAs in isolated diseases (diseases without any known miRNA association). Leave-one-out cross validation and case studies have shown that the predictive performance of NCPMDA is superior over that of previous method.
Project description:Genetic architecture of plasma lipidome provides insights into regulation of lipid metabolism and related diseases. We applied an unsupervised machine learning method, PGMRA, to discover phenotype-genotype many-to-many relations between genotype and plasma lipidome (phenotype) in order to identify the genetic architecture of plasma lipidome profiled from 1,426 Finnish individuals aged 30-45 years. PGMRA involves biclustering genotype and lipidome data independently followed by their inter-domain integration based on hypergeometric tests of the number of shared individuals. Pathway enrichment analysis was performed on the SNP sets to identify their associated biological processes. We identified 93 statistically significant (hypergeometric p-value < 0.01) lipidome-genotype relations. Genotype biclusters in these 93 relations contained 5977 SNPs across 3164 genes. Twenty nine of the 93 relations contained genotype biclusters with more than 50% unique SNPs and participants, thus representing most distinct subgroups. We identified 30 significantly enriched biological processes among the SNPs involved in 21 of these 29 most distinct genotype-lipidome subgroups through which the identified genetic variants can influence and regulate plasma lipid related metabolism and profiles. This study identified 29 distinct genotype-lipidome subgroups in the studied Finnish population that may have distinct disease trajectories and therefore could be useful in precision medicine research.
Project description:Cardiomyopathies are progressive disease conditions that give rise to an abnormal heart phenotype and are a leading cause of heart failures in the general population. These are complex diseases that show co-morbidity with other diseases. The molecular interaction network in the localised disease neighbourhood is an important step toward deciphering molecular mechanisms underlying these complex conditions. In this pursuit, we employed network medicine techniques to systematically investigate cardiomyopathy's genetic interplay with other diseases and uncover the molecular players underlying these associations. We predicted a set of candidate genes in cardiomyopathy by exploring the DIAMOnD algorithm on the human interactome. We next revealed how these candidate genes form association across different diseases and highlighted the predominant association with brain, cancer and metabolic diseases. Through integrative systems analysis of molecular pathways, heart-specific mouse knockout data and disease tissue-specific transcriptomic data, we screened and ascertained prominent candidates that show abnormal heart phenotype, including NOS3, MMP2 and SIRT1. Our computational analysis broadens the understanding of the genetic associations of cardiomyopathies with other diseases and holds great potential in cardiomyopathy research.