Inferring transcriptional compensation interactions in yeast via stepwise structure equation modeling.
ABSTRACT: BACKGROUND: With the abundant information produced by microarray technology, various approaches have been proposed to infer transcriptional regulatory networks. However, few approaches have studied subtle and indirect interaction such as genetic compensation, the existence of which is widely recognized although its mechanism has yet to be clarified. Furthermore, when inferring gene networks most models include only observed variables whereas latent factors, such as proteins and mRNA degradation that are not measured by microarrays, do participate in networks in reality. RESULTS: Motivated by inferring transcriptional compensation (TC) interactions in yeast, a stepwise structural equation modeling algorithm (SSEM) is developed. In addition to observed variables, SSEM also incorporates hidden variables to capture interactions (or regulations) from latent factors. Simulated gene networks are used to determine with which of six possible model selection criteria (MSC) SSEM works best. SSEM with Bayesian information criterion (BIC) results in the highest true positive rates, the largest percentage of correctly predicted interactions from all existing interactions, and the highest true negative (non-existing interactions) rates. Next, we apply SSEM using real microarray data to infer TC interactions among (1) small groups of genes that are synthetic sick or lethal (SSL) to SGS1, and (2) a group of SSL pairs of 51 yeast genes involved in DNA synthesis and repair that are of interest. For (1), SSEM with BIC is shown to outperform three Bayesian network algorithms and a multivariate autoregressive model, checked against the results of qRT-PCR experiments. The predictions for (2) are shown to coincide with several known pathways of Sgs1 and its partners that are involved in DNA replication, recombination and repair. In addition, experimentally testable interactions of Rad27 are predicted. CONCLUSION: SSEM is a useful tool for inferring genetic networks, and the results reinforce the possibility of predicting pathways of protein complexes via genetic interactions.
Project description:Genetic interactions define overlapping functions and compensatory pathways. In particular, synthetic sick or lethal (SSL) genetic interactions are important for understanding how an organism tolerates random mutation, i.e., genetic robustness. Comprehensive identification of SSL relationships remains far from complete in any organism, because mapping these networks is highly labor intensive. The ability to predict SSL interactions, however, could efficiently guide further SSL discovery. Toward this end, we predicted pairs of SSL genes in Saccharomyces cerevisiae by using probabilistic decision trees to integrate multiple types of data, including localization, mRNA expression, physical interaction, protein function, and characteristics of network topology. Experimental evidence demonstrated the reliability of this strategy, which, when extended to human SSL interactions, may prove valuable in discovering drug targets for cancer therapy and in identifying genes responsible for multigenic diseases.
Project description:Genome-wide microarrays have been useful for predicting chemical-genetic interactions at the gene level. However, interpreting genome-wide microarray results can be overwhelming due to the vast output of gene expression data combined with off-target transcriptional responses many times induced by a drug treatment. This study demonstrates how experimental and computational methods can interact with each other, to arrive at more accurate predictions of drug-induced perturbations. We present a two-stage strategy that links microarray experimental testing and network training conditions to predict gene perturbations for a drug with a known mechanism of action in a well-studied organism.S. cerevisiae cells were treated with the antifungal, fluconazole, and expression profiling was conducted under different biological conditions using Affymetrix genome-wide microarrays. Transcripts were filtered with a formal network-based method, sparse simultaneous equation models and Lasso regression (SSEM-Lasso), under different network training conditions. Gene expression results were evaluated using both gene set and single gene target analyses, and the drug's transcriptional effects were narrowed first by pathway and then by individual genes. Variables included: (i) Testing conditions--exposure time and concentration and (ii) Network training conditions--training compendium modifications. Two analyses of SSEM-Lasso output--gene set and single gene--were conducted to gain a better understanding of how SSEM-Lasso predicts perturbation targets.This study demonstrates that genome-wide microarrays can be optimized using a two-stage strategy for a more in-depth understanding of how a cell manifests biological reactions to a drug treatment at the transcription level. Additionally, a more detailed understanding of how the statistical model, SSEM-Lasso, propagates perturbations through a network of gene regulatory interactions is achieved.
Project description:Ultraviolet radiation is an important etiologic factor in skin cancer and a better understanding of how solar stimulated light (SSL) affects signal transduction pathways in human skin which is needed in further understanding activated networks that could be targeted for skin cancer prevention. We utilized Reverse Phase Protein Microarray Analysis (RPPA), a powerful technology that allows for broad-scale and quantitative measurement of the activation/phosphorylation state of hundreds of key signaling proteins and protein pathways in sun-protected skin after an acute dose of two minimal erythema dose (MED) of SSL. RPPA analysis was used to map the altered cell signaling networks resulting from acute doses of solar simulated radiation (SSL). To that end, we exposed sun-protected skin in volunteers to acute doses of two MED of SSL and collected biopsies pre-SSL and post-SSL irradiation. Frozen biopsies were subjected to laser capture microdissection (LCM) and then assessed by RPPA. The activation/phosphorylation or total levels of 128 key signaling proteins and drug targets were selected for statistical analysis. Coordinate network-based analysis was performed on specific signaling pathways that included the PI3k/Akt/mTOR and Ras/Raf/MEK/ERK pathways. Overall, we found early and sustained activation of the PI3K-AKT-mTOR and MAPK pathways. Cell death and apoptosis-related proteins were activated at 5 and 24 h. Ultimately, expression profile patterns of phosphorylated proteins in the epidermal growth factor receptor (EGFR), AKT, mTOR, and other relevant pathways may be used to determine pharmacodynamic activity of new and selective topical chemoprevention agents administered in a test area exposed to SSL to determine drug-induced attenuation or reversal of skin carcinogenesis pathways.
Project description:Accurate repair of DNA breaks is essential to maintain genome integrity and cellular fitness. Sgs1, the sole member of the RecQ family of DNA helicases in Saccharomyces cerevisiae, is important for both early and late stages of homology-dependent repair. Its large number of physical and genetic interactions with DNA recombination, repair, and replication factors has established Sgs1 as a key player in the maintenance of genome integrity. To determine the significance of Sgs1 binding to the strand-exchange factor Rad51, we have identified a single amino acid change at the C-terminal of the helicase core of Sgs1 that disrupts Rad51 binding. In contrast to an SGS1 deletion or a helicase-defective sgs1 allele, this new separation-of-function allele, sgs1-FD, does not cause DNA damage hypersensitivity or genome instability, but exhibits negative and positive genetic interactions with sae2?, mre11?, exo1?, srs2?, rrm3?, and pol32? that are distinct from those of known sgs1 mutants. Our findings suggest that the Sgs1-Rad51 interaction stimulates homologous recombination (HR). However, unlike sgs1 mutations, which impair the resection of DNA double-strand ends, negative genetic interactions of the sgs1-FD allele are not suppressed by YKU70 deletion. We propose that the Sgs1-Rad51 interaction stimulates HR by facilitating the formation of the presynaptic Rad51 filament, possibly by Sgs1 competing with single-stranded DNA for replication protein A binding during resection.
Project description:Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer's disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named "FastGGM".
Project description:Ecological communities are characterized by complex networks of trophic and nontrophic interactions, which shape the dy-namics of the community. Machine learning and correlational methods are increasingly popular for inferring networks from co-occurrence and time series data, particularly in microbial systems. In this study, we test the suitability of these methods for inferring ecological interactions by constructing networks using Dynamic Bayesian Networks, Lasso regression, and Pear-son's correlation coefficient, then comparing the model networks to empirical trophic and nontrophic webs in two ecological systems. We find that although each model significantly replicates the structure of at least one empirical network, no model significantly predicts network structure in both systems, and no model is clearly superior to the others. We also find that networks inferred for the Tatoosh intertidal match the nontrophic network much more closely than the trophic one, possibly due to the challenges of identifying trophic interactions from presence-absence data. Our findings suggest that although these methods hold some promise for ecological network inference, presence-absence data does not provide enough signal for models to consistently identify interactions, and networks inferred from these data should be interpreted with caution.
Project description:Neural networks trained by backpropagation have achieved tremendous successes on numerous intelligent tasks. However, naïve gradient-based training and updating methods on memristors impede applications due to intrinsic material properties. Here, we built a 39?nm 1?Gb phase change memory (PCM) memristor array and quantified the unique resistance drift effect. On this basis, spontaneous sparse learning (SSL) scheme that leverages the resistance drift to improve PCM-based memristor network training is developed. During training, SSL regards the drift effect as spontaneous consistency-based distillation process that reinforces the array weights at the high-resistance state continuously unless the gradient-based method switches them to low resistance. Experiments show that the SSL not only helps the convergence of network with better performance and sparsity controllability without additional computation in handwritten digit classification. This work promotes the learning algorithms with the intrinsic properties of memristor devices, opening a new direction for development of neuromorphic computing chips.
Project description:Tangeretin is a member of the polymethoxyflavones in citrus peel, which have been of interest due to their anti-atherogenic, anti-inflammatory, and anti-carcinogenic properties. However, their detailed target pathways have not been fully discovered. We applied the notion of chemical-genetic and genetic interactions in budding yeast, screening the putative targeting pathways of tangeretin, and we found that sgs1? yeast was sensitive to tangeretin treatment. Therefore, we applied this microarray test to investigate the potential pathways influenced by tangeretin in sgs1. The gene expression profiles of WT and sgs1? mutated yeast under tangeretin treatment were compared. The gene expression profiles of S. cerevisiae WT and sgs1? strains under 30 ?M tangeretin treatment for 4 hours were compared. Triplicated tests were made for each sample (tangeretin-treated WT, control-treated WT, tangeretin-treated sgs1?, control-treated sgs1?).
Project description:There has been great interest in developing nonlinear structural equation models and associated statistical inference procedures, including estimation and model selection methods. In this paper a general semiparametric structural equation model (SSEM) is developed in which the structural equation is composed of nonparametric functions of exogenous latent variables and fixed covariates on a set of latent endogenous variables. A basis representation is used to approximate these nonparametric functions in the structural equation and the Bayesian Lasso method coupled with a Markov Chain Monte Carlo (MCMC) algorithm is used for simultaneous estimation and model selection. The proposed method is illustrated using a simulation study and data from the Affective Dynamics and Individual Differences (ADID) study. Results demonstrate that our method can accurately estimate the unknown parameters and correctly identify the true underlying model.
Project description:A major goal in systems biology is a comprehensive description of the entirety of all complex interactions between different types of biomolecules-also referred to as the interactome-and how these interactions give rise to higher, cellular and organism level functions or diseases. Numerous efforts have been undertaken to define such interactomes experimentally, for example yeast-two-hybrid based protein-protein interaction networks or ChIP-seq based protein-DNA interactions for individual proteins. To complement these direct measurements, genome-scale quantitative multi-omics data (transcriptomics, proteomics, metabolomics, etc.) enable researchers to predict novel functional interactions between molecular species. Moreover, these data allow to distinguish relevant functional from non-functional interactions in specific biological contexts. However, integration of multi-omics data is not straight forward due to their heterogeneity. Numerous methods for the inference of interaction networks from homogeneous functional data exist, but with the advent of large-scale paired multi-omics data a new class of methods for inferring comprehensive networks across different molecular species began to emerge. Here we review state-of-the-art techniques for inferring the topology of interaction networks from functional multi-omics data, encompassing graphical models with multiple node types and quantitative-trait-loci (QTL) based approaches. In addition, we will discuss Bayesian aspects of network inference, which allow for leveraging already established biological information such as known protein-protein or protein-DNA interactions, to guide the inference process.