OSCA: a tool for omic-data-based complex trait analysis.
ABSTRACT: The rapid increase of omic data has greatly facilitated the investigation of associations between omic profiles such as DNA methylation (DNAm) and complex traits in large cohorts. Here, we propose a mixed-linear-model-based method called MOMENT that tests for association between a DNAm probe and trait with all other distal probes fitted in multiple random-effect components to account for unobserved confounders. We demonstrate by simulations that MOMENT shows a lower false positive rate and more robustness than existing methods. MOMENT has been implemented in a versatile software package called OSCA together with a number of other implementations for omic-data-based analyses.
Project description:Recent advances in experimental biology allow creation of datasets where several genome-wide data types (called omics) are measured per sample. Integrative analysis of multi-omic datasets in general, and clustering of samples in such datasets specifically, can improve our understanding of biological processes and discover different disease subtypes. In this work we present MONET (Multi Omic clustering by Non-Exhaustive Types), which presents a unique approach to multi-omic clustering. MONET discovers modules of similar samples, such that each module is allowed to have a clustering structure for only a subset of the omics. This approach differs from most existent multi-omic clustering algorithms, which assume a common structure across all omics, and from several recent algorithms that model distinct cluster structures. We tested MONET extensively on simulated data, on an image dataset, and on ten multi-omic cancer datasets from TCGA. Our analysis shows that MONET compares favorably with other multi-omic clustering methods. We demonstrate MONET's biological and clinical relevance by analyzing its results for Ovarian Serous Cystadenocarcinoma. We also show that MONET is robust to missing data, can cluster genes in multi-omic dataset, and reveal modules of cell types in single-cell multi-omic data. Our work shows that MONET is a valuable tool that can provide complementary results to those provided by existent algorithms for multi-omic analysis.
Project description:Mechanically activated (MA) ion channels convert physical forces into electrical signals, and are essential for eukaryotic physiology. Despite their importance, few bona-fide MA channels have been described in plants and animals. Here, we show that various members of the OSCA and TMEM63 family of proteins from plants, flies, and mammals confer mechanosensitivity to naïve cells. We conclusively demonstrate that OSCA1.2, one of the Arabidopsis thaliana OSCA proteins, is an inherently mechanosensitive, pore-forming ion channel. Our results suggest that OSCA/TMEM63 proteins are the largest family of MA ion channels identified, and are conserved across eukaryotes. Our findings will enable studies to gain deep insight into molecular mechanisms of MA channel gating, and will facilitate a better understanding of mechanosensory processes in vivo across plants and animals.
Project description:BACKGROUND:Reception of and response to exogenous and endogenous osmotic changes is important to sustain plant growth and development, as well as reproductive formation. Hyperosmolality-gated calcium-permeable channels (OSCA) were first characterised as an osmosensor in Arabidopsis and are involved in the perception of extracellular changes to trigger hyperosmolality-induced [Ca(2+)]i increases (OICI). To explore the potential biological functions of OSCAs in rice, we performed a bioinformatics and expression analysis of the OsOSCA gene family. RESULTS:A total of 11 OsOSCA genes were identified from the genome database of Oryza sativa L. Japonica. Based on their sequence composition and phylogenetic relationship, the OsOSCA family was classified into four clades. Gene and protein structure analysis indicated that the 11 OsOSCAs shared similar structures with their homologs in Oryza sativa L. ssp. Indica, Oryza glaberrima, and Oryza brachyantha. Multiple sequence alignment analysis revealed a conserved DUF221 domain in these members, in which the first three TMs were conserved, while the others were not. The expression profiles of OsOSCA genes were analysed at different stages of vegetative growth, reproductive development, and under osmotic-associated abiotic stresses. We found that four and six OsOSCA genes showed a clear correlation between the expression profile and osmotic changes during caryopsis development and seed imbibition, respectively. Orchestrated transcription of three OsOSCAs was strongly associated with the circadian clock. Moreover, osmotic-related abiotic stress differentially induced the expression of 10 genes. CONCLUSION:The entire OSCA family is characterised by the presence of a conserved DUF221 domain, which functions as an osmotic-sensing calcium channel. The phylogenetic tree of OSCA genes showed that two subspecies of cultivated rice, Oryza sativa L. ssp. Japonica and Oryza sativa L. ssp. Indica, are more closely related than wild rice Oryza glaberrima, while Oryza brachyantha was less closely related. OsOSCA expression is organ- and tissue-specific and regulated by different osmotic-related abiotic stresses in rice. These findings will facilitate further research in this gene family and provide potential target genes for generation of genetically modified osmotic-stress-resistant plants.
Project description:The representation, integration, and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture allows the analysis of the chromosome organization in the cell's natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighborhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.
Project description:We conducted DNA methylation association analyses using Illumina 450K data from whole blood for an Australian amyotrophic lateral sclerosis (ALS) case-control cohort (782 cases and 613 controls). Analyses used mixed linear models as implemented in the OSCA software. We found a significantly higher proportion of neutrophils in cases compared to controls which replicated in an independent cohort from the Netherlands (1159 cases and 637 controls). The OSCA MOMENT linear mixed model has been shown in simulations to best account for confounders. When combined in a methylation profile score, the 25 most-associated probes identified by MOMENT significantly classified case-control status in the Netherlands sample (area under the curve, AUC?=?0.65, CI95%?=?[0.62-0.68], p?=?8.3?×?10-22). The maximum AUC achieved was 0.69 (CI95%?=?[0.66-0.71], p?=?4.3?×?10-34) when cell-type proportion was included in the predictor.
Project description:Background:Worldwide cultivation of maize is often impacted negatively by drought stress. Hyperosmolality-gated calcium-permeable channels (OSCA) have been characterized as osmosensors in Arabidopsis. However, the involvement of members of the maize OSCA (ZmOSCA) gene family in response to drought stress is unknown. It is furthermore unclear which ZmOSCA gene plays a major role in genetic improvement of drought tolerance in Maize. Methods:We predicted the protein domain structure and transmembrane regions by using the NCBI Conserved Domain Database database and TMHMM server separately. The phylogeny tree was built by Mega7. We used the mixed linear model in TASSEL to perform the family-based association analysis. Results:In this report, 12 ZmOSCA genes were uncovered in the maize genome by a genome-wide survey and analyzed systematically to reveal their synteny and phylogenetic relationship with the genomes of rice, maize, and sorghum. These analyses indicated a relatively conserved evolutionary history of the ZmOSCA gene family. Protein domain and transmembrane analysis indicated that most of the 12 ZmOSCAs shared similar structures with their homologs. The result of differential expression analysis under drought at various stages, as well as the expression profiles in 15 tissues, revealed a functional divergence of ZmOSCA genes. Notably, the expression level of ZmOSCA4.1 being up-regulated in both seedlings and adult leaves. Notably, the association analysis between genetic variations in these genes and drought tolerance was detected. Significant associations between genetic variation in ZmOSCA4.1 and drought tolerance were found at the seedling stage. Our report provides a detailed analysis of the ZmOSCAs in the maize genome. These findings will contribute to future studies on the functional characterization of ZmOSCA proteins in response to water deficit stress, as well as understanding the mechanism of genetic variation in drought tolerance in maize.
Project description:Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome-wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait-related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models.
Project description:Circadian rhythms play a fundamental role at all levels of biological organization. Understanding the mechanisms and implications of circadian oscillations continues to be the focus of intense research. However, there has been no comprehensive and integrated way for accessing and mining all circadian omic datasets. The latest release of CircadiOmics (http://circadiomics.ics.uci.edu) fills this gap for providing the most comprehensive web server for studying circadian data. The newly updated version contains high-throughput 227 omic datasets corresponding to over 74 million measurements sampled over 24 h cycles. Users can visualize and compare oscillatory trajectories across species, tissues and conditions. Periodicity statistics (e.g. period, amplitude, phase, P-value, q-value etc.) obtained from BIO_CYCLE and other methods are provided for all samples in the repository and can easily be downloaded in the form of publication-ready figures and tables. New features and substantial improvements in performance and data volume make CircadiOmics a powerful web portal for integrated analysis of circadian omic data.
Project description:Multi-omic studies combine measurements at different molecular levels to build comprehensive models of cellular systems. The success of a multi-omic data analysis strategy depends largely on the adoption of adequate experimental designs, and on the quality of the measurements provided by the different omic platforms. However, the field lacks a comparative description of performance parameters across omic technologies and a formulation for experimental design in multi-omic data scenarios. Here, we propose a set of harmonized Figures of Merit (FoM) as quality descriptors applicable to different omic data types. Employing this information, we formulate the MultiPower method to estimate and assess the optimal sample size in a multi-omics experiment. MultiPower supports different experimental settings, data types and sample sizes, and includes graphical for experimental design decision-making. MultiPower is complemented with MultiML, an algorithm to estimate sample size for machine learning classification problems based on multi-omic data.
Project description:Recent advancement of omic technologies provides researchers with opportunities to search for disease biomarkers at the systems level. However, selection of biomarker candidates from a large number of molecules involved at various layers of the biological system is challenging. In this paper, we propose multi-omic integrative analysis (MOTA), a network-based method that uses information from multi-omic data to identify candidate disease biomarkers. We evaluated the performance of MOTA in selecting disease-associated molecules from four sets of multi-omic data representing three cohorts of hepatocellular carcinoma (HCC) cases and patients with liver cirrhosis. The results demonstrate that MOTA leads to selection of more biomarker candidates that shared by two different cohorts compared to traditional statistical methods. Also, the networks constructed by MOTA allow users to investigate biological significance of the selected biomarker candidates.