Project description:MotivationAlthough coexpression analysis via pair-wise expression correlation is popularly used to elucidate gene-gene interactions at the whole-genome scale, many complicated multi-gene regulations require more advanced detection methods. Liquid association (LA) is a powerful tool to detect the dynamic correlation of two gene variables depending on the expression level of a third variable (LA scouting gene). LA detection from single transcriptomic study, however, is often unstable and not generalizable due to cohort bias, biological variation and limited sample size. With the rapid development of microarray and NGS technology, LA analysis combining multiple gene expression studies can provide more accurate and stable results.ResultsIn this article, we proposed two meta-analytic approaches for LA analysis (MetaLA and MetaMLA) to combine multiple transcriptomic studies. To compensate demanding computing, we also proposed a two-step fast screening algorithm for more efficient genome-wide screening: bootstrap filtering and sign filtering. We applied the methods to five Saccharomyces cerevisiae datasets related to environmental changes. The fast screening algorithm reduced 98% of running time. When compared with single study analysis, MetaLA and MetaMLA provided stronger detection signal and more consistent and stable results. The top triplets are highly enriched in fundamental biological processes related to environmental changes. Our method can help biologists understand underlying regulatory mechanisms under different environmental exposure or disease states.Availability and implementationA MetaLA R package, data and code for this article are available at http://tsenglab.biostat.pitt.edu/software.htm.Contactctseng@pitt.edu.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BackgroundCurrent public concern over the spread of infectious diseases has underscored the importance of health surveillance systems for the speedy detection of disease outbreaks. Several international report-based monitoring systems have been developed, including GPHIN, Argus, HealthMap, and BioCaster. A vital feature of these report-based systems is the geo-temporal encoding of outbreak-related textual data. Until now, automated systems have tended to use an ad-hoc strategy for processing geo-temporal information, normally involving the detection of locations that match pre-determined criteria, and the use of document publication dates as a proxy for disease event dates. Although these strategies appear to be effective enough for reporting events at the country and province levels, they may be less effective at discovering geo-temporal information at more detailed levels of granularity. In order to improve the capabilities of current Web-based health surveillance systems, we introduce the design for a novel scheme called spatiotemporal zoning.MethodThe proposed scheme classifies news articles into zones according to the spatiotemporal characteristics of their content. In order to study the reliability of the annotation scheme, we analyzed the inter-annotator agreements on a group of human annotators for over 1000 reported events. Qualitative and quantitative evaluation is made on the results including the kappa and percentage agreement.ResultsThe reliability evaluation of our scheme yielded very promising inter-annotator agreement, more than a 0.9 kappa and a 0.9 percentage agreement for event type annotation and temporal attributes annotation, respectively, with a slight degradation for the spatial attribute. However, for events indicating an outbreak situation, the annotators usually had inter-annotator agreements with the lowest granularity location.ConclusionsWe developed and evaluated a novel spatiotemporal zoning annotation scheme. The results of the scheme evaluation indicate that our annotated corpus and the proposed annotation scheme are reliable and could be effectively used for developing an automatic system. Given the current advances in natural language processing techniques, including the availability of language resources and tools, we believe that a reliable automatic spatiotemporal zoning system can be achieved. In the next stage of this work, we plan to develop an automatic zoning system and evaluate its usability within an operational health surveillance system.
Project description:With varying, but substantial, proportions of heritability remaining unexplained by summaries of single-SNP genetic variation, there is a demand for methods that extract maximal information from genetic association studies. One source of variation that is difficult to assess is genetic interactions. A major challenge for naive detection methods is the large number of possible combinations, with a requisite need to correct for multiple testing. Assumptions of large marginal effects, to reduce the search space, may be restrictive and miss higher order interactions with modest marginal effects. In this paper, we propose a new procedure for detecting gene-by-gene interactions through heterogeneity in estimated low-order (e.g., marginal) effect sizes by leveraging population structure, or ancestral differences, among studies in which the same phenotypes were measured. We implement this approach in a meta-analytic framework, which offers numerous advantages, such as robustness and computational efficiency, and is necessary when data-sharing limitations restrict joint analysis. We effectively apply a dimension reduction procedure that scales to allow searches for higher order interactions. For comparison to our method, which we term phylogenY-aware Effect-size Tests for Interactions (YETI), we adapt an existing method that assumes interacting loci will exhibit strong marginal effects to our meta-analytic framework. As expected, YETI excels when multiple studies are from highly differentiated populations and maintains its superiority in these conditions even when marginal effects are small. When these conditions are less extreme, the advantage of our method wanes. We assess the Type-I error and power characteristics of complementary approaches to evaluate their strengths and limitations.
Project description:With the advancement of high-throughput biotechnologies, we increasingly accumulate biomedical data about diseases, especially cancer. There is a need for computational models and methods to sift through, integrate, and extract new knowledge from the diverse available data, to improve the mechanistic understanding of diseases and patient care. To uncover molecular mechanisms and drug indications for specific cancer types, we develop an integrative framework able to harness a wide range of diverse molecular and pan-cancer data. We show that our approach outperforms the competing methods and can identify new associations. Furthermore, it captures the underlying biology predictive of drug response. Through the joint integration of data sources, our framework can also uncover links between cancer types and molecular entities for which no prior knowledge is available. Our new framework is flexible and can be easily reformulated to study any biomedical problem.
Project description:Public and private institutions have gained traction in developing interventions to alter people's behaviours in predictable ways without limiting the freedom of choice or significantly changing the incentive structure. A nudge is designed to facilitate actions by minimizing friction, while a sludge is an intervention that inhibits actions by increasing friction, but the underlying cognitive mechanisms behind these interventions remain largely unknown. Here, we develop a novel cognitive framework by organizing these interventions along six cognitive processes: attention, perception, memory, effort, intrinsic motivation and extrinsic motivation. In addition, we conduct a meta-analysis of field experiments (i.e. randomized controlled trials) that contained real behavioural measures (n = 184 papers, k = 184 observations, N = 2 245 373 participants) from 2008 to 2021 to examine the effect size of these interventions targeting each cognitive process. Our findings demonstrate that interventions changing effort are more effective than interventions changing intrinsic motivation, and nudge and sludge interventions had similar effect sizes. However, these results need to be interpreted with caution due to a potential publication bias. This new meta-analytic framework provides cognitive principles for organizing nudge and sludge with corresponding behavioural impacts. The insights gained from this framework help inform the design and development of future interventions based on cognitive insights.
Project description:Accurate traffic prediction contributes significantly to the success of intelligent transportation systems (ITS), which enables ITS to rationally deploy road resources and enhance the utilization efficiency of road networks. Improvements in prediction performance are evident by utilizing synchronized rather than stepwise components to model spatial-temporal correlations. Some existing studies have designed graph structures containing spatial and temporal attributes to achieve spatial-temporal synchronous learning. However, two challenges remain due to the intricate dynamics: (a) Accounting for the impact of external factors in spatial-temporal synchronous modeling. (b) Multiple perspectives in constructing spatial-temporal synchronous graphs. To address the mentioned limitations, a novel model named dynamic multiple-graph spatial-temporal synchronous aggregation framework (DMSTSAF) for traffic prediction is proposed. Specifically, DMSTSAF utilizes a feature augmentation module (FAM) to adaptively incorporate traffic data with external factors and generate fused features as inputs to subsequent modules. Moreover, DMSTSAF introduces diverse spatial and temporal graphs according to different spatial-temporal relationships. Based on this, two types of spatial-temporal synchronous graphs and the corresponding synchronous aggregation modules are designed to simultaneously extract hidden features from various aspects. Extensive experiments constructed on four real-world datasets indicate that our model improves by 3.68-8.54% compared to the state-of-the-art baseline.
Project description:An increase in the number of smaller magnitude events, retrospectively named foreshocks, is often observed before large earthquakes. We show that the linear density probability of earthquakes occurring before and after small or intermediate mainshocks displays a symmetrical behavior, indicating that the size of the area fractured during the mainshock is encoded in the foreshock spatial organization. This observation can be used to discriminate spatial clustering due to foreshocks from the one induced by aftershocks and is implemented in an alarm-based model to forecast m > 6 earthquakes. A retrospective study of the last 19 years Southern California catalog shows that the daily occurrence probability presents isolated peaks closely located in time and space to the epicenters of five of the six m > 6 earthquakes. We find daily probabilities as high as 25% (in cells of size 0.04 × 0.04deg(2)), with significant probability gains with respect to standard models.
Project description:Historically crop models have been used to evaluate crop yield responses to nitrogen (N) rates after harvest when it is too late for the farmers to make in-season adjustments. We hypothesize that the use of a crop model as an in-season forecast tool will improve current N decision-making. To explore this, we used the Agricultural Production Systems sIMulator (APSIM) calibrated with long-term experimental data for central Iowa, USA (16-years in continuous corn and 15-years in soybean-corn rotation) combined with actual weather data up to a specific crop stage and historical weather data thereafter. The objectives were to: (1) evaluate the accuracy and uncertainty of corn yield and economic optimum N rate (EONR) predictions at four forecast times (planting time, 6th and 12th leaf, and silking phenological stages); (2) determine whether the use of analogous historical weather years based on precipitation and temperature patterns as opposed to using a 35-year dataset could improve the accuracy of the forecast; and (3) quantify the value added by the crop model in predicting annual EONR and yields using the site-mean EONR and the yield at the EONR to benchmark predicted values. Results indicated that the mean corn yield predictions at planting time (R2 = 0.77) using 35-years of historical weather was close to the observed and predicted yield at maturity (R2 = 0.81). Across all forecasting times, the EONR predictions were more accurate in corn-corn than soybean-corn rotation (relative root mean square error, RRMSE, of 25 vs. 45%, respectively). At planting time, the APSIM model predicted the direction of optimum N rates (above, below or at average site-mean EONR) in 62% of the cases examined (n = 31) with an average error range of ±38 kg N ha-1 (22% of the average N rate). Across all forecast times, prediction error of EONR was about three times higher than yield predictions. The use of the 35-year weather record was better than using selected historical weather years to forecast (RRMSE was on average 3% lower). Overall, the proposed approach of using the crop model as a forecasting tool could improve year-to-year predictability of corn yields and optimum N rates. Further improvements in modeling and set-up protocols are needed toward more accurate forecast, especially for extreme weather years with the most significant economic and environmental cost.
Project description:Large-scale mass spectrometry-based peptidomics for drug discovery is relatively unexplored because of challenges in peptide degradation and identification following tissue extraction. Here we present a streamlined analytical pipeline for large-scale peptidomics. We developed an optimized sample preparation protocol to achieve fast, reproducible and effective extraction of endogenous peptides from sub-dissected organs such as the brain, while diminishing unspecific protease activity. Each peptidome sample was analysed by high-resolution tandem mass spectrometry and the resulting data set was integrated with publically available databases. We developed and applied an algorithm that reduces the peptide complexity for identification of biologically relevant peptides. The developed pipeline was applied to rat hypothalamus and identifies thousands of neuropeptides and their post-translational modifications, which is combined in a resource format for visualization, qualitative and quantitative analyses.
Project description:In many complex physical phenomena such as wave propagation in scattering media, the process of interest often cannot be easily distinguished from other processes because only the total combined process is accessible. This makes it difficult to extract the precise knowledge of each subprocess. Here, we derive an analytic expression describing the way the eigenchannel coupling of the total process distributes its energy to the individual subprocesses, with only partial information on each subprocess such as the average eigenvalue 〈τ〉 and enhancement factor η. We found that the ratio of (η - 1)〈τ〉 between two subprocesses is a critical parameter determining the preferable subprocess in the energy coupling. This work provides a new analytic framework for understanding the effect of wavefront shaping in the control of wave propagation in disordered media.