Project description:The ability to predict the agronomic performance of single-crosses with high precision is essential for selecting superior candidates for hybrid breeding. With recent technological advances, thousands of new parent lines, and, consequently, millions of new hybrid combinations are possible in each breeding cycle, yet only a few hundred can be produced and phenotyped in multi-environment yield trials. Well established prediction approaches such as best linear unbiased prediction (BLUP) using pedigree data and whole-genome prediction using genomic data are limited in capturing epistasis and interactions occurring within and among downstream biological strata such as transcriptome and metabolome. Because mRNA and small RNA (sRNA) sequences are involved in transcriptional, translational and post-translational processes, we expect them to provide information influencing several biological strata. However, using sRNA data of parent lines to predict hybrid performance has not yet been addressed. Here, we gathered genomic, transcriptomic (mRNA and sRNA) and metabolomic data of parent lines to evaluate the ability of the data to predict the performance of untested hybrids for important agronomic traits in grain maize. We found a considerable interaction for predictive ability between predictor and trait, with mRNA data being a superior predictor for grain yield and genomic data for grain dry matter content, while sRNA performed relatively poorly for both traits. Combining mRNA and genomic data as predictors resulted in high predictive abilities across both traits and combining other predictors improved prediction over that of the individual predictors alone. We conclude that downstream "omics" can complement genomics for hybrid prediction, and, thereby, contribute to more efficient selection of hybrid candidates.
Project description:BackgroundBayesian hierarchical models have been proposed to combine evidence from different types of study designs. However, when combining evidence from randomised and non-randomised controlled studies, imbalances in patient characteristics between study arms may bias the results. The objective of this study was to assess the performance of a proposed Bayesian approach to adjust for imbalances in patient level covariates when combining evidence from both types of study designs.Methodology/principal findingsSimulation techniques, in which the truth is known, were used to generate sets of data for randomised and non-randomised studies. Covariate imbalances between study arms were introduced in the non-randomised studies. The performance of the Bayesian hierarchical model adjusted for imbalances was assessed in terms of bias. The data were also modelled using three other Bayesian approaches for synthesising evidence from randomised and non-randomised studies. The simulations considered six scenarios aimed at assessing the sensitivity of the results to changes in the impact of the imbalances and the relative number and size of studies of each type. For all six scenarios considered, the Bayesian hierarchical model adjusted for differences within studies gave results that were unbiased and closest to the true value compared to the other models.Conclusions/significanceWhere informed health care decision making requires the synthesis of evidence from randomised and non-randomised study designs, the proposed hierarchical Bayesian method adjusted for differences in patient characteristics between study arms may facilitate the optimal use of all available evidence leading to unbiased results compared to unadjusted analyses.
Project description:The measurement of this data aims to evaluate the wind shear variability at three selected sites in Malaysia. The sites are Kudat in Sabah, Kijal in Terengganu and Langkawi in Kedah. Both sites in Kudat and Kijal is located in coastal areas with few buildings or trees, while the site in Langkawi is a coastal area with many buildings or trees. The variables were measured using the sensors that mounted on the wind mast with the maximum height from 55 m to 70 m from ground level. The variables measured were wind speed, wind direction, temperature, and pressure, while the wind shear data were directly generated using the power law equation. The averaged wind shear based on measured multiple height wind speed at selected sites is larger than the 1/7 law (0.143). Also, the value of wind shear was higher in order Langkawi > Kudat > Kijal. Ultimately, the wind shear data are essential and can be reused in the wind energy potential study, especially for data extrapolation to desired wind turbine hub height.
Project description:Large-scale molecular interaction data sets have the potential to provide a comprehensive, system-wide understanding of biological function. Although individual molecules can be promiscuous in terms of their contribution to function, molecular functions emerge from the specific interactions of molecules giving rise to modular organisation. As functions often derive from a range of mechanisms, we demonstrate that they are best studied using networks derived from different sources. Implementing a graph partitioning algorithm we identify subnetworks in yeast protein-protein interaction (PPI), genetic interaction and gene co-regulation networks. Among these subnetworks we identify cohesive subgraphs that we expect to represent functional modules in the different data types. We demonstrate significant overlap between the subgraphs generated from the different data types and show these overlaps can represent related functions as represented by the Gene Ontology (GO). Next, we investigate the correspondence between our subgraphs and the Gene Ontology. This revealed varying degrees of coverage of the biological process, molecular function and cellular component ontologies, dependent on the data type. For example, subgraphs from the PPI show enrichment for 84%, 58% and 93% of annotated GO terms, respectively. Integrating the interaction data into a combined network increases the coverage of GO. Furthermore, the different annotation types of GO are not predominantly associated with one of the interaction data types. Collectively our results demonstrate that successful capture of functional relationships by network data depends on both the specific biological function being characterised and the type of network data being used. We identify functions that require integrated information to be accurately represented, demonstrating the limitations of individual data types. Combining interaction subnetworks across data types is therefore essential for fully understanding the complex and emergent nature of biological function.
Project description:This study investigates how backgrounded membrane imaging (BMI) can be used in combination with convolutional neural networks (CNNs) in order to quantitatively and qualitatively study subvisible particles in both protein biopharmaceuticals and samples containing synthetic model particles. BMI requires low sample volumes and avoids many technical complications associated with imaging particles in solution, e.g., air bubble interference, low refractive index contrast between solution and particles of interest, etc. Hence, BMI is an attractive technique for characterizing particles at various stages of drug product development. However, to date, the morphological information encoded in brightfield BMI images has scarcely been utilized. Here we show that CNN based methods can be useful in extracting morphological information from (label-free) brightfield BMI particle images. Images of particles from biopharmaceutical products and from laboratory prepared samples were analyzed with two types of CNN based approaches: traditional supervised classifiers and a recently proposed fingerprinting analysis method. We demonstrate that the CNN based methods are able to efficiently leverage BMI data to distinguish between particles comprised of different proteins, various fatty acids (representing polysorbate degradation related particles), and protein surrogates (NIST ETFE reference material) only based on BMI images. The utility of using the fingerprinting method for comparing morphological differences and similarities of particles formed in distinct drug products and/or laboratory prepared samples is further demonstrated and discussed through three case studies.
Project description:BackgroundOne of the important challenges in microarray analysis is to take full advantage of previously accumulated data, both from one's own laboratory and from public repositories. Through a comparative analysis on a variety of datasets, a more comprehensive view of the underlying mechanism or structure can be obtained. However, as we discover in this work, continual changes in genomic sequence annotations and probe design criteria make it difficult to compare gene expression data even from different generations of the same microarray platform.ResultsWe first describe the extent of discordance between the results derived from two generations of Affymetrix oligonucleotide arrays, as revealed in cluster analysis and in identification of differentially expressed genes. We then propose a method for increasing comparability. The dataset we use consists of a set of 14 human muscle biopsy samples from patients with inflammatory myopathies that were hybridized on both HG-U95Av2 and HG-U133A human arrays. We find that the use of the probe set matching table for comparative analysis provided by Affymetrix produces better results than matching by UniGene or LocusLink identifiers but still remains inadequate. Rescaling of expression values for each gene across samples and data filtering by expression values enhance comparability but only for few specific analyses. As a generic method for improving comparability, we select a subset of probes with overlapping sequence segments in the two array types and recalculate expression values based only on the selected probes. We show that this filtering of probes significantly improves the comparability while retaining a sufficient number of probe sets for further analysis.ConclusionsCompatibility between high-density oligonucleotide arrays is significantly affected by probe-level sequence information. With a careful filtering of the probes based on their sequence overlaps, data from different generations of microarrays can be combined more effectively.
Project description:A chronic disease such as asthma is the result of a complex sequence of biological interactions involving multiple genes and pathways in response to a multitude of environmental exposures. However, methods to model jointly all factors are still evolving. Some of the current challenges include how to integrate knowledge from different data types and different disciplines, as well as how to utilize relevant external information such as gene annotation to identify novel disease genes and gene-environment inter-actions.Using a Bayesian hierarchical modeling framework, we developed two alternative methods for joint analysis of an epidemiologic study of a disease endpoint and an experimental study of intermediate phenotypes, while incorporating external information.Our simulation studies demonstrated superior performance of the proposed hierarchical models compared to separate analysis with the standard single-level regression modeling approach. The combined analyses of the Southern California Children's Health Study and challenge study data suggest that these joint analytical methods detected more significant genetic main and gene-environment interaction effects than the conventional analysis.The proposed prior framework is very flexible and can be generalized for an integrative analysis of diverse sources of relevant biological data.
Project description:BackgroundPredictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions.MethodsIn this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives.ResultsContrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures.ConclusionsOur findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.
Project description:Palaeognathae consists of five groups of extant species: flighted tinamous (1) and four flightless groups: kiwi (2), cassowaries and emu (3), rheas (4), and ostriches (5). Molecular studies supported the groupings of extinct moas with tinamous and elephant birds with kiwi as well as ostriches as the group that diverged first among the five groups. However, phylogenetic relationships among the five groups are still controversial. Previous studies showed extensive heterogeneity in estimated gene tree topologies from conserved nonexonic elements, introns, and ultraconserved elements. Using the noncoding loci together with protein-coding loci, this study investigated the factors that affected gene tree estimation error and the relationships among the five groups. Using closely related ostrich rather than distantly related chicken as the outgroup, concatenated and gene tree-based approaches supported rheas as the group that diverged first among groups (1)-(4). Whereas gene tree estimation error increased using loci with low sequence divergence and short length, topological bias in estimated trees occurred using loci with high sequence divergence and/or nucleotide composition bias and heterogeneity, which more occurred in trees estimated from coding loci than noncoding loci. Regarding the relationships of (1)-(4), the site patterns by parsimony criterion appeared less susceptible to the bias than tree construction assuming stationary time-homogeneous model and suggested the clustering of kiwi and cassowaries and emu the most likely with ∼40% support rather than the clustering of kiwi and rheas and that of kiwi and tinamous with 30% support each.
Project description:Compatibility between high-density oligonucleotide arrays is significantly affected by probe-level sequence information. With a careful filtering of the probes based on their sequence overlaps, data from different generations of microarrays can be combined more effectively. The dataset of 14 human muscle biopsy samples from patients with inflammatory myopathies that were hybridized on both HG-U95Av2 and HG-U133A human arrays for this purpose. Signal values from GCOS 1.2 with Detection call and p-value are provided here, and CEL files are also available for download. Keywords: parallel sample