ABSTRACT: Confirmatory and associated toxicity counter assays from the Tox21 project were filtered by compound purity and aggregated by CID. Data are presented by assay pairings and include relevant dataset annotations.
Project description:Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call trac (tree-aggregation of compositional data), learns data-adaptive taxon aggregation levels for predictive modeling, greatly reducing the need for user-defined aggregation in preprocessing while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbiome researchers gain insights into the structure and functioning of the underlying ecosystem of interest.
Project description:This is a dataset generated by the Drosophila Regulatory Elements modENCODE Project led by Kevin P. White at the University of Chicago. It contains ChIP-chip data on Affymetrix Drosophila Tiling 2.0R arrays for multiple transcription factor antibodies at various time-points of Drosophila development. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf
Project description:Collecting complete network data is expensive, time-consuming, and often infeasible. Aggregated Relational Data (ARD), which ask respondents questions of the form "How many people with trait X do you know?" provide a low-cost option when collecting complete network data is not possible. Rather than asking about connections between each pair of individuals directly, ARD collect the number of contacts the respondent knows with a given trait. Despite widespread use and a growing literature on ARD methodology, there is still no systematic understanding of when and why ARD should accurately recover features of the unobserved network. This paper provides such a characterization by deriving conditions under which statistics about the unobserved network (or functions of these statistics like regression coefficients) can be consistently estimated using ARD. We first provide consistent estimates of network model parameters for three commonly used probabilistic models: the beta-model with node-specific unobserved effects, the stochastic block model with unobserved community structure, and latent geometric space models with unobserved latent locations. A key observation is that cross-group link probabilities for a collection of (possibly unobserved) groups identify the model parameters, meaning ARD are sufficient for parameter estimation. With these estimated parameters, it is possible to simulate graphs from the fitted distribution and analyze the distribution of network statistics. We can then characterize conditions under which the simulated networks based on ARD will allow for consistent estimation of the unobserved network statistics, such as eigenvector centrality, or response functions by or of the unobserved network, such as regression coefficients.
Project description:Methylation profiles of paired normal adjacent mucosa and tumor samples from 96 individuals and 48 healthy colon mucosae, were obtained through Illumina Infinium Human Methylation 450K BeadChip. This dataset is in the context of the COLONOMICS project and to query additional information you can visit the project website www.colonomics.org.
Project description:Gene expression profiles of paired normal adjacent mucosa and tumor samples from 98 individuals and 50 healthy colon mucosae, were obtained through Affymetrix Human Genome U219 Arrays. This dataset is in the context of the COLONOMICS project and to query additional information you can visit the project website www.colonomics.org.
Project description:Social network data are often prohibitively expensive to collect, limiting empirical network research. We propose an inexpensive and feasible strategy for network elicitation using Aggregated Relational Data (ARD): responses to questions of the form "how many of your links have trait k ?" Our method uses ARD to recover parameters of a network formation model, which permits sampling from a distribution over node- or graph-level statistics. We replicate the results of two field experiments that used network data and draw similar conclusions with ARD alone.
Project description:Jury deliberations provide a quintessential example of collective decision-making, but few studies have probed the available data to explore how juries reach verdicts. We examine how features of jury dynamics can be better understood from the joint distribution of final votes and deliberation time. To do this, we fit several different decision-making models to jury datasets from different places and times. In our best-fit model, jurors influence each other and have an increasing tendency to stick to their opinion of the defendant's guilt or innocence. We also show that this model can explain spikes in mean deliberation times when juries are hung, sub-linear scaling between mean deliberation times and trial duration, and unexpected final vote and deliberation time distributions. Our findings suggest that both stubbornness and herding play an important role in collective decision-making, providing a nuanced insight into how juries reach verdicts, and more generally, how group decisions emerge.
Project description:This is a dataset generated by the Drosophila Regulatory Elements modENCODE Project led by Kevin P. White at the University of Chicago. It contains genome-wide binding profile of the factor H3K27me3 from D.pse_WPP generated by ChIP and analyzed on Illumina Genome Analyzer. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf
Project description:This is a dataset generated by the Drosophila Regulatory Elements modENCODE Project led by Kevin P. White at the University of Chicago. It contains genome-wide binding profile of the factor Ttk from Kc167 generated by ChIP and analyzed on Illumina Genome Analyzer. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf
Project description:This is a dataset generated by the Drosophila Regulatory Elements modENCODE Project led by Kevin P. White at the University of Chicago. It contains genome-wide binding profile of the factor CNC from WPP generated by ChIP and analyzed on Illumina Genome Analyzer. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf