Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Population sampling affects pseudoreplication.

ABSTRACT:

SUBMITTER: Jordan CY

PROVIDER: S-EPMC6188620 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Publications

Population sampling affects pseudoreplication.

Jordan Crispin Y CY

PLoS biology 20181015 10

PMID: 30321169

Similar Datasets

Pseudoreplication in genomic-scale data sets.

Project description:In genomic-scale data sets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent creates pseudoreplication, which reduces the effective degrees of freedom (df') compared to the nominal degrees of freedom, df. This issue has been known for some time, but consequences have not been systematically quantified across the entire genome. Here, we measured pseudoreplication (quantified by the ratio df'/df) for a common metric of genetic differentiation (FST ) and a common measure of linkage disequilibrium between pairs of loci (r2 ). Based on data simulated using models (SLiM and msprime) that allow efficient forward-in-time and coalescent simulations while precisely controlling population pedigrees, we estimated df' and df'/df by measuring the rate of decline in the variance of mean FST and mean r2 as more loci were used. For both indices, df' increases with Ne and genome size, as expected. However, even for large Ne and large genomes, df' for mean r2 plateaus after a few thousand loci, and a variance components analysis indicates that the limiting factor is uncertainty associated with sampling individuals rather than genes. Pseudoreplication is less extreme for FST , but df'/df ≤0.01 can occur in data sets using tens of thousands of loci. Commonly-used block-jackknife methods consistently overestimated var (FST ), producing very conservative confidence intervals. Predicting df' based on our modelling results as a function of Ne , L, S, and genome size provides a robust way to quantify precision associated with genomic-scale data sets.

| S-EPMC9415146 | biostudies-literature

Pseudoreplication in physiology: More means less.

Project description:This article reviews how to analyze data from experiments designed to compare the cellular physiology of two or more groups of animals or people. This is commonly done by measuring data from several cells from each animal and using simple t tests or ANOVA to compare between groups. I use simulations to illustrate that this method can give erroneous positive results by assuming that the cells from each animal are independent of each other. This problem, which may be responsible for much of the lack of reproducibility in the literature, can be easily avoided by using a hierarchical, nested statistics approach.

| S-EPMC7814346 | biostudies-literature

A Bayesian predictive approach for dealing with pseudoreplication.

Project description:Pseudoreplication occurs when the number of measured values or data points exceeds the number of genuine replicates, and when the statistical analysis treats all data points as independent and thus fully contributing to the result. By artificially inflating the sample size, pseudoreplication contributes to irreproducibility, and it is a pervasive problem in biological research. In some fields, more than half of published experiments have pseudoreplication - making it one of the biggest threats to inferential validity. Researchers may be reluctant to use appropriate statistical methods if their hypothesis is about the pseudoreplicates and not the genuine replicates; for example, when an intervention is applied to pregnant female rodents (genuine replicates) but the hypothesis is about the effect on the multiple offspring (pseudoreplicates). We propose using a Bayesian predictive approach, which enables researchers to make valid inferences about biological entities of interest, even if they are pseudoreplicates, and show the benefits of this approach using two in vivo data sets.

| S-EPMC7012913 | biostudies-literature

Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships.

Project description:Despite expanding data sets and advances in phylogenomic methods, deep-level metazoan relationships remain highly controversial. Recent phylogenomic analyses depart from classical concepts in recovering ctenophores as the earliest branching metazoan taxon and propose a sister-group relationship between sponges and cnidarians (e.g., Dunn CW, Hejnol A, Matus DQ, et al. (18 co-authors). 2008. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452:745-749). Here, we argue that these results are artifacts stemming from insufficient taxon sampling and long-branch attraction (LBA). By increasing taxon sampling from previously unsampled nonbilaterians and using an identical gene set to that reported by Dunn et al., we recover monophyletic Porifera as the sister group to all other Metazoa. This suggests that the basal position of the fast-evolving Ctenophora proposed by Dunn et al. was due to LBA and that broad taxon sampling is of fundamental importance to metazoan phylogenomic analyses. Additionally, saturation in the Dunn et al. character set is comparatively high, possibly contributing to the poor support for some nonbilaterian nodes.

| S-EPMC2922619 | biostudies-literature

Sampling frequency affects estimates of annual nitrous oxide fluxes.

Project description:Quantifying nitrous oxide (N2O) fluxes, a potent greenhouse gas, from soils is necessary to improve our knowledge of terrestrial N2O losses. Developing universal sampling frequencies for calculating annual N2O fluxes is difficult, as fluxes are renowned for their high temporal variability. We demonstrate daily sampling was largely required to achieve annual N2O fluxes within 10% of the 'best' estimate for 28 annual datasets collected from three continents--Australia, Europe and Asia. Decreasing the regularity of measurements either under- or overestimated annual N2O fluxes, with a maximum overestimation of 935%. Measurement frequency was lowered using a sampling strategy based on environmental factors known to affect temporal variability, but still required sampling more than once a week. Consequently, uncertainty in current global terrestrial N2O budgets associated with the upscaling of field-based datasets can be decreased significantly using adequate sampling frequencies.

| S-EPMC4629121 | biostudies-literature

A practical solution to pseudoreplication bias in single-cell studies.

Project description:Cells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation. Here, we document this dependence across a range of cell types and show that pseudo-bulk aggregation methods are conservative and underpowered relative to mixed models. To compute differential expression within a specific cell type across treatment groups, we propose applying generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.

| S-EPMC7854630 | biostudies-literature

The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?

Project description:BackgroundPseudoreplication occurs when observations are not statistically independent, but treated as if they are. This can occur when there are multiple observations on the same subjects, when samples are nested or hierarchically organised, or when measurements are correlated in time or space. Analysis of such data without taking these dependencies into account can lead to meaningless results, and examples can easily be found in the neuroscience literature.ResultsA single issue of Nature Neuroscience provided a number of examples and is used as a case study to highlight how pseudoreplication arises in neuroscientific studies, why the analyses in these papers are incorrect, and appropriate analytical methods are provided. 12% of papers had pseudoreplication and a further 36% were suspected of having pseudoreplication, but it was not possible to determine for certain because insufficient information was provided.ConclusionsPseudoreplication can undermine the conclusions of a statistical analysis, and it would be easier to detect if the sample size, degrees of freedom, the test statistic, and precise p-values are reported. This information should be a requirement for all publications.

| S-EPMC2817684 | biostudies-literature

"How" and "what" matters: Sampling method affects biodiversity estimates of reef fishes.

Project description:Understanding changes in biodiversity requires the implementation of monitoring programs encompassing different dimensions of biodiversity through varying sampling techniques. In this work, fish assemblages associated with the "outer" and "inner" sides of four marinas, two at the Canary Islands and two at southern Portugal, were investigated using three complementary sampling techniques: underwater visual censuses (UVCs), baited cameras (BCs), and fish traps (FTs). We firstly investigated the complementarity of these sampling methods to describe species composition. Then, we investigated differences in taxonomic (TD), phylogenetic (PD) and functional diversity (FD) between sides of the marinas according to each sampling method. Finally, we explored the applicability/reproducibility of each sampling technique to characterize fish assemblages according to these metrics of diversity. UVCs and BCs provided complementary information, in terms of the number and abundances of species, while FTs sampled a particular assemblage. Patterns of TD, PD, and FD between sides of the marinas varied depending on the sampling method. UVC was the most cost-efficient technique, in terms of personnel hours, and it is recommended for local studies. However, for large-scale studies, BCs are recommended, as it covers greater spatio-temporal scales by a lower cost. Our study highlights the need to implement complementary sampling techniques to monitor ecological change, at various dimensions of biodiversity. The results presented here will be useful for optimizing future monitoring programs.

| S-EPMC5496540 | biostudies-literature

Sampling Method Affects HR-MAS NMR Spectra of Healthy Caprine Brain Biopsies.

Project description:The metabolic profiling of tissue biopsies using high-resolution-magic angle spinning (HR-MAS) 1H nuclear magnetic resonance (NMR) spectroscopy may be influenced by experimental factors such as the sampling method. Therefore, we compared the effects of two different sampling methods on the metabolome of brain tissue obtained from the brainstem and thalamus of healthy goats by 1H HR-MAS NMR spectroscopy-in vivo-harvested biopsy by a minimally invasive stereotactic approach compared with postmortem-harvested sample by dissection with a scalpel. Lactate and creatine were elevated, and choline-containing compounds were altered in the postmortem compared to the in vivo-harvested samples, demonstrating rapid changes most likely due to sample ischemia. In addition, in the brainstem samples acetate and inositols, and in the thalamus samples ƴ-aminobutyric acid, were relatively increased postmortem, demonstrating regional differences in tissue degradation. In conclusion, in vivo-harvested brain biopsies show different metabolic alterations compared to postmortem-harvested samples, reflecting less tissue degradation. Sampling method and brain region should be taken into account in the analysis of metabolic profiles. To be as close as possible to the actual situation in the living individual, it is desirable to use brain samples obtained by stereotactic biopsy whenever possible.

| S-EPMC7825498 | biostudies-literature

Sampling strategies for frequency spectrum-based population genomic inference.

Project description:BACKGROUND: The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models. RESULTS: Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models. CONCLUSIONS: Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

| S-EPMC4269862 | biostudies-other

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data