Project description:SummaryMany genome-wide association studies and genome-wide screening for gene-environment (GxE) interactions have been performed to elucidate the underlying mechanisms of human traits and diseases. When the analyzed outcome is quantitative, the overall contribution of identified genetic variants to the outcome is often expressed as the percentage of phenotypic variance explained. This is commonly done using individual-level genotype data but it is challenging when results are derived through meta-analyses. Here, we present R package, 'VarExp', that allows for the estimation of the percentage of phenotypic variance explained using summary statistics only. It allows for a range of models to be evaluated, including marginal genetic effects, GxE interaction effects and both effects jointly. Its implementation integrates all recent methodological developments and does not need external data to be uploaded by users.Availability and implementationThe R package is available at https://gitlab.pasteur.fr/statistical-genetics/VarExp.git.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Following the publication of the ENCODE project results, there has been increasing interest in investigating different areas of the chromosome and evaluating the relative contribution of each area to expressed phenotypes. This study aims to evaluate the contribution of variants, classified by minor allele frequency and gene annotation, to the observed interindividual differences. In this study, we fitted Bayesian linear regression models to data from Genetic Analysis Workshop 18 (n = 395) to estimate the variance of standardized and log-transformed systolic blood pressure that can be explained by subsets of genetic markers. Rare and very rare variants explained an overall higher proportion of the variance, as did markers located within a gene rather than flanking regions. The proportion of variance explained by rare and very rare variants decreased when we controlled for the number of markers, suggesting that the number of contributing rare alleles plays an important role in the genetic architecture of chronic disease traits. Our findings lend support to the "common disease, rare variant" hypothesis for systolic blood pressure and highlight allele frequency and functional annotation of a polymorphism as potentially crucial considerations in whole genome study designs.
Project description:There is a dearth of statistical models that adequately capture the total signal attributed to whole-brain imaging features. The total signal is often widely distributed across the brain, with individual imaging features exhibiting small effect sizes for predicting neurobehavioral phenotypes. The challenge of capturing the total signal is compounded by the distribution of neurobehavioral data, particularly responses to psychological questionnaires, which often feature zero-inflated, highly skewed outcomes. To close this gap, we have developed a novel Variational Bayes algorithm that characterizes the total signal captured by whole-brain imaging features for zero-inflated outcomes. Our zero-inflated variance (ZIV) estimator estimates the fraction of variance explained (FVE) and the proportion of non-null effects (PNN) from large-scale imaging data. In simulations, ZIV demonstrates superior performance over other linear models. When applied to data from the Adolescent Brain Cognitive DevelopmentSM (ABCD) Study, we found that whole-brain imaging features contribute to a larger FVE for externalizing behaviors compared to internalizing behaviors. Moreover, focusing on features contributing to the PNN, ZIV estimator localized key neurocircuitry associated with neurobehavioral traits. To the best of our knowledge, the ZIV estimator is the first specialized method for analyzing zero-inflated neuroimaging data, enhancing future studies on brain-behavior relationships and improving the understanding of neurobehavioral disorders.
Project description:Understanding population dynamics requires reliable estimates of population density, yet this basic information is often surprisingly difficult to obtain. With rare or difficult-to-capture species, genetic surveys from noninvasive collection of hair or scat has proved cost-efficient for estimating densities. Here, we explored whether noninvasive genetic sampling (NGS) also offers promise for sampling a relatively common species, the snowshoe hare (Lepus americanus Erxleben, 1777), in comparison with traditional live trapping. We optimized a protocol for single-session NGS sampling of hares. We compared spatial capture-recapture population estimates from live trapping to estimates derived from NGS, and assessed NGS costs. NGS provided population estimates similar to those derived from live trapping, but a higher density of sampling plots was required for NGS. The optimal NGS protocol for our study entailed deploying 160 sampling plots for 4 days and genotyping one pellet per plot. NGS laboratory costs ranged from approximately $670 to $3000 USD per field site. While live trapping does not incur laboratory costs, its field costs can be considerably higher than for NGS, especially when study sites are difficult to access. We conclude that NGS can work for common species, but that it will require field and laboratory pilot testing to develop cost-effective sampling protocols.
Project description:We critically examined existing approaches for the estimation of the excess familial risk of cancer that can be attributed to identified common genetic risk variants and propose an alternative, more straightforward approach for calculating this proportion using well-established epidemiological methodology. We applied the underlying equations of the traditional approaches and the new epidemiological approach for colorectal cancer (CRC) in a large population-based case-control study in Germany with 4,447 cases and 3,480 controls, who were recruited from 2003 to 2016 and for whom interview, medical and genomic data were available. Having a family history of CRC (FH) was associated with a 1.77-fold risk increase in our study population (95% CI 1.52-2.07). Traditional approaches yielded estimates of the FH-associated risk explained by 97 common genetics variants from 9.6% to 23.1%, depending on various assumptions. Our alternative approach resulted in smaller and more consistent estimates of this proportion, ranging from 5.4% to 14.3%. Commonly employed methods may lead to strongly divergent and possibly exaggerated estimates of excess familial risk of cancer explained by associated known common genetic variants. Our results suggest that familial risk and risk associated with known common genetic variants might reflect two complementary major sources of risk.
Project description:We surveyed 26 quantitative traits and disease outcomes to understand the proportion of phenotypic variance explained by local ancestry in admixed African Americans. After inferring local ancestry as the number of African-ancestry chromosomes at hundreds of thousands of genotyped loci across all autosomes, we used a linear mixed effects model to estimate the variance explained by local ancestry in two large independent samples of unrelated African Americans. We found that local ancestry at major and polygenic effect genes can explain up to 20 and 8% of phenotypic variance, respectively. These findings provide evidence that most but not all additive genetic variance is explained by genetic markers undifferentiated by ancestry. These results also inform the proportion of health disparities due to genetic risk factors and the magnitude of error in association studies not controlling for local ancestry.
Project description:Background: RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript. Results: Independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage. Conclusions: Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.
Project description:Background: RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript. Results: Independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage. Conclusions: Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.