Project description:BackgroundHigh dimensional case control studies are ubiquitous in the biological sciences, particularly genomics. To maximise power while constraining cost and to minimise type-1 error rates, researchers typically seek to replicate findings in a second experiment on independent cohorts before proceeding with further analyses. This can be an expensive procedure, particularly when control samples are difficult to recruit or ascertain; for example in inter-disease comparisons, or studies on degenerative diseases.ResultsThis paper presents a method in which control (or case) samples from the discovery cohort are re-used in a replication study. The theoretical implications of this method are discussed and simulated genome-wide association study (GWAS) tests are used to compare performance against the standard approach in a range of circumstances. Using similar methods, a procedure is proposed for 'partial replication' using a new independent cohort consisting of only controls. This methods can be used to provide some validation of findings when a full replication procedure is not possible. The new method has differing sensitivity to confounding in study cohorts compared to the standard procedure, which must be considered in its application. Type-1 error rates in these scenarios are analytically and empirically derived, and an online tool for comparing power and error rates is provided.ConclusionsIn several common study designs, a shared-control method allows a substantial improvement in power while retaining type-1 error rate control. Although careful consideration must be made of all necessary assumptions, this method can enable more efficient use of data in GWAS and other applications.
Project description:Study cost remains the major limiting factor for genome-wide association studies due to the necessity of genotyping a large number of SNPs for a large number of subjects. Both DNA pooling strategies and two-stage designs have been proposed to reduce genotyping costs. In this study, we propose a cost-effective, two-stage approach with a DNA pooling strategy. During stage I, all markers are evaluated on a subset of individuals using DNA pooling. The most promising set of markers is then evaluated with individual genotyping for all individuals during stage II. The goal is to determine the optimal parameters (pi(p)(sample ), the proportion of samples used during stage I with DNA pooling; and pi(p)(marker ), the proportion of markers evaluated during stage II with individual genotyping) that minimize the cost of a two-stage DNA pooling design while maintaining a desired overall significance level and achieving a level of power similar to that of a one-stage individual genotyping design. We considered the effects of three factors on optimal two-stage DNA pooling designs. Our results suggest that, under most scenarios considered, the optimal two-stage DNA pooling design may be much more cost-effective than the optimal two-stage individual genotyping design, which use individual genotyping during both stages.
Project description:Finding a genetic marker associated with a trait is a classic problem in human genetics. Recently, two-stage approaches have gained popularity in marker-trait association studies, in part because researchers hope to reduce the multiple testing problem by testing fewer markers in the final stage. We compared one two-stage family-based approach to an analogous single-stage method, calculating the empirical type I error rates and power for both methods using fully simulated data sets modeled on nuclear families with rheumatoid arthritis, and data sets of real single-nucleotide polymorphism genotypes from Centre d'Etude du Polymorphisme Humain pedigrees with simulated traits. In these analyses performed in the absence of population stratification, the single-stage method was consistently more powerful than the two-stage method for a given type I error rate. To explore the sources of this difference, we performed a case study comparing the individual steps of two-stage designs, the two-stage design itself, and the analogous one-stage design.
Project description:Studies using induced pluripotent stem cells (iPSCs) are gaining momentum in brain disorder modelling, but optimal study designs are poorly defined. Here, we compare commonly used designs and statistical analysis for different research aims. Furthermore, we generated immunocytochemical, electrophysiological, and proteomic data from iPSC-derived neurons of five healthy subjects, analysed data variation and conducted power simulations. These analyses show that published case-control iPSC studies are generally underpowered. Designs using isogenic iPSC lines typically have higher power than case-control designs, but generalization of conclusions is limited. We show that, for the realistic settings used in this study, a multiple isogenic pair design increases absolute power up to 60% or requires up to 5-fold fewer lines. A free web tool is presented to explore the power of different study designs, using any (pilot) data.
Project description:With its potential to discover a much greater amount of genetic variation, next-generation sequencing is fast becoming an emergent tool for genetic association studies. However, the cost of sequencing all individuals in a large-scale population study is still high in comparison to most alternative genotyping options. While the ability to identify individual-level data is lost (without bar-coding), sequencing pooled samples can substantially lower costs without compromising the power to detect significant associations. We propose a hierarchical Bayesian model that estimates the association of each variant using pools of cases and controls, accounting for the variation in read depth across pools and sequencing error. To investigate the performance of our method across a range of number of pools, number of individuals within each pool, and average coverage, we undertook extensive simulations varying effect sizes, minor allele frequencies, and sequencing error rates. In general, the number of pools and pool size have dramatic effects on power while the total depth of coverage per pool has only a moderate impact. This information can guide the selection of a study design that maximizes power subject to cost, sample size, or other laboratory constraints. We provide an R package (hiPOD: hierarchical Pooled Optimal Design) to find the optimal design, allowing the user to specify a cost function, cost, and sample size limitations, and distributions of effect size, minor allele frequency, and sequencing error rate.
Project description:BackgroundCharacterization of anti-malarial drug concentration profiles is necessary to optimize dosing, and thereby optimize cure rates and reduce both toxicity and the emergence of resistance. Population pharmacokinetic studies determine the drug concentration time profiles in the target patient populations, including children who have limited sampling options. Currently, population pharmacokinetic studies of anti-malarial drugs are designed based on logistical, financial and ethical constraints, and prior knowledge of the drug concentration time profile. Although these factors are important, the proposed design may be unable to determine the desired pharmacokinetic profile because there was no formal consideration of the complex statistical models used to analyse the drug concentration data.MethodsOptimal design methods incorporate prior knowledge of the pharmacokinetic profile of the drug, the statistical methods used to analyse data from population pharmacokinetic studies, and also the practical constraints of sampling the patient population. The methods determine the statistical efficiency of the design by evaluating the information of the candidate study design prior to the pharmacokinetic study being conducted.ResultsIn a hypothetical population pharmacokinetic study of intravenous artesunate, where the number of patients and blood samples to be assayed was constrained to be 50 and 200 respectively, an evaluation of varying elementary designs using optimal design methods found that the designs with more patients and less samples per patient improved the precision of the pharmacokinetic parameters and inter-patient variability, and the overall statistical efficiency by at least 50%.ConclusionOptimal design methods ensure that the proposed study designs for population pharmacokinetic studies are robust and efficient. It is unethical to continue conducting population pharmacokinetic studies when the sampling schedule may be insufficient to estimate precisely the pharmacokinetic profile.
Project description:BackgroundGenome-wide association studies are a promising new tool for deciphering the genetics of complex diseases. To choose the proper sample size and genotyping platform for such studies, power calculations that take into account genetic model, tag SNP selection, and the population of interest are required.ResultsThe power of genome-wide association studies can be computed using a set of tag SNPs and a large number of genotyped SNPs in a representative population, such as available through the HapMap project. As expected, power increases with increasing sample size and effect size. Power also depends on the tag SNPs selected. In some cases, more power is obtained by genotyping more individuals at fewer SNPs than fewer individuals at more SNPs.ConclusionGenome-wide association studies should be designed thoughtfully, with the choice of genotyping platform and sample size being determined from careful power calculations.
Project description:Genetic interaction is a crucial issue in the understanding of functional pathways underlying complex diseases. However, detecting such interaction effects is challenging in terms of both methodology and statistical power. We address this issue by introducing a disease-concordant twin-case-only design, which applies to both monozygotic and dizygotic twins. To investigate the power, we conducted a computer simulation study by setting a series of parameter schemes with different minor allele frequencies and relative risks. Results from the simulation study reveals that the disease-concordant twin-case-only design largely reduces sample size required for sufficient power compared to the ordinary case-only design for detecting gene-gene interaction using unrelated individuals. Sample sizes for dizygotic and monozygotic twins were roughly 1/2 and 1/4 of sample sizes in the ordinary case-only design. Since dizygotic twins are genetically similar as siblings, the enriched power for dizygotic twins also applies to affected siblings, which could help to largely extend the application of the powerful twin-case-only design. In summary, our simulation reveals high value of disease-concordant twins and siblings in efficiently detecting gene-by-gene interactions.