Dataset Information

Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development.

ABSTRACT: Determination of sequence variation within a genetic locus to develop clinically relevant databases is critical for molecular assay design and clinical test interpretation, so multisample pooling for Illumina genome analyzer (GA) sequencing was investigated using the RET proto-oncogene as a model. Samples were Sanger-sequenced for RET exons 10, 11, and 13-16. Ten samples with 13 known unique variants ("singleton variants" within the pool) and seven common changes were amplified and then equimolar-pooled before sequencing on a single flow cell lane, generating 36 base reads. For comparison, a single "control" sample was run in a different lane. After alignment, a 24-base quality score-screening threshold and 3; read end trimming of three bases yielded low background error rates with a 27% decrease in aligned read coverage. Sequencing data were evaluated using an established variant detection method (percent variant reads), by the presented subtractive correction method, and with SNPSeeker software. In total, 41 variants (of which 23 were singleton variants) were detected in the 10 pool data, which included all Sanger-identified variants. The 23 singleton variants were detected near the expected 5% allele frequency (average 5.17%+/-0.90% variant reads), well above the highest background error (1.25%). Based on background error rates, read coverage, simulated 30, 40, and 50 sample pool data, expected singleton allele frequencies within pools, and variant detection methods; >or=30 samples (which demonstrated a minimum 1% variant reads for singletons) could be pooled to reliably detect singleton variants by GA sequencing.

SUBMITTER: Margraf RL

PROVIDER: S-EPMC2922832 | biostudies-literature | 2010 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development.

Margraf Rebecca L RL Durtschi Jacob D JD Dames Shale S Pattison David C DC Stephens Jack E JE Mao Rong R Voelkerding Karl V KV

Journal of biomolecular techniques : JBT 20100901 3

Determination of sequence variation within a genetic locus to develop clinically relevant databases is critical for molecular assay design and clinical test interpretation, so multisample pooling for Illumina genome analyzer (GA) sequencing was investigated using the RET proto-oncogene as a model. Samples were Sanger-sequenced for RET exons 10, 11, and 13-16. Ten samples with 13 known unique variants ("singleton variants" within the pool) and seven common changes were amplified and then equimola ...[more]

PMID: 20808642

Similar Datasets

Project description:In this study 3 pooling experiments were performed. In each of the 3 cohorts, a 'case' and a 'control' blood pool was compared - the goal being to identify single nucleotide polymorphisms with significantly different estimated pooling allele frequencies between cases and controls. For cohort 1, 100 individuals with blue eye color were placed in one pool (the 'control' pool) and 100 individuals with brown eye color were placed in another pool ( the 'case' pool). In cohort 2, 131 individuals with age-related macular degeneration were placed in one pool, with 216 control individuals in another pool. In cohort 3, 100 individuals with pseudoexfoliation syndrome were placed in a case pool - in this case the cohort 2 control sample was used as 'controls'. The blue/brown pools were hybridized to Illumina HumanHap550 arrays. The cohort 2 and 3 pools were hybridized to Illumina 1M arrays. After processing, the raw data are summarized to give pooling allele frequency estimates for each pool. The abstract from the paper describing these data is as follows: Genome-wide association studies (GWAS) have now successfully identified important genetic variants associated with many human traits and diseases. The high cost of genotyping arrays in large datasets remains the major barrier to wider utilization of GWAS. We have developed a novel method in which whole blood from cases and controls respectively is pooled prior to DNA extraction for genotyping. We demonstrate proof of principle by clearly identifying the associated variants for eye color, age-related macular degeneration and pseudoexfoliation syndrome in cohorts not previously studied. Blood pooling has the potential to reduce GWAS cost by several orders of magnitude and dramatically shorten gene discovery time. This method has profound implications for translation of modern genetic approaches to a multitude of diseases and traits yet to be analysed by GWAS, and will enable developing nations to participate in GWAS.

Dataset Information

Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development.

Publications

Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets