Dataset Information

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

ABSTRACT: Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition.

SUBMITTER: Chan AW

PROVIDER: S-EPMC4990193 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

Chan Ariel W AW Hamblin Martha T MT Jannink Jean-Luc JL

PloS one 20160818 8

Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under suc ...[more]

PMID: 27537694

Dataset Information

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

Publications

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Imputation accuracy of wheat genotyping-by-sequencing (GBS) data using barley and wheat genome references.
| S-EPMC6322752 | biostudies-literature

Efficient imputation of missing markers in low-coverage genotyping-by-sequencing data from multiparental crosses.
| S-EPMC4012496 | biostudies-literature

TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline.
| S-EPMC3938676 | biostudies-literature

Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies.
| S-EPMC4993469 | biostudies-literature

All-FIT: allele-frequency-based imputation of tumor purity from high-depth sequencing data.
| S-EPMC7141867 | biostudies-literature

Discovery of Anthocyanin Acyltransferase1 (AAT1) in Maize Using Genotyping-by-Sequencing (GBS).
| S-EPMC6222571 | biostudies-other

Genotyping-by-sequencing (GBS): a novel, efficient and cost-effective genotyping method for cattle using next-generation sequencing.
| S-EPMC3656875 | biostudies-literature

Using genotyping-by-sequencing (GBS) for genomic discovery in cultivated oat.
| S-EPMC4105502 | biostudies-literature

Genetic Diversity Assessed by Genotyping by Sequencing (GBS) in Watermelon Germplasm.
| S-EPMC6826620 | biostudies-literature

Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes.
| S-EPMC10335927 | biostudies-literature