Unknown

Dataset Information

0

Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey.


ABSTRACT:

Background

The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a randomly sheared, enzymatically generated, 2-3 kbp genome fraction of six pooled Meleagris gallopavo (turkey) individuals.

Results

A total of 100 million 36 bp reads were generated, representing approximately 5-6% (approximately 62 Mbp) of the turkey genome, with an estimated sequence depth of 58. Reads consisting of bases called with less than 1% error probability were selected and assembled into contigs. Subsequently, high throughput discovery of nucleotide variation was performed using sequences with more than 90% reliability by using the assembled contigs that were 50 bp or longer as the reference sequence. We identified more than 7,500 SNPs with a high probability of representing true nucleotide variation in turkeys. Increasing the reference genome by adding publicly available turkey BAC-end sequences increased the number of SNPs to over 11,000. A comparison with the sequenced chicken genome indicated that the assembled turkey contigs were distributed uniformly across the turkey genome. Genotyping of a representative sample of 340 SNPs resulted in a SNP conversion rate of 95%. The correlation of the minor allele count (MAC) and observed minor allele frequency (MAF) for the validated SNPs was 0.69.

Conclusion

We provide an efficient and cost-effective approach for the identification of thousands of high quality SNPs in species currently lacking a sequenced genome and applied this to turkey. The methodology addresses a random fraction of the genome, resulting in an even distribution of SNPs across the targeted genome.

SUBMITTER: Kerstens HH 

PROVIDER: S-EPMC2772860 | biostudies-literature | 2009 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey.

Kerstens Hindrik H D HH   Crooijmans Richard P M A RP   Veenendaal Albertine A   Dibbits Bert W BW   Chin-A-Woeng Thomas F C TF   den Dunnen Johan T JT   Groenen Martien A M MA  

BMC genomics 20091016


<h4>Background</h4>The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a rando  ...[more]

Similar Datasets

| S-EPMC3146956 | biostudies-literature
| S-EPMC5325534 | biostudies-literature
| S-EPMC5015895 | biostudies-literature
| S-EPMC521117 | biostudies-literature
| S-EPMC310784 | biostudies-literature
| S-EPMC4893331 | biostudies-literature
| S-EPMC2789104 | biostudies-literature
| S-EPMC1540726 | biostudies-literature
| S-EPMC3128068 | biostudies-literature
| S-EPMC117295 | biostudies-literature