Dataset Information

DNApod: DNA polymorphism annotation database from next-generation sequence read archives.

ABSTRACT: With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information.

SUBMITTER: Mochizuki T

PROVIDER: S-EPMC5325239 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

DNApod: DNA polymorphism annotation database from next-generation sequence read archives.

Mochizuki Takako T Tanizawa Yasuhiro Y Fujisawa Takatomo T Ohta Tazro T Nikoh Naruo N Shimizu Tokurou T Toyoda Atsushi A Fujiyama Asao A Kurata Nori N Nagasaki Hideki H Kaminuma Eli E Nakamura Yasukazu Y

PloS one 20170224 2

With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform qua ...[more]

PMID: 28234924

Dataset Information

DNApod: DNA polymorphism annotation database from next-generation sequence read archives.

Publications

DNApod: DNA polymorphism annotation database from next-generation sequence read archives.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Exome-wide benchmark of difficult-to-sequence regions using short-read next-generation DNA sequencing.
| S-EPMC10783491 | biostudies-literature

VibrioBase: a model for next-generation genome and annotation database development.
| S-EPMC4138799 | biostudies-literature

SeqVItA: Sequence Variant Identification and Annotation Platform for Next Generation Sequencing Data.
| S-EPMC6247818 | biostudies-literature

ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using next generation sequence.
| S-EPMC3124440 | biostudies-literature

DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.
| S-EPMC3738164 | biostudies-literature

STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions.
| S-EPMC8450716 | biostudies-literature

ART: a next-generation sequencing read simulator.
| S-EPMC3278762 | biostudies-literature

SeqWiz: a modularized toolkit for next-generation protein sequence database management and analysis.
| S-EPMC10189941 | biostudies-literature

DDBJ launches a new archive database with analytical tools for next-generation sequence data.
| S-EPMC2808917 | biostudies-literature

WikiPathways 2024: next generation pathway database.
| S-EPMC10767877 | biostudies-literature