Dataset Information


TMC-SNPdb: An Indian germline variant dataset derived from whole exome sequence

ABSTRACT: Cancer is predominantly a somatic disease. A mutant allele found in cancer cell genome is considered somatic when it is absent in paired normal genome and dbSNP, the most comprehensive public SNP database. However, dbSNP inadequately represents several non-Caucasian populations including that from the Indian subcontinent, posing a limitation in cancer genomic analyses of data from these populations. We present TMC-SNPdb, as the first open source freely accessible (through ANNOVAR), flexible and upgradable SNP database from whole exome data of 62 normal samples derived from cancer patients of Indian origin, representing 114,309 unique germline variants. TMC-SNPdb is presented with a companion subtraction tool that can be executed with command line option or an easy-to-use graphical user interface (GUI) with the ability to deplete additional Indian population specific SNPs over and above that possible with dbSNP and 1000 Genomes databases. Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb reduced 42%, 33% and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively. Beyond cancer somatic analyses, we anticipate utility of TMC-SNPdb in several Mendelian germline diseases.

INSTRUMENT(S): Illumina HiSeq 1500, Illumina Genome Analyzer IIx, NextSeq 500, Illumina HiSeq 2000

ORGANISM(S): Homo sapiens  

SUBMITTER: Amit Dutt  

PROVIDER: E-MTAB-4618 | ArrayExpress | 2018-06-06



Similar Datasets

2015-11-27 | E-MTAB-3961 | ArrayExpress
2015-10-29 | E-MTAB-3958 | ArrayExpress
2015-11-01 | E-MTAB-3960 | ArrayExpress
| GSE84430 | GEO
| GSE84428 | GEO
2018-03-15 | E-MTAB-6467 | ArrayExpress
| GSE94378 | GEO
2019-05-07 | PXD012845 | Pride
| PRJNA438039 | ENA
| PRJNA438859 | ENA