Genomics

Dataset Information

0

High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios


ABSTRACT: We performed whole genome sequencing (WGS) of 3,202 samples from the 1000 Genomes Project (1kGP) collection, including 602 trios. All samples were sequenced to a targeted depth of 30X genome coverage using Illumina NovaSeq 6000 instruments. We aligned reads to the GRCh38 reference and performed single nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery using GATK HaplotypeCaller. We also discovered and genotyped a comprehensive set of structural variants (SVs), including insertions, deletions, duplications, inversions, and multiallelic copy number variants, by integrating multiple algorithms and analytic pipelines, including GATK-SV, svtools, and Absinthe. In addition to the small variant and SV call sets, we also generated an integrated haplotype-resolved SNV/INDEL/SV call set that can be used as a reference panel for imputation. For that, we first performed haplotype phasing of high quality non-singleton SNVs and INDELs across the 3,202-sample 1kGP cohort using statistical phasing with pedigree-based correction. Next, we used the phased SNV/INDEL call set as a haplotype scaffold onto which we phased high quality non-singleton SV calls. For more information, please visit https://www.internationalgenome.org/data-portal/data-collection/30x-grch38.

INSTRUMENT(S): Illumina NovaSeq 6000

ORGANISM(S): Homo Sapiens

SUBMITTER: New York Genome Center 

PROVIDER: PRJEB55077 | EVA | 2022-07-30

REPOSITORIES: EVA

Similar Datasets

| EGAD00001005916 | EGA
| EGAD00001003455 | EGA
2012-03-03 | GSE36217 | GEO
| EGAD00001004127 | EGA
2012-03-03 | E-GEOD-36217 | biostudies-arrayexpress
2014-03-05 | E-GEOD-55586 | biostudies-arrayexpress
2018-12-05 | GSE120099 | GEO
| PRJNA1001912 | ENA
2014-03-05 | GSE55586 | GEO
| phs000572 | dbGaP