High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios
Ontology highlight
ABSTRACT: We performed whole genome sequencing (WGS) of 3,202 samples from the 1000 Genomes Project (1kGP) collection, including 602 trios. All samples were sequenced to a targeted depth of 30X genome coverage using Illumina NovaSeq 6000 instruments. We aligned reads to the GRCh38 reference and performed single nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery using GATK HaplotypeCaller. We also discovered and genotyped a comprehensive set of structural variants (SVs), including insertions, deletions, duplications, inversions, and multiallelic copy number variants, by integrating multiple algorithms and analytic pipelines, including GATK-SV, svtools, and Absinthe. In addition to the small variant and SV call sets, we also generated an integrated haplotype-resolved SNV/INDEL/SV call set that can be used as a reference panel for imputation. For that, we first performed haplotype phasing of high quality non-singleton SNVs and INDELs across the 3,202-sample 1kGP cohort using statistical phasing with pedigree-based correction. Next, we used the phased SNV/INDEL call set as a haplotype scaffold onto which we phased high quality non-singleton SV calls. For more information, please visit https://www.internationalgenome.org/data-portal/data-collection/30x-grch38.
INSTRUMENT(S): Illumina NovaSeq 6000
ORGANISM(S): Homo Sapiens
SUBMITTER: New York Genome Center
PROVIDER: PRJEB55077 | EVA | 2022-07-30
REPOSITORIES: EVA
ACCESS DATA