Unknown

Dataset Information

0

NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data.


ABSTRACT:

Motivation

Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation.

Results

NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs.

Availability and implementation

Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.

SUBMITTER: Linderman MD 

PROVIDER: S-EPMC10955255 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data.

Linderman Michael D MD   Wallace Jacob J   van der Heyde Alderik A   Wieman Eliza E   Brey Daniel D   Shi Yiran Y   Hansen Peter P   Shamsi Zahra Z   Liu Jeremiah J   Gelb Bruce D BD   Bashir Ali A  

Bioinformatics (Oxford, England) 20240301 3


<h4>Motivation</h4>Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation.<h4>Results</h4>NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs t  ...[more]

Similar Datasets

| S-EPMC11373317 | biostudies-literature
| S-EPMC8138798 | biostudies-literature
| S-EPMC5860216 | biostudies-literature
| S-EPMC3483208 | biostudies-literature
| S-EPMC6771380 | biostudies-literature
| S-EPMC9481040 | biostudies-literature
| S-EPMC10207598 | biostudies-literature
| S-EPMC9034514 | biostudies-literature
| S-EPMC4301848 | biostudies-literature
| S-EPMC4494700 | biostudies-literature