Unknown

Dataset Information

0

ECOLE: Learning to call copy number variants on whole exome sequencing data.


ABSTRACT: Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders. Accurate detection of CNVs on whole exome sequencing (WES) data has been a long sought-after goal for use in clinics. This was not possible despite recent improvements in performance because algorithms mostly suffer from low precision and even lower recall on expert-curated gold standard call sets. Here, we present a deep learning-based somatic and germline CNV caller for WES data, named ECOLE. Based on a variant of the transformer architecture, the model learns to call CNVs per exon, using high-confidence calls made on matched WGS samples. We further train and fine-tune the model with a small set of expert calls via transfer learning. We show that ECOLE achieves high performance on human expert labelled data for the first time with 68.7% precision and 49.6% recall. This corresponds to precision and recall improvements of 18.7% and 30.8% over the next best-performing methods, respectively. We also show that the same fine-tuning strategy using tumor samples enables ECOLE to detect RT-qPCR-validated variations in bladder cancer samples without the need for a control sample. ECOLE is available at https://github.com/ciceklab/ECOLE .

SUBMITTER: Mandiracioglu B 

PROVIDER: S-EPMC10762021 | biostudies-literature | 2024 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

ECOLE: Learning to call copy number variants on whole exome sequencing data.

Mandiracioglu Berk B   Ozden Furkan F   Kaynar Gun G   Yilmaz Mehmet Alper MA   Alkan Can C   Cicek A Ercument AE  

Nature communications 20240102 1


Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders. Accurate detection of CNVs on whole exome sequencing (WES) data has been a long sought-after goal for use in clinics. This was not possible despite recent improvements in performance because algorithms mostly suffer from low precision and even lower recall on expert-curated gold standard call sets. Here, we present a deep learning-based somatic and germline CNV caller for WES data, named ECOLE. Based  ...[more]

Similar Datasets

| S-EPMC4053953 | biostudies-literature
| S-EPMC4081054 | biostudies-literature
| S-EPMC5175347 | biostudies-literature
| S-EPMC4849420 | biostudies-literature
| S-EPMC6283367 | biostudies-literature
| S-EPMC4526866 | biostudies-literature
| S-EPMC6126229 | biostudies-literature
| S-EPMC9248885 | biostudies-literature
| S-EPMC7604644 | biostudies-literature
| S-EPMC6633262 | biostudies-literature