Unknown

Dataset Information

0

Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq.


ABSTRACT: Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost 'capture' method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.

SUBMITTER: Massarat AR 

PROVIDER: S-EPMC8373110 | biostudies-literature | 2021 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq.

Massarat Arya R AR   Sen Arko A   Jaureguy Jeff J   Tyndale Sélène T ST   Fu Yi Y   Erikson Galina G   McVicker Graham G  

Nucleic acids research 20210801 14


Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost 'capture' method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven varia  ...[more]

Similar Datasets

| S-EPMC8289380 | biostudies-literature
| S-EPMC10638764 | biostudies-literature
| S-EPMC8699717 | biostudies-literature
| S-EPMC4546161 | biostudies-literature
| S-EPMC10094055 | biostudies-literature
| S-EPMC7523641 | biostudies-literature
| S-EPMC7934446 | biostudies-literature
| S-EPMC10657386 | biostudies-literature
| S-EPMC10776385 | biostudies-literature
| S-EPMC10539354 | biostudies-literature