Unknown

Dataset Information

0

Fast analysis of scATAC-seq data using a predefined set of genomic regions.


ABSTRACT: Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods: Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.

SUBMITTER: Giansanti V 

PROVIDER: S-EPMC7308914 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Fast analysis of scATAC-seq data using a predefined set of genomic regions.

Giansanti Valentina V   Tang Ming M   Cittaro Davide D  

F1000Research 20200320


<b>Background:</b> Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. <b>Methods:</b> Public data for 10k PBMC were downloaded fro  ...[more]

Similar Datasets

| S-EPMC9338487 | biostudies-literature
| S-EPMC11459382 | biostudies-literature
| S-EPMC10502032 | biostudies-literature
| S-EPMC11224678 | biostudies-literature
| S-EPMC11316085 | biostudies-literature
| S-EPMC8356148 | biostudies-literature
| S-EPMC10457667 | biostudies-literature
| S-EPMC9238247 | biostudies-literature
| S-EPMC9805575 | biostudies-literature
| S-EPMC9235505 | biostudies-literature