Unknown

Dataset Information

0

LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets.


ABSTRACT: The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.

SUBMITTER: Wilm A 

PROVIDER: S-EPMC3526318 | biostudies-literature | 2012 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets.

Wilm Andreas A   Aw Pauline Poh Kim PP   Bertrand Denis D   Yeo Grace Hui Ting GH   Ong Swee Hoe SH   Wong Chang Hua CH   Khor Chiea Chuen CC   Petric Rosemary R   Hibberd Martin Lloyd ML   Nagarajan Niranjan N  

Nucleic acids research 20121012 22


The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call varia  ...[more]

Similar Datasets

| S-EPMC7182099 | biostudies-literature
| S-EPMC8034624 | biostudies-literature
| S-EPMC5570013 | biostudies-literature
| S-EPMC8414796 | biostudies-literature
| S-EPMC7245202 | biostudies-literature
| S-EPMC4492008 | biostudies-other