Dataset Information


SNVHMM: predicting single nucleotide variants from next generation sequencing.

ABSTRACT: The rapid development of next generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research. Single nucleotide variants (SNVs) inferred from next generation sequencing are expected to reveal gene mutations in cancer. However, NGS has lower sequence coverage and poor SNVs detection capability in the regulatory regions of the genome. Post probabilistic based methods are efficient for detection of SNVs in high coverage regions or sequencing data with high depth. However, for data with low sequencing depth, the efficiency of such algorithms remains poor and needs to be improved.A new tool SNVHMM basing on a discrete hidden Markov model (HMM) was developed to infer the genotype for each position on the genome. We incorporated the mapping quality of each read and the corresponding base quality on the reads into the emission probability of HMM. The context information of the whole observation as well as its confidence were completely utilized to infer the genotype for each position on the genome in study. Therefore, more probability power can be gained over the Bayes based methods, which is very useful for SNVs detection for data with low sequencing depth. Moreover, our model was verified by testing against two sets of lobular breast tumor and Myelodysplastic Syndromes (MDS) data each. Comparing against a recently published SNVs calling algorithm SNVMix2, our model improved the performance of SNVMix2 largely when the sequencing depth is low and also outperformed SNVMix2 when SNVMix2 is well trained by large datasets.SNVHMM can detect SNVs from NGS cancer data efficiently even if the sequence depth is very low. The training data size can be very small for SNVHMM to work. SNVHMM incorporated the base quality and mapping quality of all observed bases and reads, and also provides the option for users to choose the confidence of the observation for SNVs prediction.


PROVIDER: S-EPMC3718670 | BioStudies | 2013-01-01T00:00:00Z

REPOSITORIES: biostudies

Similar Datasets

2014-01-01 | S-EPMC3906084 | BioStudies
1000-01-01 | S-EPMC2832826 | BioStudies
2015-01-01 | S-EPMC4691076 | BioStudies
2019-01-01 | S-EPMC6738196 | BioStudies
2019-01-01 | S-EPMC6419332 | BioStudies
2014-01-01 | S-EPMC5755963 | BioStudies
2020-01-01 | S-EPMC6972009 | BioStudies
2019-01-01 | S-EPMC6547602 | BioStudies
1000-01-01 | S-EPMC4021345 | BioStudies
2020-01-01 | S-EPMC7300012 | BioStudies