Unknown

Dataset Information

0

A gradient-boosting approach for filtering de novo mutations in parent-offspring trios.


ABSTRACT:

Motivation

Whole-genome and -exome sequencing on parent-offspring trios is a powerful approach to identifying disease-associated genes by detecting de novo mutations in patients. Accurate detection of de novo mutations from sequencing data is a critical step in trio-based genetic studies. Existing bioinformatic approaches usually yield high error rates due to sequencing artifacts and alignment issues, which may either miss true de novo mutations or call too many false ones, making downstream validation and analysis difficult. In particular, current approaches have much worse specificity than sensitivity, and developing effective filters to discriminate genuine from spurious de novo mutations remains an unsolved challenge.

Results

In this article, we curated 59 sequence features in whole genome and exome alignment context which are considered to be relevant to discriminating true de novo mutations from artifacts, and then employed a machine-learning approach to classify candidates as true or false de novo mutations. Specifically, we built a classifier, named De Novo Mutation Filter (DNMFilter), using gradient boosting as the classification algorithm. We built the training set using experimentally validated true and false de novo mutations as well as collected false de novo mutations from an in-house large-scale exome-sequencing project. We evaluated DNMFilter's theoretical performance and investigated relative importance of different sequence features on the classification accuracy. Finally, we applied DNMFilter on our in-house whole exome trios and one CEU trio from the 1000 Genomes Project and found that DNMFilter could be coupled with commonly used de novo mutation detection approaches as an effective filtering approach to significantly reduce false discovery rate without sacrificing sensitivity.

Availability

The software DNMFilter implemented using a combination of Java and R is freely available from the website at http://humangenome.duke.edu/software.

SUBMITTER: Liu Y 

PROVIDER: S-EPMC4071207 | biostudies-literature | 2014 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

A gradient-boosting approach for filtering de novo mutations in parent-offspring trios.

Liu Yongzhuang Y   Li Bingshan B   Tan Renjie R   Zhu Xiaolin X   Wang Yadong Y  

Bioinformatics (Oxford, England) 20140310 13


<h4>Motivation</h4>Whole-genome and -exome sequencing on parent-offspring trios is a powerful approach to identifying disease-associated genes by detecting de novo mutations in patients. Accurate detection of de novo mutations from sequencing data is a critical step in trio-based genetic studies. Existing bioinformatic approaches usually yield high error rates due to sequencing artifacts and alignment issues, which may either miss true de novo mutations or call too many false ones, making downst  ...[more]

Similar Datasets

| S-EPMC4410659 | biostudies-literature
| S-EPMC3709464 | biostudies-literature
| S-EPMC3576329 | biostudies-literature
| S-EPMC7332647 | biostudies-literature
| PRJNA430295 | ENA
| PRJNA430296 | ENA
| S-EPMC4907378 | biostudies-literature
| S-EPMC4309431 | biostudies-literature