Unknown

Dataset Information

0

Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project.


ABSTRACT: Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.

SUBMITTER: Gong B 

PROVIDER: S-EPMC10963753 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project.

Gong Binsheng B   Li Dan D   Zhang Yifan Y   Kusko Rebecca R   Lababidi Samir S   Cao Zehui Z   Chen Mingyang M   Chen Ning N   Chen Qiaochu Q   Chen Qingwang Q   Dai Jiacheng J   Gan Qiang Q   Gao Yuechen Y   Guo Mingkun M   Hariani Gunjan G   He Yujie Y   Hou Wanwan W   Jiang He H   Kushwaha Garima G   Li Jian-Liang JL   Li Jianying J   Li Yulan Y   Liu Liang-Chun LC   Liu Ruimei R   Liu Shiming S   Meriaux Edwin E   Mo Mengqing M   Moore Mathew M   Moss Tyler J TJ   Niu Quanne Q   Patel Ananddeep A   Ren Luyao L   Saremi Nedda F NF   Shang Erfei E   Shang Jun J   Song Ping P   Sun Siqi S   Urban Brent J BJ   Wang Danke D   Wang Shangzi S   Wen Zhining Z   Xiong Xiangyi X   Yang Jingcheng J   Yin Lihui L   Zhang Chao C   Zhang Ruolan R   Bhandari Ambica A   Cai Wanshi W   Eterovic Agda Karina AK   Megherbi Dalila B DB   Shi Tieliu T   Suo Chen C   Yu Ying Y   Zheng Yuanting Y   Novoradovskaya Natalia N   Sears Renee L RL   Shi Leming L   Jones Wendell W   Tong Weida W   Xu Joshua J  

Scientific reports 20240325 1


Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant b  ...[more]

Similar Datasets

| S-EPMC7791862 | biostudies-literature
| S-EPMC5517277 | biostudies-other
| S-EPMC4725912 | biostudies-other
2006-09-08 | GSE5350 | GEO
| S-EPMC11790059 | biostudies-literature
2017-01-20 | GSE93848 | GEO
| S-EPMC4585973 | biostudies-literature
| S-EPMC6142223 | biostudies-literature
| S-EPMC7820859 | biostudies-literature
2010-09-10 | GSE24061 | GEO