Unknown

Dataset Information

0

RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions.


ABSTRACT: Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare k-mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare k-mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%-90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively.

SUBMITTER: Yang J 

PROVIDER: S-EPMC10107899 | biostudies-literature | 2023 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions.

Yang Jinbao J   Zhao Xianjia X   Jiang Heling H   Yang Yingxue Y   Hou Yuze Y   Pan Weihua W  

Horticulture research 20221229 1


Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare <i>k-</i>mers representing the co  ...[more]

Similar Datasets

| S-EPMC10692869 | biostudies-literature
| S-EPMC3302978 | biostudies-literature
| S-EPMC1208854 | biostudies-literature
| S-EPMC4787761 | biostudies-literature
| S-EPMC11842051 | biostudies-literature
| S-EPMC8322061 | biostudies-literature
| S-EPMC6550178 | biostudies-literature
| S-EPMC1779562 | biostudies-literature
| S-EPMC4112476 | biostudies-literature
| S-EPMC10425934 | biostudies-literature