Unknown

Dataset Information

0

Rapid and sensitive detection of genome contamination at scale with FCS-GX.


ABSTRACT: Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 minutes. Testing FCS-GX on artificially fragmented genomes demonstrates sensitivity >95% for diverse contaminant species and specificity >99.93%. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination (0.16% of total bases), with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/.

SUBMITTER: Astashyn A 

PROVIDER: S-EPMC10246020 | biostudies-literature | 2023 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications


Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 minutes. Testing FCS-GX on artificially fragmented genomes demonstrates sensitivity >95% for diverse contaminant species and specificity >99.93%. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of conta  ...[more]

Similar Datasets

| S-EPMC10898089 | biostudies-literature
| S-EPMC6546082 | biostudies-literature
| S-EPMC10831095 | biostudies-literature
| S-EPMC6863201 | biostudies-literature
| S-EPMC8348754 | biostudies-literature
| S-EPMC6838224 | biostudies-literature
| S-EPMC10157669 | biostudies-literature
| S-EPMC3151148 | biostudies-literature
2021-01-30 | GSE165780 | GEO
| S-EPMC4121612 | biostudies-other