Project description:Accurate and rapid typing of pathogens is essential for effective surveillance and outbreak detection. Conventional serotyping of Escherichia coli is a delicate, laborious, time-consuming, and expensive procedure. With whole-genome sequencing (WGS) becoming cheaper, it has vast potential in routine typing and surveillance. The aim of this study was to establish a valid and publicly available tool for WGS-based in silico serotyping of E. coli applicable for routine typing and surveillance. A FASTA database of specific O-antigen processing system genes for O typing and flagellin genes for H typing was created as a component of the publicly available Web tools hosted by the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org). All E. coli isolates available with WGS data and conventional serotype information were subjected to WGS-based serotyping employing this specific SerotypeFinder CGE tool. SerotypeFinder was evaluated on 682 E. coli genomes, 108 of which were sequenced for this study, where both the whole genome and the serotype were available. In total, 601 and 509 isolates were included for O and H typing, respectively. The O-antigen genes wzx, wzy, wzm, and wzt and the flagellin genes fliC, flkA, fllA, flmA, and flnA were detected in 569 and 508 genome sequences, respectively. SerotypeFinder for WGS-based O and H typing predicted 560 of 569 O types and 504 of 508 H types, consistent with conventional serotyping. In combination with other available WGS typing tools, E. coli serotyping can be performed solely from WGS data, providing faster and cheaper typing than current routine procedures and making WGS typing a superior alternative to conventional typing strategies.
Project description:Public health and food safety institutions around the world are adopting whole genome sequencing (WGS) to replace conventional methods for characterizing Salmonella for use in surveillance and outbreak response. Falling costs and increased throughput of WGS have resulted in an explosion of data, but questions remain as to the reliability and robustness of the data. Due to the critical importance of serovar information to public health, it is essential to have reliable serovar assignments available for all of the Salmonella records. The current study used a systematic assessment and curation of all Salmonella in the sequence read archive (SRA) to assess the state of the data and their utility. A total of 67?758 genomes were assembled de novo and quality-assessed for their assembly metrics as well as species and serovar assignments. A total of 42?400 genomes passed all of the quality criteria but 30.16?% of genomes were deposited without serotype information. These data were used to compare the concordance of reported and predicted serovars for two in silico prediction tools, multi-locus sequence typing (MLST) and the Salmonella in silico Typing Resource (SISTR), which produced predictions that were fully concordant with 87.51 and 91.91?% of the tested isolates, respectively. Concordance of in silico predictions increased when serovar variants were grouped together, 89.25?% for MLST and 94.98?% for SISTR. This study represents the first large-scale validation of serovar information in public genomes and provides a large validated set of genomes, which can be used to benchmark new bioinformatics tools.
Project description:We compared the performance of four open-source in silico Salmonella typing tools (SeqSero, SeqSero2, Salmonella In Silico Typing Resource [SISTR], and Metric Oriented Sequence Typer [MOST]) to assess their potential for replacing laboratory serological testing with serovar predictions from whole-genome sequencing data. We conducted a retrospective analysis of 1,624 Salmonella isolates of 72 serovars submitted to the German National Salmonella Reference Laboratory between 1999 and 2019. All isolates are derived from animal and foodstuff origins. We conducted Illumina short-read sequencing and compared the in silico serovar prediction results with the results of routine laboratory serotyping. We found the best-performing in silico serovar prediction tool to be SISTR, with 94% correctly typed isolates, followed by SeqSero2 (87%), SeqSero (81%), and MOST (79%). Furthermore, we found that mapping-based tools like SeqSero and SeqSero2 (allele mode) were more reliable for the prediction of monophasic variants, while sequence type and cluster-based methods like MOST and SISTR (core-genome multilocus sequence type [cgMLST]), showed greater resilience when confronted with GC-biased sequencing data. We showed that the choice of library preparation kit could substantially affect O antigen detection, due to the low GC content of the wzx and wzy genes. Although the accuracy of computational serovar predictions is still not quite on par with traditional serotyping by Salmonella reference laboratories, the command-line tools investigated in this study perform a rapid, efficient, inexpensive, and reproducible analysis, which can be integrated into in-house characterization pipelines. Based on our results, we find SISTR most suitable for automated, routine serotyping for public health surveillance of Salmonella IMPORTANCE Salmonella spp. are important foodborne pathogens. To reduce the number of infected patients, it is essential to understand which subtypes of the bacteria cause disease outbreaks. Traditionally, characterization of Salmonella requires serological testing, a laboratory method by which Salmonella isolates can be classified into over 2,600 distinct subtypes, called serovars. Due to recent advances in whole-genome sequencing, many tools have been developed to replace traditional testing methods with computational analysis of genome sequences. It is crucial to validate that these tools, many already in use for routine surveillance, deliver accurate and reliable serovar information. In this study, we set out to compare which of the currently available open-source command-line tools is most suitable to replace serological testing. A thorough evaluation of the differing computational approaches is highly important to ensure the backward compatibility of serotyping data and to maintain comparability between laboratories.
Project description:Salmonella enterica subspecies enterica is a highly diverse subspecies with more than 1500 serovars and the ability to distinguish serovars within this group is vital for surveillance. With the development of whole-genome sequencing technology, serovar prediction by traditional serotyping is being replaced by molecular serotyping. Existing in silico serovar prediction approaches utilize surface antigen encoding genes, core genome MLST and serovar-specific gene markers or DNA fragments for serotyping. However, these serovar-specific gene markers or DNA fragments only distinguished a small number of serovars. In this study, we compared 2258 Salmonella accessory genomes to identify 414 candidate serovar-specific or lineage-specific gene markers for 106 serovars which includes 24 polyphyletic serovars and the paraphyletic serovar Enteritidis. A combination of several lineage-specific gene markers can be used for the clear identification of the polyphyletic serovars and the paraphyletic serovar. We designed and evaluated an in silico serovar prediction approach by screening 1089 genomes representing 106 serovars against a set of 131 serovar-specific gene markers. The presence or absence of one or more serovar-specific gene markers was used to predict the serovar of an isolate from genomic data. We show that serovar-specific gene markers have comparable accuracy to other in silico serotyping methods with 84.8% of isolates assigned to the correct serovar with no false positives (FP) and false negatives (FN) and 10.5% of isolates assigned to a small subset of serovars containing the correct serovar with varied FP. Combined, 95.3% of genomes were correctly assigned to a serovar. This approach would be useful as diagnosis moves to culture-independent and metagenomic methods as well as providing a third alternative to confirm other genome-based analyses. The identification of a set of gene markers may also be useful in the development of more cost-effective molecular assays designed to detect specific gene markers of the all major serovars in a region. These assays would be useful in serotyping isolates where cultures are no longer obtained and traditional serotyping is therefore impossible.
Project description:AIM:Members of the genus Citrobacter are important opportunistic pathogens responsible for high mortality rate. Therefore, in this study, we aimed to develop efficient and accurate Citrobacter typing schemes for clinical detection and epidemiological surveillance. MATERIALS & METHODS:Using genomic and experimental analyses, we located the O-antigen biosynthesis gene clusters in Citrobacter genome for the first time, and used comparative genomic analyses to reveal the specific genes in different Citrobacter serotypes. RESULTS:Based on the specific genes in O-antigen biosynthesis gene clusters of Citrobacter, we established experimental and in silico serotyping systems for this bacterium. CONCLUSION:Both serotyping tools are reliable, and our observations are biologically and clinically relevant for understanding and managing Citrobacter infection.
Project description:Until recently, traditional serology and the Kauffmann White Scheme (KWS) have been the gold standard for <i>Salmonella</i> serotyping. Whole Genome Sequencing (WGS) has now emerged as an alternative in this field. Serotype information remains a cornerstone in food safety and public health activities to reduce the burden of salmonellosis. At the same time, recent advances in WGS have improved the ability to perform advanced pathogen characterization while improving trace back investigations to determine the source of foodborne illness during outbreaks. Serovar prediction based on WGS can be performed using <i>in silico</i> data analysis tools. Three such tools have been developed: (a). <i>Salmonella in silico</i> Typing Resource (SISTR), (b). SeqSero, and (c). <i>in silico</i> 7-gene MLST ST (Multilocus Sequence Typing Sub-Typing) which was generated using the SISTR platform. Public health officials around the world are diligently working to validate these tools for replacing traditional surveillance methods to provide a more powerful approach for molecular epidemiology in support of public health investigations. In this study, we report a retrospective analysis of our laboratory inventory of 1,041 <i>Salmonella</i> isolates collected between 1999 and 2017. These isolates are of public health significance since they all came from either food, feed or environmental swabs. They were all serotyped by both traditional serology and WGS using an <i>in silico</i> SeqSero tool for serovar prediction. Both predicted identical <i>Salmonella</i> serotypes in 899 isolates (86.4% of the 1,041 <i>Salmonella</i> isolates). SeqSero assignments differed from traditional serological testing in 80 isolates (7.7%) and no serotype prediction was ascertained from 62 isolates (5.9%). This retrospective study is an excellent example of using WGS and SeqSero as a data analysis tool to predict <i>Salmonella</i> serotypes that can provide numerous advantages including molecular and genetic details regarding the characteristics of the <i>Salmonella</i> isolates compared to traditional KWS serotyping. In conclusion, it is evident that using WGS and <i>in silico</i> tools for <i>Salmonella</i> serotyping might someday replace traditional serotyping.
Project description:Accurate typing methods are required for efficient infection control. The emergence of whole-genome sequencing (WGS) technologies has enabled the development of genome-based methods applicable for routine typing and surveillance of bacterial pathogens. In this study, we developed the Pseudomonas aeruginosa serotyper (PAst) program, which enabled in silico serotyping of P. aeruginosa isolates using WGS data. PAst has been made publically available as a web service and aptly facilitates high-throughput serotyping analysis. The program overcomes critical issues such as the loss of in vitro typeability often associated with P. aeruginosa isolates from chronic infections and quickly determines the serogroup of an isolate based on the sequence of the O-specific antigen (OSA) gene cluster. Here, PAst analysis of 1,649 genomes resulted in successful serogroup assignments in 99.27% of the cases. This frequency is rarely achievable by conventional serotyping methods. The limited number of nontypeable isolates found using PAst was the result of either a complete absence of OSA genes in the genomes or the artifact of genomic misassembly. With PAst, P. aeruginosa serotype data can be obtained from WGS information alone. PAst is a highly efficient alternative to conventional serotyping methods in relation to outbreak surveillance of serotype O12 and other high-risk clones, while maintaining backward compatibility to historical serotype data.
Project description:In the work presented here, we designed and developed two easy-to-use Web tools for in silico detection and characterization of whole-genome sequence (WGS) and whole-plasmid sequence data from members of the family Enterobacteriaceae. These tools will facilitate bacterial typing based on draft genomes of multidrug-resistant Enterobacteriaceae species by the rapid detection of known plasmid types. Replicon sequences from 559 fully sequenced plasmids associated with the family Enterobacteriaceae in the NCBI nucleotide database were collected to build a consensus database for integration into a Web tool called PlasmidFinder that can be used for replicon sequence analysis of raw, contig group, or completely assembled and closed plasmid sequencing data. The PlasmidFinder database currently consists of 116 replicon sequences that match with at least at 80% nucleotide identity all replicon sequences identified in the 559 fully sequenced plasmids. For plasmid multilocus sequence typing (pMLST) analysis, a database that is updated weekly was generated from www.pubmlst.org and integrated into a Web tool called pMLST. Both databases were evaluated using draft genomes from a collection of Salmonella enterica serovar Typhimurium isolates. PlasmidFinder identified a total of 103 replicons and between zero and five different plasmid replicons within each of 49 S. Typhimurium draft genomes tested. The pMLST Web tool was able to subtype genomic sequencing data of plasmids, revealing both known plasmid sequence types (STs) and new alleles and ST variants. In conclusion, testing of the two Web tools using both fully assembled plasmid sequences and WGS-generated draft genomes showed them to be able to detect a broad variety of plasmids that are often associated with antimicrobial resistance in clinically relevant bacterial pathogens.
Project description:<i>Shigella</i> and enteroinvasive <i>Escherichia coli</i> (EIEC) cause human bacillary dysentery with similar invasion mechanisms and share similar physiological, biochemical and genetic characteristics. Differentiation of <i>Shigella</i> from EIEC is important for clinical diagnostic and epidemiological investigations. However, phylogenetically, <i>Shigella</i> and EIEC strains are composed of multiple clusters and are different forms of <i>E. coli</i>, making it difficult to find genetic markers to discriminate between <i>Shigella</i> and EIEC. In this study, we identified 10 <i>Shigella</i> clusters, seven EIEC clusters and 53 sporadic types of EIEC by examining over 17000 publicly available <i>Shigella</i> and EIEC genomes. We compared <i>Shigella</i> and EIEC accessory genomes to identify cluster-specific gene markers for the 17 clusters and 53 sporadic types. The cluster-specific gene markers showed 99.64% accuracy and more than 97.02% specificity. In addition, we developed a freely available <i>in silico</i> serotyping pipeline named <i>Shigella</i> EIEC Cluster Enhanced Serotype Finder (ShigEiFinder) by incorporating the cluster-specific gene markers and established <i>Shigella</i> and EIEC serotype-specific O antigen genes and modification genes into typing. ShigEiFinder can process either paired-end Illumina sequencing reads or assembled genomes and almost perfectly differentiated <i>Shigella</i> from EIEC with 99.70 and 99.74% cluster assignment accuracy for the assembled genomes and read mapping respectively. ShigEiFinder was able to serotype over 59 <i>Shigella</i> serotypes and 22 EIEC serotypes and provided a high specificity of 99.40% for assembled genomes and 99.38% for read mapping for serotyping. The cluster-specific gene markers and our new serotyping tool, ShigEiFinder (installable package: https://github.com/LanLab/ShigEiFinder, online tool: https://mgtdb.unsw.edu.au/ShigEiFinder/), will be useful for epidemiological and diagnostic investigations.
Project description:Salmonella serotyping remains the gold-standard tool for the classification of Salmonella isolates and forms the basis of Canada's national surveillance program for this priority foodborne pathogen. Public health officials have been increasingly looking toward whole genome sequencing (WGS) to provide a large set of data from which all the relevant information about an isolate can be mined. However, rigorous validation and careful consideration of potential implications in the replacement of traditional surveillance methodologies with WGS data analysis tools is needed. Two in silico tools for Salmonella serotyping have been developed, the Salmonella in silico Typing Resource (SISTR) and SeqSero, while seven gene MLST for serovar prediction can be adapted for in silico analysis. All three analysis methods were assessed and compared to traditional serotyping techniques using a set of 813 verified clinical and laboratory isolates, including 492 Canadian clinical isolates and 321 isolates of human and non-human sources. Successful results were obtained for 94.8, 88.2, and 88.3% of the isolates tested using SISTR, SeqSero, and MLST, respectively, indicating all would be suitable for maintaining historical records, surveillance systems, and communication structures currently in place and the choice of the platform used will ultimately depend on the users need. Results also pointed to the need to reframe serotyping in the genomic era as a test to understand the genes that are carried by an isolate, one which is not necessarily congruent with what is antigenically expressed. The adoption of WGS for serotyping will provide the simultaneous collection of information that can be used by multiple programs within the current surveillance paradigm; however, this does not negate the importance of the various programs or the role of serotyping going forward.