High Interlaboratory Reproducibility and Accuracy of Next-Generation-Sequencing-Based Bacterial Genotyping in a Ring Trial.
ABSTRACT: Today, next-generation whole-genome sequencing (WGS) is increasingly used to determine the genetic relationships of bacteria on a nearly whole-genome level for infection control purposes and molecular surveillance. Here, we conducted a multicenter ring trial comprising five laboratories to determine the reproducibility and accuracy of WGS-based typing. The participating laboratories sequenced 20 blind-coded Staphylococcus aureus DNA samples using 250-bp paired-end chemistry for library preparation in a single sequencing run on an Illumina MiSeq sequencer. The run acceptance criteria were sequencing outputs >5.6 Gb and Q30 read quality scores of >75%. Subsequently, spa typing, multilocus sequence typing (MLST), ribosomal MLST, and core genome MLST (cgMLST) were performed by the participants. Moreover, discrepancies in cgMLST target sequences in comparisons with the included and also published sequence of the quality control strain ATCC 25923 were resolved using Sanger sequencing. All five laboratories fulfilled the run acceptance criteria in a single sequencing run without any repetition. Of the 400 total possible typing results, 394 of the reported spa types, sequence types (STs), ribosomal STs (rSTs), and cgMLST cluster types were correct and identical among all laboratories; only six typing results were missing. An analysis of cgMLST allelic profiles corroborated this high reproducibility; only 3 of 183,927 (0.0016%) cgMLST allele calls were wrong. Sanger sequencing confirmed all 12 discrepancies of the ring trial results in comparison with the published sequence of ATCC 25923. In summary, this ring trial demonstrated the high reproducibility and accuracy of current next-generation sequencing-based bacterial typing for molecular surveillance when done with nearly completely locked-down methods.
Project description:Staphylococcus aureus is a leading cause of bacteremia in hospitalized patients. Whether or not S. aureus bacteremia (SAB) is associated with clonality, implicating potential nosocomial transmission, has not, however, been investigated. Herein, we examined the epidemiology of SAB using whole genome sequencing (WGS). 152 SAB isolates collected over the course of 2015 at a single large Minnesota medical center were studied. Staphylococcus protein A (spa) typing was performed by PCR/Sanger sequencing; multilocus sequence typing (MLST) and core genome MLST (cgMLST) were determined by WGS. Forty-eight isolates (32%) were methicillin-resistant S. aureus (MRSA). The isolates encompassed 66 spa types, clustered into 11 spa clonal complexes (CCs) and 10 singleton types. 88% of 48 MRSA isolates belonged to spa CC-002 or -008. Methicillin-susceptible S. aureus (MSSA) isolates were more genotypically diverse, with 61% distributed across four spa CCs (CC-002, CC-012, CC-008 and CC-084). By MLST, there was 31 sequence types (STs), including 18 divided into 6 CCs and 13 singleton STs. Amongst MSSA isolates, the common MLST clones were CC5 (23%), CC30 (19%), CC8 (15%) and CC15 (11%). Common MRSA clones were CC5 (67%) and CC8 (25%); there were no MRSA isolates in CC45 or CC30. By cgMLST analysis, there were 9 allelic differences between two isolates, with the remaining 150 isolates differing from each other by over 40 alleles. The two isolates were retroactively epidemiologically linked by medical record review. Overall, cgMLST analysis resulted in higher resolution epidemiological typing than did multilocus sequence or spa typing.
Project description:For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
Project description:Enterococcus faecium, a common inhabitant of the human gut, has emerged in the last 2 decades as an important multidrug-resistant nosocomial pathogen. Since the start of the 21st century, multilocus sequence typing (MLST) has been used to study the molecular epidemiology of E. faecium. However, due to the use of a small number of genes, the resolution of MLST is limited. Whole-genome sequencing (WGS) now allows for high-resolution tracing of outbreaks, but current WGS-based approaches lack standardization, rendering them less suitable for interlaboratory prospective surveillance. To overcome this limitation, we developed a core genome MLST (cgMLST) scheme for E. faecium. cgMLST transfers genome-wide single nucleotide polymorphism(SNP) diversity into a standardized and portable allele numbering system that is far less computationally intensive than SNP-based analysis of WGS data. The E. faecium cgMLST scheme was built using 40 genome sequences that represented the diversity of the species. The scheme consists of 1,423 cgMLST target genes. To test the performance of the scheme, we performed WGS analysis of 103 outbreak isolates from five different hospitals in the Netherlands, Denmark, and Germany. The cgMLST scheme performed well in distinguishing between epidemiologically related and unrelated isolates, even between those that had the same sequence type (ST), which denotes the higher discriminatory power of this cgMLST scheme over that of conventional MLST. We also show that in terms of resolution, the performance of the E. faecium cgMLST scheme is equivalent to that of an SNP-based approach. In conclusion, the cgMLST scheme developed in this study facilitates rapid, standardized, and high-resolution tracing of E. faecium outbreaks.
Project description:We report the investigation of an outbreak situation of methicillin-resistant Staphylococcus aureus (MRSA) that occurred at the Academic Hospital Paramaribo (AZP) in the Republic of Suriname from April to May 2013. We performed whole genome sequencing with complete gap closure for chromosomes and plasmids on all isolates. The outbreak involved 12 patients and 1 healthcare worker/nurse at the AZP. In total 24 isolates were investigated. spa typing, genome-wide single nucleotide polymorphism (SNP) analysis, ad hoc whole genome multilocus sequence typing (wgMLST), stable core genome MLST (cgMLST) and in silico PFGE were used to determine phylogenetic relatedness and to identify transmission. Whole-genome sequencing (WGS) showed that all isolates were members of genomic variants of the North American USA300 clone. However, WGS revealed a heterogeneous population structure of USA300 circulating at the AZP. We observed up to 8 SNPs or up to 5 alleles of difference by wgMLST when the isolates were recovered from different body sites of the same patient or if direct transmission between patients was most likely. This work describes the usefulness of complete genome sequencing of bacterial chromosomes and plasmids providing an unprecedented level of detail during outbreak investigations not being visible by using conventional typing methods.
Project description:Whole-genome sequencing (WGS)-based typing methods have emerged as promising and highly discriminative epidemiological tools. In this study, we combined gene-by-gene allele calling and core genome single nucleotide polymorphism (cgSNP) approaches to investigate the genetic relatedness of a well-characterized collection of OXA-48-producing Klebsiella pneumoniae isolates. We included isolates from the predominant sequence type ST405 (n = 31) OXA-48-producing K. pneumoniae clone and isolates from ST101 (n = 3), ST14 (n = 1), ST17 (n = 1), and ST1233 (n = 1), obtained from eight Catalan hospitals. Core-genome multilocus sequence typing (cgMLST) schemes from Institut Pasteur's BIGSdb-Kp (634 genes) and SeqSphere+ (2,365 genes), and a SeqSphere+ whole-genome MLST (wgMLST) scheme (4,891 genes) were used. Allele differences or allelic mismatches and the genetic distance, as the proportion of allele differences, were used to interpret the results from a gene-by-gene approach, whereas the number of SNPs was used for the cgSNP analysis. We observed between 0-10 and 0-14 allele differences among the predominant ST405 using cgMLST and wgMLST from SeqSphere+, respectively, and <2 allelic mismatches when using Institut Pasteur's BIGSdb-Kp cgMLST scheme. For ST101, we observed 14 and 54 allele differences when using cgMLST and wgMLST SeqSphere+, respectively, and 2-5 allelic mismatches for BIGSdb-Kp cgMLST. A low genetic distance (<0.0035, a previously established threshold for epidemiological link) was generally in concordance with a low number of allele differences (<8) when using the SeqSphere+ cgMLST scheme. The cgSNP analysis showed 6-29 SNPs in isolates with identical allelic SeqSphere+ cgMLST profiles and 16-61 cgSNPs among ST405 isolates. Furthermore, comparison of WGS-based typing results with previously obtained MLST and pulsed-field gel electrophoresis (PFGE) data showed some differences, demonstrating the different molecular principles underlying these techniques. In conclusion, the use of the different WGS-based typing methods that were used to elucidate the genetic relatedness of clonal OXA-48-producing K. pneumoniae all led to the same conclusions. Furthermore, threshold parameters in WGS-based typing methods should be applied with caution and should be used in combination with clinical epidemiological data and population and species characteristics.
Project description:Vibrio parahaemolyticus is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database for V. parahaemolyticus was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among V. parahaemolyticus strains using WGS data. We sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a V. parahaemolyticus cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of V. parahaemolyticus strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage.
Project description:Streptococcus mutans is one of the primary pathogens responsible for the development of dental caries. Recent whole-genome sequencing (WGS)-based core genome multilocus sequence typing (cgMLST) approaches have been employed in epidemiological studies of specific human pathogens. However, this approach has not been reported in studies of S. mutans Here, we therefore developed a cgMLST scheme for S. mutans We surveyed 199 available S. mutans genomes as a means of identifying cgMLST targets, developing a scheme that incorporated 594 targets from the S. mutans UA159 reference genome. Sixty-eight sequence types (STs) were identified in this cgMLST scheme (cgSTs) in 80 S. mutans isolates from 40 children that were sequenced in this study, compared to 35 STs identified by multilocus sequence typing (MLST). Fifty-six cgSTs (82.35%) were associated with a single isolate based on our cgMLST scheme, which is significantly higher than in the MLST scheme (11.43%). In addition, 58.06% of all MLST profiles with??2 isolates were further differentiated by our cgMLST scheme. Topological analyses of the maximum likelihood phylogenetic trees revealed that our cgMLST scheme was more reliable than the MLST scheme. A minimum spanning tree of 145 S. mutans isolates from 10 countries developed based upon the cgMLST scheme highlighted the diverse population structure of S. mutans This cgMLST scheme thus offers a new molecular typing method suitable for evaluating the epidemiological distribution of this pathogen and has the potential to serve as a benchmark for future global studies of the epidemiological nature of dental caries.IMPORTANCE Streptococcus mutans is regarded as a major pathogen responsible for the onset of dental caries. S. mutans can transmit among people, especially within families. In this study, we established a new epidemiological approach to S. mutans classification. This approach can effectively differentiate among closely related isolates and offers superior reliability relative to that of the traditional MLST molecular typing method. As such, it has the potential to better support effective public health strategies centered around this bacterium that are aimed at preventing and treating dental caries.
Project description:Among enterococci, Enterococcus faecalis occurs ubiquitously, with the highest incidence of human and animal infections. The high genetic plasticity of E. faecalis complicates both molecular investigations and phylogenetic analyses. Whole-genome sequencing (WGS) enables unraveling of epidemiological linkages and putative transmission events between humans, animals, and food. Core genome multilocus sequence typing (cgMLST) aims to combine the discriminatory power of classical multilocus sequence typing (MLST) with the extensive genetic data obtained by WGS. By sequencing a representative collection of 146 E. faecalis strains isolated from hospital outbreaks, food, animals, and colonization of healthy human individuals, we established a novel cgMLST scheme with 1,972 gene targets within the Ridom SeqSphere+ software. To test the E. faecalis cgMLST scheme and assess the typing performance, different collections comprising environmental and bacteremia isolates, as well as all publicly available genome sequences from the NCBI and SRA databases, were analyzed. In more than 98.6% of the tested genomes, >95% good cgMLST target genes were detected (mean, 99.2% target genes). Our genotyping results not only corroborate the known epidemiological background of the isolates but exceed previous typing resolution. In conclusion, we have created a powerful typing scheme, hence providing an international standardized nomenclature that is suitable for surveillance approaches in various sectors, linking public health, veterinary public health, and food safety in a true One Health fashion.
Project description:American foulbrood (AFB), caused by Paenibacillus larvae, is a devastating disease in honeybees. In most countries, the disease is controlled through compulsory burning of symptomatic colonies causing major economic losses in apiculture. The pathogen is endemic to honeybees world-wide and is readily transmitted via the movement of hive equipment or bees. Molecular epidemiology of AFB currently largely relies on placing isolates in one of four ERIC-genotypes. However, a more powerful alternative is multi-locus sequence typing (MLST) using whole-genome sequencing (WGS), which allows for high-resolution studies of disease outbreaks. To evaluate WGS as a tool for AFB-epidemiology, we applied core genome MLST (cgMLST) on isolates from a recent outbreak of AFB in Sweden. The high resolution of the cgMLST allowed different bacterial clones involved in the disease outbreak to be identified and to trace the source of infection. The source was found to be a beekeeper who had sold bees to two other beekeepers, proving the epidemiological link between them. No such conclusion could have been made using conventional MLST or ERIC-typing. This is the first time that WGS has been used to study the epidemiology of AFB. The results show that the technique is very powerful for high-resolution tracing of AFB-outbreaks.
Project description:The use of whole-genome sequencing (WGS) using next-generation sequencing (NGS) technology has become a widely accepted method for microbiology laboratories in the application of molecular typing for outbreak tracing and genomic epidemiology. Several studies demonstrated the usefulness of WGS data analysis through single-nucleotide polymorphism (SNP) calling from a reference sequence analysis for Brucella melitensis, whereas gene-by-gene comparison through core-genome multilocus sequence typing (cgMLST) has not been explored so far. The current study developed an allele-based cgMLST method and compared its performance to that of the genome-wide SNP approach and the traditional multilocus variable-number tandem repeat analysis (MLVA) on a defined sample collection. The data set was comprised of 37 epidemiologically linked animal cases of brucellosis as well as 71 isolates with unknown epidemiological status, composed of human and animal samples collected in Italy. The cgMLST scheme generated in this study contained 2,704 targets of the B. melitensis 16M reference genome. We established the potential criteria necessary for inclusion of an isolate into a brucellosis outbreak cluster to be ?6 loci in the cgMLST and ?7 in WGS SNP analysis. Higher phylogenetic distance resolution was achieved with cgMLST and SNP analysis than with MLVA, particularly for strains belonging to the same lineage, thereby allowing diverse and unrelated genotypes to be identified with greater confidence. The application of a cgMLST scheme to the characterization of B. melitensis strains provided insights into the epidemiology of this pathogen, and it is a candidate to be a benchmark tool for outbreak investigations in human and animal brucellosis.