A Core Genome Multilocus Sequence Typing Scheme for Enterococcus faecalis.
ABSTRACT: Among enterococci, Enterococcus faecalis occurs ubiquitously, with the highest incidence of human and animal infections. The high genetic plasticity of E. faecalis complicates both molecular investigations and phylogenetic analyses. Whole-genome sequencing (WGS) enables unraveling of epidemiological linkages and putative transmission events between humans, animals, and food. Core genome multilocus sequence typing (cgMLST) aims to combine the discriminatory power of classical multilocus sequence typing (MLST) with the extensive genetic data obtained by WGS. By sequencing a representative collection of 146 E. faecalis strains isolated from hospital outbreaks, food, animals, and colonization of healthy human individuals, we established a novel cgMLST scheme with 1,972 gene targets within the Ridom SeqSphere+ software. To test the E. faecalis cgMLST scheme and assess the typing performance, different collections comprising environmental and bacteremia isolates, as well as all publicly available genome sequences from the NCBI and SRA databases, were analyzed. In more than 98.6% of the tested genomes, >95% good cgMLST target genes were detected (mean, 99.2% target genes). Our genotyping results not only corroborate the known epidemiological background of the isolates but exceed previous typing resolution. In conclusion, we have created a powerful typing scheme, hence providing an international standardized nomenclature that is suitable for surveillance approaches in various sectors, linking public health, veterinary public health, and food safety in a true One Health fashion.
Project description:An inability to standardize the bioinformatic data produced by whole-genome sequencing (WGS) has been a barrier to its widespread use in tuberculosis phylogenetics. The aim of this study was to carry out a phylogenetic analysis of tuberculosis in Wales, United Kingdom, using Ridom SeqSphere software for core genome multilocus sequence typing (cgMLST) analysis of whole-genome sequencing data. The phylogenetics of tuberculosis in Wales have not previously been studied. Sixty-six Mycobacterium tuberculosis isolates (including 42 outbreak-associated isolates) from south Wales were sequenced using an Illumina platform. Isolates were assigned to principal genetic groups, single nucleotide polymorphism (SNP) cluster groups, lineages, and sublineages using SNP-calling protocols. WGS data were submitted to the Ridom SeqSphere software for cgMLST analysis and analyzed alongside 179 previously lineage-defined isolates. The data set was dominated by the Euro-American lineage, with the sublineage composition being dominated by T, X, and Haarlem family strains. The cgMLST analysis successfully assigned 58 isolates to major lineages, and the results were consistent with those obtained by traditional SNP mapping methods. In addition, the cgMLST scheme was used to resolve an outbreak of tuberculosis occurring in the region. This study supports the use of a cgMLST method for standardized phylogenetic assignment of tuberculosis isolates and for outbreak resolution and provides the first insight into Welsh tuberculosis phylogenetics, identifying the presence of the Haarlem sublineage commonly associated with virulent traits.
Project description:Traditional genotyping methods for infection control of antimicrobial-resistant bacteria in healthcare settings have been supplemented by whole-genome sequencing (WGS), often relying on a gene-based approach, e.g., core genome multilocus sequence typing (cgMLST), to cluster-related samples. In this study, we compared clusters of methicillin-resistant <i>Staphylococcus aureus</i> (MRSA) and <i>Enterococcus faecium</i> analyzed with the commercial cgMLST software Ridom SeqSphere+ and with an open-source single-nucleotide polymorphism (SNP)-based phylogenetic analysis pipeline (PAPABAC). A total of 5,655 MRSA and 2,572 <i>E. faecium</i> patient isolates, collected between 2013 and 2018, were processed. Clusters of 1,844 MRSA and 1,355 <i>E. faecium</i> isolates were compared to cgMLST results, and epidemiological data were included when available. The phylogenies inferred by the two different technologies were highly concordant, and the MRSA SNP tree re-captured known hospital-related outbreaks and epidemiologically linked samples. PAPABAC has the advantage over Ridom SeqSphere+ to generate stable, referable clusters without the need for sequence assembly, and it is a free-of-charge, open-source alternative to the commercial software.
Project description:Whole-genome sequencing (WGS)-based typing methods have emerged as promising and highly discriminative epidemiological tools. In this study, we combined gene-by-gene allele calling and core genome single nucleotide polymorphism (cgSNP) approaches to investigate the genetic relatedness of a well-characterized collection of OXA-48-producing Klebsiella pneumoniae isolates. We included isolates from the predominant sequence type ST405 (n = 31) OXA-48-producing K. pneumoniae clone and isolates from ST101 (n = 3), ST14 (n = 1), ST17 (n = 1), and ST1233 (n = 1), obtained from eight Catalan hospitals. Core-genome multilocus sequence typing (cgMLST) schemes from Institut Pasteur's BIGSdb-Kp (634 genes) and SeqSphere+ (2,365 genes), and a SeqSphere+ whole-genome MLST (wgMLST) scheme (4,891 genes) were used. Allele differences or allelic mismatches and the genetic distance, as the proportion of allele differences, were used to interpret the results from a gene-by-gene approach, whereas the number of SNPs was used for the cgSNP analysis. We observed between 0-10 and 0-14 allele differences among the predominant ST405 using cgMLST and wgMLST from SeqSphere+, respectively, and <2 allelic mismatches when using Institut Pasteur's BIGSdb-Kp cgMLST scheme. For ST101, we observed 14 and 54 allele differences when using cgMLST and wgMLST SeqSphere+, respectively, and 2-5 allelic mismatches for BIGSdb-Kp cgMLST. A low genetic distance (<0.0035, a previously established threshold for epidemiological link) was generally in concordance with a low number of allele differences (<8) when using the SeqSphere+ cgMLST scheme. The cgSNP analysis showed 6-29 SNPs in isolates with identical allelic SeqSphere+ cgMLST profiles and 16-61 cgSNPs among ST405 isolates. Furthermore, comparison of WGS-based typing results with previously obtained MLST and pulsed-field gel electrophoresis (PFGE) data showed some differences, demonstrating the different molecular principles underlying these techniques. In conclusion, the use of the different WGS-based typing methods that were used to elucidate the genetic relatedness of clonal OXA-48-producing K. pneumoniae all led to the same conclusions. Furthermore, threshold parameters in WGS-based typing methods should be applied with caution and should be used in combination with clinical epidemiological data and population and species characteristics.
Project description:Whole-genome sequencing (WGS) allows for effective tracing of Mycobacterium tuberculosis complex (MTBC) (tuberculosis pathogens) transmission. However, it is difficult to standardize and, therefore, is not yet employed for interlaboratory prospective surveillance. To allow its widespread application, solutions for data standardization and storage in an easily expandable database are urgently needed. To address this question, we developed a core genome multilocus sequence typing (cgMLST) scheme for clinical MTBC isolates using the Ridom SeqSphere(+) software, which transfers the genome-wide single nucleotide polymorphism (SNP) diversity into an allele numbering system that is standardized, portable, and not computationally intensive. To test its performance, we performed WGS analysis of 26 isolates with identical IS6110 DNA fingerprints and spoligotyping patterns from a longitudinal outbreak in the federal state of Hamburg, Germany (notified between 2001 and 2010). The cgMLST approach (3,041 genes) discriminated the 26 strains with a resolution comparable to that of SNP-based WGS typing (one major cluster of 22 identical or closely related and four outlier isolates with at least 97 distinct SNPs or 63 allelic variants). Resulting tree topologies are highly congruent and grouped the isolates in both cases analogously. Our data show that SNP- and cgMLST-based WGS analyses facilitate high-resolution discrimination of longitudinal MTBC outbreaks. cgMLST allows for a meaningful epidemiological interpretation of the WGS genotyping data. It enables standardized WGS genotyping for epidemiological investigations, e.g., on the regional public health office level, and the creation of web-accessible databases for global TB surveillance with an integrated early warning system.
Project description:Staphylococcus aureusis a major bacterial pathogen causing a variety of diseases ranging from wound infections to severe bacteremia or intoxications. Besides host factors, the course and severity of disease is also widely dependent on the genotype of the bacterium. Whole-genome sequencing (WGS), followed by bioinformatic sequence analysis, is currently the most extensive genotyping method available. To identify clinically relevant staphylococcal virulence and resistance genes in WGS data, we developed anin silicotyping scheme for the software SeqSphere(+)(Ridom GmbH, Münster, Germany). The implemented target genes (n= 182) correspond to those queried by the IdentibacS. aureusGenotyping DNA microarray (Alere Technologies, Jena, Germany). Thein silicoscheme was evaluated by comparing the typing results of microarray and of WGS for 154 humanS. aureusisolates. A total of 96.8% (n= 27,119) of all typing results were equally identified with microarray and WGS (40.6% present and 56.2% absent). Discrepancies (3.2% in total) were caused by WGS errors (1.7%), microarray hybridization failures (1.3%), wrong prediction of ambiguous microarray results (0.1%), or unknown causes (0.1%). Superior to the microarray, WGS enabled the distinction of allelic variants, which may be essential for the prediction of bacterial virulence and resistance phenotypes. Multilocus sequence typing clonal complexes and staphylococcal cassette chromosomemecelement types inferred from microarray hybridization patterns were equally determined by WGS. In conclusion, WGS may substitute array-based methods due to its universal methodology, open and expandable nature, and rapid parallel analysis capacity for different characteristics in once-generated sequences.
Project description:<i>Campylobacter jejuni</i> is the leading cause of bacterial gastroenteritis, which has motivated the monitoring of genetic profiles circulating in Luxembourg since 13 years. From our integrated surveillance using a genotyping strategy based on an extended MLST scheme including <i>gyrA</i> and <i>porA</i> markers, an unexpected endemic pattern was discovered in the temporal distribution of genotypes. We aimed to test the hypothesis of stable lineages occurrence by implementing whole genome sequencing (WGS) associated with comprehensive and internationally validated schemes. This pilot study assessed four WGS-based typing schemes to classify a panel of 108 strains previously identified as recurrent or sporadic profiles using this in-house typing system. The strain collection included four common lineages in human infection (N = 67) initially identified from recurrent combination of ST-<i>gyrA</i>-<i>porA</i> alleles also detected in non-human samples: veterinary (N = 19), food (N = 20), and environmental (N = 2) sources. An additional set of 19 strains belonging to sporadic profiles completed the tested panel. All the strains were processed by WGS by using Illumina technologies and by applying stringent criteria for filtering sequencing data; we ensure robustness in our genomic comparison. Four typing schemes were applied to classify the strains: (i) the cgMLST SeqSphere+ scheme of 637 loci, (ii) the cgMLST Oxford scheme of 1,343 loci, (iii) the cgMLST INNUENDO scheme of 678 loci, and (iv) the wgMLST INNUENDO scheme of 2,795 loci. A high concordance between the typing schemes was determined by comparing the calculated adjusted Wallace coefficients. After quality control and analyses with these four typing schemes, 60 strains were confirmed as members of the four recurrent lineages regardless of the method used (N = 32, 12, 7, and 9, respectively). Our results indicate that, regardless of the typing scheme used, epidemic or endemic signals were detected as reflected by lineage B (ST2254-<i>gyrA</i>9-<i>porA</i>1) in 2014 or lineage A (ST19-<i>gyrA</i>8-<i>porA</i>7), respectively. These findings support the clonal expansion of stable genomes in <i>Campylobacter</i> population exhibiting a multi-host profile and accounting for the majority of clinical strains isolated over a decade. Such recurring genotypes suggest persistence in reservoirs, sources or environment, emphasizing the need to investigate their survival strategy in greater depth.
Project description:Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determined the breadth of the L. monocytogenes population based on MLST data with a Bayesian approach. Based on the genome sequence data of representative isolates for the whole population, cgMLST target genes were defined and reappraised with 67 L. monocytogenes isolates from two outbreaks and serotype reference strains. The Bayesian population analysis generated five L. monocytogenes groups. Using all available NCBI RefSeq genomes (n = 36) and six additionally sequenced strains, all genetic groups were covered. Pairwise comparisons of these 42 genome sequences resulted in 1,701 cgMLST targets present in all 42 genomes with 100% overlap and ?90% sequence similarity. Overall, ?99.1% of the cgMLST targets were present in 67 outbreak and serotype reference strains, underlining the representativeness of the cgMLST scheme. Moreover, cgMLST enabled clustering of outbreak isolates with ?10 alleles difference and unambiguous separation from unrelated outgroup isolates. In conclusion, the novel cgMLST scheme not only improves outbreak investigations but also enables, due to the availability of the automatically curated cgMLST nomenclature, interlaboratory exchange of data that are crucial, especially for rapid responses during transsectorial outbreaks.
Project description:In Austria, all laboratories are legally obligated to forward human and food/environmental L. monocytogenes isolates to the National Reference Laboratory/Center (NRL) for Listeria. Two invasive human isolates of L. monocytogenes serotype 1/2a of the same pulsed-field gel electrophoresis (PFGE) pattern, previously unknown in Austria, were cultured for the first time in January 2016. Five further human isolates, obtained from patients with invasive listeriosis between April 2016 and September 2017, showed this PFGE pattern. In Austria the NRL started to use whole-genome sequencing (WGS) based typing in 2016, using a core genome MLST (cgMLST) scheme developed by Ruppitsch et al. 2015, which contains 1701 target genes. Sequence data are submitted to a publicly available nomenclature server (Ridom GmbH, Münster, Germany) for allocation of the core genome complex type (CT). The seven invasive human isolates differed from each other with zero to two alleles and were allocated to CT1234 (declared as outbreak strain). Among the Austrian strain collection of about 6,000 cgMLST-characterized non-human isolates (i.e., food/environmental isolates) 90 isolates shared CT1234. Out of these, 83 isolates were traced back to one meat processing-company. They differed from the outbreak strain by up to seven alleles; one isolate originated from the company's industrial slicer. The remaining seven CT1234-isolates were obtained from food products of four other companies (five fish-products, one ready-to-eat dumpling and one deer-meat) and differed from the outbreak strain by six to eleven alleles. The outbreak described shows the considerable potential of WGS to identify the source of a listeriosis outbreak. Compared to PFGE analysis, WGS-based typing has higher discriminatory power, yields better data accuracy, and allows higher laboratory through-put at lower cost. Utilization of WGS-based typing results of human and food/ environmental L. monocytogenes isolates by appropriate public health analysts and epidemiologists is indispensable to support a successful outbreak investigation.
Project description:Burkholderia pseudomallei causes the severe disease melioidosis. Whole-genome sequencing (WGS)-based typing methods currently offer the highest resolution for molecular investigations of this genetically diverse pathogen. Still, its routine application in diagnostic laboratories is limited by the need for high computing power, bioinformatic skills, and variable bioinformatic approaches, with the latter affecting the results. We therefore aimed to establish and validate a WGS-based core genome multilocus sequence typing (cgMLST) scheme, applicable in routine diagnostic settings. A soft defined core genome was obtained by challenging the B. pseudomallei reference genome K96243 with 469 environmental and clinical genomes, resulting in 4,221 core and 1,351 accessory targets. The scheme was validated with 320 WGS data sets. We compared our novel typing scheme with single nucleotide polymorphism-based approaches investigating closely and distantly related strains. Finally, we applied our scheme for tracking the environmental source of a recent infection. The validation of the scheme detected >95% good cgMLST target genes in 98.4% of the genomes. Comparison with existing typing methods revealed very good concordance. Our scheme proved to be applicable to investigating not only closely related strains but also the global B. pseudomallei population structure. We successfully utilized our scheme to identify a sugarcane field as the presumable source of a recent melioidosis case. In summary, we developed a robust cgMLST scheme that integrates high resolution, maximized standardization, and fast analysis for the nonbioinformatician. Our typing scheme has the potential to serve as a routinely applicable classification system in B. pseudomallei molecular epidemiology.
Project description:Due to the potential of enterohemorrhagic <i>Escherichia coli</i> (EHEC) serogroup O157 to cause large food borne outbreaks, national and international surveillance is necessary. For developing an effective method of molecular surveillance, a conventional method, multilocus variable-number tandem-repeat analysis (MLVA), and whole-genome sequencing (WGS) analysis were compared. WGS of 369 isolates of EHEC O157 belonging to 7 major MLVA types and their relatives were subjected to comprehensive <i>in silico</i> typing, core genome single nucleotide polymorphism (cgSNP), and core genome multilocus sequence typing (cgMLST) analyses. The typing resolution was the highest in cgSNP analysis. However, determination of the sequence of the mismatch repair protein gene <i>mutS</i> is necessary because spontaneous deletion of the gene could lead to a hypermutator phenotype. MLVA had sufficient typing resolution for a short-term outbreak investigation and had advantages in rapidity and high throughput. cgMLST showed less typing resolution than cgSNP, but it is less time-consuming and does not require as much computer power. Therefore, cgMLST is suitable for comparisons using large data sets (e.g., international comparison using public databases). In conclusion, screening using MLVA followed by cgMLST and cgSNP analyses would provide the highest typing resolution and improve the accuracy and cost-effectiveness of EHEC O157 surveillance.<b>IMPORTANCE</b> Intensive surveillance for enterohemorrhagic <i>Escherichia coli</i> (EHEC) serogroup O157 is important to detect outbreaks and to prevent the spread of the bacterium. Recent advances in sequencing technology made molecular surveillance using whole-genome sequence (WGS) realistic. To develop rapid, high-throughput, and cost-effective typing methods for real-time surveillance, typing resolution of WGS and a conventional typing method, multilocus variable-number tandem-repeat analysis (MLVA), was evaluated. Nation-level systematic comparison of MLVA, core genome single nucleotide polymorphism (cgSNP), and core genome multilocus sequence typing (cgMLST) indicated that a combination of WGS and MLVA is a realistic approach to improve EHEC O157 surveillance.