Using whole genome sequencing to study American foulbrood epidemiology in honeybees.
ABSTRACT: American foulbrood (AFB), caused by Paenibacillus larvae, is a devastating disease in honeybees. In most countries, the disease is controlled through compulsory burning of symptomatic colonies causing major economic losses in apiculture. The pathogen is endemic to honeybees world-wide and is readily transmitted via the movement of hive equipment or bees. Molecular epidemiology of AFB currently largely relies on placing isolates in one of four ERIC-genotypes. However, a more powerful alternative is multi-locus sequence typing (MLST) using whole-genome sequencing (WGS), which allows for high-resolution studies of disease outbreaks. To evaluate WGS as a tool for AFB-epidemiology, we applied core genome MLST (cgMLST) on isolates from a recent outbreak of AFB in Sweden. The high resolution of the cgMLST allowed different bacterial clones involved in the disease outbreak to be identified and to trace the source of infection. The source was found to be a beekeeper who had sold bees to two other beekeepers, proving the epidemiological link between them. No such conclusion could have been made using conventional MLST or ERIC-typing. This is the first time that WGS has been used to study the epidemiology of AFB. The results show that the technique is very powerful for high-resolution tracing of AFB-outbreaks.
Project description:Enterococcus faecium, a common inhabitant of the human gut, has emerged in the last 2 decades as an important multidrug-resistant nosocomial pathogen. Since the start of the 21st century, multilocus sequence typing (MLST) has been used to study the molecular epidemiology of E. faecium. However, due to the use of a small number of genes, the resolution of MLST is limited. Whole-genome sequencing (WGS) now allows for high-resolution tracing of outbreaks, but current WGS-based approaches lack standardization, rendering them less suitable for interlaboratory prospective surveillance. To overcome this limitation, we developed a core genome MLST (cgMLST) scheme for E. faecium. cgMLST transfers genome-wide single nucleotide polymorphism(SNP) diversity into a standardized and portable allele numbering system that is far less computationally intensive than SNP-based analysis of WGS data. The E. faecium cgMLST scheme was built using 40 genome sequences that represented the diversity of the species. The scheme consists of 1,423 cgMLST target genes. To test the performance of the scheme, we performed WGS analysis of 103 outbreak isolates from five different hospitals in the Netherlands, Denmark, and Germany. The cgMLST scheme performed well in distinguishing between epidemiologically related and unrelated isolates, even between those that had the same sequence type (ST), which denotes the higher discriminatory power of this cgMLST scheme over that of conventional MLST. We also show that in terms of resolution, the performance of the E. faecium cgMLST scheme is equivalent to that of an SNP-based approach. In conclusion, the cgMLST scheme developed in this study facilitates rapid, standardized, and high-resolution tracing of E. faecium outbreaks.
Project description:Staphylococcus aureus is a leading cause of bacteremia in hospitalized patients. Whether or not S. aureus bacteremia (SAB) is associated with clonality, implicating potential nosocomial transmission, has not, however, been investigated. Herein, we examined the epidemiology of SAB using whole genome sequencing (WGS). 152 SAB isolates collected over the course of 2015 at a single large Minnesota medical center were studied. Staphylococcus protein A (spa) typing was performed by PCR/Sanger sequencing; multilocus sequence typing (MLST) and core genome MLST (cgMLST) were determined by WGS. Forty-eight isolates (32%) were methicillin-resistant S. aureus (MRSA). The isolates encompassed 66 spa types, clustered into 11 spa clonal complexes (CCs) and 10 singleton types. 88% of 48 MRSA isolates belonged to spa CC-002 or -008. Methicillin-susceptible S. aureus (MSSA) isolates were more genotypically diverse, with 61% distributed across four spa CCs (CC-002, CC-012, CC-008 and CC-084). By MLST, there was 31 sequence types (STs), including 18 divided into 6 CCs and 13 singleton STs. Amongst MSSA isolates, the common MLST clones were CC5 (23%), CC30 (19%), CC8 (15%) and CC15 (11%). Common MRSA clones were CC5 (67%) and CC8 (25%); there were no MRSA isolates in CC45 or CC30. By cgMLST analysis, there were 9 allelic differences between two isolates, with the remaining 150 isolates differing from each other by over 40 alleles. The two isolates were retroactively epidemiologically linked by medical record review. Overall, cgMLST analysis resulted in higher resolution epidemiological typing than did multilocus sequence or spa typing.
Project description:American foulbrood is the most destructive brood disease of honeybees (Apis mellifera) globally. The absence of a repeatable, universal typing scheme for the causative bacterium Paenibacillus larvae has restricted our understanding of disease epidemiology. We have created the first multilocus sequence typing scheme (MLST) for P.?larvae, which largely confirms the previous enterobacterial repetitive intergenic consensus (ERIC)-polymerase chain reaction (PCR)-based typing scheme's divisions while providing added resolution and improved repeatability. We have used the new scheme to determine the distribution and biogeography of 294 samples of P.?larvae from across six continents. We found that of the two most epidemiologically important ERIC types, ERIC I was more diverse than ERIC II. Analysis of the fixation index (FST ) by distance suggested a significant relationship between genetic and geographic distance, suggesting that population structure exists in populations of P.?larvae. Interestingly, this effect was only observed within the native range of the host and was absent in areas where international trade has moved honeybees and their disease. Correspondence analysis demonstrated similar sequence type (ST) distributions between native and non-native countries and that ERIC I and II STs mainly have differing distributions. The new typing scheme facilitates epidemiological study of this costly disease of a key pollinator.
Project description:Human campylobacteriosis, caused by Campylobacter jejuni and C. coli, remains a leading cause of bacterial gastroenteritis in many countries, but the epidemiology of campylobacteriosis outbreaks remains poorly defined, largely due to limitations in the resolution and comparability of isolate characterization methods. Whole-genome sequencing (WGS) data enable the improvement of sequence-based typing approaches, such as multilocus sequence typing (MLST), by substantially increasing the number of loci examined. A core genome MLST (cgMLST) scheme defines a comprehensive set of those loci present in most members of a bacterial group, balancing very high resolution with comparability across the diversity of the group. Here we propose a set of 1,343 loci as a human campylobacteriosis cgMLST scheme (v1.0), the allelic profiles of which can be assigned to core genome sequence types. The 1,343 loci chosen were a subset of the 1,643 loci identified in the reannotation of the genome sequence of C. jejuni isolate NCTC 11168, chosen as being present in >95% of draft genomes of 2,472 representative United Kingdom campylobacteriosis isolates, comprising 2,207 (89.3%) C. jejuni isolates and 265 (10.7%) C. coli isolates. Validation of the cgMLST scheme was undertaken with 1,478 further high-quality draft genomes, containing 150 or fewer contiguous sequences, from disease isolate collections: 99.5% of these isolates contained ?95% of the 1,343 cgMLST loci. In addition to the rapid and effective high-resolution analysis of large numbers of diverse isolates, the cgMLST scheme enabled the efficient identification of very closely related isolates from a well-defined single-source campylobacteriosis outbreak.
Project description:Vibrio parahaemolyticus is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database for V. parahaemolyticus was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among V. parahaemolyticus strains using WGS data. We sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a V. parahaemolyticus cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of V. parahaemolyticus strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage.
Project description:For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
Project description:Among enterococci, Enterococcus faecalis occurs ubiquitously, with the highest incidence of human and animal infections. The high genetic plasticity of E. faecalis complicates both molecular investigations and phylogenetic analyses. Whole-genome sequencing (WGS) enables unraveling of epidemiological linkages and putative transmission events between humans, animals, and food. Core genome multilocus sequence typing (cgMLST) aims to combine the discriminatory power of classical multilocus sequence typing (MLST) with the extensive genetic data obtained by WGS. By sequencing a representative collection of 146 E. faecalis strains isolated from hospital outbreaks, food, animals, and colonization of healthy human individuals, we established a novel cgMLST scheme with 1,972 gene targets within the Ridom SeqSphere+ software. To test the E. faecalis cgMLST scheme and assess the typing performance, different collections comprising environmental and bacteremia isolates, as well as all publicly available genome sequences from the NCBI and SRA databases, were analyzed. In more than 98.6% of the tested genomes, >95% good cgMLST target genes were detected (mean, 99.2% target genes). Our genotyping results not only corroborate the known epidemiological background of the isolates but exceed previous typing resolution. In conclusion, we have created a powerful typing scheme, hence providing an international standardized nomenclature that is suitable for surveillance approaches in various sectors, linking public health, veterinary public health, and food safety in a true One Health fashion.
Project description:Today, next-generation whole-genome sequencing (WGS) is increasingly used to determine the genetic relationships of bacteria on a nearly whole-genome level for infection control purposes and molecular surveillance. Here, we conducted a multicenter ring trial comprising five laboratories to determine the reproducibility and accuracy of WGS-based typing. The participating laboratories sequenced 20 blind-coded Staphylococcus aureus DNA samples using 250-bp paired-end chemistry for library preparation in a single sequencing run on an Illumina MiSeq sequencer. The run acceptance criteria were sequencing outputs >5.6 Gb and Q30 read quality scores of >75%. Subsequently, spa typing, multilocus sequence typing (MLST), ribosomal MLST, and core genome MLST (cgMLST) were performed by the participants. Moreover, discrepancies in cgMLST target sequences in comparisons with the included and also published sequence of the quality control strain ATCC 25923 were resolved using Sanger sequencing. All five laboratories fulfilled the run acceptance criteria in a single sequencing run without any repetition. Of the 400 total possible typing results, 394 of the reported spa types, sequence types (STs), ribosomal STs (rSTs), and cgMLST cluster types were correct and identical among all laboratories; only six typing results were missing. An analysis of cgMLST allelic profiles corroborated this high reproducibility; only 3 of 183,927 (0.0016%) cgMLST allele calls were wrong. Sanger sequencing confirmed all 12 discrepancies of the ring trial results in comparison with the published sequence of ATCC 25923. In summary, this ring trial demonstrated the high reproducibility and accuracy of current next-generation sequencing-based bacterial typing for molecular surveillance when done with nearly completely locked-down methods.
Project description:Many listeriosis outbreaks are caused by a few globally distributed clonal groups, designated clonal complexes or epidemic clones, of Listeria monocytogenes, several of which have been defined by classic multilocus sequence typing (MLST) schemes targeting 6 to 8 housekeeping or virulence genes. We have developed and evaluated core genome MLST (cgMLST) schemes and applied them to isolates from multiple clonal groups, including those associated with 39 listeriosis outbreaks. The cgMLST clusters were congruent with MLST-defined clonal groups, which had various degrees of diversity at the whole-genome level. Notably, cgMLST could distinguish among outbreak strains and epidemiologically unrelated strains of the same clonal group, which could not be achieved using classic MLST schemes. The precise selection of cgMLST gene targets may not be critical for the general identification of clonal groups and outbreak strains. cgMLST analyses further identified outbreak strains, including those associated with recent outbreaks linked to contaminated French-style cheese, Hispanic-style cheese, stone fruit, caramel apple, ice cream, and packaged leafy green salad, as belonging to major clonal groups. We further developed lineage-specific cgMLST schemes, which can include accessory genes when core genomes do not possess sufficient diversity, and this provided additional resolution over species-specific cgMLST. Analyses of isolates from different common-source listeriosis outbreaks revealed various degrees of diversity, indicating that the numbers of allelic differences should always be combined with cgMLST clustering and epidemiological evidence to define a listeriosis outbreak.Classic multilocus sequence typing (MLST) schemes targeting internal fragments of 6 to 8 genes that define clonal complexes or epidemic clones have been widely employed to study L. monocytogenes biodiversity and its relation to pathogenicity potential and epidemiology. We demonstrated that core genome MLST schemes can be used for the simultaneous identification of clonal groups and the differentiation of individual outbreak strains and epidemiologically unrelated strains of the same clonal group. We further developed lineage-specific cgMLST schemes that targeted more genomic regions than the species-specific cgMLST schemes. Our data revealed the genome-level diversity of clonal groups defined by classic MLST schemes. Our identification of U.S. and international outbreaks caused by major clonal groups can contribute to further understanding of the global epidemiology of L. monocytogenes.
Project description:The use of whole-genome sequencing (WGS) using next-generation sequencing (NGS) technology has become a widely accepted method for microbiology laboratories in the application of molecular typing for outbreak tracing and genomic epidemiology. Several studies demonstrated the usefulness of WGS data analysis through single-nucleotide polymorphism (SNP) calling from a reference sequence analysis for Brucella melitensis, whereas gene-by-gene comparison through core-genome multilocus sequence typing (cgMLST) has not been explored so far. The current study developed an allele-based cgMLST method and compared its performance to that of the genome-wide SNP approach and the traditional multilocus variable-number tandem repeat analysis (MLVA) on a defined sample collection. The data set was comprised of 37 epidemiologically linked animal cases of brucellosis as well as 71 isolates with unknown epidemiological status, composed of human and animal samples collected in Italy. The cgMLST scheme generated in this study contained 2,704 targets of the B. melitensis 16M reference genome. We established the potential criteria necessary for inclusion of an isolate into a brucellosis outbreak cluster to be ?6 loci in the cgMLST and ?7 in WGS SNP analysis. Higher phylogenetic distance resolution was achieved with cgMLST and SNP analysis than with MLVA, particularly for strains belonging to the same lineage, thereby allowing diverse and unrelated genotypes to be identified with greater confidence. The application of a cgMLST scheme to the characterization of B. melitensis strains provided insights into the epidemiology of this pathogen, and it is a candidate to be a benchmark tool for outbreak investigations in human and animal brucellosis.