Core Genome Multilocus Sequence Typing: a Standardized Approach for Molecular Typing of Mycoplasma gallisepticum.
ABSTRACT: Mycoplasma gallisepticum is the most virulent and economically important Mycoplasma species for poultry worldwide. Currently, M. gallisepticum strain differentiation based on sequence analysis of 5 loci remains insufficient for accurate outbreak investigation. Recently, whole-genome sequences (WGS) of many human and animal pathogens have been successfully used for microbial outbreak investigations. However, the massive sequence data and the diverse properties of different genes within bacterial genomes results in a lack of standard reproducible methods for comparisons among M. gallisepticum whole genomes. Here, we proposed the development of a core genome multilocus sequence typing (cgMLST) scheme for M. gallisepticum strains and field isolates. For development of this scheme, a diverse collection of 37 M. gallisepticum genomes was used to identify cgMLST targets. A total of 425 M. gallisepticum conserved genes (49.85% of M. gallisepticum genome) were selected as core genome targets. A total of 81 M. gallisepticum genomes from 5 countries on 4 continents were typed using M. gallisepticum cgMLST. Analyses of phylogenetic trees generated by cgMLST displayed a high degree of agreement with geographical and temporal information. Moreover, the high discriminatory power of cgMLST allowed differentiation between M. gallisepticum strains of the same outbreak. M. gallisepticum cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation among M. gallisepticum isolates. cgMLST provides stable and expandable nomenclature, allowing for comparison and sharing of typing results among laboratories worldwide. cgMLST offers an opportunity to harness the tremendous power of next-generation sequencing technology in applied avian mycoplasma epidemiology at both local and global levels.
Project description:At present, the most used methods for Klebsiella pneumoniae subtyping are multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE). However, the discriminatory power of MLST could not meet the need for distinguishing outbreak and non-outbreak isolates and the PFGE is time-consuming and labor-intensive. A core genome multilocus sequence typing (cgMLST) scheme for whole-genome sequence-based typing of K. pneumoniae was developed for solving the disadvantages of these traditional molecular subtyping methods. Firstly, we used the complete genome of K. pneumoniae strain HKUOPLC as the reference genome and 907 genomes of K. pneumoniae download from NCBI database as original genome dataset to determine cgMLST target genes. A total of 1,143 genes were retained as cgMLST target genes. Secondly, we used 26 K. pneumoniae strains from a nosocomial infection outbreak to evaluate the cgMLST scheme. cgMLST enabled clustering of outbreak strains with <10 alleles difference and unambiguous separation from unrelated outgroup strains. Moreover, cgMLST revealed that there may be several sub-clones of epidemic ST11 clone. In conclusion, the novel cgMLST scheme not only showed higher discriminatory power compared with PFGE and MLST in outbreak investigations but also showed ability to reveal more population structure characteristics than MLST.
Project description:Clostridium difficile, recently renamed Clostridioides difficile, is the most common cause of antibiotic-associated nosocomial gastrointestinal infections worldwide. To differentiate endogenous infections and transmission events, highly discriminatory subtyping is necessary. Today, methods based on whole-genome sequencing data are increasingly used to subtype bacterial pathogens; however, frequently a standardized methodology and typing nomenclature are missing. Here we report a core genome multilocus sequence typing (cgMLST) approach developed for C. difficile Initially, we determined the breadth of the C. difficile population based on all available MLST sequence types with Bayesian inference (BAPS). The resulting BAPS partitions were used in combination with C. difficile clade information to select representative isolates that were subsequently used to define cgMLST target genes. Finally, we evaluated the novel cgMLST scheme with genomes from 3,025 isolates. BAPS grouping (n = 6 groups) together with the clade information led to a total of 11 representative isolates that were included for cgMLST definition and resulted in 2,270 cgMLST genes that were present in all isolates. Overall, 2,184 to 2,268 cgMLST targets were detected in the genome sequences of 70 outbreak-associated and reference strains, and on average 99.3% cgMLST targets (1,116 to 2,270 targets) were present in 2,954 genomes downloaded from the NCBI database, underlining the representativeness of the cgMLST scheme. Moreover, reanalyzing different cluster scenarios with cgMLST were concordant to published single nucleotide variant analyses. In conclusion, the novel cgMLST is representative for the whole C. difficile population, is highly discriminatory in outbreak situations, and provides a unique nomenclature facilitating interlaboratory exchange.
Project description:The environmental bacterium <i>Pseudomonas aeruginosa</i>, particularly multidrug-resistant clones, is often associated with nosocomial infections and outbreaks. Today, core genome multilocus sequence typing (cgMLST) is frequently applied to delineate sporadic cases from nosocomial transmissions. However, until recently, no cgMLST scheme for a standardized typing of <i>P. aeruginosa</i> was available. To establish a novel cgMLST scheme for <i>P. aeruginosa</i>, we initially determined the breadth of the <i>P. aeruginosa</i> population based on MLST data with a Bayesian approach (BAPS). Using genomic data of representative isolates for the whole population and all 12 serogroups, we extracted target genes and further refined them using a random data set of 1,000 <i>P. aeruginosa</i> genomes. Subsequently, we investigated reproducibility and discriminatory ability with repeatedly sequenced isolates and isolates from well-defined outbreak scenarios, respectively, and compared clustering applying two recently published cgMLST schemes. BAPS generated seven <i>P. aeruginosa</i> groups. To cover these and all serogroups, 15 reference strains were used to determine genes common in all strains. After refinement with the data set of 1,000 genomes, the cgMLST scheme consisted of 3,867 target genes, which are representative of the <i>P. aeruginosa</i> population and highly reproducible using biological replicates. We finally evaluated the scheme by reanalyzing two published outbreaks where the authors used single-nucleotide polymorphism (SNP) typing. In both cases, cgMLST was concordant with the previous SNP results and the results of the two other cgMLST schemes. In conclusion, the highly reproducible novel <i>P. aeruginosa</i> cgMLST scheme facilitates outbreak investigations due to the publicly available cgMLST nomenclature.
Project description:We have employed whole genome sequencing to define and evaluate a core genome multilocus sequence typing (cgMLST) scheme for Acinetobacter baumannii. To define a core genome we downloaded a total of 1,573 putative A. baumannii genomes from NCBI as well as representative isolates belonging to the eight previously described international A. baumannii clonal lineages. The core genome was then employed against a total of fifty-three carbapenem-resistant A. baumannii isolates that were previously typed by PFGE and linked to hospital outbreaks in eight German cities. We defined a core genome of 2,390 genes of which an average 98.4% were called successfully from 1,339 A. baumannii genomes, while Acinetobacter nosocomialis, Acinetobacter pittii, and Acinetobacter calcoaceticus resulted in 71.2%, 33.3%, and 23.2% good targets, respectively. When tested against the previously identified outbreak strains, we found good correlation between PFGE and cgMLST clustering, with 0-8 allelic differences within a pulsotype, and 40-2,166 differences between pulsotypes. The highest number of allelic differences was between the isolates representing the international clones. This typing scheme was highly discriminatory and identified separate A. baumannii outbreaks. Moreover, because a standardised cgMLST nomenclature is used, the system will allow inter-laboratory exchange of data.
Project description:Whole-genome sequencing (WGS) has been established for bacterial subtyping and is regularly used to study pathogen transmission, to investigate outbreaks, and to perform routine surveillance. Core-genome multilocus sequence typing (cgMLST) is a bacterial subtyping method that uses WGS data to provide a high-resolution strain characterization. This study aimed at developing a novel cgMLST scheme for Bacillus anthracis, a notorious pathogen that causes anthrax in livestock and humans worldwide. The scheme comprises 3,803 genes that were conserved in 57 B. anthracis genomes spanning the whole phylogeny. The scheme has been evaluated and applied to 584 genomes from 50 countries. On average, 99.5% of the cgMLST targets were detected. The cgMLST results confirmed the classical canonical single-nucleotide-polymorphism (SNP) grouping of B. anthracis into major clades and subclades. Genetic distances calculated based on cgMLST were comparable to distances from whole-genome-based SNP analysis with similar phylogenetic topology and comparable discriminatory power. Additionally, the application of the cgMLST scheme to anthrax outbreaks from Germany and Italy led to a definition of a cutoff threshold of five allele differences to trace epidemiologically linked strains for cluster typing and transmission analysis. Finally, the association of two clusters of B. anthracis with human cases of injectional anthrax in four European countries was confirmed using cgMLST. In summary, this study presents a novel cgMLST scheme that provides high-resolution strain genotyping for B. anthracis. This scheme can be used in parallel with SNP typing methods to facilitate rapid and harmonized interlaboratory comparisons, essential for global surveillance and outbreak analysis. The scheme is publicly available for application by users, including those with little bioinformatics knowledge.
Project description:Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determined the breadth of the L. monocytogenes population based on MLST data with a Bayesian approach. Based on the genome sequence data of representative isolates for the whole population, cgMLST target genes were defined and reappraised with 67 L. monocytogenes isolates from two outbreaks and serotype reference strains. The Bayesian population analysis generated five L. monocytogenes groups. Using all available NCBI RefSeq genomes (n = 36) and six additionally sequenced strains, all genetic groups were covered. Pairwise comparisons of these 42 genome sequences resulted in 1,701 cgMLST targets present in all 42 genomes with 100% overlap and ?90% sequence similarity. Overall, ?99.1% of the cgMLST targets were present in 67 outbreak and serotype reference strains, underlining the representativeness of the cgMLST scheme. Moreover, cgMLST enabled clustering of outbreak isolates with ?10 alleles difference and unambiguous separation from unrelated outgroup isolates. In conclusion, the novel cgMLST scheme not only improves outbreak investigations but also enables, due to the availability of the automatically curated cgMLST nomenclature, interlaboratory exchange of data that are crucial, especially for rapid responses during transsectorial outbreaks.
Project description:BACKGROUND:Global tuberculosis (TB) control is challenged by uncontrolled transmission of Mycobacterium tuberculosis complex (Mtbc) strains, esp. of multidrug (MDR) or extensively resistant (XDR) variants. Precise analysis of transmission networks is the basis to trace outbreak M/XDR clones and improve TB control. However, classical genotyping tools lack discriminatory power due to the high similarity of strains of particular successful lineages, e.g. Beijing or outbreak strains. This can be overcome by whole genome sequencing (WGS) approaches, but these are not yet standardized to facilitate larger investigations encompassing different laboratories or outbreak tracing across borders. METHODS:We established and improved a whole genome gene-by-gene multi locus sequence typing approach encompassing a stable set of core genome genes (cgMLST) and linked it to a web-based nomenclature server (cgMLST.org) facilitating assignment and storage of allele numbers. FINDINGS:We evaluated and refined a previously suggested cgMLST schema by using a reference strain set (n?=?251) reflecting the global diversity of the Mtbc. A set of 2891 genes showed excellent performance with at least 97% of the genes reliably identified in strains of all Mtbc lineages and in discriminating outbreak strains. cgMLST allele numbers were automatically retrieved from and stored at cgMLST.org. INTERPRETATION:The refined cgMLST schema provides high resolution genome-based typing of clinical strains of all Mtbc lineages. Combined with a web-based nomenclature server, it facilitates rapid, high-resolution, and harmonized tracing of clinical Mtbc strains needed for prospective local and global surveillance.
Project description:Vibrio parahaemolyticus is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database for V. parahaemolyticus was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among V. parahaemolyticus strains using WGS data. We sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a V. parahaemolyticus cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of V. parahaemolyticus strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage.
Project description:Among enterococci, Enterococcus faecalis occurs ubiquitously, with the highest incidence of human and animal infections. The high genetic plasticity of E. faecalis complicates both molecular investigations and phylogenetic analyses. Whole-genome sequencing (WGS) enables unraveling of epidemiological linkages and putative transmission events between humans, animals, and food. Core genome multilocus sequence typing (cgMLST) aims to combine the discriminatory power of classical multilocus sequence typing (MLST) with the extensive genetic data obtained by WGS. By sequencing a representative collection of 146 E. faecalis strains isolated from hospital outbreaks, food, animals, and colonization of healthy human individuals, we established a novel cgMLST scheme with 1,972 gene targets within the Ridom SeqSphere+ software. To test the E. faecalis cgMLST scheme and assess the typing performance, different collections comprising environmental and bacteremia isolates, as well as all publicly available genome sequences from the NCBI and SRA databases, were analyzed. In more than 98.6% of the tested genomes, >95% good cgMLST target genes were detected (mean, 99.2% target genes). Our genotyping results not only corroborate the known epidemiological background of the isolates but exceed previous typing resolution. In conclusion, we have created a powerful typing scheme, hence providing an international standardized nomenclature that is suitable for surveillance approaches in various sectors, linking public health, veterinary public health, and food safety in a true One Health fashion.
Project description:Many listeriosis outbreaks are caused by a few globally distributed clonal groups, designated clonal complexes or epidemic clones, of Listeria monocytogenes, several of which have been defined by classic multilocus sequence typing (MLST) schemes targeting 6 to 8 housekeeping or virulence genes. We have developed and evaluated core genome MLST (cgMLST) schemes and applied them to isolates from multiple clonal groups, including those associated with 39 listeriosis outbreaks. The cgMLST clusters were congruent with MLST-defined clonal groups, which had various degrees of diversity at the whole-genome level. Notably, cgMLST could distinguish among outbreak strains and epidemiologically unrelated strains of the same clonal group, which could not be achieved using classic MLST schemes. The precise selection of cgMLST gene targets may not be critical for the general identification of clonal groups and outbreak strains. cgMLST analyses further identified outbreak strains, including those associated with recent outbreaks linked to contaminated French-style cheese, Hispanic-style cheese, stone fruit, caramel apple, ice cream, and packaged leafy green salad, as belonging to major clonal groups. We further developed lineage-specific cgMLST schemes, which can include accessory genes when core genomes do not possess sufficient diversity, and this provided additional resolution over species-specific cgMLST. Analyses of isolates from different common-source listeriosis outbreaks revealed various degrees of diversity, indicating that the numbers of allelic differences should always be combined with cgMLST clustering and epidemiological evidence to define a listeriosis outbreak.Classic multilocus sequence typing (MLST) schemes targeting internal fragments of 6 to 8 genes that define clonal complexes or epidemic clones have been widely employed to study L. monocytogenes biodiversity and its relation to pathogenicity potential and epidemiology. We demonstrated that core genome MLST schemes can be used for the simultaneous identification of clonal groups and the differentiation of individual outbreak strains and epidemiologically unrelated strains of the same clonal group. We further developed lineage-specific cgMLST schemes that targeted more genomic regions than the species-specific cgMLST schemes. Our data revealed the genome-level diversity of clonal groups defined by classic MLST schemes. Our identification of U.S. and international outbreaks caused by major clonal groups can contribute to further understanding of the global epidemiology of L. monocytogenes.