In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer.
ABSTRACT: Whole-genome sequencing is increasingly used in clinical diagnosis of tuberculosis and study of Mycobacterium tuberculosis complex (MTC). MTC consists of several genetically homogenous mycobacteria species which can cause tuberculosis in humans and animals. Regions of difference (RDs) are commonly regarded as gold standard genetic markers for MTC classification.We develop RD-Analyzer, a tool that can accurately infer the species and lineage of MTC isolates from sequence reads based on the presence and absence of a set of 31 RDs. Applied on a publicly available diverse set of 377 sequenced MTC isolates from known major species and lineages, RD-Analyzer achieved an accuracy of 98.14 % (370/377) in species prediction and a concordance of 98.47 % (257/261) in Mycobacterium tuberculosis lineage prediction compared to predictions based on single nucleotide polymorphism markers. By comparing respective sequencing read depths on each genomic position between isolates of different sublineages, we were able to identify the known RD markers in different sublineages of Lineage 4 and provide support for six potential delineating markers having high sensitivities and specificities for sublineage prediction. An extended version of RD-Analyzer was thus developed to allow user-defined RDs for lineage prediction.RD-Analyzer is a useful and accurate tool for species, lineage and sublineage prediction using known RDs of MTC from sequence reads and is extendable to accepting user-defined RDs for analysis. RD-Analyzer is written in Python and is freely available at https://github.com/xiaeryu/RD-Analyzer .
Project description:Strains of Mycobacterium tuberculosis complex (MTBC) can be classified into major lineages based on their genotype. Further subdivision of major lineages into sublineages requires multiple biomarkers along with methods to combine and analyze multiple sources of information in one unsupervised learning model. Typically, spacer oligonucleotide type (spoligotype) and mycobacterial interspersed repetitive units (MIRU) are used for TB genotyping and surveillance. Here, we examine the sublineage structure of MTBC strains with multiple biomarkers simultaneously, by employing a tensor clustering framework (TCF) on multiple-biomarker tensors.Simultaneous analysis of the spoligotype and MIRU type of strains using TCF on multiple-biomarker tensors leads to coherent sublineages of major lineages with clear and distinctive spoligotype and MIRU signatures. Comparison of tensor sublineages with SpolDB4 families either supports tensor sublineages, or suggests subdivision or merging of SpolDB4 families. High prediction accuracy of major lineage classification with supervised tensor learning on multiple-biomarker tensors validates our unsupervised analysis of sublineages on multiple-biomarker tensors.TCF on multiple-biomarker tensors achieves simultaneous analysis of multiple biomarkers and suggest a new putative sublineage structure for each major lineage. Analysis of multiple-biomarker tensors gives insight into the sublineage structure of MTBC at the genomic level.
Project description:The Beijing genotype is a lineage of Mycobacterium tuberculosis that is distributed worldwide and responsible for large epidemics, associated with multidrug-resistance. However, its distribution in Africa is less understood due to the lack of data. Our aim was to investigate the prevalence and possible transmission of Beijing strains in Mozambique by a multivariate analysis of genotypic, geographic and demographic data. A total of 543 M. tuberculosis isolates from Mozambique were spoligotyped. Of these, 33 were of the Beijing lineage. The genetic relationship between the Beijing isolates were studied by identification of genomic deletions within some Regions of Difference (RD), Restriction Fragment Length Polymorphism (RFLP) and Mycobacterial Interspersed Repetivie Unit - variable number tandem repeat (MIRU-VNTR). Beijing strains from South Africa, representing different sublineages were included as reference strains. The association between Beijing genotype, Human Immunodeficiency Virus (HIV) serology and baseline demographic data was investigated. HIV positive serostatus was significantly (p=0.023) more common in patients with Beijing strains than in patients with non-Beijing strains in a multivariable analysis adjusted for age, sex and province (14 (10.9%) of the 129 HIV positive patients had Beijing strains while 6/141 (4.3%) of HIV negative patients had Beijing strains). The majority of Beijing strains were found in the Southern region of Mozambique, particularly in Maputo City (17%). Only one Beijing strain was drug resistant (multi-drug resistant). By combined use of RD and spoligotyping, three genetic sublineages could be tentatively identified where a distinct group of four isolates had deletion of RD150, a signature of the "sublineage 7" recently emerging in South Africa. The same group was very similar to South African "sublineage 7" by RFLP and MIRU-VNTR, suggesting that this sublineage could have been recently introduced in Mozambique from South Africa, in association with HIV infection.
Project description:Mycobacterium tuberculosis is the principal etiologic agent of human tuberculosis (TB) and a member of the M. tuberculosis complex (MTC). Additional MTC species that cause TB in humans and other mammals include Mycobacterium africanum and Mycobacterium bovis. One result of studies interrogating recently identified MTC phylogenetic markers has been the recognition of at least two distinct lineages of M. africanum, known as West African-1 and West African-2.We screened a blinded non-random set of MTC strains isolated from TB patients in Ghana (n = 47) for known chromosomal region-of-difference (RD) loci and single nucleotide polymorphisms (SNPs). A MTC PCR-typing panel, single-target standard PCR, multi-primer PCR, PCR-restriction fragment analysis, and sequence analysis of amplified products were among the methods utilized for the comparative evaluation of targets and identification systems. The MTC distributions of novel SNPs were characterized in the both the Ghana collection and two other diverse collections of MTC strains (n = 175 in total).The utility of various polymorphisms as species-, lineage-, and sublineage-defining phylogenetic markers for M. africanum was determined. Novel SNPs were also identified and found to be specific to either M. africanum West African-1 (Rv1332(523); n = 32) or M. africanum West African-2 (nat(751); n = 27). In the final analysis, a strain identification approach that combined multi-primer PCR targeting of the RD loci RD9, RD10, and RD702 was the most simple, straight-forward, and definitive means of distinguishing the two clades of M. africanum from one another and from other MTC species.With this study, we have organized a series of consistent phylogenetically-relevant markers for each of the distinct MTC lineages that share the M. africanum designation. A differential distribution of each M. africanum clade in Western Africa is described.
Project description:Tuberculosis (TB) is a significant public health problem in Ecuador with an incidence of 43 per 100,000 inhabitants and an estimated multidrug-resistant-TB prevalence in all TB cases of 9%. Genotyping of Mycobacterium tuberculosis (MTBC) is important to understand regional transmission dynamics. This study aims to describe the main MTBC lineages and sublineages circulating in the country. A representative sample of 373 MTBC strains from 22 provinces of Ecuador, with data comprising geographic origin and drug susceptibility, were genotyped using 24 loci-MIRU-VNTR. For strains with an ambiguous sublineage designation, the lineage was confirmed by Regions of Difference analysis or by Whole Genome Sequencing. We show that lineage 4 is predominant in Ecuador (98.3% of the strains). Only 4 strains belong to lineages 2-sublineage Beijing and two strains to lineage 3-sublineage Delhi. Lineage 4 strains included sublineages LAM (45.7%), Haarlem (31.8%), S (13.1%), X (4.6%), Ghana (0.6%) and NEW (0.3%). The LAM sublineage showed the strongest association with antibiotic resistance. The X and S sublineages were found predominantly in the Coastal and the Andean regions respectively and the reason for the high prevalence of these strains in Ecuador should be addressed in future studies. Our database constitutes a tool for MIRU-VNTR pattern comparison of M. tuberculosis isolates for national and international epidemiologic studies and phylogenetic purposes.
Project description:Genetic tracking of Mycobacterium tuberculosis is a cornerstone of tuberculosis (TB) control programs. The RD(Rio) M. tuberculosis sublineage was previously associated with TB in Brazil. We investigated 3847 M. tuberculosis isolates and registry data from New York City (NYC) (2001-2005) to: (1) affirm the position of RD(Rio) strains within the M. tuberculosis phylogenetic structure, (2) determine its prevalence, and (3) define transmission, demographic, and clinical characteristics associated with RD(Rio) TB.Isolates classified as RD(Rio) or non-RD(Rio) M. tuberculosis by multiplex PCR were further classified as clustered (?2 isolates) or unique based primarily upon IS6110-RFLP patterns and lineage-specific cluster proportions were calculated. The secondary case rate of RD(Rio) was compared with other prevalent M. tuberculosis lineages. Genotype data were merged with the data from the NYC TB Registry to assess demographic and clinical characteristics.RD(Rio) strains were found to: (1) be restricted to the Latin American-Mediterranean family, (2) cause approximately 8% of TB cases in NYC, and (3) be associated with heightened transmission as shown by: (i) a higher cluster proportion compared to other prevalent lineages, (ii) a higher secondary case rate, and (iii) cases in children. Furthermore, RD(Rio) strains were significantly associated with US-born Black or Hispanic race, birth in Latin American and Caribbean countries, and isoniazid resistance.The RD(Rio) genotype is a single M. tuberculosis strain population that is emerging in NYC. The findings suggest that expanded RD(Rio) case and exposure identification could be of benefit due to its association with heightened transmission.
Project description:Generalist and specialist species differ in the breadth of their ecological niches. Little is known about the niche width of obligate human pathogens. Here we analyzed a global collection of Mycobacterium tuberculosis lineage 4 clinical isolates, the most geographically widespread cause of human tuberculosis. We show that lineage 4 comprises globally distributed and geographically restricted sublineages, suggesting a distinction between generalists and specialists. Population genomic analyses showed that, whereas the majority of human T cell epitopes were conserved in all sublineages, the proportion of variable epitopes was higher in generalists. Our data further support a European origin for the most common generalist sublineage. Hence, the global success of lineage 4 reflects distinct strategies adopted by different sublineages and the influence of human migration.
Project description:BACKGROUND:Kazakhstan remains a high-burden TB prevalence country with a concomitent high-burden of multi-drug resistant tuberculosis. For this reason, we performed an in depth genetic diversity and population structure characterization of Mycobacterium tuberculosis complex (MTC) genetic diversity in Kazakhstan with both patient and community benefit. METHODS:A convenience sample of 700 MTC DNA cultures extracts from 630 tuberculosis patients recruited from 12 out of 14 regions in Kazakhstan, between 2010 and 2015, was independently studied by high-throughput hybridization-based methods, TB-SPRINT (59-Plex, n = 700), TB-SNPID (50-Plex, n = 543). DNA from 391 clinical isolates was successfully typed by two methods. To resolve the population structure of drug-resistant clades in more detail two complementary assays were run on the L2 isolates: an IS6110-NTF insertion site typing assay and a SigE SNP polymorphism assay. RESULTS:Strains belonged to L2/Beijing and L4/Euro-American sublineages; L2/Beijing prevalence totaled almost 80%. 50% of all samples were resistant to RIF and to INH., Subtyping showed that: (1) all L2/Beijing were "modern" Beijing and (2) most of these belonged to the previously described 94-32 sublineage (Central Asian/Russian), (3) at least two populations of the Central Asian/Russian sublineages are circulating in Kazakhstan, with different evolutionary dynamics. CONCLUSIONS:For the first time, the global genetic diversity and population structure of M. tuberculosis genotypes circulating in Kazakhstan was obtained and compared to previous local studies. Results suggest a region-specific spread of a very limited number of L2/Beijing clonal complexes in Kazakhstan many strongly associated with an MDR phenotype.
Project description:Mycobacterium tuberculosis is divided into several distinct lineages, and various genetic markers such as IS-elements, VNTR, and SNPs are used for lineage identification. We propose an M. tuberculosis classification approach based on functional polymorphisms in virulence genes. An M. tuberculosis virulence genes catalog has been established, including 319 genes from various protein groups, such as proteases, cell wall proteins, fatty acid and lipid metabolism proteins, sigma factors, toxin-antitoxin systems. Another catalog of 1,573 M. tuberculosis isolates of different lineages has been developed. The developed SNP-calling program has identified 3,563 nonsynonymous SNPs. The constructed SNP-based phylogeny reflected the evolutionary relationship between lineages and detected new sublineages. SNP analysis of sublineage F15/LAM4/KZN revealed four lineage-specific mutations in cyp125, mce3B, vapC25, and vapB34. The Ural lineage has been divided into two geographical clusters based on different SNPs in virulence genes. A new sublineage, B0/N-90, was detected inside the Beijing-B0/W-148 by SNPs in irtB, mce3F and vapC46. We have found 27 members of B0/N-90 among the 227 available genomes of the Beijing-B0/W-148 sublineage. Whole-genome sequencing of strain B9741, isolated from an HIV-positive patient, was demonstrated to belong to the new B0/N-90 group. A primer set for PCR detection of B0/N-90 lineage-specific mutations has been developed. The prospective use of mce3 mutant genes as genetically engineered vaccine is discussed.
Project description:A sample of 260 Mycobacterium tuberculosis strains assigned to the Euro-American family was studied to identify phylogenetically informative genomic regions of difference (RD). Mutually exclusive deletions of regions RD115, RD122, RD174, RD182, RD183, RD193, RD219, RD726 and RD761 were found in 202 strains; the RD(Rio) deletion was detected exclusively among the RD174-deleted strains. Although certain deletions were found more frequently in certain spoligotype families (i.e., deletion RD115 in T and LAM, RD174 in LAM, RD182 in Haarlem, RD219 in T and RD726 in the "Cameroon" family), the RD-defined sublineages did not specifically match with spoligotype-defined families, thus arguing against the use of spoligotyping for establishing exact phylogenetic relationships between strains. Notably, when tested for katG463/gyrA95 polymorphism, all the RD-defined sublineages belonged to Principal Genotypic Group (PGG) 2, except sublineage RD219 exclusively belonging to PGG3; the 58 Euro-American strains with no deletion were of either PGG2 or 3. A representative sample of 197 isolates was then analyzed by standard 15-locus MIRU-VNTR typing, a suitable approach to independently assess genetic relationships among the strains. Analysis of the MIRU-VNTR typing results by using a minimum spanning tree (MST) and a classical dendrogram showed groupings that were largely concordant with those obtained by RD-based analysis. Isolates of a given RD profile show, in addition to closely related MIRU-VNTR profiles, related spoligotype profiles that can serve as a basis for better spoligotype-based classification.
Project description:The Beijing strain is one of the most successful genotypes of Mycobacterium tuberculosis worldwide and appears to be highly homogenous according to existing genotyping methods. To type Beijing strains reliably we developed a robust typing scheme using single nucleotide polymorphisms (SNPs) and regions of difference (RDs) derived from whole-genome sequencing data of eight Beijing strains. SNP/RD typing of 259 M. tuberculosis isolates originating from 45 countries worldwide discriminated 27 clonal complexes within the Beijing genotype family. A total of 16 Beijing clonal complexes contained more than one isolate of known origin, of which two clonal complexes were strongly associated with South African origin. The remaining 14 clonal complexes encompassed isolates from different countries. Even highly resolved clonal complexes comprised isolates from distinct geographical sites. Our results suggest that Beijing strains spread globally on multiple occasions and that the tuberculosis epidemic caused by the Beijing genotype is at least partially driven by modern migration patterns. The SNPs and RDs presented in this study will facilitate future molecular epidemiological and phylogenetic studies on Beijing strains.