Volatile fingerprinting of human respiratory viruses from cell culture.
ABSTRACT: Volatile metabolites are currently under investigation as potential biomarkers for the detection and identification of pathogenic microorganisms, including bacteria, fungi, and viruses. Unlike bacteria and fungi, which produce distinct volatile metabolic signatures associated with innate differences in both primary and secondary metabolic processes, viruses are wholly reliant on the metabolic machinery of infected cells for replication and propagation. In the present study, the ability of volatile metabolites to discriminate between respiratory cells infected and uninfected with virus, in vitro, was investigated. Two important respiratory viruses, namely respiratory syncytial virus (RSV) and influenza A virus (IAV), were evaluated. Data were analyzed using three different machine learning algorithms (random forest (RF), linear support vector machines (linear SVM), and partial least squares-discriminant analysis (PLS-DA)), with volatile metabolites identified from a training set used to predict sample classifications in a validation set. The discriminatory performances of RF, linear SVM, and PLS-DA were comparable for the comparison of IAV-infected versus uninfected cells, with area under the receiver operating characteristic curves (AUROCs) between 0.78 and 0.82, while RF and linear SVM demonstrated superior performance in the classification of RSV-infected versus uninfected cells (AUROCs between 0.80 and 0.84) relative to PLS-DA (0.61). A subset of discriminatory features were assigned putative compound identifications, with an overabundance of hydrocarbons observed in both RSV- and IAV-infected cell cultures relative to uninfected controls. This finding is consistent with increased oxidative stress, a process associated with viral infection of respiratory cells.
Project description:Volatile molecules in exhaled breath represent potential biomarkers in the setting of infectious diseases, particularly those affecting the respiratory tract. In particular, Pseudomonas aeruginosa is a critically important respiratory pathogen in specific subsets of the population, such as those with cystic fibrosis (CF). Infections caused by P. aeruginosa can be particularly problematic when co-infection with respiratory syncytial virus (RSV) occurs, as this is correlated with the establishment of chronic P. aeruginosa infection. In the present study, we evaluate the volatile metabolites produced by P. aeruginosa (PAO1)-infected, RSV-infected, co-infected, or uninfected CF bronchial epithelial (CFBE) cells, in vitro. We identified a volatile metabolic signature that could discriminate between P. aeruginosa-infected and non-P. aeruginosa-infected CFBE with an area under the receiver operating characteristic curve (AUROC) of 0.850, using the machine learning algorithm random forest (RF). Although we could not discriminate between RSV-infected and non-RSV-infected CFBE (AUROC = 0.431), we note that sample classification probabilities for RSV-infected cell, generated using RF, were between those of uninfected CFBE and P. aeruginosa-infected CFBE, suggesting that RSV infection may result in a volatile metabolic profile that shares attributes with both of these groups. To more precisely elucidate the biological origins of the volatile metabolites that were discriminatory between P. aeruginosa-infected and non-P. aeruginosa-infected CFBE, we measured the volatile metabolites produced by P. aeruginosa grown in the absence of CFBE. Our findings suggest that the discriminatory metabolites produced likely result from the interaction of P. aeruginosa with the CFBE cells, rather than the metabolism of media components by the bacterium. Taken together, our findings support the notion that P. aeruginosa interacting with CFBE yields a particular volatile metabolic signature. Such a signature may have clinical utility in the monitoring of individuals with CF.
Project description:Due to the existence of Lingzhi adulteration, there is a growing demand for species classification of medicinal mushrooms by various techniques. The objective of this study was to explore a rapid and reliable way to distinguish between different Lingzhi species and compare the influence of data pretreatment methods on the recognition results. To this end, 120 fresh fruiting bodies of Lingzhi were collected, and all of them were analyzed by attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR). Random forest (RF), support vector machine (SVM) and partial least squares discriminant analysis (PLS-DA) classification models were established for raw and pretreated second derivative (SD) spectral matrices to authenticate different Lingzhi species. The results of multivariate statistical analysis indicated that the SD preprocessing method displayed a higher classification ability, which may be attributed to the analysis of powder samples that requires removal of overlapping peaks and baseline shifts. Compared with RF, the results of the SVM and PLS-DA methods were more satisfying, and their accuracies for the test set were both 100%. Among SVM and PLS-DA, the training set and test set accuracy of PLS-DA were both 100%. In conclusion, ATR-FTIR spectroscopy data pretreated by SD combined with PLS-DA is a simple, rapid, non-destructive and relatively inexpensive method to discriminate between mushroom species and provide a good reference to quality assessment.
Project description:Edible gelatin has been widely used as a food additive in the food industry, and illegal adulteration with industrial gelatin will cause serious harm to human health. The present work used laser-induced breakdown spectroscopy (LIBS) coupled with the partial least square-support vector machine (PLS-SVM) method for the fast and accurate estimation of edible gelatin adulteration. Gelatin samples with 11 different adulteration ratios were prepared by mixing pure edible gelatin with industrial gelatin, and the LIBS spectra were recorded to analyze their elemental composition differences. The PLS, SVM, and PLS-SVM models were separately built for the prediction of gelatin adulteration ratios, and the hybrid PLS-SVM model yielded a better performance than only the PLS and SVM models. Besides, four different variable selection methods, including competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MC-UVE), random frog (RF), and principal component analysis (PCA), were adopted to combine with the SVM model for comparative study; the results further demonstrated that the PLS-SVM model was superior to the other SVM models. This study reveals that the hybrid PLS-SVM model, with the advantages of low computational time and high prediction accuracy, can be employed as a preferred method for the accurate estimation of edible gelatin adulteration.
Project description:Soybean (Glycine max) is a major crop cultivated in various regions and consumed globally. The formation of volatile compounds in soybeans is influenced by the cultivar as well as environmental factors, such as the climate and soil in the cultivation areas. This study used gas chromatography-mass spectrometry (GC-MS) combined by headspace solid-phase microextraction (HS-SPME) to analyze the volatile compounds of soybeans cultivated in Korea, China, and North America. The multivariate data analysis of partial least square-discriminant analysis (PLS-DA), and hierarchical clustering analysis (HCA) were then applied to GC-MS data sets. The soybeans could be clearly discriminated according to their geographical origins on the PLS-DA score plot. In particular, 25 volatile compounds, including terpenes (limonene, myrcene), esters (ethyl hexanoate, butyl butanoate, butyl prop-2-enoate, butyl acetate, butyl propanoate), aldehydes (nonanal, heptanal, (E)-hex-2-enal, (E)-hept-2-enal, acetaldehyde) were main contributors to the discrimination of soybeans cultivated in China from those cultivated in other regions in the PLS-DA score plot. On the other hand, 15 volatile compounds, such as 2-ethylhexan-1-ol, 2,5-dimethylhexan-2-ol, octanal, and heptanal, were related to Korean soybeans located on the negative PLS 2 axis, whereas 12 volatile compounds, such as oct-1-en-3-ol, heptan-4-ol, butyl butanoate, and butyl acetate, were responsible for North American soybeans. However, the multivariate statistical analysis (PLS-DA) was not able to clearly distinguish soybeans cultivated in Korea, except for those from the Gyeonggi and Kyeongsangbuk provinces.
Project description:Paris polyphylla, as a traditional herb with long history, has been widely used to treat diseases in multiple nationalities of China. Nevertheless, the quality of P. yunnanensis fluctuates among from different geographical origins, so that a fast and accurate classification method was necessary for establishment. In our study, the geographical origin identification of 462 P. yunnanensis rhizome and leaf samples from Kunming, Yuxi, Chuxiong, Dali, Lijiang, and Honghe were analyzed by Fourier transform mid infrared (FT-MIR) spectra, combined with partial least squares discriminant analysis (PLS-DA), random forest (RF), and hierarchical cluster analysis (HCA) methods. The obvious cluster tendency of rhizomes and leaves FT-MIR spectra was displayed by principal component analysis (PCA). The distribution of the variable importance for the projection (VIP) was more uniform than the important variables obtained by RF, while PLS-DA models obtained higher classification abilities. Hence, a PLS-DA model was more suitably used to classify the different geographical origins of P. yunnanensis than the RF model. Additionally, the clustering results of different geographical origins obtained by HCA dendrograms also proved the chemical information difference between rhizomes and leaves. The identification performances of PLS-DA and the RF models of leaves FT-MIR matrixes were better than those of rhizomes datasets. In addition, the model classification abilities of combination datasets were higher than the individual matrixes of rhizomes and leaves spectra. Our study provides a reference to the rational utilization of resources, as well as a fast and accurate identification research for P. yunnanensis samples.
Project description:In recent years, mass spectrometry (MS)-based metabolomics has been extensively applied to characterize biochemical mechanisms, and study physiological processes and phenotypic changes associated with disease. Metabolomics has also been important for identifying biomarkers of interest suitable for clinical diagnosis. For the purpose of predictive modeling, in this chapter, we will review various supervised learning algorithms such as random forest (RF), support vector machine (SVM), and partial least squares-discriminant analysis (PLS-DA). In addition, we will also review feature selection methods for identifying the best combination of metabolites for an accurate predictive model. We conclude with best practices for reproducibility by including internal and external replication, reporting metrics to assess performance, and providing guidelines to avoid overfitting and to deal with imbalanced classes. An analysis of an example data will illustrate the use of different machine learning methods and performance metrics.
Project description:Experimental pEC(50)s for 216 selective respiratory syncytial virus (RSV) inhibitors are used to develop classification models as a potential screening tool for a large library of target compounds. Variable selection algorithm coupled with random forests (VS-RF) is used to extract the physicochemical features most relevant to the RSV inhibition. Based on the selected small set of descriptors, four other widely used approaches, i.e., support vector machine (SVM), Gaussian process (GP), linear discriminant analysis (LDA) and k nearest neighbors (kNN) routines are also employed and compared with the VS-RF method in terms of several of rigorous evaluation criteria. The obtained results indicate that the VS-RF model is a powerful tool for classification of RSV inhibitors, producing the highest overall accuracy of 94.34% for the external prediction set, which significantly outperforms the other four methods with the average accuracy of 80.66%. The proposed model with excellent prediction capacity from internal to external quality should be important for screening and optimization of potential RSV inhibitors prior to chemical synthesis in drug development.
Project description:Even with effective viral control, HIV-infected individuals are at a higher risk for morbidities associated with older age than the general population, and these serious non-AIDS events (SNAEs) track with plasma inflammatory and coagulation markers. The cell subsets driving inflammation in aviremic HIV infection are not yet elucidated. Also, whether ART-suppressed HIV infection causes premature induction of the inflammatory events found in uninfected elderly or if a novel inflammatory network ensues when HIV and older age co-exist is unclear. In this study we measured combinational expression of five inhibitory receptors (IRs) on seven immune cell subsets and 16 plasma markers from peripheral blood mononuclear cells (PBMC) and plasma samples, respectively, from a HIV and Aging cohort comprised of ART-suppressed HIV-infected and uninfected controls stratified by age (?35 or ?50 years old). For data analysis, multiple multivariate computational algorithms [cluster identification, characterization, and regression (CITRUS), partial least squares regression (PLSR), and partial least squares-discriminant analysis (PLS-DA)] were used to determine if immune parameter disparities can distinguish the subject groups and to investigate if there is a cross-impact of aviremic HIV and age on immune signatures. IR expression on gamma delta (??) T cells exclusively separated HIV+ subjects from controls in CITRUS analyses and secretion of inflammatory cytokines and cytotoxic mediators from ?? T cells tracked with TIGIT expression among HIV+ subjects. Also, plasma markers predicted the percentages of TIGIT+ ?? T cells in subjects with and without HIV in PSLR models, and a PLS-DA model of ?? T cell IR signatures and plasma markers significantly stratified all four of the subject groups (uninfected younger, uninfected older, HIV+ younger, and HIV+ older). These data implicate ?? T cells as an inflammatory driver in ART-suppressed HIV infection and provide evidence of distinct "inflamm-aging" processes with and without ART-suppressed HIV infection.
Project description:Dendrobium is the largest genus of orchids most of which have excellent medicinal properties. Fresh stems of some species have been consumed in daily life by Asians for thousands of years. However, there are differences in flavour and clinical efficacy among different species. Therefore, it is necessary for a detector to establish an effective and rapid method controlling botanical origins of these crude materials. In our study, three spectroscopies including mid-infrared (MIR) (transmission and reflection mode) and near-infrared (NIR) spectra were investigated for authentication of 12 Dendrobium species. Generally, two fusion strategies, reflection MIR and NIR spectra, were combined with three mathematical models (random forest, support vector machine with grid search (SVM-GS) and partial least-squares discrimination analysis (PLS-DA)) for discrimination analysis. In conclusion, a low-level fusion strategy comprising two spectra after pretreated by the second derivative and multiplicative scatter correction was recommended for discrimination analysis because of its excellent performance in three models. Compared with MIR spectra, NIR spectra were more responsible for the discrimination according to a bi-plot analysis of PLS-DA. Moreover, SVM-GS and PLS-DA were suitable for accurate discrimination (100% accuracy rates) of calibration and validation sets. The protocol combined with low-level fusion strategy and chemometrics provides a rapid and effective reference for control of botanical origins in crude Dendrobium materials.
Project description:INTRODUCTION:Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models. OBJECTIVES:We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis. METHODS:We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks. RESULTS:There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice. CONCLUSION:The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm.