Attenuated Total Reflection-Fourier Transform Infrared Spectroscopy (ATR-FTIR) Combined with Chemometrics Methods for the Classification of Lingzhi Species.
ABSTRACT: Due to the existence of Lingzhi adulteration, there is a growing demand for species classification of medicinal mushrooms by various techniques. The objective of this study was to explore a rapid and reliable way to distinguish between different Lingzhi species and compare the influence of data pretreatment methods on the recognition results. To this end, 120 fresh fruiting bodies of Lingzhi were collected, and all of them were analyzed by attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR). Random forest (RF), support vector machine (SVM) and partial least squares discriminant analysis (PLS-DA) classification models were established for raw and pretreated second derivative (SD) spectral matrices to authenticate different Lingzhi species. The results of multivariate statistical analysis indicated that the SD preprocessing method displayed a higher classification ability, which may be attributed to the analysis of powder samples that requires removal of overlapping peaks and baseline shifts. Compared with RF, the results of the SVM and PLS-DA methods were more satisfying, and their accuracies for the test set were both 100%. Among SVM and PLS-DA, the training set and test set accuracy of PLS-DA were both 100%. In conclusion, ATR-FTIR spectroscopy data pretreated by SD combined with PLS-DA is a simple, rapid, non-destructive and relatively inexpensive method to discriminate between mushroom species and provide a good reference to quality assessment.
Project description:Volatile metabolites are currently under investigation as potential biomarkers for the detection and identification of pathogenic microorganisms, including bacteria, fungi, and viruses. Unlike bacteria and fungi, which produce distinct volatile metabolic signatures associated with innate differences in both primary and secondary metabolic processes, viruses are wholly reliant on the metabolic machinery of infected cells for replication and propagation. In the present study, the ability of volatile metabolites to discriminate between respiratory cells infected and uninfected with virus, in vitro, was investigated. Two important respiratory viruses, namely respiratory syncytial virus (RSV) and influenza A virus (IAV), were evaluated. Data were analyzed using three different machine learning algorithms (random forest (RF), linear support vector machines (linear SVM), and partial least squares-discriminant analysis (PLS-DA)), with volatile metabolites identified from a training set used to predict sample classifications in a validation set. The discriminatory performances of RF, linear SVM, and PLS-DA were comparable for the comparison of IAV-infected versus uninfected cells, with area under the receiver operating characteristic curves (AUROCs) between 0.78 and 0.82, while RF and linear SVM demonstrated superior performance in the classification of RSV-infected versus uninfected cells (AUROCs between 0.80 and 0.84) relative to PLS-DA (0.61). A subset of discriminatory features were assigned putative compound identifications, with an overabundance of hydrocarbons observed in both RSV- and IAV-infected cell cultures relative to uninfected controls. This finding is consistent with increased oxidative stress, a process associated with viral infection of respiratory cells.
Project description:Meningiomas are the commonest types of tumours in the central nervous system (CNS). It is a benign type of tumour divided into three WHO grades (I, II and III) associated with tumour growth rate and likelihood of recurrence, where surgical outcomes and patient treatments are dependent on the meningioma grade and histological subtype. The development of alternative approaches based on attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy could aid meningioma grade determination and its biospectrochemical profiling in an automated fashion. Herein, ATR-FTIR in combination with chemometric techniques is employed to distinguish grade I, grade II and grade I meningiomas that re-occurred. Ninety-nine patients were investigated in this study where their formalin-fixed paraffin-embedded (FFPE) brain tissue samples were analysed by ATR-FTIR spectroscopy. Subsequent classification was performed via principal component analysis plus linear discriminant analysis (PCA-LDA) and partial least squares plus discriminant analysis (PLS-DA). PLS-DA gave the best results where grade I and grade II meningiomas were discriminated with 79% accuracy, 80% sensitivity and 73% specificity, while grade I versus grade I recurrence and grade II versus grade I recurrence were discriminated with 94% accuracy (94% sensitivity and specificity) and 97% accuracy (97% sensitivity and 100% specificity), respectively. Several wavenumbers were identified as possible biomarkers towards tumour differentiation. The majority of these were associated with lipids, protein, DNA/RNA and carbohydrate alterations. These findings demonstrate the potential of ATR-FTIR spectroscopy towards meningioma grade discrimination as a fast, low-cost, non-destructive and sensitive tool for clinical settings. Graphical abstract Attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy was used to discriminate meningioma WHO grade I, grade II and grade I recurrence tumours.
Project description:Postmortem interval (PMI) evaluation remains a challenge in the forensic community due to the lack of efficient methods. Studies have focused on chemical analysis of biofluids for PMI estimation; however, no reports using spectroscopic methods in pericardial fluid (PF) are available. In this study, Fourier transform infrared (FTIR) spectroscopy with attenuated total reflectance (ATR) accessory was applied to collect comprehensive biochemical information from rabbit PF at different PMIs. The PMI-dependent spectral signature was determined by two-dimensional (2D) correlation analysis. The partial least square (PLS) and nu-support vector machine (nu-SVM) models were then established based on the acquired spectral dataset. Spectral variables associated with amide I, amide II, COO-, C-H bending, and C-O or C-OH vibrations arising from proteins, polypeptides, amino acids and carbohydrates, respectively, were susceptible to PMI in 2D correlation analysis. Moreover, the nu-SVM model appeared to achieve a more satisfactory prediction than the PLS model in calibration; the reliability of both models was determined in an external validation set. The study shows the possibility of application of ATR-FTIR methods in postmortem interval estimation using PF samples.
Project description:Paris polyphylla, as a traditional herb with long history, has been widely used to treat diseases in multiple nationalities of China. Nevertheless, the quality of P. yunnanensis fluctuates among from different geographical origins, so that a fast and accurate classification method was necessary for establishment. In our study, the geographical origin identification of 462 P. yunnanensis rhizome and leaf samples from Kunming, Yuxi, Chuxiong, Dali, Lijiang, and Honghe were analyzed by Fourier transform mid infrared (FT-MIR) spectra, combined with partial least squares discriminant analysis (PLS-DA), random forest (RF), and hierarchical cluster analysis (HCA) methods. The obvious cluster tendency of rhizomes and leaves FT-MIR spectra was displayed by principal component analysis (PCA). The distribution of the variable importance for the projection (VIP) was more uniform than the important variables obtained by RF, while PLS-DA models obtained higher classification abilities. Hence, a PLS-DA model was more suitably used to classify the different geographical origins of P. yunnanensis than the RF model. Additionally, the clustering results of different geographical origins obtained by HCA dendrograms also proved the chemical information difference between rhizomes and leaves. The identification performances of PLS-DA and the RF models of leaves FT-MIR matrixes were better than those of rhizomes datasets. In addition, the model classification abilities of combination datasets were higher than the individual matrixes of rhizomes and leaves spectra. Our study provides a reference to the rational utilization of resources, as well as a fast and accurate identification research for P. yunnanensis samples.
Project description:Dendrobium is the largest genus of orchids most of which have excellent medicinal properties. Fresh stems of some species have been consumed in daily life by Asians for thousands of years. However, there are differences in flavour and clinical efficacy among different species. Therefore, it is necessary for a detector to establish an effective and rapid method controlling botanical origins of these crude materials. In our study, three spectroscopies including mid-infrared (MIR) (transmission and reflection mode) and near-infrared (NIR) spectra were investigated for authentication of 12 Dendrobium species. Generally, two fusion strategies, reflection MIR and NIR spectra, were combined with three mathematical models (random forest, support vector machine with grid search (SVM-GS) and partial least-squares discrimination analysis (PLS-DA)) for discrimination analysis. In conclusion, a low-level fusion strategy comprising two spectra after pretreated by the second derivative and multiplicative scatter correction was recommended for discrimination analysis because of its excellent performance in three models. Compared with MIR spectra, NIR spectra were more responsible for the discrimination according to a bi-plot analysis of PLS-DA. Moreover, SVM-GS and PLS-DA were suitable for accurate discrimination (100% accuracy rates) of calibration and validation sets. The protocol combined with low-level fusion strategy and chemometrics provides a rapid and effective reference for control of botanical origins in crude Dendrobium materials.
Project description:Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real-world applications, e.g., quality assurance and process monitoring. Specifically, variability in sample, system, and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a nonlinear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that the application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), because of its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data-highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples, as well as in related areas of forensic and biological sample analysis.
Project description:INTRODUCTION:Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models. OBJECTIVES:We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis. METHODS:We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks. RESULTS:There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice. CONCLUSION:The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm.
Project description:Edible gelatin has been widely used as a food additive in the food industry, and illegal adulteration with industrial gelatin will cause serious harm to human health. The present work used laser-induced breakdown spectroscopy (LIBS) coupled with the partial least square-support vector machine (PLS-SVM) method for the fast and accurate estimation of edible gelatin adulteration. Gelatin samples with 11 different adulteration ratios were prepared by mixing pure edible gelatin with industrial gelatin, and the LIBS spectra were recorded to analyze their elemental composition differences. The PLS, SVM, and PLS-SVM models were separately built for the prediction of gelatin adulteration ratios, and the hybrid PLS-SVM model yielded a better performance than only the PLS and SVM models. Besides, four different variable selection methods, including competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MC-UVE), random frog (RF), and principal component analysis (PCA), were adopted to combine with the SVM model for comparative study; the results further demonstrated that the PLS-SVM model was superior to the other SVM models. This study reveals that the hybrid PLS-SVM model, with the advantages of low computational time and high prediction accuracy, can be employed as a preferred method for the accurate estimation of edible gelatin adulteration.
Project description:Multi-sensor data fusion can provide more comprehensive and more accurate analysis results. However, it also brings some redundant information, which is an important issue with respect to finding a feature-mining method for intuitive and efficient analysis. This paper demonstrates a feature-mining method based on variable accumulation to find the best expression form and variables' behavior affecting beer flavor. First, e-tongue and e-nose were used to gather the taste and olfactory information of beer, respectively. Second, principal component analysis (PCA), genetic algorithm-partial least squares (GA-PLS), and variable importance of projection (VIP) scores were applied to select feature variables of the original fusion set. Finally, the classification models based on support vector machine (SVM), random forests (RF), and extreme learning machine (ELM) were established to evaluate the efficiency of the feature-mining method. The result shows that the feature-mining method based on variable accumulation obtains the main feature affecting beer flavor information, and the best classification performance for the SVM, RF, and ELM models with 96.67%, 94.44%, and 98.33% prediction accuracy, respectively.
Project description:Fourier-transform infrared (FTIR) spectroscopy enables the chemical characterization and identification of pollen samples, leading to a wide range of applications, such as paleoecology and allergology. This is of particular interest in the identification of grass (Poaceae) species since they have pollen grains of very similar morphology. Unfortunately, the correct identification of FTIR microspectroscopy spectra of single pollen grains is hindered by strong spectral contributions from Mie scattering. Embedding of pollen samples in paraffin helps to retrieve infrared spectra without scattering artifacts. In this study, pollen samples from 10 different populations of five grass species (Anthoxanthum odoratum, Bromus inermis, Hordeum bulbosum, Lolium perenne, and Poa alpina) were embedded in paraffin, and their single grain spectra were obtained by FTIR microspectroscopy. Spectra were subjected to different preprocessing in order to suppress paraffin influence on spectral classification. It is shown that decomposition by non-negative matrix factorization (NMF) and extended multiplicative signal correction (EMSC) that utilizes a paraffin constituent spectrum, respectively, leads to good success rates for the classification of spectra with respect to species by a partial least square discriminant analysis (PLS-DA) model in full cross-validation for several species. PLS-DA, artificial neural network, and random forest classifiers were applied on the EMSC-corrected spectra using an independent validation to assign spectra from unknown populations to the species. Variation within and between species, together with the differences in classification results, is in agreement with the systematics within the Poaceae family. The results illustrate the great potential of FTIR microspectroscopy for automated classification and identification of grass pollen, possibly together with other, complementary methods for single pollen chemical characterization.