Original plant traceability of Dendrobium species using multi-spectroscopy fusion and mathematical models.
ABSTRACT: Dendrobium is the largest genus of orchids most of which have excellent medicinal properties. Fresh stems of some species have been consumed in daily life by Asians for thousands of years. However, there are differences in flavour and clinical efficacy among different species. Therefore, it is necessary for a detector to establish an effective and rapid method controlling botanical origins of these crude materials. In our study, three spectroscopies including mid-infrared (MIR) (transmission and reflection mode) and near-infrared (NIR) spectra were investigated for authentication of 12 Dendrobium species. Generally, two fusion strategies, reflection MIR and NIR spectra, were combined with three mathematical models (random forest, support vector machine with grid search (SVM-GS) and partial least-squares discrimination analysis (PLS-DA)) for discrimination analysis. In conclusion, a low-level fusion strategy comprising two spectra after pretreated by the second derivative and multiplicative scatter correction was recommended for discrimination analysis because of its excellent performance in three models. Compared with MIR spectra, NIR spectra were more responsible for the discrimination according to a bi-plot analysis of PLS-DA. Moreover, SVM-GS and PLS-DA were suitable for accurate discrimination (100% accuracy rates) of calibration and validation sets. The protocol combined with low-level fusion strategy and chemometrics provides a rapid and effective reference for control of botanical origins in crude Dendrobium materials.
Project description:Origin traceability is important for controlling the effect of Chinese medicinal materials and Chinese patent medicines. Paris polyphylla var. yunnanensis is widely distributed and well-known all over the world. In our study, two spectroscopic techniques (Fourier transform mid-infrared (FT-MIR) and near-infrared (NIR)) were applied for the geographical origin traceability of 196 wild P. yunnanensis samples combined with low-, mid-, and high-level data fusion strategies. Partial least squares discriminant analysis (PLS-DA) and random forest (RF) were used to establish classification models. Feature variables extraction (principal component analysis-PCA) and important variables selection models (recursive feature elimination and Boruta) were applied for geographical origin traceability, while the classification ability of models with the former model is better than with the latter. FT-MIR spectra are considered to contribute more than NIR spectra. Besides, the result of high-level data fusion based on principal components (PCs) feature variables extraction is satisfactory with an accuracy of 100%. Hence, data fusion of FT-MIR and NIR signals can effectively identify the geographical origin of wild P. yunnanensis.
Project description:This study investigated the possibility of using visible and near-infrared (VIS/NIR) hyperspectral imaging techniques to discriminate viable and non-viable wheat seeds. Both sides of individual seeds were subjected to hyperspectral imaging (400-1000 nm) to acquire reflectance spectral data. Four spectral datasets, including the ventral groove side, reverse side, mean (the mean of two sides' spectra of every seed), and mixture datasets (two sides' spectra of every seed), were used to construct the models. Classification models, partial least squares discriminant analysis (PLS-DA), and support vector machines (SVM), coupled with some pre-processing methods and successive projections algorithm (SPA), were built for the identification of viable and non-viable seeds. Our results showed that the standard normal variate (SNV)-SPA-PLS-DA model had high classification accuracy for whole seeds (>85.2%) and for viable seeds (>89.5%), and that the prediction set was based on a mixed spectral dataset by only using 16 wavebands. After screening with this model, the final germination of the seed lot could be higher than 89.5%. Here, we develop a reliable methodology for predicting the viability of wheat seeds, showing that the VIS/NIR hyperspectral imaging is an accurate technique for the classification of viable and non-viable wheat seeds in a non-destructive manner.
Project description:<h4>Background</h4>In recent years, genetically modified technology has developed rapidly, and the potential impact of genetically modified foods on human health and the ecological environment has received increasing attention. The currently used methods for testing genetically modified foods are cumbersome, time-consuming, and expensive. This paper proposed a more efficient and convenient detection method.<h4>Methods</h4>Near-infrared diffuse reflectance spectroscopy (NIRDRS) combined with multivariate calibration methods, including principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and support vector machines (SVM), were used for identification of different rice varieties and transgenic (Bt63)/non-transgenic rice. Spectral pretreatment methods, including Norris-Williams smooth (NWS), standard normal variate (SNV), multiplicative scatter correction (MSC), and Savitzky-Golay 1st derivative (SG 1st-Der), were used for spectral noise reduction and effective information enhancement. Accuracy was used to evaluate the qualitative discriminant models.<h4>Results</h4>The results showed that the SG 1st-Der pretreatment method, combined with the SVM, provided the optimal model to distinguish different rice varieties. The accuracy of the optimal model was 98.33%. For the discrimination model of transgenic/non-transgenic rice, the SNV-SVM model, MSC-SVM model, and SG 1st-Der-PLS-DA model all achieved good analysis results with the accuracy of 100%.<h4>Conclusion</h4>The results showed that portable NIR spectroscopy combined with chemometrics methods could be used to identify rice varieties and transgenic characteristics (Bt63) due to its fast, non-destructive, and accurate advantages.
Project description:The activities of enzymes are the basis of evaluating the quality of honey. Beekeepers usually use concentrators to process natural honey into concentrated honey by concentrating it under high temperatures. Active enzymes are very sensitive to high temperatures and will lose their activity when they exceed a certain temperature. The objective of this work is to study the kinetic mechanism of the temperature effect on diastase activity and to develop a nondestructive approach for quick determination of the diastase activity of honey through a heating process based on visible and near-infrared (Vis/NIR) spectroscopy. A total of 110 samples, including three species of botanical origin, were used for this study. To explore the kinetic mechanism of diastase activity under high temperatures, the honey of three kinds of botanical origins were processed with thermal treatment to obtain a variety of diastase activity. Diastase activity represented with diastase number (DN) was measured according to the national standard method. The results showed that the diastase activity decreased with the increase of temperature and heating time, and the sensitivity of acacia and longan to temperature was higher than linen. The optimum temperature for production and processing is 60 °C. Unsupervised clustering analysis was adopted to detect spectral characteristics of these honeys, indicating that different botanical origins of honeys can be distinguished in principal component spaces. Partial least squares (PLS) and least squares-support vector machine (LS-SVM) algorithms were applied to develop quantitative relationships between Vis/NIR spectroscopy and diastase activity. The best result was obtained through Gaussian filter smoothing-standard normal variate (GF-SNV) pretreatment and the LS-SVM model, known as GF-SNV-LS-SVM, with a determination coefficient (R²) of prediction of 0.8872, and root mean square error (RMSE) of prediction of 0.2129. The overall results of this paper showed that the diastase activity of honey can be determined quickly and non-destructively with Vis/NIR spectral methods, which can be used to detect DN in the process of honey production and processing, and to maximize the nutrient content of honey.
Project description:Paris polyphylla, as a traditional herb with long history, has been widely used to treat diseases in multiple nationalities of China. Nevertheless, the quality of P. yunnanensis fluctuates among from different geographical origins, so that a fast and accurate classification method was necessary for establishment. In our study, the geographical origin identification of 462 P. yunnanensis rhizome and leaf samples from Kunming, Yuxi, Chuxiong, Dali, Lijiang, and Honghe were analyzed by Fourier transform mid infrared (FT-MIR) spectra, combined with partial least squares discriminant analysis (PLS-DA), random forest (RF), and hierarchical cluster analysis (HCA) methods. The obvious cluster tendency of rhizomes and leaves FT-MIR spectra was displayed by principal component analysis (PCA). The distribution of the variable importance for the projection (VIP) was more uniform than the important variables obtained by RF, while PLS-DA models obtained higher classification abilities. Hence, a PLS-DA model was more suitably used to classify the different geographical origins of P. yunnanensis than the RF model. Additionally, the clustering results of different geographical origins obtained by HCA dendrograms also proved the chemical information difference between rhizomes and leaves. The identification performances of PLS-DA and the RF models of leaves FT-MIR matrixes were better than those of rhizomes datasets. In addition, the model classification abilities of combination datasets were higher than the individual matrixes of rhizomes and leaves spectra. Our study provides a reference to the rational utilization of resources, as well as a fast and accurate identification research for P. yunnanensis samples.
Project description:Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real-world applications, e.g., quality assurance and process monitoring. Specifically, variability in sample, system, and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a nonlinear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that the application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), because of its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data-highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples, as well as in related areas of forensic and biological sample analysis.
Project description:Due to the existence of Lingzhi adulteration, there is a growing demand for species classification of medicinal mushrooms by various techniques. The objective of this study was to explore a rapid and reliable way to distinguish between different Lingzhi species and compare the influence of data pretreatment methods on the recognition results. To this end, 120 fresh fruiting bodies of Lingzhi were collected, and all of them were analyzed by attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR). Random forest (RF), support vector machine (SVM) and partial least squares discriminant analysis (PLS-DA) classification models were established for raw and pretreated second derivative (SD) spectral matrices to authenticate different Lingzhi species. The results of multivariate statistical analysis indicated that the SD preprocessing method displayed a higher classification ability, which may be attributed to the analysis of powder samples that requires removal of overlapping peaks and baseline shifts. Compared with RF, the results of the SVM and PLS-DA methods were more satisfying, and their accuracies for the test set were both 100%. Among SVM and PLS-DA, the training set and test set accuracy of PLS-DA were both 100%. In conclusion, ATR-FTIR spectroscopy data pretreated by SD combined with PLS-DA is a simple, rapid, non-destructive and relatively inexpensive method to discriminate between mushroom species and provide a good reference to quality assessment.
Project description:Near infrared spectra (NIR) technology is a widespread detection method with high signal to noise ratio (SNR) while has poor modeling interpretation due to the overlapped features. Alternatively, mid-infrared spectra (MIR) technology demonstrates more chemical features and gives a better explanation of the model. Yet, it has the defects of low SNR. With the purpose of developing a model with plenty of characteristics as well as with higher SNR, NIR and MIR technologies are combined to perform high-level fusion strategy for quantitative analysis. A novel chemometrical method named as Mahalanobis distance weighted (MDW) is proposed to integrate NIR and MIR techniques comprehensively. Mahalanobis distance (MD) based on the principle of spectral similarity is obtained to calculate the weight of each sample. Specifically, the weight is assigned to the inverse ratio of the corresponding MD. Besides, the proposed MDW method is applied to NIR and MIR spectra of active ingredients in deltamethrin and emamectin benzoate formulations for quantitative analysis. As a consequence, the overall results show that the MDW method is promising with noticeable improvement of predictive performance than individual methods when executing high-level fusion for quantitative analysis.
Project description:Honey is one of the food commodities most frequently affected by fraud. Although addition of extraneous sugars is the most common type of fraud, analytical methods are also needed to detect origin masking and misdescription of botanical variety. In this work, multivariate analysis of the content of certain macro- and trace elements, determined by energy-dispersive X-ray fluorescence (ED-XRF) without any type of sample treatment, were used to classify honeys according to botanical variety and geographical origin. Principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) were used to create classification models for nine different botanical varieties-orange, robinia, lavender, rosemary, thyme, lime, chestnut, eucalyptus and manuka-and seven different geographical origins-Italy, Romania, Spain, Portugal, France, Hungary and New Zealand. Although characterised by 100% sensitivity, PCA models lacked specificity. The PLS-DA models constructed for specific combinations of botanical variety-country (BV-C) allowed the successful classification of honey samples, which was verified by external validation samples. Graphical abstract.
Project description:Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000-4000 cm-1) and Fourier transform mid-infrared (MIR: 4000-600 cm-1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen's kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.