An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images.
ABSTRACT: This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method.
Project description:Multi-sensor data fusion technology based on Dempster-Shafer evidence theory is widely applied in many fields. However, how to determine basic belief assignment (BBA) is still an open issue. The existing BBA methods pay more attention to the uncertainty of information, but do not simultaneously consider the reliability of information sources. Real-world information is not only uncertain, but also partially reliable. Thus, uncertainty and partial reliability are strongly associated with each other. To take into account this fact, a new method to represent BBAs along with their associated reliabilities is proposed in this paper, which is named reliability-based BBA. Several examples are carried out to show the validity of the proposed method.
Project description:UNLABELLED:A method is described for the analysis of the results obtained from the docking studies applied on a protein target and small molecules chemical compounds as ligands from various sources using different docking tools. We show the use of Dempster Shafer Theory (DST) to select the high ranking top compounds for further analysis and consideration. AVAILABILITY:Application is freely available at http://allamapparao.org/dst/
Project description:Decision-in decision-out fusion architecture can be used to fuse the outputs of multiple classifiers from different diagnostic sources. In this paper, Dempster-Shafer Theory (DST) has been used to fuse classification results of breast cancer data from two different sources: gene-expression patterns in peripheral blood cells and Fine-Needle Aspirate Cytology (FNAc) data. Classification of individual sources is done by Support Vector Machine (SVM) with linear, polynomial and Radial Base Function (RBF) kernels. Out put belief of classifiers of both data sources are combined to arrive at one final decision. Dynamic uncertainty assessment is based on class differentiation of the breast cancer. Experimental results have shown that the new proposed breast cancer data fusion methodology have outperformed single classification models.
Project description:Human involvement influences traditional service quality evaluations, which triggers an evaluation's low accuracy, poor reliability and less impressive predictability. This paper proposes a method by employing a support vector machine (SVM) and Dempster-Shafer evidence theory to evaluate the service quality of a production process by handling a high number of input features with a low sampling data set, which is called SVMs-DS. Features that can affect production quality are extracted by a large number of sensors. Preprocessing steps such as feature simplification and normalization are reduced. Based on three individual SVM models, the basic probability assignments (BPAs) are constructed, which can help the evaluation in a qualitative and quantitative way. The process service quality evaluation results are validated by the Dempster rules; the decision threshold to resolve conflicting results is generated from three SVM models. A case study is presented to demonstrate the effectiveness of the SVMs-DS method.
Project description:In order to improve the detection accuracy for the quality of wheat, a recognition method for wheat quality using the terahertz (THz) spectrum and multi-source information fusion technology is proposed. Through a combination of the absorption and the refractive index spectra of samples of normal, germinated, moldy, and worm-eaten wheat, support vector machine (SVM) and Dempster-Shafer (DS) evidence theory with different kernel functions were used to establish a classification fusion model for the multiple optical indexes of wheat. The results showed that the recognition rate of the fusion model for wheat samples can be as high as 96%. Furthermore, this approach was compared to the regression model based on single-spectrum analysis. The results indicate that the average recognition rates of fusion models for wheat can reach 90%, and the recognition rate of the SVM radial basis function (SVM-RBF) fusion model can reach 97.5%. The preliminary results indicated that THz-TDS combined with DS evidence theory analysis was suitable for the determination of the wheat quality with better detection accuracy.
Project description:Image segmentation, as a key step of microarray image processing, is crucial for obtaining the spot expressions simultaneously. However, state-of-art clustering-based segmentation algorithms are sensitive to noises. To solve this problem and improve the segmentation accuracy, in this article, several improvements are introduced into the fast and simple clustering methods (K-means and Fuzzy C means). Firstly, a contrast enhancement algorithm is implemented in image preprocessing to improve the gridding precision. Secondly, the data-driven means are proposed for cluster center initialization, instead of usual random setting. The third improvement is that the multi features, including intensity features, spatial features, and shape features, are implemented in feature selection to replace the sole pixel intensity feature used in the traditional clustering-based methods to avoid taking noises as spot pixels. Moreover, the principal component analysis is adopted for various feature extraction. Finally, an adaptive adjustment algorithm is proposed based on data mining and learning for further dealing with the missing spots or low contrast spots. Experiments on real and simulation data sets indicate that the proposed improvements made our proposed method obtains higher segmented precision than the traditional K-means and Fuzzy C means clustering methods.
Project description:Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. We proposed a multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data. First, we conducted initial filtering and SCnorm normalization. We considered various case studies by selecting different cluster numbers ( c l = 2 to a user-defined number), and applied fuzzy c-means clustering algorithm individually. From each case, we evaluated the scores of four cluster validity index measures, Partition Entropy ( P E ), Partition Coefficient ( P C ), Modified Partition Coefficient ( M P C ), and Fuzzy Silhouette Index ( F S I ). Next, we set the first measure as minimization objective (?) and the remaining three as maximization objectives (?), and then applied a multi-objective decision-making technique, TOPSIS, to identify the best optimal solution. The best optimal solution (case study) that had the highest TOPSIS score was selected as the final optimal clustering. Finally, we obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters. We applied our approach to a scRNA-seq dataset for the rare intestinal cell type in mice [GEO ID: GSE62270, 23,630 features (genes) and 288 cells]. The optimal cluster result (TOPSIS optimal score= 0.858) comprised two clusters, one with 115 cells and the other 91 cells. The evaluated scores of the four cluster validity indices, F S I , P E , P C , and M P C for the optimized fuzzy clustering were 0.482, 0.578, 0.607, and 0.215, respectively. The Limma analysis identified 1240 DEGs (cluster 1 vs. cluster 2). The top ten gene markers were Rps21, Slc5a1, Crip1, Rpl15, Rpl3, Rpl27a, Khk, Rps3a1, Aldob and Rps17. In this list, Khk (encoding ketohexokinase) is a novel marker for the rare intestinal cell type. In summary, this method is useful to detect cell clusters from scRNA-seq data.
Project description:In k Nearest Neighbor (kNN) classifier, a query instance is classified based on the most frequent class of its nearest neighbors among the training instances. In imbalanced datasets, kNN becomes biased towards the majority instances of the training space. To solve this problem, we propose a method called Proximity weighted Evidential kNN classifier. In this method, each neighbor of a query instance is considered as a piece of evidence from which we calculate the probability of class label given feature values to provide more preference to the minority instances. This is then discounted by the proximity of the neighbor to prioritize the closer instances in the local neighborhood. These evidences are then combined using Dempster-Shafer theory of evidence. A rigorous experiment over 30 benchmark imbalanced datasets shows that our method performs better compared to 12 popular methods. In pairwise comparison of these 12 methods with our method, in the best case, our method wins in 29 datasets, and in the worst case it wins in least 19 datasets. More importantly, according to Friedman test the proposed method ranks higher than all other methods in terms of AUC at 5% level of significance.
Project description:Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.
Project description:Network anomaly detection has been focused on by more people with the fast development of computer network. Some researchers utilized fusion method and DS evidence theory to do network anomaly detection but with low performance, and they did not consider features of network-complicated and varied. To achieve high detection rate, we present a novel network anomaly detection system with optimized Dempster-Shafer evidence theory (ODS) and regression basic probability assignment (RBPA) function. In this model, we add weights for each sensor to optimize DS evidence theory according to its previous predict accuracy. And RBPA employs sensor's regression ability to address complex network. By four kinds of experiments, we find that our novel network anomaly detection model has a better detection rate, and RBPA as well as ODS optimization methods can improve system performance significantly.