ABSTRACT: Multi-sensor data fusion technology based on Dempster-Shafer evidence theory is widely applied in many fields. However, how to determine basic belief assignment (BBA) is still an open issue. The existing BBA methods pay more attention to the uncertainty of information, but do not simultaneously consider the reliability of information sources. Real-world information is not only uncertain, but also partially reliable. Thus, uncertainty and partial reliability are strongly associated with each other. To take into account this fact, a new method to represent BBAs along with their associated reliabilities is proposed in this paper, which is named reliability-based BBA. Several examples are carried out to show the validity of the proposed method.
Project description:Decision-in decision-out fusion architecture can be used to fuse the outputs of multiple classifiers from different diagnostic sources. In this paper, Dempster-Shafer Theory (DST) has been used to fuse classification results of breast cancer data from two different sources: gene-expression patterns in peripheral blood cells and Fine-Needle Aspirate Cytology (FNAc) data. Classification of individual sources is done by Support Vector Machine (SVM) with linear, polynomial and Radial Base Function (RBF) kernels. Out put belief of classifiers of both data sources are combined to arrive at one final decision. Dynamic uncertainty assessment is based on class differentiation of the breast cancer. Experimental results have shown that the new proposed breast cancer data fusion methodology have outperformed single classification models.
Project description:Our previous study demonstrated the application of the Dempster-Shafer theory of evidence to dose/volume/outcome data analysis. Specifically, it provided Yager's rule to fuse data from different institutions pertaining to radiotherapy pneumonitis versus mean lung dose. The present work is a follow-on study that employs the optimal unified combination rule, which optimizes data similarity among independent sources. Specifically, we construct belief and plausibility functions on the lung cancer radiotherapy dose outcome datasets, and then apply the optimal unified combination rule to obtain combined belief and plausibility, which bound the probabilities of pneumonitis incidence. To estimate the incidence of pneumonitis at any value of mean lung dose, we use the Lyman-Kutcher-Burman (LKB) model to fit the combined belief and plausibility curves. The results show that the optimal unified combination rule yields a narrower uncertainty range (as represented by the belief-plausibility range) than Yager's rule, which is also theoretically proven.
Project description:This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method.
Project description:In the theory of belief functions, the approximation of a basic belief assignment (BBA) is for reducing the high computational cost especially when large number of focal elements are available. In traditional BBA approximation approaches, a focal element's own characteristics such as the mass assignment and the cardinality, are usually used separately or jointly as criteria for the removal of focal elements. Besides the computational cost, the distance between the original BBA and the approximated one is also concerned, which represents the loss of information in BBA approximation. In this paper, an iterative approximation approach is proposed based on maximizing the closeness, i.e., minimizing the distance between the approximated BBA in current iteration and the BBA obtained in the previous iteration, where one focal element is removed in each iteration. The iteration stops when the desired number of focal elements is reached. The performance evaluation approaches for BBA approximations are also discussed and used to compare and evaluate traditional BBA approximations and the newly proposed one in this paper, which include traditional time-based way, closeness-based way and new proposed ones. Experimental results and related analyses are provided to show the rationality and efficiency of our proposed new BBA approximation.
Project description:Human involvement influences traditional service quality evaluations, which triggers an evaluation's low accuracy, poor reliability and less impressive predictability. This paper proposes a method by employing a support vector machine (SVM) and Dempster-Shafer evidence theory to evaluate the service quality of a production process by handling a high number of input features with a low sampling data set, which is called SVMs-DS. Features that can affect production quality are extracted by a large number of sensors. Preprocessing steps such as feature simplification and normalization are reduced. Based on three individual SVM models, the basic probability assignments (BPAs) are constructed, which can help the evaluation in a qualitative and quantitative way. The process service quality evaluation results are validated by the Dempster rules; the decision threshold to resolve conflicting results is generated from three SVM models. A case study is presented to demonstrate the effectiveness of the SVMs-DS method.
Project description:The databases included on this article refers to variables and parameters belonging to the Space Traffic Management (STM), Evidence Theory and Machine Learning (ML) fields. They have been used for implementing ML for autonomously predict risk associated to a close encounter between two space (Sanchez and Vasile, On the Use of Machine Learning and Evidence Theory to Improve Collision Risk Management, Acta Astronautica, Special Issue for ICSSA2020, In Press ). The position of the objected is assumed to be affected by epistemic uncertainty, which has been modeled according to Dempster-Shafer Evidence theory (DSt) . Six datasets are presented. Two (DB1 and DB2, respectively) include samples of space object close encounters subject to epistemic uncertainty on the relative position. Other two databases (DB3 and DB4, respectively) include the values of the Cumulative Plausibility and Belief Curves (CPC and CBC, respectively) of each sample included in DB1. The remaining databases (DB5 and DB6), contain the value of the CPC and CBC of each sample included in DB2. All of them are synthetic databases created using computer simulation to obtain the results presented in . DB1 database is constituted by 9,000 samples and 45 columns and a header, while DB2 is formed by 28,800 samples and 45 columns and a header. These databases come from a set of, respectively, 5 and 14 different families of encounter geometries defined by the range of values that can be assigned to the bounds of the intervals for the uncertain variables, assumed to be affected by epistemic uncertainty, considered to have been provided by two sources of information. The uncertain variables are: the miss distance, [µx, µy], on the impact plane (B plane), the standard deviation of the relative position projected on the B plane, [?x, ?y], and the Hard Body Radius of the combined objects, HBR. The dataset is completed with STM related parameters: miss distance and covariance matrix of the uncertain ellipse projected on the B plane enclosing all samples defined by the uncertainty intervals, the Probability of Collision (PC ) of this ellipse or the elapsed time to the Time of Closest Approach (TCA); with DSt related parameters: Belief and Plausibility of certain values of Pc; and the class of the event according to the classification detailed in . DB3 and DB4 are constituted by 34 columns and 9000 rows containing the Plausibility and Belief for Pc values and the corresponding Probabilities of Collision necessary to build the CPC and CBC of the events in DB1, while DB5 and DB6 are constituted by 34 columns and 28,800 rows containing the Plausibility and Belief for Pc values and the corresponding Probabilities of Collision values necessary to build the CPC and CBC of the events in DB2. These databases have a potential usage by the ML community interested in STM as well as for the space community, especially, space operators interested in introduce epistemic uncertainty on collision risk assessment. These databases contribute to build a scarce field such as the databases of encounter events .
Project description:Although the concept of genomic selection relies on linkage disequilibrium (LD) between quantitative trait loci and markers, reliability of genomic predictions is strongly influenced by family relationships. In this study, we investigated the effects of LD and family relationships on reliability of genomic predictions and the potential of deterministic formulas to predict reliability using population parameters in populations with complex family structures. Five groups of selection candidates were simulated by taking different information sources from the reference population into account: (1) allele frequencies, (2) LD pattern, (3) haplotypes, (4) haploid chromosomes, and (5) individuals from the reference population, thereby having real family relationships with reference individuals. Reliabilities were predicted using genomic relationships among 529 reference individuals and their relationships with selection candidates and with a deterministic formula where the number of effective chromosome segments (M(e)) was estimated based on genomic and additive relationship matrices for each scenario. At a heritability of 0.6, reliabilities based on genomic relationships were 0.002 ± 0.0001 (allele frequencies), 0.022 ± 0.001 (LD pattern), 0.018 ± 0.001 (haplotypes), 0.100 ± 0.008 (haploid chromosomes), and 0.318 ± 0.077 (family relationships). At a heritability of 0.1, relative differences among groups were similar. For all scenarios, reliabilities were similar to predictions with a deterministic formula using estimated M(e). So, reliabilities can be predicted accurately using empirically estimated M(e) and level of relationship with reference individuals has a much higher effect on the reliability than linkage disequilibrium per se. Furthermore, accumulated length of shared haplotypes is more important in determining the reliability of genomic prediction than the individual shared haplotype length.
Project description:UNLABELLED:A method is described for the analysis of the results obtained from the docking studies applied on a protein target and small molecules chemical compounds as ligands from various sources using different docking tools. We show the use of Dempster Shafer Theory (DST) to select the high ranking top compounds for further analysis and consideration. AVAILABILITY:Application is freely available at http://allamapparao.org/dst/
Project description:In order to improve the detection accuracy for the quality of wheat, a recognition method for wheat quality using the terahertz (THz) spectrum and multi-source information fusion technology is proposed. Through a combination of the absorption and the refractive index spectra of samples of normal, germinated, moldy, and worm-eaten wheat, support vector machine (SVM) and Dempster-Shafer (DS) evidence theory with different kernel functions were used to establish a classification fusion model for the multiple optical indexes of wheat. The results showed that the recognition rate of the fusion model for wheat samples can be as high as 96%. Furthermore, this approach was compared to the regression model based on single-spectrum analysis. The results indicate that the average recognition rates of fusion models for wheat can reach 90%, and the recognition rate of the SVM radial basis function (SVM-RBF) fusion model can reach 97.5%. The preliminary results indicated that THz-TDS combined with DS evidence theory analysis was suitable for the determination of the wheat quality with better detection accuracy.
Project description:The Centers for Medicare and Medicaid Services will introduce the reporting of patient surveys in 2008. The Consumer Assessment of Health Care Providers and Systems (CAHPS) Hospital Survey contains 18 questions about hospital care. Internal consistency reliability of the discharge information scale is relatively low and some important domains of care are not represented.To determine whether adding questions increases the reliability and validity of the survey.Surveys of patients at 181 hospitals participating in the California Hospitals Assessment and Reporting Taskforce (CHART), an initiative for voluntary public reporting of hospital performance in California.CHART added nine questions to the CAHPS Hospital Survey; two to improve reliability of the discharge information domain, five to create a coordination of care domain, and two relating to interpreter services.Surveys were sent to randomly selected patients from each CHART hospital.A total of 40,172 surveys were included. Adding the new discharge information questions improved the internal consistency reliability from 0.45 to 0.72 and the hospital-level reliability from 0.75 to 0.81. New coordination of care composites had good internal consistency reliabilities ranging from 0.58 to 0.70 and hospital-level reliabilities ranging from 0.84 to 0.87. The new coordination of care composites were more closely correlated with overall hospital ratings and willingness to recommend than six of the seven original domains.The additional discharge information questions and the new coordination of care questions significantly improved the psychometric properties of the CAHPS Hospital Survey.