100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox.
ABSTRACT: The most widely spread measure of performance, accuracy, suffers from a paradox: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. Despite optimizing classification error rate, high accuracy models may fail to capture crucial information transfer in the classification task. We present evidence of this behavior by means of a combinatorial analysis where every possible contingency matrix of 2, 3 and 4 classes classifiers are depicted on the entropy triangle, a more reliable information-theoretic tool for classification assessment. Motivated by this, we develop from first principles a measure of classification performance that takes into consideration the information learned by classifiers. We are then able to obtain the entropy-modulated accuracy (EMA), a pessimistic estimate of the expected accuracy with the influence of the input distribution factored out, and the normalized information transfer factor (NIT), a measure of how efficient is the transmission of information from the input to the output set of classes. The EMA is a more natural measure of classification performance than accuracy when the heuristic to maximize is the transfer of information through the classifier instead of classification error count. The NIT factor measures the effectiveness of the learning process in classifiers and also makes it harder for them to "cheat" using techniques like specialization, while also promoting the interpretability of results. Their use is demonstrated in a mind reading task competition that aims at decoding the identity of a video stimulus based on magnetoencephalography recordings. We show how the EMA and the NIT factor reject rankings based in accuracy, choosing more meaningful and interpretable classifiers.
Project description:Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.
Project description:BACKGROUND: Accurate diagnosis of cancer subtypes remains a challenging problem. Building classifiers based on gene expression data is a promising approach; yet the selection of non-redundant but relevant genes is difficult. The selected gene set should be small enough to allow diagnosis even in regular clinical laboratories and ideally identify genes involved in cancer-specific regulatory pathways. Here an entropy-based method is proposed that selects genes related to the different cancer classes while at the same time reducing the redundancy among the genes. RESULTS: The present study identifies a subset of features by maximizing the relevance and minimizing the redundancy of the selected genes. A merit called normalized mutual information is employed to measure the relevance and the redundancy of the genes. In order to find a more representative subset of features, an iterative procedure is adopted that incorporates an initial clustering followed by data partitioning and the application of the algorithm to each of the partitions. A leave-one-out approach then selects the most commonly selected genes across all the different runs and the gene selection algorithm is applied again to pare down the list of selected genes until a minimal subset is obtained that gives a satisfactory accuracy of classification. The algorithm was applied to three different data sets and the results obtained were compared to work done by others using the same data sets. CONCLUSION: This study presents an entropy-based iterative algorithm for selecting genes from microarray data that are able to classify various cancer sub-types with high accuracy. In addition, the feature set obtained is very compact, that is, the redundancy between genes is reduced to a large extent. This implies that classifiers can be built with a smaller subset of genes.
Project description:Different performance measures are used to assess the behaviour, and to carry out the comparison, of classifiers in Machine Learning. Many measures have been defined on the literature, and among them, a measure inspired by Shannon's entropy named the Confusion Entropy (CEN). In this work we introduce a new measure, MCEN, by modifying CEN to avoid its unwanted behaviour in the binary case, that disables it as a suitable performance measure in classification. We compare MCEN with CEN and other performance measures, presenting analytical results in some particularly interesting cases, as well as some heuristic computational experimentation.
Project description:OBJECTIVES:Autism spectrum disorders (ASD) are diagnosed based on early-manifesting clinical symptoms, including markedly impaired social communication. We assessed the viability of resting-state functional MRI (rs-fMRI) connectivity measures as diagnostic biomarkers for ASD and investigated which connectivity features are predictive of a diagnosis. METHODS:Rs-fMRI scans from 59 high functioning males with ASD and 59 age- and IQ-matched typically developing (TD) males were used to build a series of machine learning classifiers. Classification features were obtained using 3 sets of brain regions. Another set of classifiers was built from participants' scores on behavioral metrics. An additional age and IQ-matched cohort of 178 individuals (89 ASD; 89 TD) from the Autism Brain Imaging Data Exchange (ABIDE) open-access dataset (http://fcon_1000.projects.nitrc.org/indi/abide/) were included for replication. RESULTS:High classification accuracy was achieved through several rs-fMRI methods (peak accuracy 76.67%). However, classification via behavioral measures consistently surpassed rs-fMRI classifiers (peak accuracy 95.19%). The class probability estimates, P(ASD|fMRI data), from brain-based classifiers significantly correlated with scores on a measure of social functioning, the Social Responsiveness Scale (SRS), as did the most informative features from 2 of the 3 sets of brain-based features. The most informative connections predominantly originated from regions strongly associated with social functioning. CONCLUSIONS:While individuals can be classified as having ASD with statistically significant accuracy from their rs-fMRI scans alone, this method falls short of biomarker standards. Classification methods provided further evidence that ASD functional connectivity is characterized by dysfunction of large-scale functional networks, particularly those involved in social information processing.
Project description:Machine learning algorithms hold the promise to effectively automate the analysis of histopathological images that are routinely generated in clinical practice. Any machine learning method used in the clinical diagnostic process has to be extremely accurate and, ideally, provide a measure of uncertainty for its predictions. Such accurate and reliable classifiers need enough labelled data for training, which requires time-consuming and costly manual annotation by pathologists. Thus, it is critical to minimise the amount of data needed to reach the desired accuracy by maximising the efficiency of training. We propose an accurate, reliable and active (ARA) image classification framework and introduce a new Bayesian Convolutional Neural Network (ARA-CNN) for classifying histopathological images of colorectal cancer. The model achieves exceptional classification accuracy, outperforming other models trained on the same dataset. The network outputs an uncertainty measurement for each tested image. We show that uncertainty measures can be used to detect mislabelled training samples and can be employed in an efficient active learning workflow. Using a variational dropout-based entropy measure of uncertainty in the workflow speeds up the learning process by roughly 45%. Finally, we utilise our model to segment whole-slide images of colorectal tissue and compute segmentation-based spatial statistics.
Project description:Data transformations prior to analysis may be beneficial in classification tasks. In this article we investigate a set of such transformations on 2D graph-data derived from facial images and their effect on classification accuracy in a high-dimensional setting. These transformations are low-variance in the sense that each involves only a fixed small number of input features. We show that classification accuracy can be improved when penalized regression techniques are employed, as compared to a principal component analysis (PCA) pre-processing step. In our data example classification accuracy improves from 47% to 62% when switching from PCA to penalized regression. A second goal is to visualize the resulting classifiers. We develop importance plots highlighting the influence of coordinates in the original 2D space. Features used for classification are mapped to coordinates in the original images and combined into an importance measure for each pixel. These plots assist in assessing plausibility of classifiers, interpretation of classifiers, and determination of the relative importance of different features.
Project description:The transfer entropy is a well-established measure of information flow, which quantifies directed influence between two stochastic time series and has been shown to be useful in a variety fields of science. Here we introduce the transfer entropy of the backward time series called the backward transfer entropy, and show that the backward transfer entropy quantifies how far it is from dynamics to a hidden Markov model. Furthermore, we discuss physical interpretations of the backward transfer entropy in completely different settings of thermodynamics for information processing and the gambling with side information. In both settings of thermodynamics and the gambling, the backward transfer entropy characterizes a possible loss of some benefit, where the conventional transfer entropy characterizes a possible benefit. Our result implies the deep connection between thermodynamics and the gambling in the presence of information flow, and that the backward transfer entropy would be useful as a novel measure of information flow in nonequilibrium thermodynamics, biochemical sciences, economics and statistics.
Project description:BACKGROUND:Class prediction models have been shown to have varying performances in clinical gene expression datasets. Previous evaluation studies, mostly done in the field of cancer, showed that the accuracy of class prediction models differs from dataset to dataset and depends on the type of classification function. While a substantial amount of information is known about the characteristics of classification functions, little has been done to determine which characteristics of gene expression data have impact on the performance of a classifier. This study aims to empirically identify data characteristics that affect the predictive accuracy of classification models, outside of the field of cancer. RESULTS:Datasets from twenty five studies meeting predefined inclusion and exclusion criteria were downloaded. Nine classification functions were chosen, falling within the categories: discriminant analyses or Bayes classifiers, tree based, regularization and shrinkage and nearest neighbors methods. Consequently, nine class prediction models were built for each dataset using the same procedure and their performances were evaluated by calculating their accuracies. The characteristics of each experiment were recorded, (i.e., observed disease, medical question, tissue/cell types and sample size) together with characteristics of the gene expression data, namely the number of differentially expressed genes, the fold changes and the within-class correlations. Their effects on the accuracy of a class prediction model were statistically assessed by random effects logistic regression. The number of differentially expressed genes and the average fold change had significant impact on the accuracy of a classification model and gave individual explained-variation in prediction accuracy of up to 72% and 57%, respectively. Multivariable random effects logistic regression with forward selection yielded the two aforementioned study factors and the within class correlation as factors affecting the accuracy of classification functions, explaining 91.5% of the between study variation. CONCLUSIONS:We evaluated study- and data-related factors that might explain the varying performances of classification functions in non-cancerous datasets. Our results showed that the number of differentially expressed genes, the fold change, and the correlation in gene expression data significantly affect the accuracy of class prediction models.
Project description:Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen's kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals.
Project description:Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification.We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED+SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control.The two RED classifiers achieved 80.9-83.0% in overall accuracy on the two datasets, which is 1.3-3% higher than SVM's accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1-10.3% of the total instances and 43.8-53.0% of SVM's misclassifications).Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance.