A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.
ABSTRACT: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column-ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process.We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios.The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology.
Project description:BACKGROUND: An ion mobility (IM) spectrometer coupled with a multi-capillary column (MCC) measures volatile organic compounds (VOCs) in the air or in exhaled breath. This technique is utilized in several biotechnological and medical applications. Each peak in an MCC/IM measurement represents a certain compound, which may be known or unknown. For clustering and classification of measurements, the raw data matrix must be reduced to a set of peaks. Each peak is described by its coordinates (retention time in the MCC and reduced inverse ion mobility) and shape (signal intensity, further shape parameters). This fundamental step is referred to as peak extraction. It is the basis for identifying discriminating peaks, and hence putative biomarkers, between two classes of measurements, such as a healthy control group and a group of patients with a confirmed disease. Current state-of-the-art peak extraction methods require human interaction, such as hand-picking approximate peak locations, assisted by a visualization of the data matrix. In a high-throughput context, however, it is preferable to have robust methods for fully automated peak extraction. RESULTS: We introduce PEAX, a modular framework for automated peak extraction. The framework consists of several steps in a pipeline architecture. Each step performs a specific sub-task and can be instantiated by different methods implemented as modules. We provide open-source software for the framework and several modules for each step. Additionally, an interface that allows easy extension by a new module is provided. Combining the modules in all reasonable ways leads to a large number of peak extraction methods. We evaluate all combinations using intrinsic error measures and by comparing the resulting peak sets with an expert-picked one. CONCLUSIONS: Our software PEAX is able to automatically extract peaks from MCC/IM measurements within a few seconds. The automatically obtained results keep up with the results provided by current state-of-the-art peak extraction methods. This opens a high-throughput context for the MCC/IM application field. Our software is available at http://www.rahmannlab.de/research/ims.
Project description:The manual review of an electroencephalogram (EEG) for seizure detection is a laborious and error-prone process. Thus, automated seizure detection based on machine learning has been studied for decades. Recently, deep learning has been adopted in order to avoid manual feature extraction and selection. In the present study, we systematically compared the performance of different combinations of input modalities and network structures on a fixed window size and dataset to ascertain an optimal combination of input modalities and network structures. The raw time-series EEG, periodogram of the EEG, 2D images of short-time Fourier transform results, and 2D images of raw EEG waveforms were obtained from 5-s segments of intracranial EEGs recorded from a mouse model of epilepsy. A fully connected neural network (FCNN), recurrent neural network (RNN), and convolutional neural network (CNN) were implemented to classify the various inputs. The classification results for the test dataset showed that CNN performed better than FCNN and RNN, with the area under the curve (AUC) for the receiver operating characteristics curves ranging from 0.983 to 0.984, from 0.985 to 0.989, and from 0.989 to 0.993 for FCNN, RNN, and CNN, respectively. As for input modalities, 2D images of raw EEG waveforms yielded the best result with an AUC of 0.993. Thus, CNN can be the most suitable network structure for automated seizure detection when applied to the images of raw EEG waveforms, since CNN can effectively learn a general spatially-invariant representation of seizure patterns in 2D representations of raw EEG.
Project description:New chest compression detection technology allows for the recording and graphical depiction of clinical cardiopulmonary resuscitation (CPR) chest compressions. The authors sought to determine the inter-rater reliability of chest compression pattern classifications by human raters. Agreement with automated chest compression classification was also evaluated by computer analysis.This was an analysis of chest compression patterns from cardiac arrest patients enrolled in the ongoing Resuscitation Outcomes Consortium (ROC) Continuous Chest Compressions Trial. Thirty CPR process files from patients in the trial were selected. Using written guidelines, research coordinators from each of eight participating ROC sites classified each chest compression pattern as 30:2 chest compressions, continuous chest compressions (CCC), or indeterminate. A computer algorithm for automated chest compression classification was also developed for each case. Inter-rater agreement between manual classifications was tested using Fleiss's kappa. The criterion standard was defined as the classification assigned by the majority of manual raters. Agreement between the automated classification and the criterion standard manual classifications was also tested.The majority of the eight raters classified 12 chest compression patterns as 30:2, 12 as CCC, and six as indeterminate. Inter-rater agreement between manual classifications of chest compression patterns was ? = 0.62 (95% confidence interval [CI] = 0.49 to 0.74). The automated computer algorithm classified chest compression patterns as 30:2 (n = 15), CCC (n = 12), and indeterminate (n = 3). Agreement between automated and criterion standard manual classifications was ? = 0.84 (95% CI = 0.59 to 0.95).In this study, good inter-rater agreement in the manual classification of CPR chest compression patterns was observed. Automated classification showed strong agreement with human ratings. These observations support the consistency of manual CPR pattern classification as well as the use of automated approaches to chest compression pattern analysis.
Project description:Researchers classify critical neural events during sleep called spindles that are related to memory consolidation using the method of scalp electroencephalography (EEG). Manual classification is time consuming and is susceptible to low inter-rater agreement. This could be improved using an automated approach. This study presents an optimized filter based and thresholding (FBT) model to set up a baseline for comparison to evaluate machine learning models using naïve features, such as raw signals, peak frequency, and dominant power. The FBT model allows us to formally define sleep spindles using signal processing but may miss examples most human scorers would agree are spindles. Machine learning methods in theory should be able to approach performance of human raters but they require a large quantity of scored data, proper feature representation, intensive feature engineering, and model selection. We evaluate both the FBT model and machine learning models with naïve features. We show that the machine learning models derived from the FBT model improve classification performance. An automated approach designed for the current data was applied to the DREAMS dataset . With one of the expert's annotation as a gold standard, our pipeline yields an excellent sensitivity that is close to a second expert's scores and with the advantage that it can classify spindles based on multiple channels if more channels are available. More importantly, our pipeline could be modified as a guide to aid manual annotation of sleep spindles based on multiple channels quickly (6-10 s for processing a 40-min EEG recording), making spindle detection faster and more objective.
Project description:Manual identification of brain tumors is an error-prone and tedious process for radiologists; therefore, it is crucial to adopt an automated system. The binary classification process, such as malignant or benign is relatively trivial; whereas, the multimodal brain tumors classification (T1, T2, T1CE, and Flair) is a challenging task for radiologists. Here, we present an automated multimodal classification method using deep learning for brain tumor type classification. The proposed method consists of five core steps. In the first step, the linear contrast stretching is employed using edge-based histogram equalization and discrete cosine transform (DCT). In the second step, deep learning feature extraction is performed. By utilizing transfer learning, two pre-trained convolutional neural network (CNN) models, namely VGG16 and VGG19, were used for feature extraction. In the third step, a correntropy-based joint learning approach was implemented along with the extreme learning machine (ELM) for the selection of best features. In the fourth step, the partial least square (PLS)-based robust covariant features were fused in one matrix. The combined matrix was fed to ELM for final classification. The proposed method was validated on the BraTS datasets and an accuracy of 97.8%, 96.9%, 92.5% for BraTs2015, BraTs2017, and BraTs2018, respectively, was achieved.
Project description:Understanding the detailed dynamics of neuronal networks will require the simultaneous measurement of spike trains from hundreds of neurons (or more). Currently, approaches to extracting spike times and labels from raw data are time consuming, lack standardization, and involve manual intervention, making it difficult to maintain data provenance and assess the quality of scientific results. Here, we describe an automated clustering approach and associated software package that addresses these problems and provides novel cluster quality metrics. We show that our approach has accuracy comparable to or exceeding that achieved using manual or semi-manual techniques with desktop central processing unit (CPU) runtimes faster than acquisition time for up to hundreds of electrodes. Moreover, a single choice of parameters in the algorithm is effective for a variety of electrode geometries and across multiple brain regions. This algorithm has the potential to enable reproducible and automated spike sorting of larger scale recordings than is currently possible.
Project description:The acoustic startle reflex (ASR) is a rapid, involuntary movement to sound, found in many species. The ASR can be modulated by external stimuli and internal state, making it a useful tool in many disciplines. ASR data collection and interpretation varies greatly across laboratories making comparisons a challenge.Here we investigate the animal movement associated with a startle in mouse (CBA/CaJ). Movements were simultaneously captured with high-speed video and a piezoelectric startle plate. We also use simple mathematical extrapolations to convert startle data (force) into center of mass displacement ("height"), which incorporates the animal's mass.Startle plate force data revealed a stereotype waveform associated with a startle that contained three distinct peaks. This waveform allowed researchers to separate trials into 'startles' and 'no-startles' (termed 'manual classification). Fleiss' kappa and Krippendorff"s alpha (0.865 for both) indicate very good levels of agreement between researchers. Further work uses this waveform to develop an automated startle classifier. The automated classifier compares favorably with manual classification. A two-way ANOVA reveals no significant difference in the magnitude of the 3 peaks as classified by the manual and automated methods (P1: p=0.526, N1: p=0.488, P2: p=0.529).The ability of the automated classifier was compared with three other commonly used classification methods; the automated classifier far outperformed these methods.The improvements made allow researchers to automatically separate startle data from noise, and normalize for an individual animal's mass. These steps ease inter-animal and inter-laboratory comparisons of startle data.
Project description:<h4>Background</h4>The microarray data analysis realm is ever growing through the development of various tools, open source and commercial. However there is absence of predefined rational algorithmic analysis workflows or batch standardized processing to incorporate all steps, from raw data import up to the derivation of significantly differentially expressed gene lists. This absence obfuscates the analytical procedure and obstructs the massive comparative processing of genomic microarray datasets. Moreover, the solutions provided, heavily depend on the programming skills of the user, whereas in the case of GUI embedded solutions, they do not provide direct support of various raw image analysis formats or a versatile and simultaneously flexible combination of signal processing methods.<h4>Results</h4>We describe here Gene ARMADA (Automated Robust MicroArray Data Analysis), a MATLAB implemented platform with a Graphical User Interface. This suite integrates all steps of microarray data analysis including automated data import, noise correction and filtering, normalization, statistical selection of differentially expressed genes, clustering, classification and annotation. In its current version, Gene ARMADA fully supports 2 coloured cDNA and Affymetrix oligonucleotide arrays, plus custom arrays for which experimental details are given in tabular form (Excel spreadsheet, comma separated values, tab-delimited text formats). It also supports the analysis of already processed results through its versatile import editor. Besides being fully automated, Gene ARMADA incorporates numerous functionalities of the Statistics and Bioinformatics Toolboxes of MATLAB. In addition, it provides numerous visualization and exploration tools plus customizable export data formats for seamless integration by other analysis tools or MATLAB, for further processing. Gene ARMADA requires MATLAB 7.4 (R2007a) or higher and is also distributed as a stand-alone application with MATLAB Component Runtime.<h4>Conclusion</h4>Gene ARMADA provides a highly adaptable, integrative, yet flexible tool which can be used for automated quality control, analysis, annotation and visualization of microarray data, constituting a starting point for further data interpretation and integration with numerous other tools.
Project description:Cone penetration testing (CPT) is one of the most efficient and versatile methods currently available for geotechnical, lithostratigraphic and hydrogeological site characterization. Currently available methods for soil behaviour type classification (SBT) of CPT data however have severe limitations, often restricting their application to a local scale. For parameterization of regional groundwater flow or geotechnical models, and delineation of regional hydro- or lithostratigraphy, regional SBT classification would be very useful. This paper investigates the use of model-based clustering for SBT classification, and the influence of different clustering approaches on the properties and spatial distribution of the obtained soil classes. We additionally propose a methodology for automated lithostratigraphic mapping of regionally occurring sedimentary units using SBT classification. The methodology is applied to a large CPT dataset, covering a groundwater basin of ~60 km2 with predominantly unconsolidated sandy sediments in northern Belgium. Results show that the model-based approach is superior in detecting the true lithological classes when compared to more frequently applied unsupervised classification approaches or literature classification diagrams. We demonstrate that automated mapping of lithostratigraphic units using advanced SBT classification techniques can provide a large gain in efficiency, compared to more time-consuming manual approaches and yields at least equally accurate results.
Project description:<h4>Background</h4>Hospital-acquired pneumonia (HAP) is a common problem in intensive care medicine and the patient outcome depends on the fast beginning of adequate antibiotic therapy. Until today pathogen identification is performed using conventional microbiological methods with turnaround times of at least 24?h for the first results. It was the aim of this study to investigate the potential of headspace analyses detecting bacterial species-specific patterns of volatile organic compounds (VOCs) for the rapid differentiation of HAP-relevant bacteria.<h4>Methods</h4>Eleven HAP-relevant bacteria (Acinetobacter baumanii, Acinetobacter pittii, Citrobacter freundii, Enterobacter cloacae, Escherichia coli, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas aeruginosa, Proteus mirabilis, Staphylococcus aureus, Serratia marcescens) were each grown for 6 hours in Lysogeny Broth and the headspace over the grown cultures was investigated using multi-capillary column-ion mobility spectrometry (MCC-IMS) to detect differences in the VOC composition between the bacteria in the panel. Peak areas with changing signal intensities were statistically analysed, including significance testing using one-way ANOVA or Kruskal-Wallis test (p?<?0.05).<h4>Results</h4>30 VOC signals (23 in the positive ion mode and 7 in the negative ion mode of the MCC-IMS) showed statistically significant differences in at least one of the investigated bacteria. The VOC patterns of the bacteria within the HAP panel differed substantially and allowed species differentiation.<h4>Conclusions</h4>MCC-IMS headspace analyses allow differentiation of bacteria within HAP-relevant panel after 6 h of incubation in a complex fluid growth medium. The method has the potential to be developed towards a feasible point-of-care diagnostic tool for pathogen differentiation on HAP.