ABSTRACT: Deep profiling the phenotypic landscape of tissues using high-throughput flow cytometry (FCM) can provide important new insights into the interplay of cells in both healthy and diseased tissue. But often, especially in clinical settings, the cytometer cannot measure all the desired markers in a single aliquot. In these cases, tissue is separated into independently analysed samples, leaving a need to electronically recombine these to increase dimensionality. Nearest-neighbour (NN) based imputation fulfils this need but can produce artificial subpopulations. Clustering-based NNs can reduce these, but requires prior domain knowledge to be able to parameterize the clustering, so is unsuited to discovery settings.We present flowBin, a parameterization-free method for combining multitube FCM data into a higher-dimensional form suitable for deep profiling and discovery. FlowBin allocates cells to bins defined by the common markers across tubes in a multitube experiment, then computes aggregate expression for each bin within each tube, to create a matrix of expression of all markers assayed in each tube. We show, using simulated multitube data, that flowType analysis of flowBin output reproduces the results of that same analysis on the original data for cell types of >10% abundance. We used flowBin in conjunction with classifiers to distinguish normal from cancerous cells. We used flowBin together with flowType and RchyOptimyx to profile the immunophenotypic landscape of NPM1-mutated acute myeloid leukemia, and present a series of novel cell types associated with that mutation.
Project description:BACKGROUND:Flow cytometry (FCM) is a powerful single-cell based measurement method to ascertain multidimensional optical properties of millions of cells. FCM is widely used in medical diagnostics and health research. There is also a broad range of applications in the analysis of complex microbial communities. The main concern in microbial community analyses is to track the dynamics of microbial subcommunities. So far, this can be achieved with the help of time-consuming manual clustering procedures that require extensive user-dependent input. In addition, several tools have recently been developed by using different approaches which, however, focus mainly on the clustering of medical FCM data or of microbial samples with a well-known background, while much less work has been done on high-throughput, online algorithms for two-channel FCM. RESULTS:We bridge this gap with flowEMMi, a model-based clustering tool based on multivariate Gaussian mixture models with subsampling and foreground/background separation. These extensions provide a fast and accurate identification of cell clusters in FCM data, in particular for microbial community FCM data that are often affected by irrelevant information like technical noise, beads or cell debris. flowEMMi outperforms other available tools with regard to running time and information content of the clustering results and provides near-online results and optional heuristics to reduce the running-time further. CONCLUSIONS:flowEMMi is a useful tool for the automated cluster analysis of microbial FCM data. It overcomes the user-dependent and time-consuming manual clustering procedure and provides consistent results with ancillary information and statistical proof.
Project description:As a high-throughput technology that offers rapid quantification of multidimensional characteristics for millions of cells, flow cytometry (FCM) is widely used in health research, medical diagnosis and treatment, and vaccine development. Nevertheless, there is an increasing concern about the lack of appropriate software tools to provide an automated analysis platform to parallelize the high-throughput data-generation platform. Currently, to a large extent, FCM data analysis relies on the manual selection of sequential regions in 2-D graphical projections to extract the cell populations of interest. This is a time-consuming task that ignores the high-dimensionality of FCM data.In view of the aforementioned issues, we have developed an R package called flowClust to automate FCM analysis. flowClust implements a robust model-based clustering approach based on multivariate t mixture models with the Box-Cox transformation. The package provides the functionality to identify cell populations whilst simultaneously handling the commonly encountered issues of outlier identification and data transformation. It offers various tools to summarize and visualize a wealth of features of the clustering results. In addition, to ensure its convenience of use, flowClust has been adapted for the current FCM data format, and integrated with existing Bioconductor packages dedicated to FCM analysis.flowClust addresses the issue of a dearth of software that helps automate FCM analysis with a sound theoretical foundation. It tends to give reproducible results, and helps reduce the significant subjectivity and human time cost encountered in FCM analysis. The package contributes to the cytometry community by offering an efficient, automated analysis platform which facilitates the active, ongoing technological advancement.
Project description:BACKGROUND:Preanalytic factors such as time and temperature can have significant effects on laboratory test results. For example, ammonium concentration will increase 31% in blood samples stored at room temperature for 30 min before centrifugation. To reduce preanalytic error, blood samples may be placed in precooled tubes and chilled on ice or in ice water baths; however, the effectiveness of these modalities in cooling blood samples has not been formally evaluated. The purpose of this study was to evaluate the effectiveness of various cooling modalities on reducing temperature of EDTA whole blood samples. METHODS:Pooled samples of canine EDTA whole blood were divided into two aliquots. Saline was added to one aliquot to produce a packed cell volume (PCV) of 40% and to the second aliquot to produce a PCV of 20% (simulated anemia). Thirty samples from each aliquot were warmed to 37.7 °C and cooled in 2 ml allotments under one of three conditions: in ice, in ice after transfer to a precooled tube, or in an ice water bath. Temperature of each sample was recorded at one minute intervals for 15 min. RESULTS:Within treatment conditions, sample PCV had no significant effect on cooling. Cooling in ice water was significantly faster than cooling in ice only or transferring the sample to a precooled tube and cooling it on ice. Mean temperature of samples cooled in ice water was significantly lower at 15 min than mean temperatures of those cooled in ice, whether or not the tube was precooled. By 4 min, samples cooled in an ice water bath had reached mean temperatures less than 4 °C (refrigeration temperature), while samples cooled in other conditions remained above 4.0 °C for at least 11 min. For samples with a PCV of 40%, precooling the tube had no significant effect on rate of cooling on ice. For samples with a PCV of 20%, transfer to a precooled tube resulted in a significantly faster rate of cooling than direct placement of the warmed tube onto ice. DISCUSSION:Canine EDTA whole blood samples cool most rapidly and to a greater degree when placed in an ice-water bath rather than in ice. Samples stored on ice water can rapidly drop below normal refrigeration temperatures; this should be taken into consideration when using this cooling modality.
Project description:BACKGROUND:Advances in multiparameter flow cytometry (FCM) now allow for the independent detection of larger numbers of fluorochromes on individual cells, generating data with increasingly higher dimensionality. The increased complexity of these data has made it difficult to identify cell populations from high-dimensional FCM data using traditional manual gating strategies based on single-color or two-color displays. METHODS:To address this challenge, we developed a novel program, FLOCK (FLOw Clustering without K), that uses a density-based clustering approach to algorithmically identify biologically relevant cell populations from multiple samples in an unbiased fashion, thereby eliminating operator-dependent variability. RESULTS:FLOCK was used to objectively identify seventeen distinct B-cell subsets in a human peripheral blood sample and to identify and quantify novel plasmablast subsets responding transiently to tetanus and other vaccinations in peripheral blood. FLOCK has been implemented in the publically available Immunology Database and Analysis Portal-ImmPort (http://www.immport.org)-for open use by the immunology research community. CONCLUSIONS:FLOCK is able to identify cell subsets in experiments that use multiparameter FCM through an objective, automated computational approach. The use of algorithms like FLOCK for FCM data analysis obviates the need for subjective and labor-intensive manual gating to identify and quantify cell subsets. Novel populations identified by these computational approaches can serve as hypotheses for further experimental study.
Project description:Predicting crash injury severity is a crucial constituent of reducing the consequences of traffic crashes. This study developed machine learning (ML) models to predict crash injury severity using 15 crash-related parameters. Separate ML models for each cluster were obtained using fuzzy c-means, which enhanced the predicting capability. Finally, four ML models were developed: feed-forward neural networks (FNN), support vector machine (SVM), fuzzy C-means clustering based feed-forward neural network (FNN-FCM), and fuzzy c-means based support vector machine (SVM-FCM). Features that were easily identified with little investigation on crash sites were used as an input so that the trauma center can predict the crash severity level based on the initial information provided from the crash site and prepare accordingly for the treatment of the victims. The input parameters mainly include vehicle attributes and road condition attributes. This study used the crash database of Great Britain for the years 2011-2016. A random sample of crashes representing each year was used considering the same share of severe and non-severe crashes. The models were compared based on injury severity prediction accuracy, sensitivity, precision, and harmonic mean of sensitivity and precision (i.e., F1 score). The SVM-FCM model outperformed the other developed models in terms of accuracy and F1 score in predicting the injury severity level of severe and non-severe crashes. This study concluded that the FCM clustering algorithm enhanced the prediction power of FNN and SVM models.
Project description:BACKGROUND:Cancer typically exhibits genotypic and phenotypic heterogeneity, which can have prognostic significance and influence therapy response. Computed Tomography (CT)-based radiomic approaches calculate quantitative features of tumour heterogeneity at a mesoscopic level, regardless of macroscopic areas of hypo-dense (i.e., cystic/necrotic), hyper-dense (i.e., calcified), or intermediately dense (i.e., soft tissue) portions. METHOD:With the goal of achieving the automated sub-segmentation of these three tissue types, we present here a two-stage computational framework based on unsupervised Fuzzy C-Means Clustering (FCM) techniques. No existing approach has specifically addressed this task so far. Our tissue-specific image sub-segmentation was tested on ovarian cancer (pelvic/ovarian and omental disease) and renal cell carcinoma CT datasets using both overlap-based and distance-based metrics for evaluation. RESULTS:On all tested sub-segmentation tasks, our two-stage segmentation approach outperformed conventional segmentation techniques: fixed multi-thresholding, the Otsu method, and automatic cluster number selection heuristics for the K-means clustering algorithm. In addition, experiments showed that the integration of the spatial information into the FCM algorithm generally achieves more accurate segmentation results, whilst the kernelised FCM versions are not beneficial. The best spatial FCM configuration achieved average Dice similarity coefficient values starting from 81.94±4.76 and 83.43±3.81 for hyper-dense and hypo-dense components, respectively, for the investigated sub-segmentation tasks. CONCLUSIONS:The proposed intelligent framework could be readily integrated into clinical research environments and provides robust tools for future radiomic biomarker validation.
Project description:Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis-intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps.Here we present HPC-CLUST, a highly optimized software pipeline that can cluster large numbers of pre-aligned DNA sequences by running on distributed computing hardware. It allocates both memory and computing resources efficiently, and can process more than a million sequences in a few hours on a small cluster.Source code and binaries are freely available at http://meringlab.org/software/hpc-clust/; the pipeline is implemented in Cþþ and uses the Message Passing Interface (MPI) standard for distributed computing.
Project description:Suspicious lesion or organ segmentation is a challenging task to be solved in most of the medical image analyses, medical diagnoses and computer diagnosis systems. Nevertheless, various image segmentation methods were proposed in the previous studies with varying success levels. But, the image segmentation problems such as lack of versatility, low robustness, high complexity and low accuracy in up-to-date image segmentation practices still remain unsolved. Fuzzy c-means clustering (FCM) methods are very well suited for segmenting the regions. The noise-free images are effectively segmented using the traditional FCM method. However, the segmentation result generated is highly sensitive to noise due to the negligence of spatial information. To solve this issue, super-pixel-based FCM (SPOFCM) is implemented in this paper, in which the influence of spatially neighbouring and similar super-pixels is incorporated. Also, a crow search algorithm is adopted for optimizing the influential degree; thereby, the segmentation performance is improved. In clinical applications, the SPOFCM feasibility is verified using the multi-spectral MRIs, mammograms and actual single spectrum on performing tumour segmentation tests for SPOFCM. Ultimately, the competitive, renowned segmentation techniques such as k-means, entropy thresholding (ET), FCM, FCM with spatial constraints (FCM_S) and kernel FCM (KFCM) are used to compare the results of proposed SPOFCM. Experimental results on multi-spectral MRIs and actual single-spectrum mammograms indicate that the proposed algorithm can provide a better performance for suspicious lesion or organ segmentation in computer-assisted clinical applications.
Project description:We have developed flowMeans, a time-efficient and accurate method for automated identification of cell populations in flow cytometry (FCM) data based on K-means clustering. Unlike traditional K-means, flowMeans can identify concave cell populations by modelling a single population with multiple clusters. flowMeans uses a change point detection algorithm to determine the number of sub-populations, enabling the method to be used in high throughput FCM data analysis pipelines. Our approach compares favorably to manual analysis by human experts and current state-of-the-art automated gating algorithms. flowMeans is freely available as an open source R package through Bioconductor.
Project description:Large (> 1 ?m) tumor-derived extracellular vesicles (tdEVs) enriched from the cell fraction of centrifuged whole blood are prognostic in metastatic castration-resistant prostate cancer (mCRPC) patients. However, the highest concentration of tdEVs is expected in the cell-free plasma fraction. In this pilot study, we determine whether mCRPC patients can be discriminated from healthy controls based on detection of tdEVs (< 1?m, EpCAM+) and/or other EVs, in cell-free plasma and/or urine. The presence of marker+ EVs in plasma and urine samples from mCRPC patients (n = 5) and healthy controls (n = 5) was determined by flow cytometry (FCM) and surface plasmon resonance imaging (SPRi) using an antibody panel and lactadherin. For FCM, the concentrations of marker positive (+) particles and EVs (refractive index <1.42) were determined. Only the lactadherin+ particle and EV concentration in plasma measured by FCM differed significantly between patients and controls (p = 0.017). All other markers did not result in signals exceeding the background on both FCM and SPRi, or did not differ significantly between patients and controls. In conclusion, no difference was found between patients and controls based on the detection of tdEVs. For FCM, the measured sample volumes are too small to detect tdEVs. For SPRi, the concentration of tdEVs is probably too low to be detected. Thus, to detect tdEVs in cell-free plasma and/or urine, EV enrichment and/or concentration is required. Furthermore, we recommend testing other markers and/or a combination of markers to discriminate mCRPC patients from healthy controls.