Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning.
ABSTRACT: High-throughput microscopy of many single cells generates high-dimensional data that are far from straightforward to analyze. One important problem is automatically detecting the cellular compartment where a fluorescently-tagged protein resides, a task relatively simple for an experienced human, but difficult to automate on a computer. Here, we train an 11-layer neural network on data from mapping thousands of yeast proteins, achieving per cell localization classification accuracy of 91%, and per protein accuracy of 99% on held-out images. We confirm that low-level network features correspond to basic image characteristics, while deeper layers separate localization classes. Using this network as a feature calculator, we train standard classifiers that assign proteins to previously unseen compartments after observing only a small number of training examples. Our results are the most accurate subcellular localization classifications to date, and demonstrate the usefulness of deep learning for high-throughput microscopy.
Project description:Live-cell microscopy is quickly becoming an indispensable technique for studying the dynamics of cellular processes. Maintaining the specimen in focus during image acquisition is crucial for high-throughput applications, especially for long experiments or when a large sample is being continuously scanned. Automated focus control methods are often expensive, imperfect, or ill-adapted to a specific application and are a bottleneck for widespread adoption of high-throughput, live-cell imaging. Here, we demonstrate a neural network approach for automatically maintaining focus during bright-field microscopy. Z-stacks of yeast cells growing in a microfluidic device were collected and used to train a convolutional neural network to classify images according to their z-position. We studied the effect on prediction accuracy of the various hyperparameters of the neural network, including downsampling, batch size, and z-bin resolution. The network was able to predict the z-position of an image with ±1??m accuracy, outperforming human annotators. Finally, we used our neural network to control microscope focus in real-time during a 24?hour growth experiment. The method robustly maintained the correct focal position compensating for 40??m of focal drift and was insensitive to changes in the field of view. About ~100 annotated z-stacks were required to train the network making our method quite practical for custom autofocus applications.
Project description:BACKGROUND:Cell counting from cell cultures is required in multiple biological and biomedical research applications. Especially, accurate brightfield-based cell counting methods are needed for cell growth analysis. With deep learning, cells can be detected with high accuracy, but manually annotated training data is required. We propose a method for cell detection that requires annotated training data for one cell line only, and generalizes to other, unseen cell lines. RESULTS:Training a deep learning model with one cell line only can provide accurate detections for similar unseen cell lines (domains). However, if the new domain is very dissimilar from training domain, high precision but lower recall is achieved. Generalization capabilities of the model can be improved with training data transformations, but only to a certain degree. To further improve the detection accuracy of unseen domains, we propose iterative unsupervised domain adaptation method. Predictions of unseen cell lines with high precision enable automatic generation of training data, which is used to train the model together with parts of the previously used annotated training data. We used U-Net-based model, and three consecutive focal planes from brightfield image z-stacks. We trained the model initially with PC-3 cell line, and used LNCaP, BT-474 and 22Rv1 cell lines as target domains for domain adaptation. Highest improvement in accuracy was achieved for 22Rv1 cells. F1-score after supervised training was only 0.65, but after unsupervised domain adaptation we achieved a score of 0.84. Mean accuracy for target domains was 0.87, with mean improvement of 16 percent. CONCLUSIONS:With our method for generalized cell detection, we can train a model that accurately detects different cell lines from brightfield images. A new cell line can be introduced to the model without a single manual annotation, and after iterative domain adaptation the model is ready to detect these cells with high accuracy.
Project description:Lateralization of temporal lobe epilepsy (TLE) is critical for successful outcome of surgery to relieve seizures. TLE affects brain regions beyond the temporal lobes and has been associated with aberrant brain networks, based on evidence from functional magnetic resonance imaging. We present here a machine learning-based method for determining the laterality of TLE, using features extracted from resting-state functional connectivity of the brain. A comprehensive feature space was constructed to include network properties within local brain regions, between brain regions, and across the whole network. Feature selection was performed based on random forest and a support vector machine was employed to train a linear model to predict the laterality of TLE on unseen patients. A leave-one-patient-out cross validation was carried out on 12 patients and a prediction accuracy of 83% was achieved. The importance of selected features was analyzed to demonstrate the contribution of resting-state connectivity attributes at voxel, region, and network levels to TLE lateralization.
Project description:Purpose: To develop a machine learning-based calculator to improve the accuracy of IOL power predictions for highly myopic eyes. Methods: Data of 1,450 highly myopic eyes from 1,450 patients who had cataract surgeries at our hospital were used as internal dataset (train and validate). Another 114 highly myopic eyes from other hospitals were used as external test dataset. A new calculator was developed using XGBoost regression model based on features including demographics, biometrics, IOL powers, A constants, and the predicted refractions by Barrett Universal II (BUII) formula. The accuracies were compared between our calculator and BUII formula, and axial length (AL) subgroup analysis (26.0–28.0, 28.0–30.0, or ?30.0 mm) was further conducted. Results: The median absolute errors (MedAEs) and median squared errors (MedSEs) were lower with the XGBoost calculator (internal: 0.25 D and 0.06 D2; external: 0.29 D and 0.09 D2) vs. the BUII formula (all P ? 0.001). The mean absolute errors and were 0.33 ± 0.28 vs. 0.45 ± 0.31 (internal), and 0.35 ± 0.24 vs. 0.43 ± 0.29 D (external). The mean squared errors were 0.19 ± 0.32 vs. 0.30 ± 0.36 (internal), and 0.18 ± 0.21 vs. 0.27 ± 0.29 D2 (external). The percentages of eyes within ±0.25 D of the prediction errors were significantly greater with the XGBoost calculator (internal: 49.66 vs. 29.66%; external: 78.28 vs. 60.34%; both P < 0.05). The same trend was in MedAEs and MedSEs in all subgroups (internal) and in AL ?30.0 mm subgroup (external) (all P < 0.001). Conclusions: The new XGBoost calculator showed promising accuracy for highly or extremely myopic eyes.
Project description:In tomographic reconstruction, the image quality of the reconstructed images can be significantly degraded by defects in the measured two-dimensional (2D) raw image data. Despite the importance of screening defective 2D images for robust tomographic reconstruction, manual inspection and rule-based automation suffer from low-throughput and insufficient accuracy, respectively. Here, we present deep learning-enabled quality control for holographic data to produce robust and high-throughput optical diffraction tomography (ODT). The key idea is to distil the knowledge of an expert into a deep convolutional neural network. We built an extensive database of optical field images with clean/noisy annotations, and then trained a binary-classification network based upon the data. The trained network outperformed visual inspection by non-expert users and a widely used rule-based algorithm, with >90% test accuracy. Subsequently, we confirmed that the superior screening performance significantly improved the tomogram quality. To further confirm the trained model's performance and generalisability, we evaluated it on unseen biological cell data obtained with a setup that was not used to generate the training dataset. Lastly, we interpreted the trained model using various visualisation techniques that provided the saliency map underlying each model inference. We envision the proposed network would a powerful lightweight module in the tomographic reconstruction pipeline.
Project description:Capillary electrophoresis (CE) is a powerful approach for structural analysis of nucleic acids, with recent high-throughput variants enabling three-dimensional RNA modeling and the discovery of new rules for RNA structure design. Among the steps composing CE analysis, the process of finding each band in an electrophoretic trace and mapping it to a position in the nucleic acid sequence has required significant manual inspection and remains the most time-consuming and error-prone step. The few available tools seeking to automate this band annotation have achieved limited accuracy and have not taken advantage of information across dozens of profiles routinely acquired in high-throughput measurements.We present a dynamic-programming-based approach to automate band annotation for high-throughput capillary electrophoresis. The approach is uniquely able to define and optimize a robust target function that takes into account multiple CE profiles (sequencing ladders, different chemical probes, different mutants) collected for the RNA. Over a large benchmark of multi-profile datasets for biological RNAs and designed RNAs from the EteRNA project, the method outperforms prior tools (QuSHAPE and FAST) significantly in terms of accuracy compared with gold-standard manual annotations. The amount of computation required is reasonable at a few seconds per dataset. We also introduce an 'E-score' metric to automatically assess the reliability of the band annotation and show it to be practically useful in flagging uncertainties in band annotation for further inspection.The implementation of the proposed algorithm is included in the HiTRACE software, freely available as an online server and for download at http://email@example.com or firstname.lastname@example.orgSupplementary data are available at Bioinformatics online.
Project description:Identifying localization of proteins and their specific subpopulations associated with certain cellular compartments is crucial for understanding protein function and interactions with other macromolecules. Fluorescence microscopy is a powerful method to assess protein localizations, with increasing demand of automated high throughput analysis methods to supplement the technical advancements in high throughput imaging. Here, we study the applicability of deep neural network-based artificial intelligence in classification of protein localization in 13 cellular subcompartments. We use deep learning-based on convolutional neural network and fully convolutional network with similar architectures for the classification task, aiming at achieving accurate classification, but importantly, also comparison of the networks. Our results show that both types of convolutional neural networks perform well in protein localization classification tasks for major cellular organelles. Yet, in this study, the fully convolutional network outperforms the convolutional neural network in classification of images with multiple simultaneous protein localizations. We find that the fully convolutional network, using output visualizing the identified localizations, is a very useful tool for systematic protein localization assessment.
Project description:While the analysis of mitochondrial morphology has emerged as a key tool in the study of mitochondrial function, efficient quantification of mitochondrial microscopy images presents a challenging task and bottleneck for statistically robust conclusions. Here, we present Mitochondrial Segmentation Network (MitoSegNet), a pretrained deep learning segmentation model that enables researchers to easily exploit the power of deep learning for the quantification of mitochondrial morphology. We tested the performance of MitoSegNet against three feature-based segmentation algorithms and the machine-learning segmentation tool Ilastik. MitoSegNet outperformed all other methods in both pixelwise and morphological segmentation accuracy. We successfully applied MitoSegNet to unseen fluorescence microscopy images of mitoGFP expressing mitochondria in wild-type and catp-6 ATP13A2 mutant C. elegans adults. Additionally, MitoSegNet was capable of accurately segmenting mitochondria in HeLa cells treated with fragmentation inducing reagents. We provide MitoSegNet in a toolbox for Windows and Linux operating systems that combines segmentation with morphological analysis.
Project description:Pinpointing subcellular protein localizations from microscopy images is easy to the trained eye, but challenging to automate. Based on the Human Protein Atlas image collection, we held a competition to identify deep learning solutions to solve this task. Challenges included training on highly imbalanced classes and predicting multiple labels per image. Over 3?months, 2,172 teams participated. Despite convergence on popular networks and training techniques, there was considerable variety among the solutions. Participants applied strategies for modifying neural networks and loss functions, augmenting data and using pretrained networks. The winning models far outperformed our previous effort at multi-label classification of protein localization patterns by ~20%. These models can be used as classifiers to annotate new images, feature extractors to measure pattern similarity or pretrained networks for a wide range of biological applications.
Project description:Gene regulatory networks give important insights into the mechanisms underlying physiology and pathophysiology. The derivation of gene regulatory networks from high-throughput expression data via machine learning strategies is problematic as the reliability of these models is often compromised by limited and highly variable samples, heterogeneity in transcript isoforms, noise, and other artifacts. Here, we develop a novel algorithm, dubbed Dandelion, in which we construct and train intraspecies Bayesian networks that are translated and assessed on independent test sets from other species in a reiterative procedure. The interspecies disease networks are subjected to multi-layers of analysis and evaluation, leading to the identification of the most consistent relationships within the network structure. In this study, we demonstrate the performance of our algorithms on datasets from animal models of oculopharyngeal muscular dystrophy (OPMD) and patient materials. We show that the interspecies network of genes coding for the proteasome provide highly accurate predictions on gene expression levels and disease phenotype. Moreover, the cross-species translation increases the stability and robustness of these networks. Unlike existing modeling approaches, our algorithms do not require assumptions on notoriously difficult one-to-one mapping of protein orthologues or alternative transcripts and can deal with missing data. We show that the identified key components of the OPMD disease network can be confirmed in an unseen and independent disease model. This study presents a state-of-the-art strategy in constructing interspecies disease networks that provide crucial information on regulatory relationships among genes, leading to better understanding of the disease molecular mechanisms.