Project description:Cryo-soft X-ray tomography (cryo-SXT) is a powerful method to investigate the ultrastructure of cells, offering resolution in the tens of nanometer range and strong contrast for membranous structures without requiring labeling or chemical fixation. The short acquisition time and the relatively large field of view leads to fast acquisition of large amounts of tomographic image data. Segmentation of these data into accessible features is a necessary step in gaining biologically relevant information from cryo-soft X-ray tomograms. However, manual image segmentation still requires several orders of magnitude more time than data acquisition. To address this challenge, we have here developed an end-to-end automated 3D segmentation pipeline based on semisupervised deep learning. Our approach is suitable for high-throughput analysis of large amounts of tomographic data, while being robust when faced with limited manual annotations and variations in the tomographic conditions. We validate our approach by extracting three-dimensional information on cellular ultrastructure and by quantifying nanoscopic morphological parameters of filopodia in mammalian cells.
Project description:We developed a semi-supervised deep learning framework for the identification of doublets in scRNA-seq analysis called Solo. To validate our method, we used MULTI-seq, cholesterol modified oligos (CMOs), to experimentally identify doublets in a solid tissue with diverse cell types, mouse kidney, and showed Solo recapitulated experimentally identified doublets.
Project description:The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827.
Project description:Deep learning and computer vision algorithms can deliver highly accurate and automated interpretation of medical imaging to augment and assist clinicians. However, medical imaging presents uniquely pertinent obstacles such as a lack of accessible data or a high-cost of annotation. To address this, we developed data-efficient deep learning classifiers for prediction tasks in cardiology. Using pipeline supervised models to focus relevant structures, we achieve an accuracy of 94.4% for 15-view still-image echocardiographic view classification and 91.2% accuracy for binary left ventricular hypertrophy classification. We then develop semi-supervised generative adversarial network models that can learn from both labeled and unlabeled data in a generalizable fashion. We achieve greater than 80% accuracy in view classification with only 4% of labeled data used in solely supervised techniques and achieve 92.3% accuracy for left ventricular hypertrophy classification. In exploring trade-offs between model type, resolution, data resources, and performance, we present a comprehensive analysis and improvements of efficient deep learning solutions for medical imaging assessment especially in cardiology.
Project description:Disease classification based on machine learning has become a crucial research topic in the fields of genetics and molecular biology. Generally, disease classification involves a supervised learning style; i.e., it requires a large number of labelled samples to achieve good classification performance. However, in the majority of the cases, labelled samples are hard to obtain, so the amount of training data are limited. However, many unclassified (unlabelled) sequences have been deposited in public databases, which may help the training procedure. This method is called semi-supervised learning and is very useful in many applications. Self-training can be implemented using high- to low-confidence samples to prevent noisy samples from affecting the robustness of semi-supervised learning in the training process. The deep forest method with the hyperparameter settings used in this paper can achieve excellent performance. Therefore, in this work, we propose a novel combined deep learning model and semi-supervised learning with self-training approach to improve the performance in disease classification, which utilizes unlabelled samples to update a mechanism designed to increase the number of high-confidence pseudo-labelled samples. The experimental results show that our proposed model can achieve good performance in disease classification and disease-causing gene identification.
Project description:Investigating the 3D structures and rearrangements of organelles within a single cell is critical for better characterizing cellular function. Imaging approaches such as soft X-ray tomography have been widely applied to reveal a complex subcellular organization involving multiple inter-organelle interactions. However, 3D segmentation of organelle instances has been challenging despite its importance in organelle characterization. Here we propose an intensity-based post-processing tool to identify and separate organelle instances. Our tool separates sphere-like (insulin vesicle) and columnar-shaped organelle instances (mitochondrion) based on the intensity of raw tomograms, semantic segmentation masks, and organelle morphology. We validate our tool using synthetic tomograms of organelles and experimental tomograms of pancreatic β-cells to separate insulin vesicle and mitochondria instances. As compared to the commonly used connected regions labeling, watershed, and watershed + Gaussian filter methods, our tool results in improved accuracy in identifying organelles in the synthetic tomograms and an improved description of organelle structures in β-cell tomograms. In addition, under different experimental treatment conditions, significant changes in volumes and intensities of both insulin vesicle and mitochondrion are observed in our instance results, revealing their potential roles in maintaining normal β-cell function. Our tool is expected to be applicable for improving the instance segmentation of other images obtained from different cell types using multiple imaging modalities.
Project description:Despite its remarkable potential for transforming low-resolution images, deep learning faces significant challenges in achieving high-quality superresolution microscopy imaging from wide-field (conventional) microscopy. Here, we present X-Microscopy, a computational tool comprising two deep learning subnets, UR-Net-8 and X-Net, which enables STORM-like superresolution microscopy image reconstruction from wide-field images with input-size flexibility. X-Microscopy was trained using samples of various subcellular structures, including cytoskeletal filaments, dot-like, beehive-like, and nanocluster-like structures, to generate prediction models capable of producing images of comparable quality to STORM-like images. In addition to enabling multicolour superresolution image reconstructions, X-Microscopy also facilitates superresolution image reconstruction from different conventional microscopic systems. The capabilities of X-Microscopy offer promising prospects for making superresolution microscopy accessible to a broader range of users, going beyond the confines of well-equipped laboratories.
Project description:New developments at synchrotron beamlines and the ongoing upgrades of synchrotron facilities allow the possibility to study complex structures with a much better spatial and temporal resolution than ever before. However, the downside is that the data collected are also significantly larger (more than several terabytes) than ever before, and post-processing and analyzing these data is very challenging to perform manually. This issue can be solved by employing automated methods such as machine learning, which show significantly improved performance in data processing and image segmentation than manual methods. In this work, a 3D U-net deep convolutional neural network (DCNN) model with four layers and base-8 characteristic features has been developed to segment precipitates and porosities in synchrotron transmission X-ray micrograms. Transmission X-ray microscopy experiments were conducted on micropillars prepared from additively manufactured 316L steel to evaluate precipitate information. After training the 3D U-net DCNN model, it was used on unseen data and the prediction was compared with manual segmentation. A good agreement was found between both segmentations. An ablation study was performed and revealed that the proposed model showed better statistics than other models with lower numbers of layers and/or characteristic features. The proposed model is able to segment several hundreds of gigabytes of data in a few minutes and could be applied to other materials and tomography techniques. The code and the fitted weights are made available with this paper for any interested researcher to use for their needs (https://github.com/manasvupadhyay/erc-gamma-3D-DCNN).
Project description:BackgroundWith the development of modern sequencing technology, hundreds of thousands of single-cell RNA-sequencing (scRNA-seq) profiles allow to explore the heterogeneity in the cell level, but it faces the challenges of high dimensions and high sparsity. Dimensionality reduction is essential for downstream analysis, such as clustering to identify cell subpopulations. Usually, dimensionality reduction follows unsupervised approach.ResultsIn this paper, we introduce a semi-supervised dimensionality reduction method named scSemiAE, which is based on an autoencoder model. It transfers the information contained in available datasets with cell subpopulation labels to guide the search of better low-dimensional representations, which can ease further analysis.ConclusionsExperiments on five public datasets show that, scSemiAE outperforms both unsupervised and semi-supervised baselines whether the transferred information embodied in the number of labeled cells and labeled cell subpopulations is much or less.