Project description:Metagenomic analysis has been explored for disease diagnosis and biomarker discovery. Low sample sizes, high dimensionality, and sparsity of metagenomic data challenge metagenomic investigations. Here, an unsupervised microbial embedding, grouping, and mapping algorithm (MEGMA) was developed to transform metagenomic data into individualized multichannel microbiome 2D representation by manifold learning and clustering of microbial profiles (e.g., composition, abundance, hierarchy, and taxonomy). These 2D representations enable enhanced disease prediction by established ConvNet-based AggMapNet models, outperforming the commonly used machine learning and deep learning models in metagenomic benchmark datasets. These 2D representations combined with AggMapNet explainable module robustly identified more reliable and replicable disease-prediction microbes (biomarkers). Employing the MEGMA-AggMapNet pipeline for biomarker identification from 5 disease datasets, 84% of the identified biomarkers have been described in over 74 distinct works as important for these diseases. Moreover, the method also discovered highly consistent sets of biomarkers in cross-cohort colorectal cancer (CRC) patients and microbial shifts in different CRC stages.
Project description:Omics-based biomedical learning frequently relies on data of high-dimensions (up to thousands) and low-sample sizes (dozens to hundreds), which challenges efficient deep learning (DL) algorithms, particularly for low-sample omics investigations. Here, an unsupervised novel feature aggregation tool AggMap was developed to Aggregate and Map omics features into multi-channel 2D spatial-correlated image-like feature maps (Fmaps) based on their intrinsic correlations. AggMap exhibits strong feature reconstruction capabilities on a randomized benchmark dataset, outperforming existing methods. With AggMap multi-channel Fmaps as inputs, newly-developed multi-channel DL AggMapNet models outperformed the state-of-the-art machine learning models on 18 low-sample omics benchmark tasks. AggMapNet exhibited better robustness in learning noisy data and disease classification. The AggMapNet explainable module Simply-explainer identified key metabolites and proteins for COVID-19 detections and severity predictions. The unsupervised AggMap algorithm of good feature restructuring abilities combined with supervised explainable AggMapNet architecture establish a pipeline for enhanced learning and interpretability of low-sample omics data.
Project description:Molecular vibrational spectroscopy is widely used in various sensing and imaging applications, providing intrinsic information at the molecular level. Nonlinear optical interactions using ultrashort laser pulses facilitate the selective coherent excitation of molecular vibrational modes by focusing energy into specific molecular bonds, boosting the signal level for multiple orders of magnitude. The dephasing of such coherence, which is susceptible to the local molecular environment, however, is often neglected. The unique capability of vibrational dephasing dynamics to serve as a unique probe for complex molecular interactions and the effect of local nano- and microenvironments are beyond the reach of conventional, intensity-based spectroscopy. Here, we developed a novel multiorder coherent Raman spectroscopy platform with a special focus on the temporal evolution of molecular vibrational dephasing, termed as time-resolved coherent Raman scattering (T-CRS) spectroscopy. By utilizing a high dynamic range detection, molecular vibrational dynamics and the environmental effects are demonstrated with multidimensional spectroscopic sensing, which promises a new range of applications in biology, materials, and chemical sciences.
Project description:The fingerprinting method is generally performed to determine specific molecules or the behavior of specific molecular bonds in the desired sample content. A novel, robust and simple method based on surface enhanced Raman spectroscopy (SERS) was developed to obtain the full spectrum of tea varieties for detection of the purity of the samples based on the type of processing and cultivation. For this purpose, the fingerprint of seven different varieties of tea samples (herbal tea (rose hip, chamomile, linden, green and sage tea), black tea and earl grey tea) combined with silver colloids was obtained by SERS in the range of 200-2000 cm(-1) with an analysis time of 20 s. Each of the thirty-nine tea samples tested showed its own specific SERS spectra. Principal Component Analysis (PCA) was also applied to separate of each tea variety and different models developed for tea samples including three different models for the herbal teas and two different models for black and earl grey tea samples. Herbal tea samples were separated using mean centering, smoothing and median centering pre-processing steps while baselining and derivatisation pre-processing steps were applied to SERS data of black and earl grey tea. The novel spectroscopic fingerprinting technique combined with PCA is an accurate, rapid and simple methodology for the assessment of tea types based on the type of processing and cultivation differences. This method is proposed as an alternative tool in order to determine the characteristics of tea varieties.
Project description:Archetypes represent extreme manifestations of a population with respect to specific characteristic traits or features. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. As mixing of archetypes is performed directly on the input data, linear Archetypal Analysis requires additivity of the input, which is a strong assumption unlikely to hold e.g. in case of image data. To address this problem, we propose learning an appropriate latent feature space while simultaneously identifying suitable archetypes. We thus introduce a generative formulation of the linear archetype model, parameterized by neural networks. By introducing the distance-dependent archetype loss, the linear archetype model can be integrated into the latent space of a deep variational information bottleneck and an optimal representation, together with the archetypes, can be learned end-to-end. Moreover, the information bottleneck framework allows for a natural incorporation of arbitrarily complex side information during training. As a consequence, learned archetypes become easily interpretable as they derive their meaning directly from the included side information. Applicability of the proposed method is demonstrated by exploring archetypes of female facial expressions while using multi-rater based emotion scores of these expressions as side information. A second application illustrates the exploration of the chemical space of small organic molecules. By using different kinds of side information we demonstrate how identified archetypes, along with their interpretation, largely depend on the side information provided.Supplementary informationThe online version contains supplementary material available at 10.1007/s11263-020-01390-3.
Project description:Temperature is a fundamental environmental factor that shapes the evolution of organisms. Learning thermal determinants of protein sequences in evolution thus has profound significance for basic biology, drug discovery, and protein engineering. Here, we use a data set of over 3 million BRENDA enzymes labeled with optimal growth temperatures (OGTs) of their source organisms to train a deep neural network model (DeepET). The protein-temperature representations learned by DeepET provide a temperature-related statistical summary of protein sequences and capture structural properties that affect thermal stability. For prediction of enzyme optimal catalytic temperatures and protein melting temperatures via a transfer learning approach, our DeepET model outperforms classical regression models trained on rationally designed features and other deep-learning-based representations. DeepET thus holds promise for understanding enzyme thermal adaptation and guiding the engineering of thermostable enzymes.
Project description:Recent crystallographic results revealed conformational changes of zwitterionic ectoine upon hydration. By means of confocal Raman spectroscopy and density functional theory calculations, we present a detailed study of this transformation process as part of a Fermi resonance analysis. The corresponding findings highlight that all resonant couplings are lifted upon exposure to water vapor as a consequence of molecular binding processes. The importance of the involved molecular groups for water binding and conformational changes upon hydration is discussed. Our approach further shows that the underlying rapid process can be reversed by carbon dioxide saturated atmospheres. For the first time, we also confirm that the conformational state of ectoine in aqueous bulk solution coincides with crystalline ectoine in its dihydrate state, thereby highlighting the important role of a few bound water molecules.
Project description:Apoptotic cell death within the brain represents a significant contributing factor to impaired post-traumatic tissue function and poor clinical outcome after traumatic brain injury. After irradiation with light in the wavelength range of 600-1200 nm (photobiomodulation), previous investigations have reported a reduction in apoptosis in various tissues. This study investigates the effect of 660 nm photobiomodulation on organotypic slice cultured hippocampal tissue of rats, examining the effect on apoptotic cell loss. Tissue optical Raman spectroscopic changes were evaluated. A significantly higher proportion of apoptotic cells 62.8±12.2% vs 48.6±13.7% (P<0.0001) per region were observed in the control group compared with the photobiomodulation group. After photobiomodulation, Raman spectroscopic observations demonstrated 1440/1660 cm-1 spectral shift. Photobiomodulation has the potential for therapeutic utility, reducing cell loss to apoptosis in injured neurological tissue, as demonstrated in this in vitro model. A clear Raman spectroscopic signal was observed after apparent optimal irradiation, potentially integrable into therapeutic light delivery apparatus for real-time dose metering.
Project description:Deep learning offers a powerful approach for analyzing hippocampal changes in Alzheimer's disease (AD) without relying on handcrafted features. Nevertheless, an input format needs to be selected to pass the image information to the neural network, which has wide ramifications for the analysis, but has not been evaluated yet. We compare five hippocampal representations (and their respective tailored network architectures) that span from raw images to geometric representations like meshes and point clouds. We performed a thorough evaluation for the prediction of AD diagnosis and time-to-dementia prediction with experiments on an independent test dataset. In addition, we evaluated the ease of interpretability for each representation-network pair. Our results show that choosing an appropriate representation of the hippocampus for predicting Alzheimer's disease with deep learning is crucial, since it impacts performance and ease of interpretation.
Project description:Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 230 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.