Project description:MotivationScaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly.ResultsWe call an analysis method 'scale-invariant' (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine.Availability and implementationThis source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Changes of glycosylation pattern in serum proteins have been linked to various diseases including cancer, suggesting possible development of novel biomarkers based on the glycomic analysis. In this study, N-linked glycans from human serum were quantitatively profiled by matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) and compared between healthy controls and ovarian cancer patients. A training set consisting of 40 healthy controls and 40 ovarian cancer cases demonstrated an inverse correlation between P value of ANOVA and area under the curve (AUC) of each candidate biomarker peak from MALDI-TOF MS, providing standards for the classification. A multibiomarker panel composed of 15 MALDI-TOF MS peaks resulted in AUC of 0.89, 80~90% sensitivity, and 70~83% specificity in the training set. The performance of the biomarker panel was validated in a separate blind test set composed of 23 healthy controls and 37 ovarian cancer patients, leading to 81~84% sensitivity and 83% specificity with cut-off values determined by the training set. Sensitivity of CA-125, the most widely used ovarian cancer marker, was 74% in the training set and 78% in the test set, respectively. These results indicate that MALDI-TOF MS-mediated serum N-glycan analysis could provide critical information for the screening of ovarian cancer.
Project description:BackgroundExploring the cellular processes of genes from the aspects of biological networks is of great interest to understanding the properties of complex diseases and biological systems. Biological networks, such as protein-protein interaction networks and gene regulatory networks, provide insights into the molecular basis of cellular processes and often form functional clusters in different tissue and disease contexts.ResultsWe present scGraph2Vec, a deep learning framework for generating informative gene embeddings. scGraph2Vec extends the variational graph autoencoder framework and integrates single-cell datasets and gene-gene interaction networks. We demonstrate that the gene embeddings are biologically interpretable and enable the identification of gene clusters representing functional or tissue-specific cellular processes. By comparing similar tools, we showed that scGraph2Vec clearly distinguished different gene clusters and aggregated more biologically functional genes. scGraph2Vec can be widely applied in diverse biological contexts. We illustrated that the embeddings generated by scGraph2Vec can infer disease-associated genes from genome-wide association study data (e.g., COVID-19 and Alzheimer's disease), identify additional driver genes in lung adenocarcinoma, and reveal regulatory genes responsible for maintaining or transitioning melanoma cell states.ConclusionsscGraph2Vec not only reconstructs tissue-specific gene networks but also obtains a latent representation of genes implying their biological functions.
Project description:BackgroundAutomatic cell type identification has been an urgent task for the rapid development of single-cell RNA-seq techniques. Generally, the current approach for cell type identification is to generate cell clusters by unsupervised clustering and later assign labels to each cell cluster with manual annotation.MethodsHere, we introduce LIDER (celL embeddIng based Deep nEural netwoRk classifier), a deep supervised learning method that combines cell embedding and deep neural network classifier for automatic cell type identification. Based on a stacked denoising autoencoder with a tailored and reconstructed loss function, LIDER identifies cell embedding and predicts cell types with a deep neural network classifier. LIDER was developed upon a stacked denoising autoencoder to learn encoder-decoder structures for identifying cell embedding.ResultsLIDER accurately identifies cell types by using stacked denoising autoencoder. Benchmarking against state-of-the-art methods across eight types of single-cell data, LIDER achieves comparable or even superior enhancement performance. Moreover, LIDER suggests comparable robust to batch effects. Our results show a potential in deep supervised learning for automatic cell type identification of single-cell RNA-seq data. The LIDER codes are available at https://github.com/ShiMGLab/LIDER.
Project description:The established model-free methods for the processing of two-electron dipolar spectroscopy data [DEER (double electron-electron resonance), PELDOR (pulsed electron double resonance), DQ-EPR (double-quantum electron paramagnetic resonance), RIDME (relaxation-induced dipolar modulation enhancement), etc.] use regularized fitting. In this communication, we describe an attempt to process DEER data using artificial neural networks trained on large databases of simulated data. Accuracy and reliability of neural network outputs from real experimental data were found to be unexpectedly high. The networks are also able to reject exchange interactions and to return a measure of uncertainty in the resulting distance distributions. This paper describes the design of the training databases, discusses the training process, and rationalizes the observed performance. Neural networks produced in this work are incorporated as options into Spinach and DeerAnalysis packages.
Project description:Environmental and metabolic processes shape the profile of glycoprotein glycans expressed by cells, whether in culture, developing tissues, or mature organisms. Quantitative characterization of glycomic changes associated with these conditions has been achieved historically by reductive coupling of oligosaccharides to various fluorophores following release from glycoprotein and subsequent HPLC or capillary electrophoretic separation. Such labeling-based approaches provide a robust means of quantifying glycan amount based on fluorescence yield. Mass spectrometry, on the other hand, has generally been limited to relative quantification in which the contribution of the signal intensity for an individual glycan is expressed as a percent of the signal intensity summed over the total profile. Relative quantification has been valuable for highlighting changes in glycan expression between samples; sensitivity is high, and structural information can be derived by fragmentation. We have investigated whether MS-based glycomics is amenable to absolute quantification by referencing signal intensities to well-characterized oligosaccharide standards. We report the qualification of a set of N-linked oligosaccharide standards by NMR, HPLC, and MS. We also demonstrate the dynamic range, sensitivity, and recovery from complex biological matrices for these standards in their permethylated form. Our results indicate that absolute quantification for MS-based glycomic analysis is reproducible and robust utilizing currently available glycan standards.
Project description:Neutrophils are the most abundant white blood cells in humans and play a vital role in several aspects of the immune response. Numerous reports have implicated neutrophil glycosylation as an important factor in mediating these interactions. We report here the application of high sensitivity glycomics methodologies, including matrix assisted laser desorption ionisation (MALDI-TOF) and MALDI-TOF/TOF analyses, to the structural analysis of N- and O-linked carbohydrates released from two samples of neutrophils, prepared by two separate and geographically remote laboratories. The data produced demonstrates that the cells display a diverse range of sialylated and fucosylated complex glycans, with a high level of similarity between the two preparations.
Project description:Network traffic must be monitored and analyzed for any abnormal activity in order to detect intrusions and to notify administrators of any attacks. A novel ensemble of deep learning technique is proposed to enhance the efficiency of Packet Flow Classification in Network Intrusion Detection System (NIDS). The proposed work consists of three phases: (i) Feature Augmented Convolutional Neural Network (FA-CNN) (ii) Deep Autoencoder (iii) Ensemble of FA-CNN and Deep Autoencoder. In FA-CNN, CNN is trained with augmented features selected using Mutual Information. The FA-CNN is ensembled with Deep Autoencoder to design the ensemble of the classifier. To assess the stated ensemble model, numerous experiments are conducted on benchmark datasets like NSL-KDD and CICDS2017. The result findings are compared with the recent methodologies to assess the performance of the stated work. The results indicate that the suggested work performs better than the existing works with the overall accuracy of 97% for NSLKDD and 95% for CICIDS2017 dataset. Also, the proposed method improved the detection rate of minority attack classes like U2R in NSLKDD and Hearbleed in CICIDS2017.
Project description:Pure shift NMR spectroscopy enables the robust probing on molecular structure and dynamics, benefiting from great resolution enhancements. Despite extensive application landscapes in various branches of chemistry, the long experimental times induced by the additional time dimension generally hinder its further developments and practical deployments, especially for multi-dimensional pure shift NMR. Herein, this study proposes and implements the fast, reliable, and robust reconstruction for accelerated pure shift NMR spectroscopy with lightweight attention-assisted deep neural network. This deep learning protocol allows one to regain high-resolution signals and suppress undersampling artifacts, as well as furnish high-fidelity signal intensities along with the accelerated pure shift acquisition, benefitting from the introduction of the attention mechanism to highlight the spectral feature and information of interest. Extensive results of simulated and experimental NMR data demonstrate that this attention-assisted deep learning protocol enables the effective recovery of weak signals that are almost drown in the serious undersampling artifacts, and the distinction and recognition of close chemical shifts even though using merely 5.4% data, highlighting its huge potentials on fast pure shift NMR spectroscopy. As a result, this study affords a promising paradigm for the AI-assisted NMR protocols toward broader applications in chemistry, biology, materials, and life sciences, and among others.
Project description:Concussion is a global health concern. Despite its high prevalence, a sound understanding of the mechanisms underlying this type of diffuse brain injury remains elusive. It is, however, well established that concussions cause significant functional deficits; that children and youths are disproportionately affected and have longer recovery time than adults; and that individuals suffering from a concussion are more prone to experience additional concussions, with each successive injury increasing the risk of long term neurological and mental health complications. Currently, the most significant challenge in concussion management is the lack of objective, clinically- accepted, brain-based approaches for determining whether an athlete has suffered a concussion. Here, we report on our efforts to address this challenge. Specifically, we introduce a deep learning long short-term memory (LSTM)-based recurrent neural network that is able to distinguish between non-concussed and acute post-concussed adolescent athletes using only short (i.e. 90 s long) samples of resting state EEG data as input. The athletes were neither required to perform a specific task nor expected to respond to a stimulus during data collection. The acquired EEG data were neither filtered, cleaned of artefacts, nor subjected to explicit feature extraction. The LSTM network was trained and validated using data from 27 male, adolescent athletes with sports related concussion, benchmarked against 35 non-concussed adolescent athletes. During rigorous testing, the classifier consistently identified concussions with an accuracy of > 90% and achieved an ensemble median Area Under the Receiver Operating Characteristic Curve (ROC/AUC) equal to 0.971. This is the first instance of a high-performing classifier that relies only on easy-to-acquire resting state, raw EEG data. Our concussion classifier represents a promising first step towards the development of an easy-to-use, objective, brain-based, automatic classification of concussion at an individual level.