Project description:Graph neural networks (GNNs) have proven to be effective in the prediction of chemical reaction yields. However, their performance tends to deteriorate when they are trained using an insufficient training dataset in terms of quantity or diversity. A promising solution to alleviate this issue is to pre-train a GNN on a large-scale molecular database. In this study, we investigate the effectiveness of GNN pre-training in chemical reaction yield prediction. We present a novel GNN pre-training method for performance improvement.Given a molecular database consisting of a large number of molecules, we calculate molecular descriptors for each molecule and reduce the dimensionality of these descriptors by applying principal component analysis. We define a pre-text task by assigning a vector of principal component scores as the pseudo-label to each molecule in the database. A GNN is then pre-trained to perform the pre-text task of predicting the pseudo-label for the input molecule. For chemical reaction yield prediction, a prediction model is initialized using the pre-trained GNN and then fine-tuned with the training dataset containing chemical reactions and their yields. We demonstrate the effectiveness of the proposed method through experimental evaluation on benchmark datasets.
Project description:Due to the biogenesis difference, miRNAs can be divided into canonical microRNAs and mirtrons. Compared to canonical microRNAs, mirtrons are less conserved and hard to be identified. Except stringent annotations based on experiments, many in silico computational methods have be developed to classify miRNAs. Although several machine learning classifiers delivered high classification performance, all the predictors depended heavily on the selection of calculated features. Here, we introduced nucleotide-level convolutional neural networks (CNNs) for pre-miRNAs classification. By using "one-hot" encoding and padding, pre-miRNAs were converted into matrixes with the same shape. The convolution and max-pooling operations can automatically extract features from pre-miRNAs sequences. Evaluation on test dataset showed that our models had a satisfactory performance. Our investigation showed that it was feasible to apply CNNs to extract features from biological sequences. Since there are many hyperparameters can be tuned in CNNs, we believe that the performance of nucleotide-level convolutional neural networks can be greatly improved in the future.
Project description:Machine learning approaches including deep learning models have shown promising performance in the automatic detection of Parkinson's disease. These approaches rely on different types of data with voice recordings being the most used due to the convenient and non-invasive nature of data acquisition. Our group has successfully developed a novel approach that uses convolutional neural network with transfer learning to analyze spectrogram images of the sustained vowel /a/ to identify people with Parkinson's disease. We tested this approach by collecting a dataset of voice recordings via analog telephone lines, which support limited bandwidth. The convolutional neural network with transfer learning approach showed superior performance against conventional machine learning methods that collapse measurements across time to generate feature vectors. This study builds upon our prior results and presents two novel contributions: First, we tested the performance of our approach on a larger voice dataset recorded using smartphones with wide bandwidth. Our results show comparable performance between two datasets generated using different recording platforms despite the differences in most important features resulting from the limited bandwidth of analog telephonic lines. Second, we compared the classification performance achieved using linear-scale and mel-scale spectrogram images and showed a small but statistically significant gain using mel-scale spectrograms.
Project description:Machine learning approaches including deep learning models have shown promising performance in the automatic detection of Parkinson's disease. These approaches rely on different types of data with voice recordings being the most used due to the convenient and non-invasive nature of data acquisition. Our group has successfully developed a novel approach that uses convolutional neural network with transfer learning to analyze spectrogram images of the sustained vowel /a/ to identify people with Parkinson's disease. We tested this approach by collecting a dataset of voice recordings via telephone lines, which have limited bandwidth. This study builds upon our prior results in two major ways: First, we tested the performance of our approach on a larger voice dataset recorded using smartphones with wide bandwidth. Our results show comparable performance between two datasets generated using different recording platforms where we report differences in most important features resulting from the limited bandwidth of telephonic lines. Second, we compared the classification performance achieved using linear-scale and mel-scale spectrogram images and showed a small but statistically significant gain using mel-scale spectrograms. The convolutional neural network with transfer learning approach showed superior performance against conventional machine learning methods that collapse measurements across time to generate feature vectors.
Project description:Constructing effective and scalable protection strategies over epidemic propagation is a challenging issue. It has been attracting interests in both theoretical and empirical studies. However, most of the recent developments are limited to the simplified single-layered networks. Multiplex social networks are social networks with multiplelayers where the same set of nodes appear in different layers. Consequently, a single attack can trigger simultaneous propagation in all corresponding layers. Therefore, suppressing propagation in multiplex topologies is more challenging given the fact that each layer also has a different structure. In this paper, we address the problem of suppressing the epidemic propagation in multiplex social networks by allocating protection resources throughout different layers. Given a multiplex graph, such as a social network, and k budget of protection resources, we aim to protect a set of nodes such that the percentage of survived nodes at the end of epidemics is maximized. We propose MultiplexShield, which employs the role of graph spectral properties, degree centrality and layer-wise stochastic propagation rate to pre-emptively select k nodes for protection. We also comprehensively evaluate our proposal in two different approaches: multiplex-based and layer-based node protection schemes. Furthermore, two kinds of common attacks are also evaluated: random and targeted attack. Experimental results show the effectiveness of our proposal on real-world datasets.
Project description:BackgroundDue to recent studies indicating that the deregulation of microRNAs (miRNAs) in T cells contributes to increased severity of rheumatoid arthritis, we hypothesized that deregulated miRNAs may interact with key mRNA targets controlling the function or differentiation of these cells in this disease.Methodology/principal findingsTo test our hypothesis, we used microarrays to survey, for the first time, the expression of all known mouse miRNAs in parallel with genome-wide mRNAs in thymocytes and naïve and activated peripheral CD3(+) T cells from two mouse strains the DBA-1/J strain (MHC-H2q), which is susceptible to collagen induced arthritis (CIA), and the DBA-2/J strain (MHC-H2d), which is resistant. Hierarchical clustering of data showed the several T cell miRNAs and mRNAs differentially expressed between the mouse strains in different stages of immunization with collagen. Bayesian statistics using the GenMir(++) algorithm allowed reconstruction of post-transcriptional miRNA-mRNA interaction networks for target prediction. We revealed the participation of miR-500, miR-202-3p and miR-30b*, which established interactions with at least one of the following mRNAs: Rorc, Fas, Fasl, Il-10 and Foxo3. Among the interactions that were validated by calculating the minimal free-energy of base pairing between the miRNA and the 3'UTR of the mRNA target and luciferase assay, we highlight the interaction of miR-30b*-Rorc mRNA because the mRNA encodes a protein implicated in pro-inflammatory Th17 cell differentiation (Ror?t). FACS analysis revealed that Ror?t protein levels and Th17 cell counts were comparatively reduced in the DBA-2/J strain.Conclusions/significanceThis result showed that the miRNAs and mRNAs identified in this study represent new candidates regulating T cell function and controlling susceptibility and resistance to CIA.
Project description:Due to the cost and complexity of biological experiments, many computational methods have been proposed to predict potential miRNA-disease associations by utilizing known miRNA-disease associations and other related information. However, there are some challenges for these computational methods. First, the relationships between miRNAs and diseases are complex. The computational network should consider the local and global influence of neighborhoods from the network. Furthermore, predicting disease-related miRNAs without any known associations is also very important. This study presents a new computational method that constructs a heterogeneous network composed of a miRNA similarity network, disease similarity network, and known miRNA-disease association network. The miRNA similarity considers the miRNAs and their possible families and clusters. The information of each node in heterogeneous network is obtained by aggregating neighborhood information with graph convolutional networks (GCNs), which can pass the information of a node to its intermediate and distant neighbors. Disease-related miRNAs with no known associations can be predicted with the reconstructed heterogeneous matrix. We apply 5-fold cross-validation, leave-one-disease-out cross-validation, and global and local leave-one-out cross-validation to evaluate our method. The corresponding areas under the curves (AUCs) are 0.9616, 0.9946, 0.9656, and 0.9532, confirming that our approach significantly outperforms the state-of-the-art methods. Case studies show that this approach can effectively predict new diseases without any known miRNAs.
Project description:Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.
Project description:EphB2 and EphA2 control stemness and differentiation in the intestinal mucosa, but the way they cooperate with the complex mechanisms underlying tumor heterogeneity and how they affect the therapeutic outcome in colorectal cancer (CRC) patients, remain unclear. MicroRNA (miRNA) expression profiling along with pathway analysis provide comprehensive information on the dysregulation of multiple crucial pathways in CRC.Through a network-based approach founded on the characterization of progressive miRNAomes centered on EphA2/EphB2 signaling during tumor development in the AOM/DSS murine model, we found a miRNA-dependent orchestration of EphB2-specific stem-like properties in earlier phases of colorectal tumorigenesis and the EphA2-specific control of tumor progression in the latest CRC phases. Furthermore, two transcriptional signatures that are specifically dependent on the EphA2/EphB2 signaling pathways were identified, namely EphA2, miR-423-5p, CREB1, ADAMTS14, and EphB2, miR-31-5p, mir-31-3p, CRK, CXCL12, ARPC5, SRC.EphA2- and EphB2-related signatures were validated for their expression and clinical value in 1663 CRC patients. In multivariate analysis, both signatures were predictive of survival and tumor progression.The early dysregulation of miRs-31, as observed in the murine samples, was also confirmed on 49 human tissue samples including preneoplastic lesions and tumors. In light of these findings, miRs-31 emerged as novel potential drivers of CRC initiation.Our study evidenced a miRNA-dependent orchestration of EphB2 stem-related networks at the onset and EphA2-related cancer-progression networks in advanced stages of CRC evolution, suggesting new predictive biomarkers and potential therapeutic targets.
Project description:Network measures have proven very successful in identifying structural patterns in complex systems (e.g., a living cell, a neural network, the Internet). How such measures can be applied to understand the rational and experimental design of chemical reaction networks (CRNs) is unknown. Here, we develop a procedure to model CRNs as a mathematical graph on which network measures and a random graph analysis can be applied. We used an enzymatic CRN (for which a mass-action model was previously developed) to show that the procedure provides insights into its network structure and properties. Temporal analyses, in particular, revealed when feedback interactions emerge in such a network, indicating that CRNs comprise various reactions that are being added and removed over time. We envision that the procedure, including the temporal network analysis method, could be broadly applied in chemistry to characterize the network properties of many other CRNs, promising data-driven analysis of future molecular systems of ever greater complexity.