Project description:Liquid biopsies that measure circulating cell-free RNA (cfRNA) offer an opportunity to study the development of pregnancy-related complications in a non-invasive manner and to bridge gaps in clinical care1-4. Here we used 404 blood samples from 199 pregnant mothers to identify and validate cfRNA transcriptomic changes that are associated with preeclampsia, a multi-organ syndrome that is the second largest cause of maternal death globally5. We find that changes in cfRNA gene expression between normotensive and preeclamptic mothers are marked and stable early in gestation, well before the onset of symptoms. These changes are enriched for genes specific to neuromuscular, endothelial and immune cell types and tissues that reflect key aspects of preeclampsia physiology6-9, suggest new hypotheses for disease progression and correlate with maternal organ health. This enabled the identification and independent validation of a panel of 18 genes that when measured between 5 and 16 weeks of gestation can form the basis of a liquid biopsy test that would identify mothers at risk of preeclampsia long before clinical symptoms manifest themselves. Tests based on these observations could help predict and manage who is at risk for preeclampsia-an important objective for obstetric care10,11.
Project description:Introduction: CircRNA-protein binding plays a critical role in complex biological activity and disease. Various deep learning-based algorithms have been proposed to identify CircRNA-protein binding sites. These methods predict whether the CircRNA sequence includes protein binding sites from the sequence level, and primarily concentrate on analysing the sequence specificity of CircRNA-protein binding. For model performance, these methods are unsatisfactory in accurately predicting motif sites that have special functions in gene expression. Methods: In this study, based on the deep learning models that implement pixel-level binary classification prediction in computer vision, we viewed the CircRNA-protein binding sites prediction as a nucleotide-level binary classification task, and use a fully convolutional neural networks to identify CircRNA-protein binding motif sites (CPBFCN). Results: CPBFCN provides a new path to predict CircRNA motifs. Based on the MEME tool, the existing CircRNA-related and protein-related database, we analysed the motif functions discovered by CPBFCN. We also investigated the correlation between CircRNA sponge and motif distribution. Furthermore, by comparing the motif distribution with different input sequence lengths, we found that some motifs in the flanking sequences of CircRNA-protein binding region may contribute to CircRNA-protein binding. Conclusion: This study contributes to identify circRNA-protein binding and provides help in understanding the role of circRNA-protein binding in gene expression regulation.
Project description:DNA methylation takes on critical significance to the regulation of gene expression by affecting the stability of DNA and changing the structure of chromosomes. DNA methylation modification sites should be identified, which lays a solid basis for gaining more insights into their biological functions. Existing machine learning-based methods of predicting DNA methylation have not fully exploited the hidden multidimensional information in DNA gene sequences, such that the prediction accuracy of models is significantly limited. Besides, most models have been built in terms of a single methylation type. To address the above-mentioned issues, a deep learning-based method was proposed in this study for DNA methylation site prediction, termed the MEDCNN model. The MEDCNN model is capable of extracting feature information from gene sequences in three dimensions (i.e., positional information, biological information, and chemical information). Moreover, the proposed method employs a convolutional neural network model with double convolutional layers and double fully connected layers while iteratively updating the gradient descent algorithm using the cross-entropy loss function to increase the prediction accuracy of the model. Besides, the MEDCNN model can predict different types of DNA methylation sites. As indicated by the experimental results,the deep learning method based on coding from multiple dimensions outperformed single coding methods, and the MEDCNN model was highly applicable and outperformed existing models in predicting DNA methylation between different species. As revealed by the above-described findings, the MEDCNN model can be effective in predicting DNA methylation sites.
Project description:The optic disc(OD) and the optic cup(OC) segmentation is an key step in fundus medical image analysis. Previously, FCN-based methods have been proposed for medical image segmentation tasks. However, the consecutive convolution and pooling operations usually hinder dense prediction tasks which require detailed spatial information, such as image segmentation. In this paper, we propose a network called Recurrent Fully Convolution Network(RFC-Net) for automatic joint segmentation of the OD and the OC, which can captures more high-level information and subtle edge information. The RFC-Net can minimize the loss of spatial information. It is mainly composed of multi-scale input layer, recurrent fully convolutional network, multiple output layer and polar transformation. In RFC-Net, the multi-scale input layer constructs an image pyramid. We propose four recurrent units, which are respectively applied to RFC-Net. Recurrent convolution layer effectively ensures feature representation for OD and OC segmentation tasks through feature accumulation. For each multiple output image, the multiple output cross entropy loss function is applied. To better balance the cup ratio of the segmented image, the polar transformation is used to transform the fundus image from the cartesian coordinate system to the polar coordinate system. We evaluate the effectiveness and generalization of the proposed method on the DRISHTI-GS1 dataset. Compared with the original FCN method and other state-of-the-art methods, the proposed method achieves better segmentation performance.
Project description:RNA editing exerts critical impacts on numerous biological processes. While millions of RNA editings have been identified in humans, much more are expected to be discovered. In this work, we constructed Convolutional Neural Network (CNN) models to predict human RNA editing events in both Alu regions and non-Alu regions. With a validation dataset resulting from CRISPR/Cas9 knockout of the ADAR1 enzyme, the validation accuracies reached 99.5% and 93.6% for Alu and non-Alu regions, respectively. We ported our CNN models in a web service named EditPredict. EditPredict not only works on reference genome sequences but can also take into consideration single nucleotide variants in personal genomes. In addition to the human genome, EditPredict tackles other model organisms including bumblebee, fruitfly, mouse, and squid genomes. EditPredict can be used stand-alone to predict novel RNA editing and it can be used to assist in filtering for candidate RNA editing detected from RNA-Seq data.
Project description:Preeclampsia (PE) is a hypertensive complication affecting 8-10% of US pregnancies annually. While there is no cure for PE, aspirin may reduce complications for those at high risk for PE. Furthermore, PE disproportionately affects racial minorities, with a higher burden of morbidity and mortality. Previous studies have shown early prediction of PE would allow for prevention. We approached the prediction of PE using a new method based on a cost-sensitive deep neural network (CSDNN) by considering the severe imbalance and sparse nature of the data, as well as racial disparities. We validated our model using large extant rich data sources that represent a diverse cohort of minority populations in the US. These include Texas Public Use Data Files (PUDF), Oklahoma PUDF, and the Magee Obstetric Medical and Infant (MOMI) databases. We identified the most influential clinical and demographic features (predictor variables) relevant to PE for both general populations and smaller racial groups. We also investigated the effectiveness of multiple network architectures using three hyperparameter optimization algorithms: Bayesian optimization, Hyperband, and random search. Our proposed models equipped with focal loss function yield superior and reliable prediction performance compared with the state-of-the-art techniques with an average area under the curve (AUC) of 66.3% and 63.5% for the Texas and Oklahoma PUDF respectively, while the CSDNN model with weighted cross-entropy loss function outperforms with an AUC of 76.5% for the MOMI data. Furthermore, our CSDNN model equipped with focal loss function leads to an AUC of 66.7% for Texas African American and 57.1% for Native American. The best results are obtained with 62.3% AUC with CSDNN with weighted cross-entropy loss function for Oklahoma African American, 58% AUC with DNN and balanced batch for Oklahoma Native American, and 72.4% AUC using either CSDNN with weighted cross-entropy loss function or CSDNN with focal loss with balanced batch method for MOMI African American dataset. Our results provide the first evidence of the predictive power of clinical databases for PE prediction among minority populations.
Project description:The reaction-diffusion system is naturally used in chemistry to represent substances reacting and diffusing over the spatial domain. Its solution illustrates the underlying process of a chemical reaction and displays diverse spatial patterns of the substances. Numerical methods like finite element method (FEM) are widely used to derive the approximate solution for the reaction-diffusion system. However, these methods require long computation time and huge computation resources when the system becomes complex. In this paper, we study the physics of a two-dimensional one-component reaction-diffusion system by using machine learning. An encoder-decoder based convolutional neural network (CNN) is designed and trained to directly predict the concentration distribution, bypassing the expensive FEM calculation process. Different simulation parameters, boundary conditions, geometry configurations and time are considered as the input features of the proposed learning model. In particular, the trained CNN model manages to learn the time-dependent behaviour of the reaction-diffusion system through the input time feature. Thus, the model is capable of providing concentration prediction at certain time directly with high test accuracy (mean relative error <3.04%) and 300 times faster than the traditional FEM. Our CNN-based learning model provides a rapid and accurate tool for predicting the concentration distribution of the reaction-diffusion system.
Project description:RNA modification is a post transcriptional modification that occurs in all organisms and plays a crucial role in the stages of RNA life, closely related to many life processes. As one of the newly discovered modifications, N1-methyladenosine (m1A) plays an important role in gene expression regulation, closely related to the occurrence and development of diseases. However, due to the low abundance of m1A, verifying the associations between m1As and diseases through wet experiments requires a great quantity of manpower and resources. In this study, we proposed a computational method for predicting the associations of RNA methylation and disease based on graph convolutional network (RMDGCN) with attention mechanism. We build an adjacency matrix through the collected m1As and diseases associations, and use positive-unlabeled learning to increase the number of positive samples. By extracting the features of m1As and diseases, a heterogeneous network is constructed, and a GCN with attention mechanism is adopted to predict the associations between m1As and diseases. The experimental results indicate that under a 5-fold cross validation, RMDGCN is superior to other methods (AUC = 0.9892 and AUPR = 0.8682). In addition, case studies indicate that RMDGCN can predict the relationships between unknown m1As and diseases. In summary, RMDGCN is an effective method for predicting the associations between m1As and diseases.
Project description:In early gastric cancer (EGC), tumor invasion depth is an important factor for determining the treatment method. However, as endoscopic ultrasonography has limitations when measuring the exact depth in a clinical setting as endoscopists often depend on gross findings and personal experience. The present study aimed to develop a model optimized for EGC detection and depth prediction, and we investigated factors affecting artificial intelligence (AI) diagnosis. We employed a visual geometry group(VGG)-16 model for the classification of endoscopic images as EGC (T1a or T1b) or non-EGC. To induce the model to activate EGC regions during training, we proposed a novel loss function that simultaneously measured classification and localization errors. We experimented with 11,539 endoscopic images (896 T1a-EGC, 809 T1b-EGC, and 9834 non-EGC). The areas under the curves of receiver operating characteristic curves for EGC detection and depth prediction were 0.981 and 0.851, respectively. Among the factors affecting AI prediction of tumor depth, only histologic differentiation was significantly associated, where undifferentiated-type histology exhibited a lower AI accuracy. Thus, the lesion-based model is an appropriate training method for AI in EGC. However, further improvements and validation are required, especially for undifferentiated-type histology.
Project description:Deep convolutional neural networks have been successfully applied to many image-processing problems in recent works. Popular network architectures often add additional operations and connections to the standard architecture to enable training deeper networks. To achieve accurate results in practice, a large number of trainable parameters are often required. Here, we introduce a network architecture based on using dilated convolutions to capture features at different image scales and densely connecting all feature maps with each other. The resulting architecture is able to achieve accurate results with relatively few parameters and consists of a single set of operations, making it easier to implement, train, and apply in practice, and automatically adapts to different problems. We compare results of the proposed network architecture with popular existing architectures for several segmentation problems, showing that the proposed architecture is able to achieve accurate results with fewer parameters, with a reduced risk of overfitting the training data.