Project description:Long non-coding RNAs (lncRNAs) play an important regulatory role in gene transcription and post-transcriptional modification, and lncRNA regulatory dysfunction leads to a variety of complex human diseases. Hence, it might be beneficial to detect the underlying biological pathways and functional categories of genes that encode lncRNA. This can be carried out by using gene set enrichment analysis, which is a pervasive bioinformatic technique that has been widely used. However, accurately performing gene set enrichment analysis of lncRNAs remains a challenge. Most conventional enrichment analysis methods have not exhaustively included the rich association information among genes, which usually affects the regulatory functions of genes. Here, we developed a novel tool for lncRNA set enrichment analysis (TLSEA) to improve the accuracy of the gene functional enrichment analysis, which extracted the low-dimensional vectors of lncRNAs in two functional annotation networks with the graph representation learning method. A novel lncRNA-lncRNA association network was constructed by merging lncRNA-related heterogeneous information obtained from multiple sources with the different lncRNA-related similarity networks. In addition, the random walk with restart method was adopted to effectively expand the lncRNAs submitted by users according to the lncRNA-lncRNA association network of TLSEA. In addition, a case study of breast cancer was performed, which demonstrated that TLSEA could detect breast cancer more accurately than conventional tools. The TLSEA can be accessed freely at http://www.lirmed.com:5003/tlsea.
Project description:The stock market is an important part of the capital market, and the research on the price fluctuation of the stock market has always been a hot topic for scholars. As a dynamic and complex system, the stock market is affected by various factors. However, with the development of information technology, information presents multisource and heterogeneous characteristics, and the transmission speed and mode of information have changed greatly. The explanation and influence of multi-source and heterogeneous information on stock market price fluctuations need further study. In this paper, a graph fusion and embedding method for multi-source heterogeneous information of Chinese stock market is established. Relational dimension information is introduced to realize the effective fusion of multi-source heterogeneous data information. A multi-attention graph neural network based on nodes and semantics is constructed to mine the implied semantics of fusion graph data and capture the influence of multi-source heterogeneous information on stock market price fluctuations. Experiments show that the proposed multi-source heterogeneous information fusion methods is superior to tensor or vector fusion method, and the constructed multi-attention diagram neural network has a better ability to explain stock market price fluctuations.
Project description:Modeling the outbreak of a novel epidemic, such as coronavirus disease 2019 (COVID-19), is crucial for estimating its dynamics, predicting future spread and evaluating the effects of different interventions. However, there are three issues that make this modeling a challenging task: uncertainty in data, roughness in models, and complexity in programming. We addressed these issues by presenting an interactive individual-based simulator, which is capable of modeling an epidemic through multi-source information fusion.
Project description:Emerging evidence has shown microRNAs (miRNAs) play an important role in human disease research. Identifying potential association among them is significant for the development of pathology, diagnose and therapy. However, only a tiny portion of all miRNA-disease pairs in the current datasets are experimentally validated. This prompts the development of high-precision computational methods to predict real interaction pairs. In this paper, we propose a new model of Logistic Model Tree for predicting miRNA-Disease Association (LMTRDA) by fusing multi-source information including miRNA sequences, miRNA functional similarity, disease semantic similarity, and known miRNA-disease associations. In particular, we introduce miRNA sequence information and extract its features using natural language processing technique for the first time in the miRNA-disease prediction model. In the cross-validation experiment, LMTRDA obtained 90.51% prediction accuracy with 92.55% sensitivity at the AUC of 90.54% on the HMDD V3.0 dataset. To further evaluate the performance of LMTRDA, we compared it with different classifier and feature descriptor models. In addition, we also validate the predictive ability of LMTRDA in human diseases including Breast Neoplasms, Breast Neoplasms and Lymphoma. As a result, 28, 27 and 26 out of the top 30 miRNAs associated with these diseases were verified by experiments in different kinds of case studies. These experimental results demonstrate that LMTRDA is a reliable model for predicting the association among miRNAs and diseases.
Project description:This study offers an integrated service management system for rural tourist information based on a cloud platform to address the three main issues of high platform concurrency, difficulty storing and managing data, and trouble sharing data functions. Three levels-data, process, and architecture-are considered in the analysis and design of the platform. The Hadoop data storage system makes possible the collection, storage, administration, and exchange of data functions for large amounts of heterogeneous data from many different sources by utilising Netty data transmission technology, hybrid data storage technology, and the Web Foundation. The results demonstrate that the system's response time is low, and the CPU consumption time and the average utilisation rate meet the actual needs. They resolve issues with the current rural tourism platforms application, such as the difficulty of data collection, the low rate of reuse, the low rate of sharing, the lack of timely updates, and severe island phenomena.
Project description:Accurately obtaining roll angles is one of the key technologies to improve the positioning accuracy and operation quality of agricultural equipment. Given the demand for the acquisition of agricultural equipment roll angles, a roll angle monitoring model based on Kalman filtering and multi-source information fusion was established by using the MTi-300 AHRS inertial sensor (INS) and XW-GI 5630 BeiDou Navigation Satellite System (BDS), which were installed on agricultural equipment. Data of the INS and BDS were fused by MATLAB; then, Kalman filter was used to optimize the data, and the state equation and measurement equation of the integrated system were established. Then, an integrated monitoring terminal man-machine interactive interface was designed on MATLAB GUI, and a roll angle monitoring system based on the INS and BDS was designed and applied into field experiments. The mean absolute error of the integrated monitoring system based on multi-source information fusion during field experiments was 0.72°, which was smaller compared with the mean absolute errors of roll angle monitored by the INS and BDS independently (0.78° and 0.75°, respectively). Thus, the roll angle integrated model improves monitoring precision and underlies future research on navigation and independent operation of agricultural equipment.
Project description:Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in different organelle are wildly imbalanced. The prediction of subcellular location of lncRNAs is actually a multi-classification small sample imbalance problem. The imbalance of data results in the poor recognition effect of machine learning models on small data subsets, which is a puzzling and challenging problem in the existing research. In this study, we integrate multi-source features to construct a sequence-based computational tool, lncLocation, to predict the subcellular location of lncRNAs. Autoencoder is used to enhance part of the features, and the binomial distribution-based filtering method and recursive feature elimination (RFE) are used to filter some of the features. It improves the representation ability of data and reduces the problem of unbalanced multi-classification data. By comprehensive experiments on different feature combinations and machine learning models, we select the optimal features and classifier model scheme to construct a subcellular location prediction tool, lncLocation. LncLocation can obtain an 87.78% accuracy using 5-fold cross validation on the benchmark data, which is higher than the state-of-the-art tools, and the classification performance, especially for small class sets, is improved significantly.
Project description:BackgroundWith the current technological advances in high-throughput biology, the necessity to develop tools that help to analyse the massive amount of data being generated is evident. A powerful method of inspecting large-scale data sets is gene set enrichment analysis (GSEA) and investigation of protein structural features can guide determining the function of individual genes. However, a convenient tool that combines these two features to aid in high-throughput data analysis has not been developed yet. In order to fill this niche, we developed the user-friendly, web-based application, PhenoFam.ResultsPhenoFam performs gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Our tool is designed to analyse complete sets of results from quantitative high-throughput studies (gene expression microarrays, functional RNAi screens, etc.) without prior pre-filtering or hits-selection steps. PhenoFam utilizes Ensembl databases to link a list of user-provided identifiers with protein features from the InterPro database, and assesses whether results associated with individual domains differ significantly from the overall population. To demonstrate the utility of PhenoFam we analysed a genome-wide RNA interference screen and discovered a novel function of plexins containing the cytoplasmic RasGAP domain. Furthermore, a PhenoFam analysis of breast cancer gene expression profiles revealed a link between breast carcinoma and altered expression of PX domain containing proteins.ConclusionsPhenoFam provides a user-friendly, easily accessible web interface to perform GSEA based on high-throughput data sets and structural-functional protein information, and therefore aids in functional annotation of genes.
Project description:Ranked set sampling (RSS) is known to increase the efficiency of the estimators while comparing it with simple random sampling. The problem of missingness creates a gap in the information that needs to be addressed before proceeding for estimation. Negligible amount of work has been carried out to deal with missingness utilizing RSS. This paper proposes some logarithmic type methods of imputation for the estimation of population mean under RSS using auxiliary information. The properties of the suggested imputation procedures are examined. A simulation study is accomplished to show that the proposed imputation procedures exhibit better results in comparison to some of the existing imputation procedures. Few real applications of the proposed imputation procedures is also provided to generalize the simulation study.
Project description:Emerging evidence has revealed that circular RNA (circRNA) is widely distributed in mammalian cells and functions as microRNA (miRNA) sponges involved in transcriptional and posttranscriptional regulation of gene expression. Recognizing the circRNA-miRNA interaction provides a new perspective for the detection and treatment of human complex diseases. Compared with the traditional biological experimental methods used to predict the association of molecules, which are limited to the small-scale and are time-consuming and laborious, computing models can provide a basis for biological experiments at low cost. Considering that the proposed calculation model is limited, it is necessary to develop an effective computational method to predict the circRNA-miRNA interaction. This study thus proposed a novel computing method, named KGDCMI, to predict the interactions between circRNA and miRNA based on multi-source information extraction and fusion. The KGDCMI obtains RNA attribute information from sequence and similarity, capturing the behavior information in RNA association through a graph-embedding algorithm. Then, the obtained feature vector is extracted further by principal component analysis and sent to the deep neural network for information fusion and prediction. At last, KGDCMI obtains the prediction accuracy (area under the curve [AUC] = 89.30% and area under the precision-recall curve [AUPR] = 87.67%). Meanwhile, with the same dataset, KGDCMI is 2.37% and 3.08%, respectively, higher than the only existing model, and we conducted three groups of comparative experiments, obtaining the best classification strategy, feature extraction parameters, and dimensions. In addition, in the performed case study, 7 of the top 10 interaction pairs were confirmed in PubMed. These results suggest that KGDCMI is a feasible and useful method to predict the circRNA-miRNA interaction and can act as a reliable candidate for related RNA biological experiments.