Project description:ObjectiveThe cause and mechanism of non-obstructive azoospermia (NOA) is complicated; therefore, an effective therapy strategy is yet to be developed. This study aimed to analyse the pathogenesis of NOA at the molecular biological level and to identify the core regulatory genes, which could be utilised as potential biomarkers.MethodsThree NOA microarray datasets (GSE45885, GSE108886, and GSE145467) were collected from the GEO database and merged into training sets; a further dataset (GSE45887) was then defined as the validation set. Differential gene analysis, consensus cluster analysis, and WGCNA were used to identify preliminary signature genes; then, enrichment analysis was applied to these previously screened signature genes. Next, 4 machine learning algorithms (RF, SVM, GLM, and XGB) were used to detect potential biomarkers that are most closely associated with NOA. Finally, a diagnostic model was constructed from these potential biomarkers and visualised as a nomogram. The differential expression and predictive reliability of the biomarkers were confirmed using the validation set. Furthermore, the competing endogenous RNA network was constructed to identify the regulatory mechanisms of potential biomarkers; further, the CIBERSORT algorithm was used to calculate immune infiltration status among the samples.ResultsA total of 215 differentially expressed genes (DEGs) were identified between NOA and control groups (27 upregulated and 188 downregulated genes). The WGCNA results identified 1123 genes in the MEblue module as target genes that are highly correlated with NOA positivity. The NOA samples were divided into 2 clusters using consensus clustering; further, 1027 genes in the MEblue module, which were screened by WGCNA, were considered to be target genes that are highly correlated with NOA classification. The 129 overlapping genes were then established as signature genes. The XGB algorithm that had the maximum AUC value (AUC=0.946) and the minimum residual value was used to further screen the signature genes. IL20RB, C9orf117, HILS1, PAOX, and DZIP1 were identified as potential NOA biomarkers. This 5 biomarker model had the highest AUC value, of up to 0.982, compared to other single biomarker models; additionally, the results of this biomarker model were verified in the validation set.ConclusionsAs IL20RB, C9orf117, HILS1, PAOX, and DZIP1 have been determined to possess the strongest association with NOA, these five genes could be used as potential therapeutic targets for NOA patients. Furthermore, the model constructed using these five genes, which possessed the highest diagnostic accuracy, may be an effective biomarker model that warrants further experimental validation.
Project description:Systemic lupus erythematosus (SLE) is an autoimmune disease involving multiple systems. Its recurrent episodes and fluctuating disease courses have a severe impact on patients. Biomarkers to predict disease prognosis and remission are still lacking in SLE. We downloaded the GSE50772 dataset from the Gene Expression Omnibus database and identified differentially expressed genes (DEGs) between SLE and healthy controls. Weighted gene co-expression network analysis was used to identify key gene modules and corresponding genes in SLE. The overlapped genes in DEGs and key modules are used as key genes for subsequent analysis. These key genes were analyzed using 3 machine learning algorithms, including the least absolute shrinkage and selection operator, support vector machine recursive elimination, and random forest algorithms. The overlapped genes were obtained as potential biomarkers for further analysis, investigating and validating the potential biomarkers' possible functions, regulatory mechanisms, diagnostic value, and expression levels. And finally studied the differences between groups in level of immune cell infiltration and explored the relationship between potential biomarkers and immunity. A total of 234 overlapped genes in DEGs and key modules are used as key genes for subsequent analysis. After taking the intersection of the key genes obtained by 3 algorithms, we got 4 potential biomarkers (ARID2, CYSTM1, DDIT3, and RNASE1) with high diagnostic values. Finally, further immune infiltration analysis showed differences in various immune cells in the SLE and healthy control samples. ARID2, CYSTM1, DDIT3, and RNASE1 can affect the immune function of SLE patients. ARID2, CYSTM1, DDIT3, and RNASE1 could be used as immune-related potential biomarkers and therapeutic or diagnostic targets for further research.
Project description:BackgroundBreast cancer (BC) ranks first in incidence among women, with approximately 2 million new cases per year. Therefore, it is essential to investigate emerging targets for BC patients' diagnosis and prognosis.MethodsWe analyzed gene expression data from 99 normal and 1,081 BC tissues in The Cancer Genome Atlas (TCGA) database. Differentially expressed genes (DEGs) were identified using "limma" R package, and relevant modules were chosen through Weighted Gene Coexpression Network Analysis (WGCNA). Intersection genes were obtained by matching DEGs to WGCNA module genes. Functional enrichment studies were performed on these genes using Gene Ontology (GO), Disease Ontology (DO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Biomarkers were screened via Protein-Protein Interaction (PPI) networks and multiple machine-learning algorithms. The Gene Expression Profiling Interactive Analysis (GEPIA), The University of ALabama at Birmingham CANcer (UALCAN), and Human Protein Atlas (HPA) databases were employed to examine mRNA and protein expression of eight biomarkers. Kaplan-Meier mapper tool assessed their prognostic capabilities. Key biomarkers were analyzed via single-cell sequencing, and their relationship with immune infiltration was examined using Tumor Immune Estimation Resource (TIMER) database and "xCell" R package. Lastly, drug prediction was conducted based on the identified biomarkers.ResultsWe identified 1,673 DEGs and 542 important genes through differential analysis and WGCNA, respectively. Intersection analysis revealed 76 genes, which play significant roles in immune-related viral infection and IL-17 signaling pathways. DIX domain containing 1 (DIXDC1), Dual specificity phosphatase 6 (DUSP6), Pyruvate dehydrogenase kinase 4 (PDK4), C-X-C motif chemokine ligand 12 (CXCL12), Interferon regulatory factor 7 (IRF7), Integrin subunit alpha 7 (ITGA7), NIMA related kinase 2 (NEK2), and Nuclear receptor subfamily 3 group C member 1 (NR3C1) were selected as BC biomarkers using machine-learning algorithms. NEK2 was the most critical gene for diagnosis. Prospective drugs targeting NEK2 include etoposide and lukasunone.ConclusionsOur study identified DIXDC1, DUSP6, PDK4, CXCL12, IRF7, ITGA7, NEK2, and NR3C1 as potential diagnostic biomarkers for BC, with NEK2 having the highest potential to aid in diagnosis and prognosis in clinical settings.
Project description:Biliary atresia (BA) is a severe and progressive biliary obstructive disease in infants that requires early diagnosis and new therapeutic targets. This study employed bioinformatics methods to identify diagnostic biomarkers and potential therapeutic targets for BA. Our analysis of mRNA expression from Gene Expression Omnibus datasets revealed 3,273 differentially expressed genes between patients with BA and those without BA (nBA). Weighted gene coexpression network analysis determined that the turquoise gene coexpression module, consisting of 298 genes, is predominantly associated with BA. The machine learning method then filtered out the top 2 important genes, CXCL8 and TMSB10, from the turquoise module. The area under receiver operating characteristic curves for TMSB10 and CXCL8 were 0.961 and 0.927 in the training group and 0.819 and 0.791 in the testing group, which indicated a high diagnostic value. Besides, combining TMSB10 and CXCL8, a nomogram with better diagnostic performance was built for clinical translation. Several studies have highlighted the potential of CXCL8 as a therapeutic target for BA, while TMSB10 has been shown to regulate cell polarity, which was related to BA progression. Our analysis with qRT PCR and immunohistochemistry also confirmed the upregulation of TMSB10 at mRNA and protein levels in BA liver samples. These findings highlight the sensitivity of CXCL8 and TMSB10 as diagnostic biomarkers and their potential as therapeutic targets for BA.
Project description:Bronchopulmonary dysplasia (BPD) is often seen as a pulmonary complication of extreme preterm birth, resulting in persistent respiratory symptoms and diminished lung function. Unfortunately, current diagnostic and treatment options for this condition are insufficient. Hence, this study aimed to identify potential biomarkers in the peripheral blood of neonates affected by BPD. The Gene Expression Omnibus provided the expression dataset GSE32472 for BPD. Initially, using this database, we identified differentially expressed genes (DEGs) in GSE32472. Subsequently, we conducted gene set enrichment analysis on the DEGs and employed weighted gene co-expression network analysis (WGCNA) to screen the most relevant modules for BPD. We then mapped the DEGs to the WGCNA module genes, resulting in a gene intersection. We conducted detailed functional enrichment analyses on these overlapping genes. To identify hub genes, we used 3 machine learning algorithms, including SVM-RFE, LASSO, and Random Forest. We constructed a diagnostic nomogram model for predicting BPD based on the hub genes. Additionally, we carried out transcription factor analysis to predict the regulatory mechanisms and identify drugs associated with these biomarkers. We used differential analysis to obtain 470 DEGs and conducted WGCNA analysis to identify 1351 significant genes. The intersection of these 2 approaches yielded 273 common genes. Using machine learning algorithms, we identified CYYR1, GALNT14, and OLAH as potential biomarkers for BPD. Moreover, we predicted flunisolide, budesonide, and beclomethasone as potential anti-BPD drugs. The genes CYYR1, GALNT14, and OLAH have the potential to serve as diagnostic biomarkers for BPD. This may prove beneficial in clinical diagnosis and prevention of BPD.
Project description:Gastric cancer is the high mortality rate cancers globally, and the current survival rate is 30% even with the use of combination therapies. Recently, mounting evidence indicates the potential role of miRNAs in the diagnosis and assessing the prognosis of cancers. In the state-of-art research in cancer, machine-learning (ML) has gained increasing attention to find clinically useful biomarkers. The present study aimed to identify potential diagnostic and prognostic miRNAs in GC with the application of ML. Using the TCGA database and ML algorithms such as Support Vector Machine (SVM), Random Forest, k-NN, etc., a panel of 29 was obtained. Among the ML algorithms, SVM was chosen (AUC:88.5%, Accuracy:93% in GC). To find common molecular mechanisms of the miRNAs, their common gene targets were predicted using online databases such as miRWalk, miRDB, and Targetscan. Functional and enrichment analyzes were performed using Gene Ontology (GO) and Kyoto Database of Genes and Genomes (KEGG), as well as identification of protein-protein interactions (PPI) using the STRING database. Pathway analysis of the target genes revealed the involvement of several cancer-related pathways including miRNA mediated inhibition of translation, regulation of gene expression by genetic imprinting, and the Wnt signaling pathway. Survival and ROC curve analysis showed that the expression levels of hsa-miR-21, hsa-miR-133a, hsa-miR-146b, and hsa-miR-29c were associated with higher mortality and potentially earlier detection of GC patients. A panel of dysregulated miRNAs that may serve as reliable biomarkers for gastric cancer were identified using machine learning, which represents a powerful tool in biomarker identification.
Project description:BackgroundCardioembolic Stroke (CS) and Atrial Fibrillation (AF) are prevalent diseases that significantly impact the quality of life and impose considerable financial burdens on society. Despite increasing evidence of a significant association between the two diseases, their complex interactions remain inadequately understood. We conducted bioinformatics analysis and employed machine learning techniques to investigate potential shared biomarkers between CS and AF.MethodsWe retrieved the CS and AF datasets from the Gene Expression Omnibus (GEO) database and applied Weighted Gene Co-Expression Network Analysis (WGCNA) to develop co-expression networks aimed at identifying pivotal modules. Next, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis on the shared genes within the modules related to CS and AF. The STRING database was used to build a protein-protein interaction (PPI) network, facilitating the discovery of hub genes within the network. Finally, several common used machine learning approaches were applied to construct the clinical predictive model of CS and AF. ROC curve analysis to evaluate the diagnostic value of the identified biomarkers for AF and CS.ResultsFunctional enrichment analysis indicated that pathways intrinsic to the immune response may be significantly involved in CS and AF. PPI network analysis identified a potential association of 4 key genes with both CS and AF, specifically PIK3R1, ITGAM, FOS, and TLR4.ConclusionIn our study, we utilized WGCNA, PPI network analysis, and machine learning to identify four hub genes significantly associated with CS and AF. Functional annotation outcomes revealed that inherent pathways related to the immune response connected to the recognized genes might could pave the way for further research on the etiological mechanisms and therapeutic targets for CS and AF.
Project description:The etiologies and pathogenesis of dilated cardiomyopathy (DCM) with heart failure (HF) remain to be defined. Thus, exploring specific diagnosis biomarkers and mechanisms is urgently needed to improve this situation. In this study, three gene expression profiling datasets (GSE29819, GSE21610, GSE17800) and one single-cell RNA sequencing dataset (GSE95140) were obtained from the Gene Expression Omnibus (GEO) database. GSE29819 and GSE21610 were combined into the training group, while GSE17800 was the test group. We used the weighted gene co-expression network analysis (WGCNA) and identified fifteen driver genes highly associated with DCM with HF in the module. We performed the least absolute shrinkage and selection operator (LASSO) on the driver genes and then constructed five machine learning classifiers (random forest, gradient boosting machine, neural network, eXtreme gradient boosting, and support vector machine). Random forest was the best-performing classifier established on five Lasso-selected genes, which was utilized to select out NPPA, OMD, and PRELP for diagnosing DCM with HF. Moreover, we observed the up-regulation mRNA levels and robust diagnostic accuracies of NPPA, OMD, and PRELP in the training group and test group. Single-cell RNA-seq analysis further demonstrated their stable up-regulation expression patterns in various cardiomyocytes of DCM patients. Besides, through gene set enrichment analysis (GSEA), we found TGF-β signaling pathway, correlated with NPPA, OMD, and PRELP, was the underlying mechanism of DCM with HF. Overall, our study revealed NPPA, OMD, and PRELP serving as diagnostic biomarkers for DCM with HF, deepening the understanding of its pathogenesis.
Project description:BackgroundRecurrent pregnancy loss defined as the occurrence of two or more pregnancy losses before 20-24 weeks of gestation, is a prevalent and significant pathological condition that impacts human reproductive health. However, the underlying mechanism of RPL remains unclear. This study aimed to investigate the biomarkers and molecular mechanisms associated with RPL and explore novel treatment strategies for clinical applications.MethodsThe GEO database was utilized to retrieve the RPL gene expression profile GSE165004. This profile underwent differential expression analysis, WGCNA, functional enrichment, and subsequent analysis of RPL gene expression using LASSO regression, SVM-RFE, and RandomForest algorithms for hub gene screening. ANN model were constructed to assess the performance of hub genes in the dataset. The expression of hub genes in both the RPL and control group samples was validated using RT-qPCR. The immune cell infiltration level of RPL was assessed using CIBERSORT. Additionally, pan-cancer analysis was conducted using Sangerbox, and small-molecule drug screening was performed using CMap.ResultsA total of 352 DEGs were identified, including 198 up-regulated genes and 154 down-regulated genes. Enrichment analysis indicated that the DEGs were primarily associated with Fc gamma R-mediated phagocytosis, the Fc epsilon RI signaling pathway, and various metabolism-related pathways. The turquoise module, which showed the highest relevance to clinical symptoms based on WGCNA results, contained 104 DEGs. Three hub genes, WBP11, ACTR2, and NCSTN, were identified using machine learning algorithms. ROC curves demonstrated a strong diagnostic value when the three hub genes were combined. RT-qPCR confirmed the low expression of WBP11 and ACTR2 in RPL, whereas NCSTN exhibited high expression. The immune cell infiltration analysis results indicated an imbalance of macrophages in RPL. Meanwhile, these three hub genes exhibited aberrant expression in multiple malignancies and were associated with a poor prognosis. Furthermore, we identified several small-molecule drugs.ConclusionThis study identifies and validates hub genes in RPL, which may lead to significant advancements in understanding the molecular mechanisms and treatment strategies for this condition.
Project description:Drug induced liver injury (DILI) is a kind of liver dysfunction which caused by drugs, and gut microbiota could affect liver injury. However, the relationship between gut microbiota and its metabolites in DILI patients is not clear. The total gut microbiota DNA was extracted from 28 DILI patient and 28 healthy control volunteers (HC) and 16S rDNA gene were amplified. Next, differentially metabolites were screened. Finally, the correlations between the diagnostic strains and differentially metabolites were studied.The richness and uniformity of the bacterial communities decreased in DILI patients, and the structure of gut microbiota changed obviously. Enterococcus and Veillonella which belong to Firmicutes increased in DILI, and Blautia and Ralstonia which belong to Firmicutes, Dialister which belongs to Proteobacteria increased in HC. In addition, these diagnostic OTUs of DILI were associated with the DILI damage mechanism. On the other hands, there were 66 differentially metabolites between DILI and HC samples, and these metabolites were mainly enriched in pyrimidine metabolism and steroid hormone biosynthesis pathways. Furthermore, the collinear network map of the key microbiota-metabolites were constructed and the results indicated that Cortodoxone, Prostaglandin I1, Bioyclo Prostaglandin E2 and Anacardic acid were positively correlated with Blautia and Ralstonia, and negatively correlated with Veillonella.This study analyzed the changes of DILI from the perspective of gut microbiota and metabolites. Key strains and differentially metabolites of DILI were screened and the correlations between them were studied. This study further illustrated the mechanism of DILI.