Project description:ObjectiveThe cause and mechanism of non-obstructive azoospermia (NOA) is complicated; therefore, an effective therapy strategy is yet to be developed. This study aimed to analyse the pathogenesis of NOA at the molecular biological level and to identify the core regulatory genes, which could be utilised as potential biomarkers.MethodsThree NOA microarray datasets (GSE45885, GSE108886, and GSE145467) were collected from the GEO database and merged into training sets; a further dataset (GSE45887) was then defined as the validation set. Differential gene analysis, consensus cluster analysis, and WGCNA were used to identify preliminary signature genes; then, enrichment analysis was applied to these previously screened signature genes. Next, 4 machine learning algorithms (RF, SVM, GLM, and XGB) were used to detect potential biomarkers that are most closely associated with NOA. Finally, a diagnostic model was constructed from these potential biomarkers and visualised as a nomogram. The differential expression and predictive reliability of the biomarkers were confirmed using the validation set. Furthermore, the competing endogenous RNA network was constructed to identify the regulatory mechanisms of potential biomarkers; further, the CIBERSORT algorithm was used to calculate immune infiltration status among the samples.ResultsA total of 215 differentially expressed genes (DEGs) were identified between NOA and control groups (27 upregulated and 188 downregulated genes). The WGCNA results identified 1123 genes in the MEblue module as target genes that are highly correlated with NOA positivity. The NOA samples were divided into 2 clusters using consensus clustering; further, 1027 genes in the MEblue module, which were screened by WGCNA, were considered to be target genes that are highly correlated with NOA classification. The 129 overlapping genes were then established as signature genes. The XGB algorithm that had the maximum AUC value (AUC=0.946) and the minimum residual value was used to further screen the signature genes. IL20RB, C9orf117, HILS1, PAOX, and DZIP1 were identified as potential NOA biomarkers. This 5 biomarker model had the highest AUC value, of up to 0.982, compared to other single biomarker models; additionally, the results of this biomarker model were verified in the validation set.ConclusionsAs IL20RB, C9orf117, HILS1, PAOX, and DZIP1 have been determined to possess the strongest association with NOA, these five genes could be used as potential therapeutic targets for NOA patients. Furthermore, the model constructed using these five genes, which possessed the highest diagnostic accuracy, may be an effective biomarker model that warrants further experimental validation.
Project description:BackgroundBreast cancer (BC) ranks first in incidence among women, with approximately 2 million new cases per year. Therefore, it is essential to investigate emerging targets for BC patients' diagnosis and prognosis.MethodsWe analyzed gene expression data from 99 normal and 1,081 BC tissues in The Cancer Genome Atlas (TCGA) database. Differentially expressed genes (DEGs) were identified using "limma" R package, and relevant modules were chosen through Weighted Gene Coexpression Network Analysis (WGCNA). Intersection genes were obtained by matching DEGs to WGCNA module genes. Functional enrichment studies were performed on these genes using Gene Ontology (GO), Disease Ontology (DO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Biomarkers were screened via Protein-Protein Interaction (PPI) networks and multiple machine-learning algorithms. The Gene Expression Profiling Interactive Analysis (GEPIA), The University of ALabama at Birmingham CANcer (UALCAN), and Human Protein Atlas (HPA) databases were employed to examine mRNA and protein expression of eight biomarkers. Kaplan-Meier mapper tool assessed their prognostic capabilities. Key biomarkers were analyzed via single-cell sequencing, and their relationship with immune infiltration was examined using Tumor Immune Estimation Resource (TIMER) database and "xCell" R package. Lastly, drug prediction was conducted based on the identified biomarkers.ResultsWe identified 1,673 DEGs and 542 important genes through differential analysis and WGCNA, respectively. Intersection analysis revealed 76 genes, which play significant roles in immune-related viral infection and IL-17 signaling pathways. DIX domain containing 1 (DIXDC1), Dual specificity phosphatase 6 (DUSP6), Pyruvate dehydrogenase kinase 4 (PDK4), C-X-C motif chemokine ligand 12 (CXCL12), Interferon regulatory factor 7 (IRF7), Integrin subunit alpha 7 (ITGA7), NIMA related kinase 2 (NEK2), and Nuclear receptor subfamily 3 group C member 1 (NR3C1) were selected as BC biomarkers using machine-learning algorithms. NEK2 was the most critical gene for diagnosis. Prospective drugs targeting NEK2 include etoposide and lukasunone.ConclusionsOur study identified DIXDC1, DUSP6, PDK4, CXCL12, IRF7, ITGA7, NEK2, and NR3C1 as potential diagnostic biomarkers for BC, with NEK2 having the highest potential to aid in diagnosis and prognosis in clinical settings.
Project description:Bronchopulmonary dysplasia (BPD) is often seen as a pulmonary complication of extreme preterm birth, resulting in persistent respiratory symptoms and diminished lung function. Unfortunately, current diagnostic and treatment options for this condition are insufficient. Hence, this study aimed to identify potential biomarkers in the peripheral blood of neonates affected by BPD. The Gene Expression Omnibus provided the expression dataset GSE32472 for BPD. Initially, using this database, we identified differentially expressed genes (DEGs) in GSE32472. Subsequently, we conducted gene set enrichment analysis on the DEGs and employed weighted gene co-expression network analysis (WGCNA) to screen the most relevant modules for BPD. We then mapped the DEGs to the WGCNA module genes, resulting in a gene intersection. We conducted detailed functional enrichment analyses on these overlapping genes. To identify hub genes, we used 3 machine learning algorithms, including SVM-RFE, LASSO, and Random Forest. We constructed a diagnostic nomogram model for predicting BPD based on the hub genes. Additionally, we carried out transcription factor analysis to predict the regulatory mechanisms and identify drugs associated with these biomarkers. We used differential analysis to obtain 470 DEGs and conducted WGCNA analysis to identify 1351 significant genes. The intersection of these 2 approaches yielded 273 common genes. Using machine learning algorithms, we identified CYYR1, GALNT14, and OLAH as potential biomarkers for BPD. Moreover, we predicted flunisolide, budesonide, and beclomethasone as potential anti-BPD drugs. The genes CYYR1, GALNT14, and OLAH have the potential to serve as diagnostic biomarkers for BPD. This may prove beneficial in clinical diagnosis and prevention of BPD.
Project description:BackgroundIn recent years, research on the pathogenesis of systemic lupus erythematosus (SLE) has made great progress. However, the prognosis of the disease remains poor, and high sensitivity and accurate biomarkers are particularly important for the early diagnosis of SLE.MethodsSLE patient information was acquired from three Gene Expression Omnibus (GEO) databases and used for differential gene expression analysis, such as weighted gene coexpression network (WGCNA) and functional enrichment analysis. Subsequently, three algorithms, random forest (RF), support vector machine-recursive feature elimination (SVM-REF) and least absolute shrinkage and selection operation (LASSO), were used to analyze the above key genes. Furthermore, the expression levels of the final core genes in peripheral blood from SLE patients were confirmed by real-time quantitative polymerase chain reaction (RT-qPCR) assay.ResultsFive key genes (ABCB1, CD247, DSC1, KIR2DL3 and MX2) were found in this study. Moreover, these key genes had good reliability and validity, which were further confirmed by clinical samples from SLE patients. The receiver operating characteristic curves (ROC) of the five genes also revealed that they had critical roles in the pathogenesis of SLE.ConclusionIn summary, five key genes were obtained and validated through machine-learning analysis, offering a new perspective for the molecular mechanism and potential therapeutic targets for SLE.
Project description:ObjectiveLimited evidences are available on biomarkers to recognize Systemic Lupus erythematosus (SLE) patients at risk to develop erosive arthritis. Anti-citrullinated peptide antibodies (ACPA) have been widely investigated and identified in up to 50% of X-ray detected erosive arthritis; conversely, few studies evaluated anti-carbamylated proteins antibodies (anti-CarP). Here, we considered the application of machine learning models to identify relevant factors in the development of ultrasonography (US)-detected erosive damage in a large cohort of SLE patients with joint involvement.MethodsWe enrolled consecutive SLE patients with arthritis/arthralgia. All patients underwent joint (DAS28, STR) and laboratory assessment (detection of ACPA, anti-CarP, Rheumatoid Factor, SLE-related antibodies). The bone surfaces of metacarpophalangeal and proximal interphalangeal joints were assessed by US: the presence of erosions was registered with a dichotomous value (0/1), obtaining a total score (0-20). Concerning machine learning techniques, we applied and compared Logistic Regression and Decision Trees in conjunction with the feature selection Forward Wrapper method.ResultsWe enrolled 120 SLE patients [M/F 8/112, median age 47.0 years (IQR 15.0); median disease duration 120.0 months (IQR 156.0)], 73.3% of them referring at least one episode of arthritis. Erosive damage was identified in 25.8% of patients (mean±SD 0.7±1.6), all of them with clinically evident arthritis. We applied Logistic Regression in conjunction with the Forward Wrapper method, obtaining an AUC value of 0.806±0.02. As a result of the learning procedure, we evaluated the relevance of the different factors: this value was higher than 35% for ACPA and anti-CarP.ConclusionThe application of Machine Learning Models allowed to identify factors associated with US-detected erosive bone damage in a large SLE cohort and their relevance in determining this phenotype. Although the scope of this study is limited by the small sample size and its cross-sectional nature, the results suggest the relevance of ACPA and anti-CarP antibodies in the development of erosive damage as also pointed out in other studies.
Project description:Gastric cancer is the high mortality rate cancers globally, and the current survival rate is 30% even with the use of combination therapies. Recently, mounting evidence indicates the potential role of miRNAs in the diagnosis and assessing the prognosis of cancers. In the state-of-art research in cancer, machine-learning (ML) has gained increasing attention to find clinically useful biomarkers. The present study aimed to identify potential diagnostic and prognostic miRNAs in GC with the application of ML. Using the TCGA database and ML algorithms such as Support Vector Machine (SVM), Random Forest, k-NN, etc., a panel of 29 was obtained. Among the ML algorithms, SVM was chosen (AUC:88.5%, Accuracy:93% in GC). To find common molecular mechanisms of the miRNAs, their common gene targets were predicted using online databases such as miRWalk, miRDB, and Targetscan. Functional and enrichment analyzes were performed using Gene Ontology (GO) and Kyoto Database of Genes and Genomes (KEGG), as well as identification of protein-protein interactions (PPI) using the STRING database. Pathway analysis of the target genes revealed the involvement of several cancer-related pathways including miRNA mediated inhibition of translation, regulation of gene expression by genetic imprinting, and the Wnt signaling pathway. Survival and ROC curve analysis showed that the expression levels of hsa-miR-21, hsa-miR-133a, hsa-miR-146b, and hsa-miR-29c were associated with higher mortality and potentially earlier detection of GC patients. A panel of dysregulated miRNAs that may serve as reliable biomarkers for gastric cancer were identified using machine learning, which represents a powerful tool in biomarker identification.
Project description:PurposeThis study aims to identify potential myopia biomarkers using machine learning algorithms, enhancing myopia diagnosis and prognosis prediction.MethodsGSE112155 and GSE15163 datasets from the GEO database were analyzed. We used "limma" for differential expression analysis and "GO plot" and "clusterProfiler" for functional and pathway enrichment analyses. The LASSO and SVM-RFE algorithms were employed to screen myopia-related biomarkers, followed by ROC curve analysis for diagnostic performance evaluation. Single-gene GSEA enrichment analysis was executed using GSEA 4.1.0.ResultsThe functional analysis of differentially expressed genes indicated their role in carbohydrate generation and polysaccharide synthesis. We identified 23 differentially expressed genes associated with myopia, four of which were highly effective diagnostic biomarkers. Single gene GSEA results showed these genes control the ubiquitin-mediated protein hydrolysis pathway.ConclusionOur study identifies four key myopia biomarkers, providing a foundation for future clinical and experimental validation studies.
Project description:BackgroundThe genetic factors and pathogenesis of idiopathic dilated cardiomyopathy-induced heart failure (IDCM-HF) have not been understood thoroughly; there is a lack of specific diagnostic markers and treatment methods for the disease. Hence, we aimed to identify the mechanisms of action at the molecular level and potential molecular markers for this disease.MethodsGene expression profiles of IDCM-HF and non-heart failure (NF) specimens were acquired from the database of Gene Expression Omnibus (GEO). We then identified the differentially expressed genes (DEGs) and analyzed their functions and related pathways by using "Metascape". Weighted gene co-expression network analysis (WGCNA) was utilized to search for key module genes. Candidate genes were identified by intersecting the key module genes identified via WGCNA with DEGs and further screened via the support vector machine-recursive feature elimination (SVM-RFE) method and the least absolute shrinkage and selection operator (LASSO) algorithm. At last, the biomarkers were validated and evaluated the diagnostic efficacy by the area under curve (AUC) value and further confirmed the differential expression in the IDCM-HF and NF groups using an external database.ResultsWe detected 490 genes exhibiting differential expression between IDCM-HF and NF specimens from the GSE57338 dataset, with most of them being concentrated in the extracellular matrix (ECM) of cells related to biological processes and pathways. After screening, 13 candidate genes were identified. Aquaporin 3 (AQP3) and cytochrome P450 2J2 (CYP2J2) showed high diagnostic efficacy in the GSE57338 and GSE6406 datasets, respectively. In comparison to the NF group, AQP3 was significantly down-regulated in the IDCM-HF group, while CYP2J2 was significantly up-regulated.ConclusionAs far as we know, this is the first study that combines WGCNA and machine learning algorithms to screen for potential biomarkers of IDCM-HF. Our findings suggest that AQP3 and CYP2J2 could be used as novel diagnostic markers and treatment targets of IDCM-HF.
Project description:Patients with atrial fibrillation (AF) often have coronary artery disease (CAD), but the biological link between them remains unclear. This study aims to explore the common pathogenesis of AF and CAD and identify common biomarkers. Gene expression profiles for AF and stable CAD were downloaded from the Gene Expression Omnibus database. Overlapping genes related to both diseases were identified using weighted gene co-expression network analysis (WGCNA), followed by functional enrichment analysis. Hub genes were then identified using the machine learning algorithm. Immune cell infiltration and correlations with hub genes were explored, followed by drug predictions. Hub gene expression in AF and CAD patients was validated by real-time qPCR. We obtained 28 common overlapping genes in AF and stable CAD, mainly enriched in the PI3K-Akt, ECM-receptor interaction, and relaxin signaling pathway. Two hub genes, COL6A3 and FKBP10, were positively correlated with the abundance of MDSC, plasmacytoid dendritic cells, and regulatory T cells in AF and negatively correlated with the abundance of CD56dim natural killer cells in CAD. The AUCs of COL6A3 and FKBP10 were all above or close to 0.7. Drug prediction suggested that collagenase clostridium histolyticum and ocriplasmin, which target COL6A3, may be potential drugs for AF and stable CAD. Additionally, COL6A3 and FKBP10 were upregulated in patients with AF and CAD. COL6A3 and FKBP10 may be key biomarkers for AF and CAD, providing new insights into the diagnosis and treatment of this disease.