Project description:This paper identifies prognosis factors for survival in patients with acute myeloid leukemia (AML) using machine learning techniques. We have integrated machine learning with feature selection methods and have compared their performances to identify the most suitable factors in assessing the survival of AML patients. Here, six data mining algorithms including Decision Tree, Random Forrest, Logistic Regression, Naive Bayes, W-Bayes Net, and Gradient Boosted Tree (GBT) are employed for the detection model and implemented using the common data mining tool RapidMiner and open-source R package. To improve the predictive ability of our model, a set of features were selected by employing multiple feature selection methods. The accuracy of classification was obtained using 10-fold cross-validation for the various combinations of the feature selection methods and machine learning algorithms. The performance of the models was assessed by various measurement indexes including accuracy, kappa, sensitivity, specificity, positive predictive value, negative predictive value, and area under the ROC curve (AUC). Our results showed that GBT with an accuracy of 85.17%, AUC of 0.930, and the feature selection via the Relief algorithm has the best performance in predicting the survival rate of AML patients.
Project description:BackgroundAcute myeloid leukemia (AML) is a heterogeneous clonal disease that prevents normal myeloid differentiation with its common features. Its incidence increases with age and has a poor prognosis. Studies have shown that DNA methylation and abnormal gene expression are closely related to AML.MethodsThe methylation array data and mRNA array data are from the Gene Expression Omnibus (GEO) database. Through the GEO data, we identified differential genes from tumors and normal samples. Then we performed Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses on these differential genes. Protein-protein interaction (PPI) network construction and module analysis were performed to screen the highest-scoring modules. Next, we used SurvExpress software to analyze the genes in the highest-scoring module and selected potential prognostic genes by univariate and multivariate Cox analysis. Finally, the three genes screened by SurvExpress software were analyzed using the methylation analysis site MethSurv to explore AML associated methylation biomarkers.ResultsWe found three genes that can be used as independent prognostic factors for AML. These three genes are the low expression/methylation genes ATP11A and ITGAM, and the high expression/low methylation gene ZNRF2.ConclusionsIn this study, we performed a comprehensive analysis of DNA methylation and gene expression to identify key epigenetic genes in AML.
Project description:The role of Notch signaling in acute myeloid leukemia (AML) is still under investigation. We have previously shown that high levels of Notch receptors and ligands could interfere with drug response. In this study, the protein expression of 79 AML blast samples collected from newly diagnosed patients was examined through flow cytometry. Gamma-secretase inhibitors were used in AML mouse xenograft models to evaluate the contribution of Notch pharmacological inhibition to mouse survival. We used univariate analysis for testing the correlation and/or association between protein expression and well-known prognostics markers. All the four receptors (Notch1-4) and some ligands (Jagged2, DLL-3) were highly expressed in less mature subtypes (M0-M1). Notch3, Notch4, and Jagged2 were overexpressed in an adverse cytogenetic risk group compared to good cytogenetic risk patients. Chi-square analysis revealed a positive association between the complete remission rate after induction therapy and weak expression of Notch2 and Notch3. We also found an association between low levels of Notch4 and Jagged2 and three-year remission following allogeneic stem cell transplantation (HSCT). Accordingly, Kaplan-Meier analysis showed improved OS for patients lacking significant expression of Notch4, Jagged2, and DLL3. In vivo experiments in an AML mouse model highlighted both improved survival and a significant reduction of leukemia cell burden in the bone marrow of mice treated with the combination of Notch pan-inhibitors (GSIs) plus chemotherapy (Ara-C). Our results suggest that Notch can be useful as a prognostic marker and therapeutic target in AML.
Project description:BackgroundAlthough it is well-known that adult and pediatric acute myeloid leukemias (AMLs) are genetically distinct diseases, they still share certain gene expression profiles. The age-related genetic heterogeneities of AMLs have been well-studied, but the common prognostic signatures and molecular mechanisms of adult and pediatric AMLs are less investigated.AimTo identify genes and pathways that are associated with both pediatric and adult AMLs and discover a gene signature for overall survival (OS) prediction.MethodsThrough mining the transcriptome profiles of The Cancer Genome Atlas (TCGA) data sets of adult cancers and The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) data of pediatric cancers, we identified genes that are commonly dysregulated in both pediatric and adult AMLs, further discovered a common gene signature, and built two risk score models for TCGA and TARGET cohorts, respectively with L 0 regularized global AUC (area under the receiver operating characteristic curve) summary maximization.ResultsWe identified 57 genes that are differentially expressed and prognostically significant in both adult and childhood AMLs. The top 4 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched with those 57 genes include transcriptional misregulation, focal adhesion, PI3K-Akt signaling pathway, and signaling pathways regulating pluripotency of stem cells. We further identified a 6-gene signature including genes of ADAMTS3, DNMT3B, NYNRIN, SORT1, ZFHX3, and ZG16B for risk prediction. We constructed a risk score model with one dataset (either TCGA or TARGET) and evaluated its performance with the other. The test AUCs for the risk prediction of TCGA data with a 2-year and 5-year OS cutoffs are 0.762 (P = 2.33e-13, 95% CI: 0.69-0.83) and 0.759 (P = 7.26e-08, 95% CI: 0.66-0.85), respectively, while the test AUCs of TARGET data with the same cutoffs are 0.71 (P = 3.3e-07, 95% CI: 0.62-0.79) and 0.72 (P= 5.25e-09, 95% CI: 0.65-0.80), respectively. We further stratified patients into 3 equal sized prognostic subtypes with the 6-gene risk scores. The P-values of the tertile partitions are 1.74e-07 and 3.28e-08 for the TARGET and TCGA cohorts, respectively, which are significantly better than the standard cytogenetic risk stratification of both cohorts (TARGET: P = 1.64e-06; TCGA: P = 1.79e-05). When validated with two other independent cohorts, the 6-gene risk score models remain a significant predictor for OS. Investigating the common gene expression program is significant in that we may extrapolate the findings from adults to children and avoid unnecessary pediatric clinical trials.
Project description:Our objective is to develop a prognostic model focused on cuproptosis, aimed at predicting overall survival (OS) outcomes among Acute myeloid leukemia (AML) patients. The model utilized machine learning algorithms incorporating stacking. The GSE37642 dataset was used as the training data, and the GSE12417 and TCGA-LAML cohorts were used as the validation data. Stacking was used to merge the three prediction models, subsequently using a random survival forests algorithm to refit the final model using the stacking linear predictor and clinical factors. The prediction model, featuring stacking linear predictor and clinical factors, achieved AUC values of 0.840, 0.876 and 0.892 at 1, 2 and 3 years within the GSE37642 dataset. In external validation dataset, the corresponding AUCs were 0.741, 0.754 and 0.783. The predictive performance of the model in the external dataset surpasses that of the model simply incorporates all predictors. Additionally, the final model exhibited good calibration accuracy. In conclusion, our findings indicate that the novel prediction model refines the prognostic prediction for AML patients, while the stacking strategy displays potential for model integration.
Project description:BackgroundAcute myeloid leukemia (AML) is one of the most common hematologic malignancies with a poor prognosis and high recurrence rate. The discovery of new predictive models and therapeutic agents plays a crucial role.MethodsThe differentially expressed gene that was explicitly highly expressed in The Cancer Genome Atlas (TCGA) and GSE9476 transcriptome databases were screened and included in the least absolute shrinkage and selection operator (LASSO) regression model to derive risk coefficients and build a risk score model. Functional enrichment analysis was conducted on the screened hub genes to explore the potential mechanisms. Subsequently, critical genes were incorporated into a nomogram model based on risk scores to analyze prognostic value. Finally, this study combined network pharmacology to find potential natural compounds for hub genes and used molecular docking to verify the binding ability of molecular structures to natural compounds to explore drug development for possible efficacy in AML.ResultsA total of 33 highly expressed genes may be associated with poor prognosis of AML patients. After LASSO and multivariate Cox regression analysis of 33 critical genes, Rho-related BTB domain containing 2 (RHOBTB2), phospholipase A2 (PLA2G4A), interleukin-2 receptor-α (IL2RA), cysteine and glycine-rich protein 1 (CSRP1), and olfactomedin-like 2A (OLFML2A) were found to played a significant role in the prognosis of AML patients. CSRP1 and OLFML2A were independent prognostic factors of AML. The predictive power of these 5 hub genes in combination with clinical features was better than clinical data alone in predicting AML in the column line graphs and had better predictive value at 1, 3, and 5 years. Finally, through network pharmacology and molecular docking, this study found that diosgenin in Guadi docked well with PLA2G4A, beta-sitosterol in Fangji docked well with IL2RA, and OLFML2A docked well with 3,4-di-O-caffeoylquinic acid in Beiliujinu.ConclusionsThe predictive model of RHOBTB2, PLA2G4A, IL2RA, CSRP1, and OLFML2A combined with clinical features can better guide the prognosis of AML. In addition, the stable docking of PLA2G4A, IL2RA, and OLFML2A with natural compounds may provide new options for treating AML.
Project description:BackgroundIncreasing studies showed that miR-200 family (miR-200s) clusters are aberrantly expressed in multiple human cancers, and miR-200s clusters function as tumor suppressor genes by affecting cell proliferation, self-renewal, differentiation, division and apoptosis. Herein, we aimed to investigate the expression and clinical implication of miR-200s clusters in acute myeloid leukemia (AML).MethodsRT-qPCR was performed to detect expression of miR-200s clusters in 19 healthy donors, 98 newly diagnosed AML patients, and 35 AML patients achieved complete remission (CR).ResultsExpression of miR-200a/200b/429 cluster but not miR-200c/141 cluster was decreased in newly diagnosed AML patients as compared to healthy donors and AML patients achieved CR. Although no significant differences were observed between miR-200s clusters and most of the features, low expression of miR-200s clusters seems to be associated with higher white blood cells especially for miR-200a/200b. Of the five members of miR-200s clusters, low expression of miR-200b/429/200c was found to be associated with lower CR rate. Logistic regression analysis further revealed that low expression of miR-429 acted as an independent risk factor for CR in AML. Based on Kaplan-Meier analysis, low expression of miR-200b/429/200c was associated with shorter OS, whereas miR-200a/141 had a trend. Moreover, multivariate analysis of Cox regression models confirmed the independently prognostic value of miR-200b expression for OS in AML.ConclusionsExpression of miR-200a/200b/429 cluster was frequently down-regulated in AML, and low expression of miR-429 as an independent risk factor for CR, whereas low expression of miR-200b as an independent prognostic biomarker for OS.
Project description:BackgroundNeutrophil extracellular traps (NETs) play pivotal roles in various pathological processes. The formation of NETs is impaired in acute myeloid leukemia (AML), which can result in immunodeficiency and increased susceptibility to infection.MethodsThe gene set variation analysis (GSVA) algorithm was employed for the calculation of NET score, while the consensus clustering algorithm was utilized to identify molecular subtypes. Weighted gene coexpression network analysis (WGCNA) revealed potential genes and biological pathways associated with NETs, and a total of 10 machine learning algorithms were applied to construct the optimal prognostic model.ResultsThrough the analysis of multiomics data, we identified two molecular subtypes with high and low NET scores. The low-NET score subgroup exhibited increased infiltration of immune effector cells. Conversely, the high-NET score subtype presented an abundance of monocytes and M2 macrophages, accompanied by elevated expression levels of immune checkpoint genes. These findings suggest that a pronounced immunosuppressive effect is associated with a significantly worse prognosis for this subtype. The optimal risk score model was selected by employing the C-index as the criterion on the basis of training 10 machine learning algorithms on 9 multicenter AML cohorts. Survival analysis confirmed that patients with high-risk scores had considerably poorer prognoses than those with lower scores. Receiver operating characteristic (ROC) curve and Cox regression analyses further validated the strong independent prognostic value of the risk score model. The nomogram, which was constructed by integrating the risk score model and clinicopathological factors, demonstrated high accuracy in predicting the overall survival of AML patients. Moreover, patients with refractory or chemotherapy-unresponsive AML had significantly higher risk scores. By analyzing drug therapy data from in vitro AML cells, we identified a subset of drugs that demonstrated increased sensitivity in the high-risk score group. Additionally, patients with a high risk score were also predicted to exhibit a favorable response to anti-PD-1 therapy, suggesting that these individuals may derive greater benefits from immunotherapy.ConclusionThe NET-related signature, derived from a combination of diverse machine learning algorithms, has promising potential as a valuable tool for prognostic prediction, preventive measures, and personalized medicine in patients with AML.
Project description:Machine learning (ML) is a useful tool for advancing our understanding of the patterns and significance of biomedical data. Given the growing trend on the application of ML techniques in precision medicine, here we present an ML technique which predicts the likelihood of complete remission (CR) in patients diagnosed with acute myeloid leukemia (AML). In this study, we explored the question of whether ML algorithms designed to analyze gene-expression patterns obtained through RNA sequencing (RNA-seq) can be used to accurately predict the likelihood of CR in pediatric AML patients who have received induction therapy. We employed tests of statistical significance to determine which genes were differentially expressed in the samples derived from patients who achieved CR after 2 courses of treatment and the samples taken from patients who did not benefit. We tuned classifier hyperparameters to optimize performance and used multiple methods to guide our feature selection as well as our assessment of algorithm performance. To identify the model which performed best within the context of this study, we plotted receiver operating characteristic (ROC) curves. Using the top 75 genes from the k-nearest neighbors algorithm (K-NN) model (K = 27) yielded the best area-under-the-curve (AUC) score that we obtained: 0.84. When we finally tested the previously unseen test data set, the top 50 genes yielded the best AUC = 0.81. Pathway enrichment analysis for these 50 genes showed that the guanosine diphosphate fucose (GDP-fucose) biosynthesis pathway is the most significant with an adjusted P value = .0092, which may suggest the vital role of N-glycosylation in AML.
Project description:Despite the transformative impact of venetoclax-azacitidine in treating acute myeloid leukemia (AML), reliable markers for accurately predicting patient responses remain urgently needed. To address this challenge, we employed a multidisciplinary approach that combined transcriptomic profiling, ex vivo drug sensitivity testing, functional assays, and clinical data to identify robust predictors of venetoclax-azacitidine response. We pinpointed a set of core genes linked to both ex vivo and in vivo drug responsiveness, validated through CRISPR-Cas9 screens in the setting of both venetoclax and azacitidine therapies. In particular, silencing BCL2L1 and PINK1 preferentially enhanced response to the venetoclax-azacitidine treatment. Building on these insights, we further developed and validated an 8-gene random forest model (RF8) that demonstrated high specificity and sensitivity in four independent cohorts comprising 498 patients. This model was capable of distilling downstream effects of genetic alterations to assist in predicting treatment response and outperformed existing genetic mutation-based signatures. Furthermore, the RF8 score demonstrated a nearly monotonic relationship with venetoclax-azacitidine response probabilities and patient outcomes, enabling precise stratification of patients. These findings illustrated the feasibility of translating integrated transcriptomic and drug-response profiling data into more refined risk stratification approaches, offering a new avenue for optimizing clinical decision-making in AML.