Project description:Blood biomarkers for dementia have the potential to identify preclinical disease and improve participant selection for clinical trials. Machine learning is an efficient analytical strategy to simultaneously identify multiple candidate biomarkers for dementia. We aimed to identify important candidate blood biomarkers for dementia using three machine learning models. We included 1642 (mean 69 ± 6 yr, 53% women) dementia-free Framingham Offspring Cohort participants attending examination, 7 who had available blood biomarker data. We developed three machine learning models, support vector machine (SVM), eXtreme gradient boosting of decision trees (XGB), and artificial neural network (ANN), to identify candidate biomarkers for incident dementia. Over a mean 12 ± 5 yr follow-up, 243 (14.8%) participants developed dementia. In multivariable models including all 38 available biomarkers, the XGB model demonstrated the strongest predictive accuracy for incident dementia (AUC 0.74 ± 0.01), followed by ANN (AUC 0.72 ± 0.01), and SVM (AUC 0.69 ± 0.01). Stepwise feature elimination by random sampling identified a subset of the nine most highly informative biomarkers. Machine learning models confined to these nine biomarkers showed improved model predictive accuracy for dementia (XGB, AUC 0.76 ± 0.01; ANN, AUC 0.75 ± 0.004; SVM, AUC 0.73 ± 0.01). A parsimonious panel of nine candidate biomarkers were identified which showed moderately good predictive accuracy for incident dementia, although our results require external validation.
Project description:Colorectal cancer (CRC) is a leading cause of cancer deaths worldwide, and the identification of biomarkers can improve early detection and personalized treatment. In this study, RNA-seq data and gene chip data from TCGA and GEO were used to explore potential biomarkers for CRC. The SMOTE method was used to address class imbalance, and four feature selection algorithms (MCFS, Borota, mRMR, and LightGBM) were used to select genes from the gene expression matrix. Four machine learning algorithms (SVM, XGBoost, RF, and kNN) were then employed to obtain the optimal number of genes for model construction. Through interpretable machine learning (IML), co-predictive networks were generated to identify rules and uncover underlying relationships among the selected genes. Survival analysis revealed that INHBA, FNBP1, PDE9A, HIST1H2BG, and CADM3 were significantly correlated with prognosis in CRC patients. In addition, the CIBERSORT algorithm was used to investigate the proportion of immune cells in CRC tissues, and gene mutation rates for the five selected biomarkers were explored. The biomarkers identified in this study have significant implications for the development of personalized therapies and could ultimately lead to improved clinical outcomes for CRC patients.
Project description:Radiomics extracts and mines large number of medical imaging features quantifying tumor phenotypic characteristics. Highly accurate and reliable machine-learning approaches can drive the success of radiomic applications in clinical care. In this radiomic study, fourteen feature selection methods and twelve classification methods were examined in terms of their performance and stability for predicting overall survival. A total of 440 radiomic features were extracted from pre-treatment computed tomography (CT) images of 464 lung cancer patients. To ensure the unbiased evaluation of different machine-learning methods, publicly available implementations along with reported parameter configurations were used. Furthermore, we used two independent radiomic cohorts for training (n = 310 patients) and validation (n = 154 patients). We identified that Wilcoxon test based feature selection method WLCX (stability = 0.84 ± 0.05, AUC = 0.65 ± 0.02) and a classification method random forest RF (RSD = 3.52%, AUC = 0.66 ± 0.03) had highest prognostic performance with high stability against data perturbation. Our variability analysis indicated that the choice of classification method is the most dominant source of performance variation (34.21% of total variance). Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice.
Project description:Background: Microglia plays complex and crucial roles in multiple sclerosis (MS). This study aimed to explore the biological significance of microglia-associated genes in experimental autoimmune encephalomyelitis (EAE) . Methods: Differentially expressed genes (DEGs) were screened with six machine learning (ML) methods, which were also utilized to validate the microglia-associated DEGs in three public databases. ceRNA and Protein–protein interaction (PPI) network analyses were utilized to identify the interaction of the 6 novel biomarkers with other molecules. Then, CIBERSORT and single-sample gene set enrichment analysis (ssGSEA) were employed to quantify the relative abundance of each immune cell infiltration, respectively. qRT-PCR was performed to test the expression of key DEGs in murine models. Results: A total of 247 DEmRNA, 499 DElncRNAs and 269 DEcircRNAs were identified. With screening strategy of five ML algorithms, 6 DEmRNAs were obtained including NGP, HIST1H2BJ, PBLD1, MBLN3, CD180 and F10. Then the 6 DEmRNAs were used as a multigene signature to construct models to differentiate EAE from normal microglia, and AUC value for each model was greater than 0.8. The diagnostic value of these 6 DEmRNAs were identified and further verified by qRT-PCR. Then, differential expression for five out of these 6 DEmRNAs, namely NGP, HIST1H2BJ, PBLD1, MBLN3, and F10 were confirmed. Using PPI analysis, DEmRNAs frequently interacting with transcription factors (TFs), potential drugs and RBPs were identified. With immune cell infiltration analyses, we found EAE microglia presented high levels of immune infiltration, especially Nature Killer (NK) cells and CD8+ T cells. We also reported circRNA (circRNA_00638) was predicted to bind to 76 RBPs. Conclusions: We identified and validated 6 novel microglia related genes and developed a multigene signature with ML methods to confirm their ability to accurately diagnose and characterize biological alterations in EAE microglia. The six key DEmRNAs might also be latent targets for immunoregulatory therapy.
Project description:Abstract Background Early diagnosis of liver metastasis is of great importance for enhancing the survival of colorectal adenocarcinoma (CAD) patients, and the combined use of a single biomarker in a classier model has shown great improvement in predicting the metastasis of several types of cancers. However, it is little reported for CAD. This study therefore aimed to screen an optimal classier model of CAD with liver metastasis and explore the metastatic mechanisms of genes when applying this classier model. Methods The differentially expressed genes between primary CAD samples and CAD with metastasis samples were screened from the Moffitt Cancer Center (MCC) dataset GSE131418. The classification performances of six selected algorithms, namely, LR, RF, SVM, GBDT, NN, and CatBoost, for classification of CAD with liver metastasis samples were compared using the MCC dataset GSE131418 by detecting their classification test accuracy. In addition, the consortium datasets of GSE131418 and GSE81558 were used as internal and external validation sets to screen the optimal method. Subsequently, functional analyses and a drug‐targeted network construction of the feature genes when applying the optimal method were conducted. Results The optimal CatBoost model with the highest accuracy of 99%, and an area under the curve of 1, was screened, which consisted of 33 feature genes. A functional analysis showed that the feature genes were closely associated with a “steroid metabolic process” and “lipoprotein particle receptor binding” (eg APOB and APOC3). In addition, the feature genes were significantly enriched in the “complement and coagulation cascade” pathways (eg FGA, F2, and F9). In a drug‐target interaction network, F2 and F9 were predicted as targets of menadione. Conclusion The CatBoost model constructed using 33 feature genes showed the optimal classification performance for identifying CAD with liver metastasis. APOB, APOC3, FGA, F2, F9, and NKX2‐3 were potential biomarkers for classification of CAD with liver metastasis. Menadione might be a promising anti‐metastatic drug of CAD cells through functioning its role at sites of F2 and F9. CatBoost model constructed by 33 feature genes showed the optimal classification performance for identifying CAD liver metastasis.
Project description:Cystic echinococcosis (CE) is a chronic parasitic disease characterized by slow progression and non-specific clinical symptoms, often leading to delayed diagnosis and treatment. Early and precise diagnosis is crucial for effective treatment, particularly considering the five stages of CE as outlined by the World Health Organization (WHO). This study explores the development of an advanced system that leverages artificial intelligence (AI) and machine learning (ML) techniques to classify CE cysts into stages using various imaging modalities, including computed tomography (CT), ultrasound (US), and magnetic resonance imaging (MRI). A total of ten ML algorithms were evaluated across these datasets, using performance metrics such as accuracy, precision, recall (sensitivity), specificity, and F1 score. These metrics offer diverse criteria for assessing model performance. To address this, we propose a normalization and scoring technique that consolidates all metrics into a final score, allowing for the identification of the best model that meets the desired criteria for CE cyst classification. The experimental results demonstrate that hybrid models, such as CNN+ResNet and Inception+ResNet, consistently outperformed other models across all three datasets. Specifically, CNN+ResNet, selected as the best model, achieved 97.55% accuracy on CT images, 93.99% accuracy on US images, and 100% accuracy on MRI images. This research underscores the potential of hybrid and pre-trained models in advancing medical image classification, providing a promising approach to improving the differential diagnosis of CE disease.
Project description:Small RNA libraries made from the rdr2/mop1 maize mutant, using RNA extracted from young ears (3 to 5 cm in length). Four libraries were constructed representing biological replicates. The mop1 mutant is depleted for heterochromatic siRNAs and enriched for 22-nt siRNAs of unknown function (as shown by Nobuta et al., 2008). These libraries were made in ~2010 but used for this study to assess the properties of the 22-nt, mop1-independent siRNAs.
Project description:Abundant accumulation of digital histopathological images has led to the increased demand for their analysis, such as computer-aided diagnosis using machine learning techniques. However, digital pathological images and related tasks have some issues to be considered. In this mini-review, we introduce the application of digital pathological image analysis using machine learning algorithms, address some problems specific to such analysis, and propose possible solutions.
Project description:Knowing metastasis is the primary cause of cancer-related deaths, incentivized research directed towards unraveling the complex cellular processes that drive the metastasis. Advancement in technology and specifically the advent of high-throughput sequencing provides knowledge of such processes. This knowledge led to the development of therapeutic and clinical applications, and is now being used to predict the onset of metastasis to improve diagnostics and disease therapies. In this regard, predicting metastasis onset has also been explored using artificial intelligence approaches that are machine learning, and more recently, deep learning-based. This review summarizes the different machine learning and deep learning-based metastasis prediction methods developed to date. We also detail the different types of molecular data used to build the models and the critical signatures derived from the different methods. We further highlight the challenges associated with using machine learning and deep learning methods, and provide suggestions to improve the predictive performance of such methods.
Project description:COVID-19, a severe respiratory disease caused by a new type of coronavirus SARS-CoV-2, has been spreading all over the world. Patients infected with SARS-CoV-2 may have no pathogenic symptoms, i.e., presymptomatic patients and asymptomatic patients. Both patients could further spread the virus to other susceptible people, thereby making the control of COVID-19 difficult. The two major challenges for COVID-19 diagnosis at present are as follows: (1) patients could share similar symptoms with other respiratory infections, and (2) patients may not have any symptoms but could still spread the virus. Therefore, new biomarkers at different omics levels are required for the large-scale screening and diagnosis of COVID-19. Although some initial analyses could identify a group of candidate gene biomarkers for COVID-19, the previous work still could not identify biomarkers capable for clinical use in COVID-19, which requires disease-specific diagnosis compared with other multiple infectious diseases. As an extension of the previous study, optimized machine learning models were applied in the present study to identify some specific qualitative host biomarkers associated with COVID-19 infection on the basis of a publicly released transcriptomic dataset, which included healthy controls and patients with bacterial infection, influenza, COVID-19, and other kinds of coronavirus. This dataset was first analysed by Boruta, Max-Relevance and Min-Redundancy feature selection methods one by one, resulting in a feature list. This list was fed into the incremental feature selection method, incorporating one of the classification algorithms to extract essential biomarkers and build efficient classifiers and classification rules. The capacity of these findings to distinguish COVID-19 with other similar respiratory infectious diseases at the transcriptomic level was also validated, which may improve the efficacy and accuracy of COVID-19 diagnosis.