Project description:Renal cell carcinoma (RCC) is diagnosed through expensive cross-sectional imaging, frequently followed by renal mass biopsy, which is not only invasive but also prone to sampling errors. Hence, there is a critical need for a noninvasive diagnostic assay. RCC exhibits altered cellular metabolism combined with the close proximity of the tumor(s) to the urine in the kidney, suggesting that urine metabolomic profiling is an excellent choice for assay development. Here, we acquired liquid chromatography-mass spectrometry (LC-MS) and nuclear magnetic resonance (NMR) data followed by the use of machine learning (ML) to discover candidate metabolomic panels for RCC. The study cohort consisted of 105 RCC patients and 179 controls separated into two subcohorts: the model cohort and the test cohort. Univariate, wrapper, and embedded methods were used to select discriminatory features using the model cohort. Three ML techniques, each with different induction biases, were used for training and hyperparameter tuning. Assessment of RCC status prediction was evaluated using the test cohort with the selected biomarkers and the optimally tuned ML algorithms. A seven-metabolite panel predicted RCC in the test cohort with 88% accuracy, 94% sensitivity, 85% specificity, and 0.98 AUC. Metabolomics Workbench Study IDs are ST001705 and ST001706.
Project description:Urine metabolomics profiling has potential for non-invasive RCC staging, in addition to providing metabolic insights into disease progression. In this study, we utilized liquid chromatography-mass spectrometry (LC-MS), nuclear magnetic resonance (NMR), and machine learning (ML) for the discovery of urine metabolites associated with RCC progression. Two machine learning questions were posed in the study: Binary classification into early RCC (stage I and II) and advanced RCC stages (stage III and IV), and RCC tumor size estimation through regression analysis. A total of 82 RCC patients with known tumor size and metabolomic measurements were used for the regression task, and 70 RCC patients with complete tumor-nodes-metastasis (TNM) staging information were used for the classification tasks under ten-fold cross-validation conditions. A voting ensemble regression model consisting of elastic net, ridge, and support vector regressor predicted RCC tumor size with a R2 value of 0.58. A voting classifier model consisting of random forest, support vector machines, logistic regression, and adaptive boosting yielded an AUC of 0.96 and an accuracy of 87%. Some identified metabolites associated with renal cell carcinoma progression included 4-guanidinobutanoic acid, 7-aminomethyl-7-carbaguanine, 3-hydroxyanthranilic acid, lysyl-glycine, glycine, citrate, and pyruvate. Overall, we identified a urine metabolic phenotype associated with renal cell carcinoma stage, exploring the promise of a urine-based metabolomic assay for staging this disease.
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:ObjectiveTo investigate the ability of ultrasomics to noninvasively predict epidermal growth factor receptor (EGFR) expression status in patients with hepatocellular carcinoma (HCC).Methods198 HCC patients were comprised in the study (n = 138 in the training dataset and n = 60 in the test dataset). EGFR expression was detected by immunohistochemistry. Ultrasomics features from gray-scale ultrasound images were extracted. Intra-class correlation coefficient (ICC) screening, variance filtering, mutual information method, and extreme gradient boosting (XGboost) embedding method were applied for selecting the best features. Random forest (RF), XGBoost, support vector machine (SVM), decision tree (DT), and logistic regression (LR) 5 machine learning algorithms were used to construct clinical models, ultrasomics models, and clinical-ultrasomics combined models, respectively. Area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, decision curve analysis (DCA), and calibration curve were used to assess the predictive performance of the model.ResultsIn 198 patients, high EGFR expression was observed in 100 patients and low EGFR expression was observed in 98 patients. The RF machine learning ultrasomics model was found to perform well, with the AUC of the training and test dataset being 0.929 (95%CI, 0.874-0.966) and 0.807 (95%CI, 0.684-0.897) respectively, the sensitivity being 0.843 and 0.767 respectively, the specificity being 0.857 and 0.800 respectively, and the accuracy being 0.850 and 0.783, respectively. The predictive performance of the combined model established by integrating ultrasomics features and clinical baseline characteristics was improved, with the AUC, sensitivity, specificity, and accuracy of the RF machine learning combined model for the training and test dataset reaching 0.937 (95%CI, 0.884-0.971), 0.822 (95%CI, 0.702-0.909); 0.857, 0.833; 0.857, 0.800; 0.857, 0.817, respectively.ConclusionTo predict the status of EGFR expression in HCC patients, the ultrasomics model and combined model created by five machine learning algorithms can be utilized as efficient and noninvasive techniques, and the ultrasomics model and combined model established by RF classifier have the best predictive performance.
Project description:In this study we assessed the utility of a microarray to identify changes in gene expression predictive of health status by interrogating blood samples from California sea lions in rehabilitation. 73 California sea lion blood samples. 28 Females and 45 males. Animals were divided into 4 groups based on preliminary diagnosis at the rehabilitation center: domoic acid toxicosis (n=33, DAT), Leptospirosis infection (n=24, Lepto), control (n=4, Healthy) and other diseases (n=12, Outgroup).
Project description:We developed a novel prediction model for recurrence and survival in patients with localized renal cell carcinoma (RCC) after surgery and a novel statistical method of machine learning (ML) to improve accuracy in predicting outcomes using a large Asian nationwide dataset, updated KOrean Renal Cell Carcinoma (KORCC) database that covered data for a total of 10,068 patients who had received surgery for RCC. After data pre-processing, feature selection was performed with an elastic net. Nine variables for recurrence and 13 variables for survival were extracted from 206 variables. Synthetic minority oversampling technique (SMOTE) was used for the training data set to solve the imbalance problem. We applied the most of existing ML algorithms introduced so far to evaluate the performance. We also performed subgroup analysis according to the histologic type. Diagnostic performances of all prediction models achieved high accuracy (range, 0.77-0.94) and F1-score (range, 0.77-0.97) in all tested metrics. In an external validation set, high accuracy and F1-score were well maintained in both recurrence and survival. In subgroup analysis of both clear and non-clear cell type RCC group, we also found a good prediction performance.
Project description:Renal cell carcinoma (RCC) is the sixth most common cancer in men and is often asymptomatic, leading to incidental detection in advanced disease stages that are associated with aggressive histology and poorer outcomes. Various cancer biomarkers are found in urine samples from patients with RCC. In this study, we propose to investigate the use of Attenuated Total Reflection-Fourier Transform Infrared Spectroscopy (ATR-FTIR) on dried urine samples for distinguishing RCC. We analyzed dried urine samples from 49 patients with RCC, confirmed by histopathology, and 39 healthy donors using ATR-FTIR spectroscopy. The vibrational bands of the dried urine were identified by comparing them with spectra from dried artificial urine, individual urine components, and dried artificial urine spiked with urine components. Urea dominated all spectra, but smaller intensity peaks, corresponding to creatinine, phosphate, and uric acid, were also identified. Statistically significant differences between the FTIR spectra of the two groups were obtained only for creatinine, with lower intensities for RCC cases. The discrimination of RCC was performed through Principal Component Analysis combined with Linear Discriminant Analysis (PCA-LDA) and Support Vector Machine (SVM). Using PCA-LDA, we achieved a higher discrimination accuracy (82%) (using only six Principal Components to avoid overfitting), as compared to SVM (76%). Our results demonstrate the potential of urine ATR-FTIR combined with machine learning techniques for RCC discrimination. However, further studies, especially of other urological diseases, must validate this approach.