Project description:Background: Modern neuropathology is challenged by an increasing number of clinically-relevant CNS tumor subgroups that require assessment of a multitude of molecular markers for classification, as well a highly trained medical staff. Failure to meet this challenge leads to tumor misclassification, which can have severe consequences for affected patients. Methods: We compiled a cohort of genome-wide DNA methylation profiles of 2,682 tumors from 82 histologically and/or molecularly distinct CNS tumor classes across all ages and histologies that served as reference for a Random Forest-based diagnostic classifier. This classifier was used to prospectively investigate a further 1,104 CNS tumor samples in order to determine its clinical utility. Results: The classifier was able to reliably assign tumor samples to a given diagnostic category with a misclassification rate of less than 2%. The system functioned robustly across laboratories and using different DNA methylation profiling techniques. Prospective application to clinical samples resulted in a reclassification of 12% of tumors compared with standard practice alone. A further 12% could not be classified by methylation profiling – this subset was highly enriched for unusual syndrome-associated tumors and likely novel entities. Conclusion: This study represents a proof-of-concept for the application of machine learning approaches in molecular diagnostics using a single, easy-to-use assay. The reference cohort and Random Forest-based classifier are available online as a valuable community tool for improving precision in brain tumor diagnostics. We expect that approaches similar to the one presented herein will rapidly restructure diagnostic practice in neurooncology and across tumor pathology.
Project description:Background We have recently constructed a DNA methylation classifier that can discriminate between pancreatic ductal adenocarcinoma (PAAD) liver metastasis and intrahepatic cholangiocarcinoma (iCCA) with high accuracy (PAAD-iCCA-Classifier). PAAD is one of the leading causes of cancer of unknown primary and diagnosis is based on exclusion of other malignancies. Therefore, our focus was to investigate whether the PAAD-iCCA-Classifier can be used to diagnose PAAD metastases from other sites. Methods For this scope, the anomaly detection filter of the initial classifier was expanded by 8 additional mimicker carcinomas, amounting to a total of 10 carcinomas in the negative class. We validated the updated version of the classifier on a validation set, which consisted of a biological cohort (n = 3579) and a technical one (n = 15). We then assessed the performance of the classifier on a test set, which included a positive control cohort of 16 PAAD metastases from various sites and a cohort of 124 negative control samples consisting of 96 breast cancer metastases from 18 anatomical sites and 28 carcinoma metastases to the brain. Results The updated PAAD-iCCA-Classifier achieved 98.21% accuracy on the biological validation samples, and on the technical validation ones it reached 100%. The classifier also correctly identified 15/16 (93.75%) metastases of the positive control as PAAD, and on the negative control, it correctly classified 122/124 samples (98.39%) for a 97.85% overall accuracy on the test set. We used this DNA methylation dataset to explore the organotropism of PAAD metastases and observed that PAAD liver metastases are distinct from PAAD peritoneal carcinomatosis and primary PAAD, and are characterized by specific copy number alterations and hypomethylation of enhancers involved in epithelial-mesenchymal-transition. Conclusions The updated PAAD-iCCA-Classifier (available at https://classifier.tgc-research.de/) can accurately classify PAAD samples from various metastatic sites and it can serve as a diagnostic aid.
Project description:The 2021 WHO Classification of Tumors of the Central Nervous System includes several tumor types and subtypes for which the diagnosis is at least partially reliant on utilization of whole genome methylation profiling. The current approach to array DNA methylation profiling utilizes a reference library of tumor DNA methylation data, and a machine learning-based tumor classifier. This approach was pioneered and popularized by the German Cancer Research Network (DKFZ) and University Hospital Heidelberg. This research group has kindly made their classifier for central nervous system tumors freely available as a research tool via a web-based portal. However, this classifier is not maintained in a clinical testing environment. Therefore, we validated our own DNA methylation-based classifier of central nervous system tumors. We validated our classifier using the same training and validation datasets as the DKFZ group. In addition, we performed a validation of samples tested in our own laboratory and compared the performance of both classifiers. Using the validation data set, our classifier’s performance showed high concordance (92%) and comparable accuracy (specificity 94.0% v. 84.9% for DKFZ, sensitivity 88.6% v. 94.7% for DKFZ). Receiver operator curve showed areas under the curve of 0.964 v. 0.966 for NM and DKFZ classifiers, respectively. Our classifier performed comparably well with samples tested in our own laboratory and is currently offered for clinical testing.
Project description:Environmental pollution is a worldwide problem, and metals are the largest group of contaminants in soil. Microarray toxicogenomic studies with ecologically relevant organisms such as springtails, supplement traditional ecotoxicological research, but are presently rather descriptive. Classifier analysis, a more analytical application of the microarray technique, is able to predict biological classes of unknown samples. We used the uncorrelated shrunken centroid (USC) method to classify gene expression profiles of the springtail Folsomia candida exposed to soil spiked with six different metals (barium, cadmium, cobalt, chromium, lead, and zinc). We identified a gene set (classifier) of 188 genes that can discriminate between six different metals present in soil, which allowed us to predict the correct classes for samples of an independent test set with an accuracy of 83% (error rate = 0.17). This study shows further that in order to apply classifier analysis to actual contaminated field soil samples, more insight and information is needed on the transcriptional responses of soil organisms to different soil types (properties) and mixtures of contaminants. Gene expression was measured in springtails after exposure of 2 days to soil containing either EC10 or EC50 of 6 different metals. The exposure experiment was performed in two separate series (1 and 2), both containing a separate non-spiked (LUFA 2.2) soil control. Also, two field soil samples were tested. The samples were divided into a separate training set and a validation set for USC classifier analysis.
Project description:Environmental pollution is a worldwide problem, and metals are the largest group of contaminants in soil. Microarray toxicogenomic studies with ecologically relevant organisms such as springtails, supplement traditional ecotoxicological research, but are presently rather descriptive. Classifier analysis, a more analytical application of the microarray technique, is able to predict biological classes of unknown samples. We used the uncorrelated shrunken centroid (USC) method to classify gene expression profiles of the springtail Folsomia candida exposed to soil spiked with six different metals (barium, cadmium, cobalt, chromium, lead, and zinc). We identified a gene set (classifier) of 188 genes that can discriminate between six different metals present in soil, which allowed us to predict the correct classes for samples of an independent test set with an accuracy of 83% (error rate = 0.17). This study shows further that in order to apply classifier analysis to actual contaminated field soil samples, more insight and information is needed on the transcriptional responses of soil organisms to different soil types (properties) and mixtures of contaminants.
Project description:Background: Modern neuropathology is challenged by an increasing number of clinically-relevant CNS tumor subgroups that require assessment of a multitude of molecular markers for classification, as well a highly trained medical staff. Failure to meet this challenge leads to tumor misclassification, which can have severe consequences for affected patients. Methods: We compiled a cohort of genome-wide DNA methylation profiles of 2,682 tumors from 82 histologically and/or molecularly distinct CNS tumor classes across all ages and histologies that served as reference for a Random Forest-based diagnostic classifier. This classifier was used to prospectively investigate a further 1,104 CNS tumor samples in order to determine its clinical utility. Results: The classifier was able to reliably assign tumor samples to a given diagnostic category with a misclassification rate of less than 2%. The system functioned robustly across laboratories and using different DNA methylation profiling techniques. Prospective application to clinical samples resulted in a reclassification of 12% of tumors compared with standard practice alone. A further 12% could not be classified by methylation profiling – this subset was highly enriched for unusual syndrome-associated tumors and likely novel entities. Conclusion: This study represents a proof-of-concept for the application of machine learning approaches in molecular diagnostics using a single, easy-to-use assay. The reference cohort and Random Forest-based classifier are available online as a valuable community tool for improving precision in brain tumor diagnostics. We expect that approaches similar to the one presented herein will rapidly restructure diagnostic practice in neurooncology and across tumor pathology.