Project description:Recent progress in unbiased metagenomic next-generation sequencing (mNGS) allows simultaneous examination of microbial and host genetic material in a single test. Leveraging affordable bronchoalveolar lavage fluid (BALF) mNGS data, we employed machine learning to create a diagnostic approach distinguishing lung cancer from pulmonary infections, conditions prone to misdiagnosis in clinical settings. This prospective study analyzed BALF-mNGS data from lung cancer and pulmonary infection patients, delineating differences in DNA/RNA microbial composition, bacteriophage abundances, and host responses, including gene expression, transposable element levels, immune cell composition, and tumor fraction derived from copy number variation (CNV). Integrating these metrics into a host/microbe metagenomics-driven machine learning model (Model VI) demonstrated robustness, achieving an AUC of 0.87 (95% CI = 0.857-0.883), sensitivity = 73.8%, and specificity = 84.5% in the training cohort, and an AUC of 0.831 (95% CI = 0.819-0.843), sensitivity = 67.1%, and specificity = 94.4% in the validation cohort for distinguishing lung cancer from pulmonary infections. The application of a rule-in and rule-out strategy-based composite predictive model significantly enhances accuracy (ACC) in distinguishing between lung cancer and tuberculosis (ACC=0.913), fungal infection (ACC=0.955), and bacterial infection (ACC=0.836). These findings highlight the potential of cost-effective mNGS-based analysis as a valuable tool for early differentiation between lung cancer and pulmonary infections, offering significant benefits through a single comprehensive testing.
2024-01-08 | GSE252118 | GEO
Project description:Comparative subtyping analysis between mNGS and mTGS
Project description:Whereas current tools for diagnosing cancer focus on identifying markers in limited types of the disease, such as the PAM50 kit known primarily for subtyping breast cancer, our research reveals that 27 different cancers have a unique glycosyltransferase (GT) fingerprint that can be detected using a single test, with 93.7% accuracy in external validation. Four models were built on the expression patterns of 71 GTs that are universally influential in inducing the formation of cancer cells and giving them the impetus to metastasize. The first differentiates cancer tissue from normal tissue, the second distinguishes cancer types, and the third and fourth models differentiate between subtypes of cancer, in particular breast cancer and glioma. Data from the Cancer Genome Atlas database were used for deriving the models, while independent publicly available databases and newly collected patient samples were used for validation. Running these models against external databases showed just how powerful they are, producing results that were highly comparable to those achieved in internal testing. Furthermore, the breast cancer classifier differentiated between breast cancer subtypes with an average of 81% accuracy in external testing, far surpassing the industry-standard PAM50’s accuracy ratio of 43%. Meanwhile, a separate prognostic model designed to predict the probability of survival among glioma patients achieved high accuracy by zooming in on four GT genes that are strongly related to prognosis. Taken together, the models demonstrate the revolutionary potential of focusing on GT genes to simplify cancer diagnostics.
Project description:Establishment and subsequent validation of a aCGH protocol for WGA (whole genome amplification) products originating from single cell or low amount of starting material (i.e. microdissected FFPE tissue samples). The establishment of the protocol involved testing of three DNA labeling protocols. Two labeling protocols were designed specifically for Ampli1(TM) WGA products. Additionally random primed isothermal (Klenow-based) labeling approach was tested (Möhlendick et al., PLoS One. 2013 Jun 25;8(6):e67031.). In addition two different types of reference samples were tested and reference based on single-cell WGA products was chosen as most suitable in the end. The validation of the protocol assessed the following aspects: (1) performance of the protocol on primary and reamplified WGA products, (2) accuracy of the protocol in term of sensitivity of the CNA detection, (3) accuracy in terms of recapitulation of complex patterns of CNAs, (4) accuracy in terms of quantitative assessment of the CNAs, (5) ability to detect genomic heterogeneity of single cells (obtained either from in vitro cultures or from clinical patient material), (6) ability to detect minimal regions of aberration within a panel of disseminated cancer cells and corresponding tumor tissues.
Project description:Alzheimer’s disease (AD) is the most common subtype of dementia, followed by Vascular Dementia (VaD), and Dementia with Lewy Bodies (DLB). Recently, microRNAs (miRNAs) have received a lot of attention as the novel biomarkers for dementia. Here, using serum miRNA expression of 1,601 Japanese individuals, we investigated potential miRNA bio- markers and constructed risk prediction models, based on a supervised principal component analysis (PCA) logistic regression method, according to the subtype of dementia. The final risk prediction model achieved a high accuracy of 0.873 on a validation cohort in AD, when using 78 miRNAs: Accuracy = 0.836 with 86 miRNAs in VaD; Accuracy = 0.825 with 110 miRNAs in DLB. To our knowledge, this is the first report applying miRNA-based risk pre- diction models to a dementia prospective cohort. Our study demonstrates our models to be effective in prospective disease risk prediction; and with further improvement may contribute to practical clinical use in dementia.
Project description:For a number of clinical and biological reasons, the accurate classification of non-small cell lung carcinoma (NSCLC) into adenocarcinoma (ADC) and squamous cell carcinoma (SCC) is essential. DNA-based tests, which are not currently used, are more robust when applied to formalin-fixed paraffin-embedded tissues. To develop a molecular-based classification of NSCLC based on genome wide copy number variations (CNVs), the corresponding TCGA, SPORE and CANARY patient datasets were used as training and independent validation data. The signature genes were selected by advanced supervised classification algorithms and restricted to known important oncogenes/tumor suppressors, resulting in a final 27-gene signature that was able to classify ADC from SCC with 0.85-0.87 accuracies of SPORE validation sets and 0.96-0.98 accuracy of CANARY validation sets. Even by using the top 7 genes in this signature, the accuracies of the validation sets were still as high as 0.80 and 0.97, respectively. These signature genes also classified adenocarcinoma and squamous cell carcinomas from the non-malignant lung samples with accuracies of 91-97%.
Project description:Small-cell lung cancer (SCLC) is an aggressive malignancy composed of distinct transcriptional subtypes, each with unique therapeutic vulnerabilities. Implementing subtyping in the clinic has remained challenging due to limited tissue availability, particularly for longitudinal monitoring. Given the known epigenetic regulation of critical SCLC transcriptional programs, we hypothesized that there would be subtype-specific patterns of DNA methylation that could be detected in tumor or blood from SCLC patients. Using genomic-wide reduced-representation bisulfite sequencing (RRBS) in two cohorts of totally 179 SCLC patients and machine learning approaches, we developed a highly accurate DNA methylation-based classifier (SCLC-DMC) that could distinguish SCLC subtypes using clinical tumor samples with 95.8% accuracy in the testing set compared to mRNA-based profiling. We further adjusted the classifier for circulating-free DNA (cfDNA) to subtype SCLC from plasma. Using the cfDNA classifier (cfDMC) we could demonstrate that SCLC phenotypes can evolve during disease progression, highlighting the need for longitudinal tracking of SCLC during clinical treatment. Furthermore, methylation-based subtyping predicted response to a wide variety of drugs in preclinical models and clinical outcomes were indistinguishable in cohorts of patients subtyped using mRNA or SCLC-DMC. These data establish that tumor and cfDNA methylation can be used to identify SCLC subtypes and guide precision SCLC therapy.
Project description:Accuracy of sepsis prediction was obtained using cross-validation of gene expression data from 12 human spleen samples and from 16 mouse spleen samples. For blood studies, classifiers were constructed using data from a training data set of 26 microarrays. The error rate of the classifiers was estimated on seven de-identified microarrays, and then on a subsequent cross-validation for all 33 blood microarrays. Estimates of classification accuracy of sepsis in human spleen were 67.1%; in mouse spleen, 96%; and in mouse blood, 94.4% (all estimates were based on nested cross-validation). Lists of genes with substantial changes in expression between study and control groups were used to identify nine mouse common inflammatory response genes, six of which were mapped into a single pathway using contemporary pathway analysis tools. Keywords: genomics, diagnosis, microarray, calprotectin
Project description:Establishment and subsequent validation of a aCGH protocol for WGA (whole genome amplification) products originating from single cell or low amount of starting material (i.e. microdissected FFPE tissue samples). The establishment of the protocol involved testing of three DNA labeling protocols. Two labeling protocols were designed specifically for Ampli1(TM) WGA products. Additionally random primed isothermal (Klenow-based) labeling approach was tested (MM-CM-6hlendick et al., PLoS One. 2013 Jun 25;8(6):e67031.). In addition two different types of reference samples were tested and reference based on single-cell WGA products was chosen as most suitable in the end. The validation of the protocol assessed the following aspects: (1) performance of the protocol on primary and reamplified WGA products, (2) accuracy of the protocol in term of sensitivity of the CNA detection, (3) accuracy in terms of recapitulation of complex patterns of CNAs, (4) accuracy in terms of quantitative assessment of the CNAs, (5) ability to detect genomic heterogeneity of single cells (obtained either from in vitro cultures or from clinical patient material), (6) ability to detect minimal regions of aberration within a panel of disseminated cancer cells and corresponding tumor tissues. Establishment and validation of the single-cell aCGH protocol: two condition experiment (i.e. PCR-based labeling technique 1 vs. PCR-based labeling technique 2; PCR-based labeling technique 2 vs. random-primed isothermal (Klenow) labeling approach; reference DNA from cell pool WGA product vs. reference DNA from single-cell WGA products). Validation of the protocol: comparison of the CNA profiles between single-cell WGA products and corresponding bulk DNA. Analysis of the DCC and corresponding FFPE tumor tissue samples: single-condition experiment performed on samples collected at different stages of the disease (DCCs from bone marrow) and/or from different sites (primary tumor-breast; metastasis-lymph nodes; DCCs-bone marrow). Microarray data is corresponding data depicted in the paper manuscript titled: Reliable single cell array CGH for clinical samples