Project description:The analytes qualified as biomarkers are potent tools to diagnose various diseases, monitor therapy responses, and design therapeutic interventions. The early assessment of the diverseness of human disease is essential for the speedy and cost-efficient implementation of personalized medicine. We developed g3mclass, the Gaussian mixture modeling software for molecular assay data classification. This software automates the validated multiclass classifier applicable to single analyte tests and multiplexing assays. The g3mclass achieves automation using the original semi-constrained expectation-maximization (EM) algorithm that allows inference from the test, control, and query data that human experts cannot interpret. In this study, we used real-world clinical data and gene expression datasets (ERBB2, ESR1, PGR) to provide examples of how g3mclass may help overcome the problems of over-/underdiagnosis and equivocal results in diagnostic tests for breast cancer. We showed the g3mclass output’s accuracy, robustness, scalability, and interpretability. The user-friendly interface and free dissemination of this multi-platform software aim to ease its use by research laboratories, biomedical pharma, companion diagnostic developers, and healthcare regulators. Furthermore, the g3mclass automatic extracting information through probabilistic modeling is adaptable for blending with machine learning and artificial intelligence.
Project description:The analytes qualified as biomarkers are potent tools to diagnose various diseases, monitor therapy responses, and design therapeutic interventions. The early assessment of the diverseness of human disease is essential for the speedy and cost-efficient implementation of personalized medicine. We developed g3mclass, the Gaussian mixture modeling software for molecular assay data classification. This software automates the validated multiclass classifier applicable to single analyte tests and multiplexing assays. The g3mclass achieves automation using the original semi-constrained expectation-maximization (EM) algorithm that allows inference from the test, control, and query data that human experts cannot interpret. In this study, we used real-world clinical data and gene expression datasets (ERBB2, ESR1, PGR) to provide examples of how g3mclass may help overcome the problems of over-/underdiagnosis and equivocal results in diagnostic tests for breast cancer. We showed the g3mclass output's accuracy, robustness, scalability, and interpretability. The user-friendly interface and free dissemination of this multi-platform software aim to ease its use by research laboratories, biomedical pharma, companion diagnostic developers, and healthcare regulators. Furthermore, the g3mclass automatic extracting information through probabilistic modeling is adaptable for blending with machine learning and artificial intelligence.
Project description:Diffuse Large B Cell Lymphoma (DLBCL) is the most common lymphoid malignancy in adults. Despite being considered a single disease, DLBCL presents with variable backgrounds in terms of morphology, genetics, and biological behavior, which results in heterogeneous outcomes among patients. Although new tools have been developed for the classification and management of patients, 40% of them still have primary refractory disease or relapse. In addition, multiple factors regarding the pathogenesis of this disease remain unclear and identification of novel biomarkers is needed. In this context, recent investigations point to microRNAs as useful biomarkers in cancer as well as important players in the development of the disease. However, regarding DLBCL, up to date, there is inconsistency in the data reported. Therefore, in this work, the main goals were to determine a microRNA set with utility as biomarkers for DLBCL diagnosis, classification, prognosis and treatment response. To achieve these goals, we analyzed microRNA expression in a cohort of 78 DLBCL samples at diagnosis and 17 controls using small RNA sequencing. This way, we were able to define new microRNA expression signatures for diagnosis, classification, treatment response and prognosis. In summary, our study remarks that microRNAs could play an important role as biomarkers in diagnosis, classification, treatment response and prognosis in DLBCL.
Project description:Cerebrospinal fluid (CSF) liquid biopsies serve as a rich source of tumor-derived cell-free DNA (cfDNA) for evaluating patients with central nervous system (CNS) tumors. However, challenges stemming from trace cfDNA yields and low mutational burden have hindered sensitivity, whereas first-generation clinical assays have relied on genetic alterations as biomarkers. Leveraging the diagnostic utility of DNA methylation classification in CNS tumors, we developed M-PACT (Methylation-based Predictive Algorithm for CNS Tumors), a robust deep neural network that accurately classifies tumors from sub-nanogram input cfDNA methylomes acquired through enzymatic methylation sequencing. In addition to tumor classification, this workflow enables methylation-based cellular deconvolution and sensitive copy number variation (CNV) detection. We benchmark our methodology in pediatric CNS embryonal tumors and further demonstrate accurate classification of intra-operative CSF, balanced tumor genomes, and secondary malignancies. Altogether, we provide a blueprint for CNS tumor classification from low input cfDNA methylomes, motivating prospective validation for future clinical implementation.
Project description:The advent of large-scale single-cell chromatin accessibility profiling has accelerated our ability to map gene regulatory landscapes, but has outpaced the development of scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; www.ArchRProject.com) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses including doublet removal, single-cell clustering and cell type identification, unified peak set generation, cellular trajectory identification, DNA element to gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility, and multi-omic integration with scRNA-seq. Enabling the analysis of over 1.2 million single cells within 8 hours on a standard Unix laptop, ArchR is a comprehensive analytical suite for end-to-end analysis of single-cell chromatin accessibility data that will accelerate the understanding of gene regulation at the resolution of individual cells.
Project description:As CRISPR/Cas9 mediated screens with pooled sgRNA libraries in somatic cells become increasingly established, an unmet need for rapid and accurate companion informatics tools has emerged. We have developed a lightweight standalone software to easily manipulate raw large next generation sequencing (NGS) datasets derived from such screens into informative relational context with graphical support. We demonstrate the capabilities of the software to interrogate meaningful results from an in vitro viability screen using Tumor Necrosis Factor-alpha (TNFa). The results not only identified stereotypical players in extrinsic apoptotic signaling but two as yet uncharacterized members of the apoptotic cascade, Smg7 and Ces2a. We further characterized cell lines containing mutations in these genes against a panel of cell death stimuli. In summary, this software enables bench scientists without access to informatic cores to rapidly access and interpret results from large scale experiments resulting from CRISPR/Cas9 library screens.
Project description:There are many toxic chemicals to contaminate the world and cause harm to human and other organisms. How to quickly discriminate these compounds and characterize their potential molecular mechanism and toxicity is essential. High through put transcriptomics profiles such as microarray have been proven useful to identify biomarkers for different classification and toxicity prediction purposes. Here we aim to investigate how to use microarray to predict chemical contaminants and their possible mechanisms. In this study, we divided 105 compounds plus vehicle control into 14 compound classes. On the basis of gene expression profiles of in vitro primary cultured hepatocytes, we comprehensively compared various normalization, feature selection and classification algorithms for the classification of these 14 class compounds. We found that normalization had little effect on the averaged classification accuracy. Two support vector machine methods LibSVM and SMO had better classification performance. When feature sizes were smaller, LibSVM outperformed other classification methods. Simple logistic algorithm also performed well. At the training stage, usually the feature selection method SVM-RFE performed the best, and PCA was the poorest feature selection algorithm. But overall, SVM-RFE had the highest overfitting rate when an independent dataset used for a prediction in this case. Therefore, we developed a new feature selection algorithm called gradient method which had a pretty high training classification as well as prediction accuracy with the lowest over-fitting rate. Through the analysis of biomarkers that distinguished 14 class compounds, we found a goup of genes that mainly invovled in cell cylce were significanly downregulated by the metal and inflammatory compounds, but were induced by anti-microbial, cancer related drugs, pesticides, and PXR mediators. For in vitro experiment, primary cultured rat hepatocytes were treated one of 105 compounds with relative controls. At least three biological replicates were used for each unique condition. In total 531 arrays were used.