Project description:Cell-free DNA (cfDNA) sequencing has demonstrated great potential for early cancer detection. However, most large-scale studies have focused only on either targeted methylation sites or whole-genome sequencing, limiting comprehensive analysis that integrates both epigenetic and genetic signatures. In this study, we present a platform that enables simultaneous analysis of whole-genome methylation, copy number, and fragmentomic patterns of cfDNA in a single assay. Using a total of 950 plasma (361 healthy and 589 cancer) and 240 tissue samples, we demonstrate that a multifeature cancer signature ensemble (CSE) classifier integrating all features outperforms single-feature classifiers. At 95.2% specificity, the cancer detection sensitivity with methylation, copy number, and fragmentomic models was 77.2%, 61.4%, and 60.5%, respectively, but sensitivity was significantly increased to 88.9% with the CSE classifier (p value < 0.0001). For tissue of origin, the CSE classifier enhanced the accuracy beyond the methylation classifier, from 74.3% to 76.4%. Overall, this work proves the utility of a signature ensemble integrating epigenetic and genetic information for accurate cancer detection.
Project description:Currently, the non-invasive diagnostic methods for nasopharyngeal carcinoma (NPC) continue to grapple with the challenge of low sensitivity. The hypermethylation of tumor suppressor genes is an established early event in NPC pathogenesis. Consequently, we conducted whole-genome methylation sequencing on plasma cell-free DNA (cfDNA) from six NPC cases and four healthy controls, integrating Illumina Human Methylation 450 K microarray data from the GEO database comprising six NPC cases and six samples of non-cancerous nasopharyngeal tissue (NP). As result, we screened only one CpG island associated with cell type-specific regulation within the candidate tumor suppressor gene VILL (Vilin Like), which exhibits specific methylation patterns in NPC. We validated our findings using 25 pairs of NPC and NP samples from GEO, alongside 9,736 pan-cancer tissues from TCGA and 656 healthy human leukocyte samples sourced from GEO through methylation microarray analysis. Based on this, we designed a methylation-specific qPCR (qMSP) system for the VILL gene, and then tested it on 192 primary NPC and 154 NC plasma samples. The new qMSP system when compared with EBV DNA qPCR revealed a sensitivity for primary NPC of 80.2% vs.81.3% (78.8% vs.54.5% for early-stage NPC), and a specificity of 100% vs. 93.5%. Notably, employing a combined methodology further enhanced sensitivity to 94.8%, including a sensitivity rate of 90.9% for early-stage NPC diagnosis. Therefore, VILL methylation assessment combined with EBV DNA detection presents a promising avenue for non-invasive diagnosis of NPC, particularly beneficial for early detection.
Project description:MotivationMicrosatellite instability (MSI) is a promising biomarker for cancer prognosis and chemosensitivity. Techniques are rapidly evolving for the detection of MSI from tumor-normal paired or tumor-only sequencing data. However, tumor tissues are often insufficient, unavailable, or otherwise difficult to procure. Increasing clinical evidence indicates the enormous potential of plasma circulating cell-free DNA (cfNDA) technology as a noninvasive MSI detection approach.ResultsWe developed MSIsensor-ct, a bioinformatics tool based on a machine learning protocol, dedicated to detecting MSI status using cfDNA sequencing data with a potential stable MSIscore threshold of 20%. Evaluation of MSIsensor-ct on independent testing datasets with various levels of circulating tumor DNA (ctDNA) and sequencing depth showed 100% accuracy within the limit of detection (LOD) of 0.05% ctDNA content. MSIsensor-ct requires only BAM files as input, rendering it user-friendly and readily integrated into next generation sequencing (NGS) analysis pipelines.AvailabilityMSIsensor-ct is freely available at https://github.com/niu-lab/MSIsensor-ct.Supplementary informationSupplementary data are available at Briefings in Bioinformatics online.
Project description:This study aimed to evaluate the cost-effective and genome-wide cell-free reduced representation bisulfite sequencing (cfRRBS) method combined with computational deconvolution for effective disease monitoring in patients with esophageal adenocarcinoma (EAC). cfDNA methylation profiling with cfRRBS was performed on 162 blood plasma samples from 33 EAC cancer patients and 28 blood plasma samples from 20 healthy donors. In addition, for reproducibility testing purposes of the method, 9 plasma samples were re-prepped (library was re-made) and re-sequenced once (n=9) or twice (n=1). As a reference for the data deconvolution cfRRBS was performed on 7 EAC tumor tissue (FFPE) samples.
Project description:BackgroundUpper gastrointestinal cancer (UGC) is an important cause of cancer death in China, with low five-year survival rates due to the majority of UGC patients being diagnosed at an advanced stage. Therefore, there is an urgent need to develop cost-effective, reliable and non-invasive methods for the early detection of UGC.MethodsA novel plasma-based methylation panel combining simultaneous detection of three methylated biomarkers (ELMO1, ZNF582 and TFPI2) and an internal control gene were developed and used to examine plasma samples from 186 UGC patients and 190 control subjects.ResultsThe results indicated excellent PCR amplification efficiency and reproducibility of ELMO1, ZNF582 and TFPI2 in the range of 10-100,000 copies per PCR reaction of fully methylated genomic DNA. The methylation levels of ELMO1, ZNF582 and TFPI2 were significantly higher in UGC samples than those in control subjects. The sensitivities of ELMO1, ZNF582 and TFPI2 alone for UGC detection were 32.3%, 61.3% and 30.6%, respectively; when three markers were combined, the sensitivity was improved to 71.0%, with a specificity of 90.0%, and the area under the curve (AUC) was 0.870 (95% CI: 0.832-0.902).ConclusionMethylated ELMO1, ZNF582 and TFPI2 were specific for UGC and the three-methylated gene panel provided an alternative non-invasive choice for UGC early detection.
Project description:Epithelial ovarian cancer (EOC) is the deadliest women's cancer and has a poor prognosis. Early detection is the key for improving survival (a 5-year survival rate in stage I/II is over 70% compared to that of 25% in stage III/IV) and can be achieved through methylation markers from circulating cell-free DNA (cfDNA) using a liquid biopsy. In this study, we first identify top 500 EOC markers differentiating EOC from healthy female controls from 3.3 million methylome-wide CpG sites and validated them in 1,800 independent cfDNA samples. We then utilize a pretrained AI transformer system called MethylBERT to develop an EOC diagnostic model which achieves 80% sensitivity and 95% specificity in early-stage EOC diagnosis. We next develop a simple digital droplet PCR (ddPCR) assay which archives good performance, facilitating early EOC detection.
Project description:IntroductionThe exponential growth of genomic datasets necessitates advanced analytical tools to effectively identify genetic loci from large-scale high throughput sequencing data. This study presents Deep-Block, a multi-stage deep learning framework that incorporates biological knowledge into its AI architecture to identify genetic regions as significantly associated with Alzheimer's disease (AD). The framework employs a three-stage approach: (1) genome segmentation based on linkage disequilibrium (LD) patterns, (2) selection of relevant LD blocks using sparse attention mechanisms, and (3) application of TabNet and Random Forest algorithms to quantify single nucleotide polymorphism (SNP) feature importance, thereby identifying genetic factors contributing to AD risk.MethodsThe Deep-Block was applied to a large-scale whole genome sequencing (WGS) dataset from the Alzheimer's Disease Sequencing Project (ADSP), comprising 7416 non-Hispanic white (NHW) participants (3150 cognitively normal older adults (CN), 4266 AD).Results30,218 LD blocks were identified and then ranked based on their relevance with Alzheimer's disease. Subsequently, the Deep-Block identified novel SNPs within the top 1500 LD blocks and confirmed previously known variants, including APOE rs429358 and rs769449. Expression Quantitative Trait Loci (eQTL) analysis across 13 brain regions provided functional evidence for the identified variants. The results were cross-validated against established AD-associated loci from the European Alzheimer's and Dementia Biobank (EADB) and the GWAS catalog.DiscussionThe Deep-Block framework effectively processes large-scale high throughput sequencing data while preserving SNP interactions during dimensionality reduction, minimizing bias and information loss. The framework's findings are supported by tissue-specific eQTL evidence across brain regions, indicating the functional relevance of the identified variants. Additionally, the Deep-Block approach has identified both known and novel genetic variants, enhancing our understanding of the genetic architecture and demonstrating its potential for application in large-scale sequencing studies.HighlightsGrowing genomic datasets require advanced tools to identify genetic loci in sequencing.Deep-Block, a novel AI framework, was used to process large-scale ADSP WGS data.Deep-Block identified both known and novel AD-associated genetic loci.rs429358 (APOE) was key; rs11556505 (TOMM40), rs34342646 (NECTIN2) were significant.The AI framework uses biological knowledge to enhance detection of Alzheimer's loci.
Project description:Epigenetic alterations by promoter DNA hypermethylation and gene silencing in cancer have been reported over the past few decades. DNA hypermethylation has great potential to serve as a screening marker, a prognostic marker, and a therapeutic surveillance marker in cancer clinics. Some bodily fluids, such as stool or urine, were obtainable without any invasion to the body. Thus, such bodily fluids were suitable samples for high throughput cancer surveillance. Analyzing the methylation status of bodily fluids around the cancer tissue may, additionally, lead to the early detection of cancer, because several genes in cancer tissues are reported to be cancer-specifically hypermethylated. Recently, several studies that analyzed the methylation status of DNA in bodily fluids were conducted, and some of the results have potential for future development and further clinical use. In fact, a stool DNA test was approved by the U.S. Food and Drug Administration (FDA) for the screening of colorectal cancer. Another promising methylation marker has been identified in various bodily fluids for several cancers. We reviewed studies that analyzed DNA methylation in bodily fluids as a less-invasive cancer screening.
Project description:MotivationThe use of liquid biopsies for cancer patients enables the non-invasive tracking of treatment response and tumor dynamics through single or serial blood drawn tests. Next-generation sequencing assays allow for the simultaneous interrogation of extended sets of somatic single-nucleotide variants (SNVs) in circulating cell-free DNA (cfDNA), a mixture of DNA molecules originating both from normal and tumor tissue cells. However, low circulating tumor DNA (ctDNA) fractions together with sequencing background noise and potential tumor heterogeneity challenge the ability to confidently call SNVs.ResultsWe present a computational methodology, called Adaptive Base Error Model in Ultra-deep Sequencing data (ABEMUS), which combines platform-specific genetic knowledge and empirical signal to readily detect and quantify somatic SNVs in cfDNA. We tested the capability of our method to analyze data generated using different platforms with distinct sequencing error properties and we compared ABEMUS performances with other popular SNV callers on both synthetic and real cancer patients sequencing data. Results show that ABEMUS performs better in most of the tested conditions proving its reliability in calling low variant allele frequencies somatic SNVs in low ctDNA levels plasma samples.Availability and implementationABEMUS is cross-platform and can be installed as R package. The source code is maintained on Github at http://github.com/cibiobcg/abemus, and it is also available at CRAN official R repository.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:RNA N 6-methyladenosine (m6A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m6A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m6A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m6A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m6A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.