Construction of a Transferable NGS Algorithmic Model for Predicting EBV-Associated Nasopharyngeal Cancer and High-risk Mutation
Ontology highlight
ABSTRACT: Abstract Background: Epstein-Barr virus (EBV) infection is closely associated with the occurrence of nasopharyngeal carcinoma (NPC). The latent membrane protein 1 (LMP1) gene, known for its high heterogeneity, plays a crucial role in determining the oncogenic potential of NPC. This study aimed to develop a universal transferable algorithm for analyzing fragmented viral genome data using next generation sequencing (NGS), construct EBV associated NPC (EBVaNPC) prediction models, and investigate the functional significance of key mutation in LMP1. Method: EBV public whole genome sequencing data was collected and divided into a training and a test set in a 2:1 ratio. Using 26 clinical EBV-positive subjects, EBV LMP1 region (aa1-118) was sequenced with amplicon-based sequencing. An improved algorithm was developed to extract features and construct the EBVaNPC machine learning (ML) prediction modesl. The biological implications of predicted key mutation on tumor cell biological behaviors were investigated through qRT-PCR, EdU and transwell invasion assay, RNA-seq and gene ontology fingerprint (GOF) anslysis. Result: For randomly disrupted NGS data, different read length had minimal impact on the performance of models, with all six ML models achieving F1 scores above 0.8 on training dataset. On the test dataset, random forest (RF) and naive Bayes demonstrated superior performance using mutation and entropy features, respectively. The model was further validated on clinical cohorts to assess its transferability and generalizability to amplicon-based NGS data. Using differential features instead of all, data dimension was reduced while the model performance was improved. Interestingly, RF model revealed that H101R mutation in LMP1 emerged as the top significant feature, and its oncogenic implication was confirmed through proliferation and invasion experiments in HNE-1MUT-LMP1 cells. By integrating EBVaNPC GOF and RNA-seq data, the differentially expressed genes linked to the H101R mutation were involved primarily in immune regulation processes. Both approaches indicated a notable association between FOXP3-T cell anergy and WNT7A-stem cell population maintenance in HNE-1MUT-LMP1 cells. Conclusion: This study integrates algorithm design and experimental investigations to uncover the biological significance of EBV carcinogenesis, offering a clinically suitable technology for high risk NPC identification in EBV infected subjects.
ORGANISM(S): Homo sapiens
PROVIDER: GSE271486 | GEO | 2025/12/03
REPOSITORIES: GEO
ACCESS DATA