Attention-based deep learning for analysis of pathology images and gene expression data in lung squamous premalignant lesions [HTAN cohort]
Ontology highlight
ABSTRACT: Background: Molecular and cellular alterations to the normal pseudostratified columnar bronchial epithelium results in the development of bronchial premalignant lesions representing a spectrum of histology from normal to hyperplasia, metaplasia, dysplasia (mild, moderate, and severe), carcinoma in situ and invasive carcinoma. Several studies have identified molecular alterations associated with lesion histology and progression. The broad and continuous spectrum of histologic and molecular changes makes reproducible stratification of lesions across multiple studies challenging. Methods: We proposed a transformer-based framework that flexibly utilizes transcriptomic and histologic patterns to distinguish lesions with bronchial dysplasia or worse from normal, hyperplasia, and metaplasia. We leveraged H&E whole slide images (WSIs) of endobronchial biopsies and bulk gene expression data (GE) derived from endobronchial biopsies and brushings from previously published studies and on-going lung precancer atlas efforts obtained from patients at high-risk for lung cancer. Results: On an external testing dataset of WSIs, the model trained on WSIs plus GE achieved an area under the ROC curve (AUROC) of 0.884±0.040 compared to 0.829±0.046 for the model trained on WSIs alone. On external testing datasets of GE, the model trained on WSIs plus GE achieved an AUROC of 0.857±0.033 versus 0.713±0.098 for a model trained on GE alone. Based on these results, we leveraged data across 4 studies to train a flexible fusion model that allows one or both data modalities (WSIs and GE) to be used in training. The model achieved an AUROC of 0.906±0.034 on external testing WSIs data and 0.870±0.023 on external testing GE data. Despite model training on a binary label, model probabilities were associated with histologic grade and the model identified gene expression alterations associated with bronchial dysplasia across multiple studies. Conclusions: Our multimodal transformer outperformed models trained on a single data modality and enabled the inclusion of samples with one or both modalities during training and/or testing. It increases the flexibility, scalability, and real-world applicability of disease severity assessment that better risk stratifies bronchial premalignant lesions even when only routine histology data is accessible.
ORGANISM(S): Homo sapiens
PROVIDER: GSE320381 | GEO | 2026/05/27
REPOSITORIES: GEO
ACCESS DATA