Unknown

Dataset Information

0

BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation.


ABSTRACT: Although substantial efforts have been made using graph neural networks (GNNs) for artificial intelligence (AI)-driven drug discovery, effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets , which are time-consuming, computationally expensive and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.

SUBMITTER: Wang Z 

PROVIDER: S-EPMC10783874 | biostudies-literature | 2023 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation.

Wang Zhen Z   Feng Zheng Z   Li Yanjun Y   Li Bowen B   Wang Yongrui Y   Sha Chulin C   He Min M   Li Xiaolin X  

Briefings in bioinformatics 20231101 1


Although substantial efforts have been made using graph neural networks (GNNs) for artificial intelligence (AI)-driven drug discovery, effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies requ  ...[more]

Similar Datasets

| S-EPMC9868114 | biostudies-literature
| S-EPMC11886571 | biostudies-literature
2024-09-13 | GSE262953 | GEO
| S-EPMC7983260 | biostudies-literature
| S-EPMC11520410 | biostudies-literature
| PRJNA1094989 | ENA
| S-EPMC10070395 | biostudies-literature
| S-EPMC11606038 | biostudies-literature
| S-EPMC10879798 | biostudies-literature
| S-EPMC11362164 | biostudies-literature