KaMT:

A Knowledge-aligned Multi-modal Transformer for Enhanced Molecular Representation and Antibiotic Discovery

Home | Download | Citation



Introduction

Abstract: The global escalation of antimicrobial resistance has created an urgent need for novel antibiotic classes. However, traditional computational screening approaches often struggle due to the extreme sparsity of antibiotic-relevant chemical space and the severe class imbalance in experimental datasets. To address these challenges, we propose the Knowledge-aligned Multi-modal Transformer (KaMT), a framework that bridges structural topology and biophysical function in molecular representation learning. KaMT employs a dual-stream architecture that integrates molecular line graphs with high-dimensional physicochemical descriptors. This design enables the model to move beyond pure structural patterns and capture the key biophysical determinants of antimicrobial activity. Importantly, KaMT demonstrates strong robustness and predictive stability under rigorous scaffold-split validation, a challenging protocol that simulates real-world lead optimization by requiring generalization to structurally distinct chemical families. Through dynamic weight loss and dual-constraint regularization, the model effectively incorporates medicinal chemistry knowledge, achieving superior performance on core antibiotic screening benchmarks compared to several existing advanced models. In addition to high-precision activity prediction, KaMT exhibits powerful scaffold-hopping capability, successfully identifying structurally novel yet potent antibiotic candidates. By learning interpretable and biophysically-grounded representations, KaMT provides a valuable computational tool to accelerate the discovery of new antibiotics against multidrug-resistant pathogens.
The flowchart of KaMT model is shown in Figure 1

KaMT web server
Figure 1. An illustrative diagram of KaMT a Pre-training workflow showcasing the extraction of biophysical knowledge (molecular fingerprints and physicochemical descriptors) alongside the transformation of the unlabeled molecular graph into a MLG for masked node predictions. b Detailed architecture of the KaMT, featuring input triplet encodings (node, distance, and path) passed through custom transformer layers with multi-modal pre-training alignment constraints. c Fine-tuning paradigm for labeled molecules, illustrating the integration of the pre-trained KaMT with knowledge fusion layers for downstream antibiotic discovery and property prediction tasks. d Fine-tuning optimization strategies, including DWL, R-Drop, and FGM.