Introduction
Understanding peptide-protein interactions is crucial for deciphering cellular signaling processes and advancing targeted therapies. However, due to the complexity of multi-molecular associations and the diversity of non-covalent interactions, accurately predicting peptide-protein interactions and providing non-covalent bond annotations at specific sites remain major computational challenges. Here, we propose KGIPA, a knowledge-guided interpretable pragmatic analysis framework that incorporates pragmatic concepts from natural language into life science research to account for the impact of real biological environments on non-covalent interactions. The KGIPA framework leverages both intra-linguistic and extra-linguistic contextual representations to integrate single-molecule multimodal features and construct residue-level pairwise interaction maps. Additionally, it incorporates a knowledge-guided module that combines biological prior knowledge to coordinate the various types of non-covalent interactions. Results on multiple benchmark datasets demonstrate the superiority of KGIPA over state-of-the-art methods in evaluating molecular binding, including protein and peptide binding residues and residue-pair interactions. Moreover, it provides interpretable insights into the quantification of multimodal feature importance and the profiling of non-covalent bonding rules. The strong performance of KGIPA in tasks such as binding affinity prediction underscores the generalizability of its framework and holds promise as a new paradigm for AI-driven life science research.

Figure 1. The model architecture of KGIPA. KGIPA is a neural network model designed to achieve biological sequence pragmatic analysis, and it can be mainly divided into two parts: intra-linguistic and extra-linguistic contextual representation.