Introduction
Accurately predicting peptide-protein interactions (PepPIs) is essential for biology and disease research. Given the homogeneity of biological sequences and natural language, the grammar and semantics of peptides or proteins have been extensively studied to solve the important tasks in protein sequence analysis. However, these methods ignored the pragmatic information of proteins. Here, we introduce IIDL-PepPI, a progressive transfer learning model based on interpretable biological sequence pragmatic analysis for predicting binary interactions and binding residues in peptide-protein-specific pairs. Inspired by linguistics, IIDL-PepPI employs the biological sequence pragmatic analysis approach to integrate multi-source features from peptides and proteins in different contexts via an interpretable bidirectional attention module. Furthermore, to enable multi-level prediction, IIDL-PepPI constructs a pre-trained model at the sequence level based on the binary interaction prediction, and then fine-tunes the model at the residue level through progressive transfer learning to achieve fine-grained profiling of binding residues in specific pairs. IIDL-PepPI outperforms the state-of-the-art methods with superior performance and interpretability, providing a more comprehensive solution for predicting PepPI and identifying peptide-protein-specific binding residues, which is expected to facilitate therapeutic peptide development and protein function analysis.