Accurately predicting peptide-protein interactions (PepPIs) is essential for biology and disease research. Given the homogeneity of biological sequences and natural language, the grammar and semantics of peptides or proteins have been extensively studied to solve the important tasks in protein sequence analysis. However, these methods ignored the pragmatic information of proteins. Here, we introduce IIDL-PepPI, a progressive transfer learning model based on interpretable biological sequence pragmatic analysis for predicting binary interactions and binding residues in peptide-protein-specific pairs. Inspired by linguistics, IIDL-PepPI employs the biological sequence pragmatic analysis approach to integrate multi-source features from peptides and proteins in different contexts via an interpretable bidirectional attention module. Furthermore, to enable multi-level prediction, IIDL-PepPI constructs a pre-trained model at the sequence level based on the binary interaction prediction, and then fine-tunes the model at the residue level through progressive transfer learning to achieve fine-grained profiling of binding residues in specific pairs. IIDL-PepPI outperforms the state-of-the-art methods with superior performance and interpretability, providing a more comprehensive solution for predicting PepPI and identifying peptide-protein-specific binding residues, which is expected to facilitate therapeutic peptide development and protein function analysis.

Figure.1 Data preparation workflow and network architecture of IIDL-PepPI. a Data preparation workflow of IIDL-PepPI, in which the public databases used include RCSB PDB, PDBe, and UniProt. b Network architecture of IIDL-PepPI for peptide-protein binary interaction prediction and binding residue recognition, including sequence representation, feature encoding, bi-attentional module, and decoding. Based on the biological sequence pragmatic analysis, the bi-attention module explicitly integrates features from the peptide and protein sides to distinguish different peptide-protein-specific interactions. c The progressive transfer learning architecture. The initial stage of IIDL-PepPI commences with pre-training peptide-protein binary interactions using sequence-level datasets and the coarse-grained learning of basic network parameters. Subsequently, in the second phase, we transfer the parameters of the basic network, replace the decoder, and conduct fine-grained fine-tuning of the model using residue-level dataset for precise prediction of peptide- and protein-binding residues in specific peptide-protein pairs.


Upon the usage the users are requested to use the following citation:

Shutao Chen, Ke Yan, Xuelong Li, and Bin Liu*.
Peptide-Protein Interaction Profiling Based on Pragmatic Analysis and Progressive Transfer Learning. (Submitted)


In this work, we proposed a progressive transfer learning for peptide-protein-specific interaction profiling based on interpretable biological sequence pragmatic analysis, named IIDL-PepPI, which progressively enables peptide-protein binary interaction prediction and pair-specific binding residue identification.


If you are interested in this research area or have any questions, please do not hesitate to contact us and we will do our best to answer them in order to facilitate mutual learning and progress. If you use our research results, please cite this article.

Copyright © bliu@bliulab rights reserved.

Back to home