Introduction
Protein-peptide interactions (PepPIs) are essential to a wide range of biological processes, including gene regulation, cellular homeostasis, and metabolic modulation. Researchers have developed several computational deep learning predictors based on the sequence information to predict the PepPIs. However, the generalization performance of most computational methods is constrained by the limited protein-peptide complexes in RCSB Protein Data Bank (RCSB PDB) database. Moreover, it is challenging to utilize the complex context of proteins and peptides to predict PepPIs. In this study, we propose HGT-PepPI, a heterogeneous graph-based framework designed for PepPI prediction. The peptide and protein sequences are initialized as heterogeneous nodes with semantic representations using the ProtT5 model. The three multi-relational edges are constructed by integrating sequence semantic information, evolutionary conservation profiles, and experimentally validated interactions between proteins and peptides, respectively. By constructing a graph that inherently integrates multiple types of biological information, our method achieves superior generalization by learning transferable patterns of interaction semantics. Moreover, the proposed method employs the message-passing operations to capture the local sequence characteristics and global complex contextual dependencies, thereby enabling a comprehensive modeling of interaction semantics. Experimental results demonstrate that HGT-PepPI outperforms the existing state-of-the-art approaches in both predictive performance and robustness. In addition, we designed an alanine scanning mutagenesis experiment and a binding affinity experiment, which successfully verified the model's ability to identify key residues and guide peptide drug design.