Introduction

Intrinsically disordered proteins and regions (IDP/IDRs) bind with partners and perform various molecular functions. These functions are summarized as five general categories, including assembler, scavenger, effector, display site and chaperone. The existing predictors have been proposed to identify disordered regions binding with specific partners, while the computational methods for predicting molecular functions of IDRs are lacking. Motivated by the growing numbers of experimental annotated functional sequences and the need to expand the coverage of disordered protein function predictors, we developed DisoBMFpred for disordered molecular functions prediction. To the best knowledge of ours, it is the first computational predictor for this task. DisoBMFpred employs the Protein Cubic Language Model (PCLM), which incorporates three protein language models for characterizing sequences, structural and functional features of proteins, and attention-based alignment for understanding the relationship among three captured features and generating a joint representation of proteins. The PCLM was pre-trained with large-scaled IDR sequences and fine-tuned with functional annotation sequences for molecular function prediction.


Figure.1 The flowchart of DisoBMFpred. (A) Architecture of the protein cubic language model (PCLM). (B)The PCLM's functional specific fine-tuning for predictiong binding related molecular functions of intrinsically disordered proteins.



Materials

Datasets and Supporting Information used in this study can be downloaded here:

· Molecular function benchmark dataset:

Training set

Evaluation set

Test set

· Intrinsically disordered pre-training dataset:

IDRs pre-training set

IDRs validation set

· ELM motifs:

Motif.zip



Acknowledgments

We acknowledge with thanks the following software used in this server:

PSI-BLAST: The protein sequence similarity search.

HH-suite3: Protein sequence alignment based on hidden Markov models (HMMs).

CCMpred: The residue-residue contacts generation of proteins.

FIMO: The Motifs searching tools.



References

Upon the usage the users are requested to use the following citation:

· Yihe Pang, Bin Liu. DisoBMFpred: predicting binding related molecular functions of intrinsically disordered proteins and regions based on protein cubic language model. (Submitted)



51La