PreHom-PCLM: Protein Remote Homology Detection by Combing Motifs and Protein Cubic Language Model

The training, validation and test datasets can be downloaded from the following links:

training.fa ,   validation.fa ,   test.fa

For the independent test set, we extracted the proteins added in the SCOPe database from 2020-7 to 2021-12, and then reduced its redundancy by MMseqs (0.95 sequence identity and 10e-4 evalue) [1]. The independent test set contains 5990 proteins covering 839 superfamilies, which can be downloaded from the following links:



Bin Liu
School of Computer Science and Technology, Beijing Institute of Technology, China.

