Protein Remote Homology Detection by Combining PageRank and Hyperlink-Induced Topic Search

| Home | Server | Tutorial | Citation |


As one of the most challenging tasks in sequence analysis, protein remote homology detection has attracted a great deal of interest. Methods based on PageRank models and HITS approaches have achieved the-state-of-the-art performance in information retrieval, and these two kinds of methods are complementary. However, the integration framework of combining the PageRank and HITS methods has never been explored in protein remote homology detection.

In this study, six basic models have been used to construct the predictors for protein remote homology detection, including PSI-BLAST,Hmmer,HHblits,PsePro-HHblits,PsePro-Hmmer,PsePro-PSI-BLAST. They are able to automatically extract the local and global sequence order information. The HHblits achieved the best performance among the six basic models. Finally, a new method called HITS-PR-HHblits was proposed by combining PageRank models, HITS approaches and a ranking method HHblits. Tested on a widely used SCOP benchmark dataset, HITS-PR-HHblits achieved an ROC1 score of 0.9045, and an ROC50 score of 0.9124, significantly outperformed other existing state-of-the-art methods. Experimental results on the recent SCOPe datasets showed that HITS-PR-HHblits can achieve stable performance. It is anticipated that HITS-PR-HHblits will become a very useful tool for protein remote homology detection.

The flowcart of HITS-PR-HHblits
Figure 1. The Integration framework of combining HHblits, PageRank, and HITS.

Code and Dataset

The supplementary files consists of the main python programs and the used datasets. To use these programs or datasets, please download following files:


[SCOP-benchmark dataset,Sequences] [SCOP-benchmark dataset,Sequences]


HITS-PR-HHblits depends on the following toolkits:






Please cite the following papers when using the programs or datasets at this website:

Liu B, Jiang S, Zou Q. HITS-PR-HHblits: Protein Remote Homology Detection by Combining PageRank and Hyperlink-Induced Topic Search, Briefings in Bioinformatics 2020;21:298-308.