As one of the most challenging tasks in sequence analysis, protein remote homology detection has attracted a great deal of interest. Methods based on PageRank models and HITS approaches have achieved the-state-of-the-art performance in information retrieval, and these two kinds of methods are complementary. However, the integration framework of combining the PageRank and HITS methods has never been explored in protein remote homology detection.
In this study, six basic models have been used to construct the predictors for protein remote homology detection, including PSI-BLAST,Hmmer,HHblits,PsePro-HHblits,PsePro-Hmmer,PsePro-PSI-BLAST. They are able to automatically extract the local and global sequence order information. The HHblits achieved the best performance among the six basic models. Finally, a new method called HITS-PR-HHblits was proposed by combining PageRank models, HITS approaches and a ranking method HHblits. Tested on a widely used SCOP benchmark dataset, HITS-PR-HHblits achieved an ROC1 score of 0.9045, and an ROC50 score of 0.9124, significantly outperformed other existing state-of-the-art methods. Experimental results on the recent SCOPe datasets showed that HITS-PR-HHblits can achieve stable performance. It is anticipated that HITS-PR-HHblits will become a very useful tool for protein remote homology detection.