ProtDec-LTR 2.0

An improved method for protein remote homology detection by combining pseudo protein and supervised learning to rank

| Home | Server | Tutorial | Document | Citation |

Description of ProtDec-LTR 2.0 web server

Protein remote homology detection is critical for both basic research (such as protein attribute prediction), and practical application (such as modeling the 3D structures of target proteins for drug development), and it is one of the most important sequence analysis tasks in computational biology, aiming to find proteins with known structures that are distantly evolutionarily related to the query proteins [1].

In 2015 we published the method ProtDec-LTR, which was the first computational method that combines three state-of-the-art ranking predictors by a supervised framework via using Learning to Rank (LTR) [2]. ProtDec-LTR has been widely used for protein remote homology detection as well as for protein function and structure prediction. However, there is still some further work to do because of the following reasons:

  • The performance in ROC1 score has large room to improve. Because those false positive hits in the front of ranking list are more like to be chosen as the targets, we need to improve the detection sensitivity by using a richer protein representation with incorporating the evolutionary information in profile.
  • Only is the ProtDec-LTR method proposed. For users' convenience, an easy-to-use online web server will be certainly welcome. However, most of the existing web servers for protein remote homology detection only provide the simple function of assigning yes/no label, let alone the convenient ways for analyzing the structure and function of the homologous proteins. Thus, there is a need to produce an updated version.

We develop an improved method ProtDec-LTR 2.0 with incorporating profile-based pseudo protein [3] in the framework of LTR algorithm. Experimental results show the profile-based pseudo protein representation can obviously improve the sensitivities of three state-of-the-art predictors. These three pseudo-protein predictors are then combined in a supervised manner via LTR algorithm, and the predictive performance is further improved. More performance description can be found in the Document. The web server is also developed, by which users can detect the homologous proteins only based on the protein sequences, and the predicted results will be returned in a user-friendly manner.

The improvement and friendly web service have led ProtDec-LTR 2.0 to be a more efficient and powerful tool. Compared with the other existing methods, ProtDec-LTR 2.0 has several advantages:

  • It is the first web server incorporating the profile-based pseudo proteins into the framework of LTR algorithm;
  • Various results visualization and functions interpretation are provided, such as homologous protein 3D structure visualization and multiple sequence alignment interpretation;
  • According to the experimental results, ProtDec-LTR 2.0 is one of the most accurate web servers for protein remote homology.

ProtDec-LTR2.0 is trained with an updated benchmark dataset SCOPe v2.06 with more samples and protein families, making it more robust and useful. The web server will be updated when new version of SCOPe is released.

1. Chen J, Guo M, Wang X et al. A comprehensive review and comparison of different computational methods for protein remote homology detection, Briefings in bioinformatics 2016:bbw108. (PMID: 27881430, cited by 7)
2. Liu B, Chen J, Wang X. Application of Learning to Rank to protein remote homology detection, Bioinformatics 2015;31:3492-3498. (PMID: 26163693, cited by 44)
3. Liu B, Zhang D, Xu R et al. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection, Bioinformatics 2014;30:472-479. (PMID: 24318998, cited by 162)

Harbin Institute of Technology, Shenzhen.

网站备案号: 粤ICP备19041859号-1