NCBRPred:

predicting nucleic acid binding residues in proteins based on multi-label learning

| Home | Server |



Introduction

NCBRPred is a new sequence-based computational predictor for identifying DNA-binding residues and RNA-binding residues. In which multi-label learning framework and sequence labeling model were employed. It improved performance of nucleic acid binding residue prediction whlie maintaining a low cross-prediction rate, which is an important complement to existing methods. The framework and working process of NCBRPred are shown in Fig. 1. For more information please refer to our paper.

Architecture of NCBRPred
Fig 1. The framework and architecture of NCBRPred. (a) The overall framework of NCBRPred. Both DNA-binding proteins and RNA-binding proteins are fed into NCBRPred for training and test. The sliding window strategy was used to capture the local dependencies among residues in a protein. (b) The network architecture of multi-label sequence labeling model (MSLM). It contains three layers, including two BiGRU layers and a TDMLFCB layer. The two BiGRU layers measure the correlations among residues along the protein in a global fashion so as to capture the long and short distance dependencies among residues. The TDMLFCB layer predicts DNA-binding residues and RNA-binding residues based on the learned hidden features by the former two BiGRU layers. The red, blue, orange and gray circles in the input layer represents DNA-binding residue, RNA-binding residue, DNA and RNA-binding residue, and non-DNA/RNA-binding residue, respectively. (c) The network architecture of MLFCB. It integrates the predictive results for binding residues via the multi-label learning strategy trained with both DNA and RNA-binding residues, leading to lower cross-prediction rate.

Acknowledgments

We acknowledge with thanks the following softwares used as a part of this server:

(1) PSIBLAST - Generation of the position specific scoring matrixes (PSSMs);

(2) NRDB90 - The non-redundant database for usage of PSIBLAST to generate PSSMs;

(3) HHBlits - Generation of the hidden Markov model (HMM) based evolutionary profiles;

(4) HHsuite database - The non-redundant database for uasage of HHblits to generate HMM profiles;

(5) SSpro and ACCpro - Prediction of protein secondary structure and solvent accessibility respectively;

(6) Biopython - Protein data preprocessing;

(7) Keras - Construction of the prediction model;

(8) Tensorflow - Backend of keras for computing.

Stand-alone package

The stand-alone packages of NCBRPred based on python 2.7 and python 3.7 can be download from below links:

[NCBRPred(Python2) | NCBRPred(Python3) | README.txt]

Note: For the example of the command line of the stand-alone package and the guide of configuring the stand-alone package please refer to the above README file.

References

Upon the usage of this server the users are requested to use the following citation:

Jun Zhang, Qingcai Chen, Bin Liu*. NCBRPred: predicting nucleic acid binding residues in proteins based on multi-label learning. Briefings in Bioinformatics,2021,22(5):bbaa397.