############################################################################### # # # Software : NCBRPred # # Release : 1.1 (January 2020) # # Author(s) : Jun Zhang, Qingcai Chen, Bin Liu # # Copyright : School of Computer Science and Technology, # # Harbin Institute of Technology, China # # # ############################################################################### User guide of NCBRPred =============================== ================== ** Introduction ** ================== NCBRPred is a new sequence-based computational predictor for identifying DNA-binding residues and RNA-binding residues. In which multi-label learning framework and sequence labeling model were employed. It improved performance of nucleic acid binding residue prediction whlie maintaining a low cross-prediction rate, which is an important complement to existing methods. The program was implemented based on Keras and TensorFlow, which requires following programs and data for usage: (1) PSIBLAST - Generation of the position specific scoring matrixes (PSSMs); (2) HHblits - Generation of the hidden Markov model (HMM) based evolutionary profiles; (3) SSpro - Prediction of protein secondary structure; (4) ACCpro - Prediction of protein solvent accessibility; (4) Biopython - Protein data preprocessing; (5) Keras - Construction of the prediction model; (6) Tensorflow - Backend of keras for computing; (7) NRDB90 - The non-redundant database for usage of PSIBLAST to generate PSSMs; (8) Uniprot20 - The non-redundant database for uasage of HHblits to generate HMM profiles; =========== ** Usage ** =========== 1. Download and install the required programs and databases; 2. Download and unzip the "NCBRPred.zip" file; 3. Go to the NCBRPred directory; 4. Configure the paths of the required programs and databases in conf.py 5. Excute command of "python predict.py -i input_file -o output_file -m model" to make prediction. Here, 'input_file' is the path of the protein sequence file in the fasta format, 'output_file' is the path to save the predicted results, 'model' is the name of prediction model. Four prediction models are available, including yk17, yfk16_a3, yfk16_a5 and mw15. The first three were trained with the training sets of YK17, YFK16-3.5, and YFK16-5 respectively. The mw15 was trained with the reduced YFK16-5. More information please refer to our paper. ============= ** Example ** ============= Four protein sequences in fasta format were provided as examples in "./examples/test.txt". Excute command of "python predict.py -i examples/test.txt -o examples/results.txt -m yk17". The prediction results will be saved in "examples/results.txt", as shown below: >example1 Amino Acid probability_DNA binary_DNA probability_RNA binary_RNA M 0.0514 0 0.0041 0 P 0.0431 0 0.0018 0 ... L 0.0017 0 0.0001 0 Q 0.0470 0 0.0042 0 ... >example4 Amino Acid probability_DNA binary_DNA probability_RNA binary_RNA K 0.0303 0 0.0024 0 W 0.0194 0 0.0016 0 ... K 0.0022 0 0.2056 1 D 0.0018 0 0.2087 1 Prediction for each protein is given in a column-wise fashion where each row corresponds to a residue and where: (1) Fist column is the protein sequence; (2) Second column gives the predicted DNA-binding scores; (3) Third column gives the putative DNA-binding residues, 1 represents DNA-binding residue and 0 represents non-DNA-binding residue; (4) Fourth column gives the predicted RNA-binding scores; (5) Fifth column gives the putative RNA-binding residues, 1 represents RNA-binding residue and 0 represents non-RNA-binding residue; ============= ** License ** ============= This is a free program for non-commercial users only. Any question please contact us by email: bliu@bliulab.net or jzhang@bliulab.net. Enjoy!