###############################################################################
#                                                                             #
#   Software   :  NCBRPred                                                    #
#   Release    :  1.1  (January 2020)                                         #
#   Author(s)  :  Jun Zhang, Qingcai Chen, Bin Liu                            #
#   Copyright  :  School of Computer Science and Technology,                  #
#                 Harbin Institute of Technology, China                       #
#                                                                             #
###############################################################################
			
			
                         User guide of NCBRPred 
		      ===============================


==================
** Introduction **
==================
NCBRPred is a new sequence-based computational predictor for identifying DNA-binding
residues and RNA-binding residues. In which multi-label learning framework and 
sequence labeling model were employed. It improved performance of nucleic acid binding
residue prediction whlie maintaining a low cross-prediction rate, which is 
an important complement to existing methods.

The program was implemented based on Keras and TensorFlow, which requires following 
programs and data for usage: 
(1) PSIBLAST - Generation of the position specific scoring matrixes (PSSMs);
(2) HHblits - Generation of the hidden Markov model (HMM) based evolutionary profiles;
(3) SSpro - Prediction of protein secondary structure;
(4) ACCpro - Prediction of protein solvent accessibility;
(4) Biopython - Protein data preprocessing;
(5) Keras - Construction of the prediction model;
(6) Tensorflow - Backend of keras for computing;
(7) NRDB90 - The non-redundant database for usage of PSIBLAST to generate PSSMs;
(8) Uniprot20 - The non-redundant database for uasage of HHblits to generate HMM profiles; 

===========
** Usage **
===========
1. Download and install the required programs and databases;
2. Download and unzip the "NCBRPred.zip" file;
3. Go to the NCBRPred directory;
4. Configure the paths of the required programs and databases in conf.py
5. Excute command of "python predict.py -i input_file -o output_file -m model" to make prediction. 
   Here, 'input_file' is the path of the protein sequence file in the fasta format, 
   'output_file' is the path to save the predicted results,
   'model' is the name of prediction model. Four prediction models are available, 
   including yk17, yfk16_a3, yfk16_a5 and mw15. The first three were trained with the training 
   sets of YK17, YFK16-3.5, and YFK16-5 respectively. The mw15 was trained with the reduced YFK16-5.
   More information please refer to our paper.

=============
** Example **
=============
Four protein sequences in fasta format were provided as examples in "./examples/test.txt". 
Excute command of "python predict.py -i examples/test.txt -o examples/results.txt -m yk17".
The prediction results will be saved in "examples/results.txt", as shown below:

>example1
Amino Acid	probability_DNA	binary_DNA	probability_RNA	binary_RNA
M	0.0514	0	0.0041	0
P	0.0431	0	0.0018	0
        ...
L	0.0017	0	0.0001	0
Q	0.0470	0	0.0042	0
        ...
>example4
Amino Acid	probability_DNA	binary_DNA	probability_RNA	binary_RNA
K	0.0303	0	0.0024	0
W	0.0194	0	0.0016	0
        ...
K	0.0022	0	0.2056	1
D	0.0018	0	0.2087	1

Prediction for each protein is given in a column-wise fashion where each row corresponds 
to a residue and where:
(1) Fist column is the protein sequence;
(2) Second column gives the predicted DNA-binding scores;
(3) Third column gives the putative DNA-binding residues, 1 represents DNA-binding residue and 0 
    represents non-DNA-binding residue;
(4) Fourth column gives the predicted RNA-binding scores;
(5) Fifth column gives the putative RNA-binding residues, 1 represents RNA-binding residue and 0
    represents non-RNA-binding residue;

=============
** License **
=============
This is a free program for non-commercial users only. 
Any question please contact us by email: bliu@bliulab.net or jzhang@bliulab.net.

Enjoy!