ProtFold-DFG: protein fold recognition by combining Directed Fusion Graph and PageRank algorithm The website provide the source code and related data for the implementation and evaluation of the ProtFold-DFG. All data available for the download is compressed into zip format. The experiment depands on the Windows Subsystem Linux(WSL) Ubuntu18.04LTS and python3.7.7. The src.zip contains a "src" folder, there are two folders "model" and "utils" and two python format files experimental.py and testDeepFR.py. The "model" folder contains the source code for implementing the ProtFold-DFG predictor. There are 5 python format files in "utils" folder, which are basemethod2rank.py, Evaluate.py, Plot.py, split_dataset.py and Utils.py. Furthermore, the exprimental.py tells the readers how to perform 2-fold cross validation on LINDAHL The LE.zip is the benchmark dataset which contains 7 files 321a, 321b, 434a, 434b, 555a, 555b and le_all.seq. We evaluate the ProtFold-DFG by 2-fold cross validation between 321a and 321b. Furthermore, there two folders "all" and "folds" providing the preprocessing result of LINDAHL dataset. The DeepFR.zip, DeepSVM-fold.zip, LTR.zip and MotifCNN-fold.zip are the basic methods. All the compressed packaged contains the 321_rankings folder which is the ranking result on the benchmark dataset. Furthermore, The DeepFR.zip has two folders 321_rankings and 976_rankings, the former is a part of the latter which is downloaded from http://protein.ict.ac.cn/deepfr/evaluation_data/lindahl_results/DeepFR/. All the ranking lists follow a specified format like this: qid:1hsq-d1hsq fold:2_21 feedback:1bia-d1bia_2 fold:2_21 score:3.152705192565918 same:1 where the "qid" means the query protein sequence 1hsq-d1hsq and "feedback" means the template protein sequence 1bia-d1bia_2, the "fold" represents the fold type of protein sequence, the "score" represents the fold similarity between them and the "same" indicates whether they belong to the same fold type, when the "same" is 1 they are the same fold type and 0 are not. When the related data are downloaded, we recommend that readers store data and source code according to the following directory structure. ProtFold-DFG ├── dataset │   └── LE │ ├── all │ └── folds ├── evaluate │   └── PRF ├── feature │   ├── LTR │   ├── PRF │   ├── deepFR │   ├── DeepSVM-fold │   ├── MotifCNN-fold │   └── prfMatrix ├── feedback │   └── PRF ├── otherTest │   └── DeepFR ├── src │ ├── model │ └── utils ├──experimental.py └──testDeepFR.py Enter "python src/experimental.py" will perform the 2-fold cross validation, but the program is experimental which means the readers should understand the source code and ensure the data importing and result exporting smoothly. Reference [1]. Lindahl E, Elofsson A. Identification of related proteins on family, superfamily and fold level, J Mol Biol 2000;295:613-625.