Document
If you use ProGO-PSL for research, please cite this paper:
Jiangyi Shao, Shutao Chen, Bin Liu*;
Hybrid Information-driven Protein Gene Ontology Annotation via the Protein Sequence Large Graph
(Submitted)
Benchmark dataset (SwissProt released April 2022)
Multiple Sequence Alignments (MSAs) of benchmark dataset
MSAs data of benchmark dataset (84 GB)
Independent test set (SwissProt newly added between May 2022 and March 2025)
Multiple Sequence Alignments (MSAs) of independent test set
Multiple Sequence Alignments (MSAs) of independent test set (2.2 GB)
Source code of ProGO-PSL:
Installation and Usage Guide
Requirements
- Python 3.10+
- Required Python libraries (install via requirements.txt):
pip install -r requirements.txt
- GPU support is recommended for deep learning tasks
Usage Examples
Training Stage 1:
python scripts/construct_gendis.py -c configs/training_msa-v1/bpo-7-26.yml \ /path/to/dataset_state_dict.pkl \ /path/to/MSAs/ \ /path/to/save/model/
Training Stage 2:
python scripts/construct_gendis.py -c configs/training_msa-v1/bpo-8-24.yml \ /path/to/dataset_state_dict.pkl \ /path/to/MSAs/ \ /path/to/save/model/
Testing:
python scripts/construct_gendis.py -c configs/evaluating_msa-v1/bpo-8-24.yml \ /path/to/dataset_state_dict.pkl \ /path/to/MSAs/ \ /path/to/trained/model/
Configuration
Sample Configuration File (configs/training_netgo-v1/bp.yml):
mode: train task: biological_process epochs: 100 batch_size: 32 lr: 0.0001 top_k: 40 max_len: 2000
Key Parameters
- General Arguments:
- file_address: Path to the dataset file
- working_dir: Directory for MSA files
- model_saving: Directory to save trained model
- Training Parameters:
- --mode: Operation mode (train, test)
- --batch-size: Batch size (default: 32)
- --epochs: Number of training epochs
- --lr: Learning rate
- Hardware Options:
- --gpu-ids: GPU IDs to use
- --amp: Use automatic mixed precision
Evaluation Details
The evaluation process includes metrics such as:
- Fmax Score: Maximum F-score across thresholds
- AuPRC: Area Under Precision-Recall Curve
License
This project is distributed under the MIT License. See LICENSE.md for more details.