Document
If you use ProGO-PSL for research, please cite this paper:
Jiangyi Shao, Shutao Chen, Bin Liu*;
Hybrid Information-driven Protein Gene Ontology Annotation via the Protein Sequence Large Graph
(Submitted)
Multiple Sequence Alignments (MSAs)
SwissProt Dataset
Source code of ProGO-PSL:
Installation and Usage Guide
Requirements
- Python 3.10+
- Required Python libraries (install via requirements.txt):
pip install -r requirements.txt
- GPU support is recommended for deep learning tasks
Usage Examples
Training Stage 1:
python scripts/construct_gendis.py -c configs/training_msa-v1/bpo-7-26.yml \ /path/to/dataset_state_dict.pkl \ /path/to/MSAs/ \ /path/to/save/model/
Training Stage 2:
python scripts/construct_gendis.py -c configs/training_msa-v1/bpo-8-24.yml \ /path/to/dataset_state_dict.pkl \ /path/to/MSAs/ \ /path/to/save/model/
Testing:
python scripts/construct_gendis.py -c configs/evaluating_msa-v1/bpo-8-24.yml \ /path/to/dataset_state_dict.pkl \ /path/to/MSAs/ \ /path/to/trained/model/
Configuration
Sample Configuration File (configs/training_netgo-v1/bp.yml):
mode: train task: biological_process epochs: 100 batch_size: 32 lr: 0.0001 top_k: 40 max_len: 2000
Key Parameters
- General Arguments:
- file_address: Path to the dataset file
- working_dir: Directory for MSA files
- model_saving: Directory to save trained model
- Training Parameters:
- --mode: Operation mode (train, test)
- --batch-size: Batch size (default: 32)
- --epochs: Number of training epochs
- --lr: Learning rate
- Hardware Options:
- --gpu-ids: GPU IDs to use
- --amp: Use automatic mixed precision
Evaluation Details
The evaluation process includes metrics such as:
- Fmax Score: Maximum F-score across thresholds
- AuPRC: Area Under Precision-Recall Curve
License
This project is distributed under the MIT License. See LICENSE.md for more details.