ProGO-PSL-Document

Document

If you use ProGO-PSL for research, please cite this paper:

Jiangyi Shao, Shutao Chen, Bin Liu*;
Hybrid Information-driven Protein Gene Ontology Annotation via the Protein Sequence Large Graph (Submitted)

Installation and Usage Guide

Requirements

Python 3.10+
Required Python libraries (install via requirements.txt):
```
pip install -r requirements.txt
```
GPU support is recommended for deep learning tasks

Usage Examples

Training Stage 1:

    python scripts/construct_gendis.py -c configs/training_msa-v1/bpo-7-26.yml \
        /path/to/dataset_state_dict.pkl \
        /path/to/MSAs/ \
        /path/to/save/model/

Training Stage 2:

    python scripts/construct_gendis.py -c configs/training_msa-v1/bpo-8-24.yml \
        /path/to/dataset_state_dict.pkl \
        /path/to/MSAs/ \
        /path/to/save/model/

Testing:

    python scripts/construct_gendis.py -c configs/evaluating_msa-v1/bpo-8-24.yml \
        /path/to/dataset_state_dict.pkl \
        /path/to/MSAs/ \
        /path/to/trained/model/

Configuration

Sample Configuration File (configs/training_netgo-v1/bp.yml):

    mode: train
    task: biological_process
    epochs: 100
    batch_size: 32
    lr: 0.0001
    top_k: 40
    max_len: 2000

Key Parameters

General Arguments:
- file_address: Path to the dataset file
- working_dir: Directory for MSA files
- model_saving: Directory to save trained model
Training Parameters:
- --mode: Operation mode (train, test)
- --batch-size: Batch size (default: 32)
- --epochs: Number of training epochs
- --lr: Learning rate
Hardware Options:
- --gpu-ids: GPU IDs to use
- --amp: Use automatic mixed precision

Evaluation Details

The evaluation process includes metrics such as:

Fmax Score: Maximum F-score across thresholds
AuPRC: Area Under Precision-Recall Curve

License

This project is distributed under the MIT License. See LICENSE.md for more details.

ProGOPSL

Home

Document

Contact

About