About ProFun-SOM

Figure 1. The flowchart of ProFun-SOM.

ProFun-SOM: Protein Function Prediction for Specific Ontology based on Multiple Sequence Alignment Reconstruction

  • Motivation: Protein function prediction is essential for comprehending species evolution, including virus mutations. Protein functions are categorized using gene ontology, which organizes them into three main groups: where they act within cells (Cellular Component Ontology), what they do (Molecular Function Ontology), and the processes they participate in (Biological Process Ontology). Existing methods are hindered by a mixed ontology problem, as they do not account for the biological distinctions between different ontologies and their sub-ontologies. Moreover, this issue is the underlying cause of label dependencies and data sparsity when constructing multi-label protein function predictions.
  • Results: To tackle this challenge, we propose ProFun-SOM, a novel multi-label protein function classifier that leverages multiple sequence alignments (MSAs) to effectively discern protein functions across the three ontologies. ProFun-SOM employs an MSA reconstruction to refine the raw MSAs and inputs them into a deep architecture. Subsequently, it predicts functions within the cellular component (CC), molecular function (MF), biological process (BP), and mixed (Mix) ontologies. Our evaluation results on three datasets: CAFA3, SwissProt, and NetGO2 demonstrate that ProFun-SOM surpasses state-of-the-art methods. It has been confirmed that utilizing multiple sequence alignments of proteins can effectively mitigate label dependencies and data sparsity issues in protein function prediction, thereby alleviating the biological root problem, mixed ontology.

Services

ProFun-SOM

Protein Function Prediction for Specific Ontology based on Multiple Sequence Alignment Reconstruction