Microenvironment Awareness Pragmatic Analysis for Generative Enzyme Function Reasoning
Accurate annotation of enzyme functions is crucial for interpreting rapidly expanding sequence data, particularly as enzyme engineering and high-throughput screening continually create and discover enzymes with novel catalytic functions beyond known enzyme families. However, despite achieving some advances, existing methods still face two limitations: 1) Discriminative models are confined to classifying known function labels and cannot explore or discover novel catalytic functions beyond the training set distribution in the cold-start condition; 2) Existing methods typically treat enzymes as isolated molecular entities, ignoring the microenvironment factors, such as optimal pH and temperature. To address the above issues, this work proposed the Microenvironment Awareness Pragmatic Analysis for Generative Enzyme Function Reasoning (MPA-EC) framework. This framework pioneers the integration of pragmatic analysis into enzyme function annotation, establishing a novel paradigm based on generative reasoning with large language models. MPA-EC enables the proposal of reference-worthy EC labels for enzymes with unknown or novel functions, thereby overcoming the "cold-start" limitation of discriminative models. Furthermore, it models biochemical factors such as optimal pH and temperature as essential "pragmatic context," and through fine-tuning on our constructed Enzyme Microenvironment Database, learns the explicit causal logic of "sequence - microenvironment - function." This allows the fine-tuned model to perform context-aware reasoning, addressing the prior neglect of microenvironmental dependence. Together, this approach provides a new perspective for understanding enzyme mechanisms and marks a transition from static, label-based classification to dynamic, context-aware generative reasoning. Experiments demonstrate that incorporating microenvironment information reasoning through pragmatic analysis effectively enhances the accuracy of known enzyme annotations (with an F1 score 17.6% higher than the state-of-the-art model), and that the generative reasoning paradigm substantially generating reference-worthy function labels for novel or unseen enzymes.
If you use MPA-EC for research, please cite this paper:
Author Names. "Microenvironment Awareness Pragmatic Analysis for Generative Enzyme Function Reasoning"