About IDP-Fusion

Motivation: Intrinsically disordered regions (IDRs) are widely distributed in proteins and related to many important biological functions. Accurately identifying IDRs is of great significance for protein structure and function analysis. Because the long disordered regions (LDRs) and short disordered regions (SDRs) share different characteristics, the existing predictors failed to achieve stable performance on datasets with different ratios of LDRs and SDRs. The main reason is that the existing predictors construct network structures based on their own experiences. As a result, it is hard to capture the feature representation hidden in the protein sequences.

Results: In this study, the Natural Architecture Search (NAS) algorithm was employed to automatically to construct the network structures so as to capture the hidden features in protein sequences. In order to stably predict both the LDRs and SDRs, the model constructed by NAS was combined with length-dependent models for capturing the unique features of SDRs or LDRs, and general models for capturing the common features between LDRs and SDRs, and a new predictor called IDP-Fusion was proposed. Experimental results showed that IDP-Fusion can achieve more stable performance than the other exiting predictors on independent test sets with different ratios of SDRs and LDRs.