Motivation: Intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. The IDRs are divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. Previous studies have shown that LDR and SDR have different proprieties. However, the existing computational methods fail to extract different features for LDRs and SDRs separately. As a result, they achieve unstable performance on datasets with different ratios of LDR and SDR.
Results: In this study, a two-layer predictor was proposed called DeepIDP-2L. In the first layer, two kinds of attention-based models are used to extract different features for LDRs and SDRs, respectively. The hierarchical attention network (HAN) is used to capture the distribution pattern features of LDRs, and convolutional attention network (CAN) is used to capture the local correlation features of SDRs. The second layer of DeepIDP-2L maps the feature extracted in the first layer into a new feature space. Convolutional network (CNN) and bidirectional long short term memory (Bi-LSTM) are employed to capture the local and long-range information for predicting both SDRs and LDRs. Experimental results show that DeepIDP-2L can achieve more stable performance than other exiting predictors on independent test sets with different ratios of SDRs and LDRs.