DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation |
Supplementary Information S2 -- The extended dataset contains 525 DNA-binding protein and 2059 non DNA-binding protein sequences, which is used for ensemble learning to improve prediction accuracy.
Supplementary Information S3 -- The benchmark dataset contains 1075 protein sequences, which are classified into subset S+ with 525 DNA-binding proteins (positive samples) and subset S- with 550 non DNA-binding proteins (negative samples).