Motivation : Protein remote homology detection is necessary to the structure prediction and function prediction, even the disease mechanism understanding. The remote homology relationship depends on multiple protein properties like structural information. A language model considering multiple protein properties is urgently desired to achieve an accurate remote homology detection.
Results : We propose a novel deep neural network-based language model, the cubic biology language model (CBLM), with three styles of motifs combing. The cubic biology language model integrates different protein properties to identify the remote homology relationship, like restoring the original scene from multiple photos. The evaluation result on the test set and independent test set shows an outperformed prediction of CBLM than other state-of-the-art methods. Furthermore, the sequence representation the CBLM generates distinguishes proteins into different structural classes at the high-dimensional space.
School of Computer Science and Technology, Beijing Institute of Technology, China.
Copyright@ By Liu Lab, Beijing Institute of Technology.