a platform for analyzing DNA, RNA, and protein sequences based on biological language models

The introduction of BioSeq-BLM

In order to uncover the meanings of “book of life”, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are introduced and discussed in this study, which are able to extract the linguistic properties of “book of life”. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing, and contribute to the development of this very important field.

The similarity between Bioinformatics and Natural Language Processing: