In order to uncover the meanings of “book of life”, 155 different biological language models (BLMs) for DNA,
RNA and protein sequence analysis are introduced and discussed in this study, which are able to extract
the linguistic properties of “book of life”. We also extend the BLMs into a system called BioSeq-BLM for
automatically representing and analyzing the sequence data. Experimental results show that the predictors
generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting
state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches
for biological sequence analysis based on natural language processing, and contribute to the development
of this very important field.
The similarity between Bioinformatics and Natural Language Processing: