BioSeq-Analysis is a platform for DNA, RNA, and protein sequence analysis based on machine learning approaches

| Home | Server | Tutorial | Document | Download | Citation | Contact us |

Descriptions of BioSeq-Analysis

Because of its importance, recently some web-servers or stand-alone tools have been developed to facilitate the developments of the biological sequence analysis field, for example, repDNA [1], PseKNC [2], PseKNC-General [3], RepRNA [4] were constructed to generate the features of DNA/RNA sequences, and PseAAC [5], Propy [6] were designed to extract the features of protein sequences. Recently, the Pse-in-One [7, 8] has been established to generate and analyze the features of DNA, RNA, and protein sequences with the properties defined by users themselves according to their requirements. The Pse-Analysis [9] made a further attempt to construct an intelligent system by incorporating the feature extraction algorithms into Support Vector Machines.

All these aforementioned tools and web-servers have been widely used in the field of biological sequence analysis, and have played a role in simulating the development of this very important field. However, further work is needed due to the following reasons: (1) Although some tools have been proposed for establishing the biological sequence analysis predictors, they only focus on specific individual step. An automatically platform is required to promote the development of the computational predictors in this field; (2) It is never an easy task to find the optimized features, classifiers, and their combinations when establishing a predictor for biological sequence analysis; (3) All the available tools have missed some features or machine learning classifiers proposed very recently.

The present study was initiated an attempted to overcome the three shortcomings by establishing a powerful platform for biological sequence analysis based on machine learning techniques called BioSeq-Analysis. It combines the three main processes including feature extraction, predictor construction, and performance evaluation. All the parameters in the three steps can be automatically optimized. It is anticipated that BioSeq-Analysis would be a powerful tool in biological sequence analysis.


1.Liu B, Liu F, Fang L, Wang X, Chou K-C: repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics 2014, 31(8):1307-1309. (PMID: 25504848, cited by 135)

2.Chen W, Lei T-Y, Jin D-C, Lin H, Chou K-C: PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Analytical biochemistry 2014, 456:53-60. (PMID: 24732113, cited by 129)

3.Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou K-C: PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 2014, 31(1):119-120. (PMID: 25231908, cited by 85)

4.Liu B, Liu F, Fang L, Wang X, Chou K-C: repRNA: a web server for generating various feature vectors of RNA sequences. Molecular Genetics and Genomics 2016, 291(1):473-481. (PMID: 26085220, cited by 77)

5.Shen H-B, Chou K-C: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical biochemistry 2008, 373(2):386-388. (PMID: 17976365, cited by 261)

6.Cao D-S, Xu Q-S, Liang Y-Z: propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 2013, 29(7):960-962. (PMID: 23426256, cited by 169)

7.Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic acids research 2015, 43(W1):W65-W71. (PMID: 25958395, cited by 205)

8.Liu B, Wu H, Chou K-C: Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. Natural Science 2017, 9(04):67. (cited by 9)

9.Liu B, Wu H, Zhang D, Wang X, Chou K-C: Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods. Oncotarget 2017, 8(8):13338. (PMID: 28076851, cited by 20)

Harbin Institute of Technology, Shenzhen.

网站备案号: 粤ICP备19041859号-1