Biological data type
Homogeneous biological sequence similarities
For the homogeneous biological sequence similarities, the queries and the retrieved samples are homogeneous.
Heterogeneous biological sequence similarities
For the heterogeneous biological sequence similarities, the queries and the retrieved samples are heterogeneous.
Biological sequence similarities calculation methods
Distribution methods
Representation methods
Interaction methods
The input data should be in the BLS format. Detailed information of the BLS format is introduced in the followings:
Required format of input biological sequences.
Please enter the biological sequences in FASTA format.
Example:
>5www_A
GHHHHHHMQAALLRRKSVNTTECVPVPSSEHVAEIVGRQLGMVLWIYKWFKPDGRLTDEQIADGMVGMLFPPFYIKTPVRGEEPIFVVTGRKEDVAMAKREILSAAEHFSMIRAS
Required format of the input vectors.
Please enter the feature vectors in following format (similar to FASTA format).
>vec_name1
vec_val1 vec_val2 vec_val3 ... vec_valn
Example:
>Data_A_ID:0
0.045 0.027 0.035 0.030 0.039 0.023 0.006 0.032 0.030 0.021 0.045 0.024 0.023 0.029 0.035 0.035 0.199 0.168 0.158
Required format of the input labels.
Please enter the associations in list format.
Example:
0 0
1 1
2 0
2 1
3 3
3 31
4 3
Interaction with BioSeq-BLM
Format conversion
If users want to generate feature vectors for input biological sequences, the pipeline software, BioSeq-BLM is recommended.
Based on biological language models, BioSeq-BLM can extract features representing linguistics attributes and biological attributes of biological sequences.
Users can serve the feature vectors generated by BioSeq-BLM as the input of BioSimi-BLS after a simple format conversion.