In this study, we proposed a new human ncRNA multi-label subcellular localization prediction method named ncRNALocate-EL based on ensemble learning strategy. The proposed method consists of four levels. (1) Data level, three datasets were used to predict three types of human ncRNA subcellular localization, including human snoRNA, human miRNA, and human lncRNA. (2) Feature level, four kinds of feature methods were employed to enrich feature representation from various aspects, including Syntax Rules feature, Term Frequency–Inverse Document Frequency feature (TF-IDF), and TextRank features, sequence based features. Through feature extraction, analysis, and fusion of different features, the optimal feature combination is selected. (3) Algorithm level, an ensemble model preditor is proposed, three kinds of ML base-learners are used. (4) Prediction level, identifying and evluate the different subcellular locations of ncRNA, different categories of ncRNAs have different subcellular locations. facilitate broader research.

Figure.1 The workflow of ncRNALocate-EL.


Upon the usage the users are requested to use the following citation:

·Tao Bai, Bin Liu. ncRNALocate-EL : A novel multi-label Subcellular Locality prediction model of ncRNA based on ensemble learning. Briefings in Functional Genomics, 2023. DOI: 10.1093/bfgp/elad007.