a ranking framework for miRNA-disease association identification

| Home | Server | Dataset | Tutorial | Citation |


MicroRNAs (micro ribonucleic acid, miRNAs) are endogenous non-coding RNAs (ncRNA) containing about 23 nucleotides, which exist in the cells and play a vital role in regulating kinds of life activities. Accumulating evidences have indicated that miRNAs are closely associated with the emergence and progression of complicated diseases. Therefore, identifying miRNA-disease associations is a significant task for revealing pathogenic mechanism of complicated diseases.

Information retrieval task is similar with the task of identifying miRNA-related diseases (as Fig.1). For the process of document retrieval task, given a novel query, the ranking model can rank associated documents and then return the top-k documents. The same procedure can be implemented for predicting diseases associated with query miRNAs through a trained ranking model, where the miRNAs and diseases are analogous to queries and documents respectively.

Fig.1.The similarities between information retrieval task and miRNA-disease association identification task.

Flowchart of idenMD-NRF

Motivated by the successful application of Learning to Rank (LTR) to information retrieval, we propose a predictor called idenMD-NRF(as Fig.2) to identify new miRNA-disease associations by a ranking framework. Compared with existing methods, it has following contributions: (i) idenMD-NRF can work for both two application scenarios: detecting missing associations between known miRNAs and diseases, and predicting diseases associated with newly detected miRNAs. (ii) idenMD-NRF is an inclusive ensemble ranking framework. Therefore, complementary predictors can be integrated for further performance improvement. (iii) It is biased that most existing methods treated unknown associations as negative samples. However, some potential positive associations may be in the unknown sample set. idenMD-NRF can consider the top-ranked samples, and greatly weaken the negative impact of unknown samples. (iv) A new deep learning technique with strong graph representation ability, node2vec, is employed by idenMD-NRF to capture the high-level association features

idenMD-NRF web server
Fig.2.The framework of idenMD-NRF. Three main steps are as follows: (i) Extract association features. Node2vec is employed to extract global topological features based on constructed heterogeneous networks. (ii) Calculate association scores. Association scores are obtained by different component methods. (iii) Rank diseases list. A predictor based on LambdaMART is trained and rank candidate diseases for query miRNAs.