a ranking framework for miRNA-disease association identification

| Home | Server | Dataset | Tutorial | Citation |


MicroRNAs (miRNAs) are endogenous non-coding RNAs (ncRNA) containing about 23 nucleotides, which exist in the cells and play a vital role in regulating kinds of life activities. Accumulating evidences have indicated that miRNAs are closely associated with the emergence and progression of complicated diseases. Therefore, identifying miRNA-disease associations is a significant task for revealing pathogenic mechanism of complicated diseases.

Information retrieval task is similar with the task of identifying miRNA-related diseases (as Fig.1). For the process of document retrieval task, given a novel query, the ranking model can rank associated documents and then return the top-k documents. The same procedure can be implemented for predicting diseases associated with query miRNAs through a trained ranking model, where the miRNAs and diseases are analogous to queries and documents respectively.

Fig.1.The similarities between information retrieval task and miRNA-disease association identification task.

Flowchart of idenMD-NRF

Motivated by the successful application of Learning to Rank (LTR) to information retrieval, we propose a predictor called idenMD-NRF(as Fig.2) to identify new miRNA-disease associations by a ranking framework. Compared with existing methods, it has following contributions: (i) idenMD-NRF can work for both two application scenarios: detecting missing associations between known miRNAs and diseases, and predicting diseases associated with newly detected miRNAs. (ii) idenMD-NRF is an inclusive ensemble ranking framework. Therefore, complementary predictors can be integrated for further performance improvement. (iii) It is biased that most existing methods treated unknown associations as negative samples. However, some potential positive associations may be in the unknown sample set. idenMD-NRF can consider the top-ranked samples, and greatly weaken the negative impact of unknown samples. (iv) A new deep learning technique with strong graph representation ability, node2vec, is employed by idenMD-NRF to capture the high-level association features

idenMD-NRF web server
Fig.2.The framework of idenMD-NRF. Three main steps are as follows: (i) Extract association features. MiRNA similarity network, disease similarity network and miRNA-disease association network are constructed and integrated to a heterogenous network. In this step, the difference between these two application scenarios is whether query miRNAs belong to the set of known miRNAs. Obviously, query miRNAs have been included in heterogenous network for first application scenario. However, query miRNAs are not contained in the set of known miRNAs for second application scenario. Therefore, query miRNAs should be combined with known miRNAs and diseases to reconstruct heterogenous network. Then, node2vec is employed to extract global topological features based on the constructed heterogeneous network. (ii) Calculate association scores. Association scores are obtained by different component methods. (iii) Rank diseases list. A predictor based on LambdaMART is trained and rank candidate diseases for query miRNAs.