Document
1. Datasets
The datasets used in this study come from the Genomics of Drug Sensitivity in Cancer (GDSC)[1], the Cancer Cell Line Encyclopedia (CCLE)[2], and The Cancer Genome Atlas (TCGA)[3], which are used to construct drug-incremental learning, cancer-incremental learning, and institute-incremental learning scenarios. These datasets can be downloaded from the following link:
Dataset: GDSC portal CCLE portal TCGA portal
2. Tools
In this study, various tools were used for feature extraction and downstream analysis, including TCGAbiolinks[4,5], pubchem[6], DESeq2[7], TIMER[8], and HPAanalyze[9]. Detailed instructions for installation and configuration are provided in the following links:
TCGAbiolinks: https://www.bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html
pubchempy: https://pubchem.ncbi.nlm.nih.gov/
DESeq2: https://www.bioconductor.org/packages/release/bioc/html/DESeq2.html
TIMER: https://cistrome.shinyapps.io/timer/
HPAanalyze: https://www.bioconductor.org/packages/release/bioc/html/HPAanalyze.html
3. References
[1] Iorio, Francesco, Theo A. Knijnenburg, Daniel J. Vis, Graham R. Bignell, Michael P. Menden, Michael Schubert, Nanne Aben et al. "A landscape of pharmacogenomic interactions in cancer." Cell 166, no. 3 (2016): 740-754. [2] Ghandi, Mahmoud, Franklin W. Huang, Judit Jané-Valbuena, Gregory V. Kryukov, Christopher C. Lo, E. Robert McDonald III, Jordi Barretina et al. "Next-generation characterization of the cancer cell line encyclopedia." Nature 569, no. 7757 (2019): 503-508. [3] Liu, Jianfang, Tara Lichtenberg, Katherine A. Hoadley, Laila M. Poisson, Alexander J. Lazar, Andrew D. Cherniack, Albert J. Kovatich et al. "An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics." Cell 173, no. 2 (2018): 400-416. [4] Colaprico, Antonio, Tiago C. Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S. Sabedot et al. "TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data." Nucleic acids research 44, no. 8 (2016): e71-e71. [5] Mounir, Mohamed, Marta Lucchetta, Tiago C. Silva, Catharina Olsen, Gianluca Bontempi, Xi Chen, Houtan Noushmehr, Antonio Colaprico, and Elena Papaleo. "New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx." PLoS computational biology 15, no. 3 (2019): e1006701. [6] Kim, Sunghwan, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li et al. "PubChem 2019 update: improved access to chemical data." Nucleic acids research 47, no. D1 (2019): D1102-D1109. [7] Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology 15 (2014): 1-21. [8] Li, Taiwen, Jingxin Fu, Zexian Zeng, David Cohen, Jing Li, Qianming Chen, Bo Li, and X. Shirley Liu. "TIMER2. 0 for analysis of tumor-infiltrating immune cells." Nucleic acids research 48, no. W1 (2020): W509-W514. [9] Tran, Anh Nhat, Alex M. Dussaq, Timothy Kennell, Christopher D. Willey, and Anita B. Hjelmeland. "HPAanalyze: an R package that facilitates the retrieval and analysis of the Human Protein Atlas data." Bmc Bioinformatics 20 (2019): 1-11.