Deisy Morselli Gysi, Ítalo do Valle, Marinka Zitnik, Asher Ameli, Xiao Gana, Onur Varola, Susan Dina Ghiassian, J. J. Patten, Robert A. Davey, Joseph Loscalzo, and Albert-László Barabási
Proceedings of the National Academy of Sciences
Introduction: The disruptive nature of the COVID-19 pandemic has unveiled the need for the rapid development, testing, and deployment of new drugs and cures. Given the compressed timescales, the denovo drug development process, which typically lasts a decade or longer, is not feasible. A time-efficient strategy must rely on drug repurposing (or repositioning), helping identify among the com-pounds approved for clinical use the few that may also have a therapeutic effect in patients with COVID-19. Yet, the lack ofreliable repurposing methodologies has resulted in a winner-takes-all pattern, where more than one-third of registered clinical trials focus on hydroxychloroquine or chloroquine, siphoningaway resources from testing a wider range of potentially effectivedrug candidates. While a full unbiased screening of all approveddrugs could identify all possible treatments, given the combinationof its high cost, extended timeline, and exceptionally low successrate (1), we need efficient strategies that enable effective drugprioritization.
Drug-repurposing algorithms rank drugs based on one or multiple streams of information, such as molecular profiles (2),chemical structures (3), adverse profiles (4), molecular docking(5), electronic health records (6), pathway analysis (7), genome (17), and network proximity (11) (Fig. 1A and B). To test the validity of the predictions, we identified 918 drugs ranked by all predictive pipelines, and experimentally screened them to identify those that inhibit viral infection and replication in cultured non-human primate cells (18); the successful outcomes were further validated in human-derived cells. We also collected clinical trial data to capture the medical community’s collective assessment ofdrug candidates. We found that the predictive power varies for the different datasets and metrics, indicating that in the absence of a priori ground truth, it is impossible to determine which algorithm to trust. Our key advance, therefore, is a multimodal ensemble forecasting approach that significantly improves the accuracy and the reliability of the predictions by seeking consensus among the predictive methods (15, 19).
Results: Network-Based Drug Repurposing.Repurposing strategies often prioritize drugs approved for (other) diseases whose molecular manifestations are similar to those caused by the pathogen or disease of interest (20). To search for diseases whose molecular mechanisms overlap with the COVID-19 disease, we first mapped the experimentally identified (21) 332 host protein targets of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)proteins (Dataset S1) to the human interactome (22–25) (DatasetS2), a collection of 332,749 pairwise binding interactions between18,508 human proteins (SI Appendix,Section1.1). We found that208 of the 332 viral targets form a large connected component(hereafter denoted the COVID-19 disease module) (Fig. 2B), indicating that the SARS-CoV-2 targets aggregate in the same net-work vicinity (13, 20). Next, we evaluated the network-based overlap between proteins associated with 299 diseases (26) (d)and the host protein targets of SARS-CoV-2 (v)using the Svdmetric(26), finding Svd>0 for all diseases, implying that the COVID-19disease module does not directly overlap with the disease proteins associated with any single disease (SI Appendix,Figs.S1andS2andDataset S5). In other words, a potential COVID-19 treatment cannot be derived from the arsenal of therapies approved for a specific disease, arguing for a network-based strategy that can identify repurposable drugs without regard for their established disease indication.
Materials and Methods:
Human Interactome, SARS-CoV-2, and Drug Targets. The human interactome was assembled from 21 public databases that compile experimentally derived protein–protein interactions (PPI) data: 1) binary PPIs, derived from high-throughput yeast two-hybrid experiments, three-dimensional protein structures; 2) PPIs identified by affinity purification followed by mass spectrometry;3) kinase substrate interactions; 4) signaling interactions; and 5) regulatory interactions. The final interactome used in our study contains 18,505 proteins, and 327,924 interactions between them. We retrieved interactions betweenSARS-CoV-2 human proteins detected by Gordon et al. (21), and drug–target information from the DrugBank database. A detailed description on the datasets can be found in SI Appendix,Section1.1.
Graph Convolutional Networks. We designed a graph neural network forCOVID-19 treatment recommendations (14), where nodes represent three distinct types of biomedical entities (i.e., drugs, proteins, diseases), and labeled edges represents four types of edges between the entities (PPIs, drug-target associations, disease–protein associations, and drug disease treatments). A detailed description of the method is presented in SI Appendix, Section 2.1.
Diffusion State Distance. The diffusion state distance (17) algorithm uses a graph diffusion to derive a similarity metric for pairs of nodes that takes into account how similarly they impact the rest of the network. A detailed description of the method and its implementation is in SI Appendix,Section 2.2.
Network Proximity. We calculated the proximity of the SARS-CoV2 targets to drug targets using the proximity (11). A detailed description of the method and randomization can be found in SI Appendix, Section 2.3.
Data Availability. All study data are included in the article and supporting information. The code is available in Github at https://github.com/Barabasi-Lab/COVID-19.