sciencemag.org 20 FEBRUARY 2015 • VOL 347 ISSUE 6224
A disease is rarely a straightforward consequence of an abnormality in a single gene, but rather reflects the interplay of multiple molecular processes. The relationships among these processes are encoded in the interactome, a network that integrates all physical interactions within a cell, from protein-protein to regulatory protein–DNA and metabolic interactions. The documented propensity of disease-associated proteins to interact with each other suggests that they tend to cluster in the same neighborhood of the interactome, forming a disease module, a connected subgraph that contains all molecular determinants of a disease. The accurate identification of the corresponding disease module represents the first step toward a systematic understanding of themolecular mechanisms underlying a complex disease. Here, we present a network-based framework to identify the location of disease modules within the interactome and use the overlap between the modules to predict disease-disease relationships.
Despite impressive advances in high-throughput interactome mapping and disease gene identification, both the interactome and our knowledge of disease-associated genes remain incomplete. This incompleteness prompts us to ask to what extent the current data are sufficient to map out the disease modules, the first step toward an integrated approach toward human disease. To make progress, we must formulate mathematically the impact of network incompleteness on the identifiability of disease modules, quantifying the predictive power and the limitations of the current interactome.
Using the tools of network science, we show that we can only uncover disease modules for diseases whose number of associated genes exceeds a critical threshold determined by thenetwork incompleteness. We find that disease proteins associated with 226 diseases are clustered in the same network neighborhood, displaying a statistically significant tendency to formidentifiable diseasemodules. The higher the degree of agglomeration of the disease proteins within the interactome, the higher the biological and functional similarity of the corresponding genes. These findings indicate that many local neighborhoods of the interactome represent the observable part of the true, larger and denser disease modules. If two disease modules overlap, local perturbations causing one disease can disrupt pathways of the other diseasemodule as well, resulting in shared clinical and pathobiological characteristics. To test this hypothesis, we measure the network-based separation of each disease pair, observing a direct relation between the pathobiological similarity of diseases and their relative distance in the interactome. We find that disease pairs with overlapping diseasemodules display significant molecular similarity, elevated coexpression of their associated genes, and similar symptoms and high comorbidity. At the same time, nonoverlapping disease pairs lack any detectable pathobiological relationships. The proposed network-based distance allows us to predict the pathobiological relationship even for diseases that do not share genes.
Despite its incompleteness, the interactome has reached sufficient coverage to allow the systematic investigation of disease mechanisms and to help uncover the molecular origins of the pathobiological relationships between diseases. The introduced network-based framework can be extended to address numerous questions at the forefront of network medicine, from interpreting genome-wide association study data.