Susan D Ghiassian, PhD; Ivan Voitalov; Johanna B Withers, PhD; Marc Santolini; Alif Saleh; Viatcheslav R Akmaev, PhD
Translational Research – The Journal of Laboratory and Clinical Medicine
Introduction: Ulcerative colitis (UC) is a chronic, relapsing disease characterized by diffuse mucosal inflammation of the colon. UC is part of a larger spectrum of chronic relapsing diseases of the intestinal tract classified as inflammatory bowel disease (IBD), which also includes Crohn’s disease (CD). IBD is a growing health problem, and the estimated prevalence is 568 cases per 100,000 persons in the US and 827 cases per 100,000 persons in Europe. Approximately 20% of patients with UC present symptoms before age 20. UC is diagnosed based on clinical presentation and endoscopic evidence of inflammation in the rectum that extends proximally into the colon. Clinical manifestations of active disease include bloody diarrhea, rectal urgency, abdominal pain, weight loss, and malaise.
The treatment goal in UC is to induce remission and maintain a corticosteroid-free remission, which often requires use of a targeted therapy. Approved targeted therapies include anti-integrin ɑ4β7 (e.g. vedolizumab), anti-interleukin-12 of 23 (eg, ustekinumab), tumor necrosis factor inhibitor (TNFi; eg, adalimumab, infliximab and golimumab) and Janus kinase inhibitor (JAKi; eg, tofacitinib) therapies.
Selecting the right drug for individual patients from day 1 of treatment results in faster recovery, less pain, and improved quality of life, particularly in chronic progressive diseases such as UC. However, response to targeted therapies in UC are as low 20%–50%. Clinical response in UC clinical trials is defined as a decline in Mayo score of ≥3 points and either a ≥30% relative decrease from baseline with at least a 1 point decrease in rectal bleeding or a rectal bleeding score of 0 or 1. This definition of response is less stringent than the treatment goals of clinical remission (Mayo score of 0-2) and mucosal healing (endoscopic score of 0-1). This highlights the urgent need to develop precision medicine tools for UC patients so that a therapy suitable to each patient’s disease biology can be prescribed.
Analysis of the map of human disease biology called the Human Interactome can be used to interpret a patients’ unique molecular signature in order to identify which therapy will work for which patient based on each individual’s unique biology. Analysis of the topology and dynamics of the Human Interactome can reveal the underlying biological processes regulating many of the most common and difficult to treat diseases. This has resulted in the ability to discover novel targets, reprioritize known targets, and develop new biomarkers to predict drug response. Application of this technology is especially useful in human complex diseases such as autoimmune conditions like rheumatoid arthritis and UC. The current goal is to develop diagnostic tests to predict which patients will have an inadequate response to targeted therapies and define new drug targets and pathways for novel therapeutic development.
Materials and Methods:
Training cohort, GSE14580: Twenty-four patients with active UC, refractory to corticosteroids or immunosuppression, underwent colonoscopy with biopsies from diseased colon within a week prior to the first intravenous infusion of 5 mg infliximab per kg body weight. Response to infliximab was defined as endoscopic and histologic healing at 4-6 weeks after first infliximab treatment (8 patients as responders and 16 patients as inadequate responders). This data also included 6 healthy controls. Total RNA was isolated from colonic mucosal biopsies, labelled, and hybridized to Affymetrix Human Genome U133 Plus 2.0 Arrays.
Validation cohort, GSE12251: Twenty-two patients underwent colonoscopy with biopsy before infliximab treatment. Response to infliximab was defined as endoscopic and histologic healing at week 8 (12 patients as responders and 11 patients as inadequate responders). For 1 patient, data from 2 samples taken at different timepoints were available. RNA was isolated from pre-infliximab biopsies, labeled and hybridized to Affymetrix Human Genome U133 Plus_2.0 Array.
This study was performed in accordance with the principles outlined in the Declaration of Helsinki.
The 2 datasets were downloaded using GEOquery R package. Before-treatment gene expression data were extracted by setting the visit time point to baseline. Probe IDs were converted to gene Entrez ID using the hgu133plus2.db database. The 2 datasets were merged by the common probe IDs. Batch effects were removed using ComBat from the sva R package. To retain the biological differences between responders and inadequate responders, cohort-specific biomarkers were derived prior to applying ComBat.
The Human Interactome, previously described, contains experimentally determined physical interactions between proteins. These interactions include, regulatory, metabolic, signaling, and binary interactions. The Human Interactome amalgamates data from more than 300 thousand interactions among 18 thousand proteins.
Identification of classifier genes
For all genes, the Pearson correlation between the gene expression values and the response to treatment was determined. The signal-to-noise ratio of each gene correlation was calculated by randomly shuffling the response outcome 100 times. Selected genes were then mapped onto the consolidated Human Interactome and the largest connected component (LCC) was determined.
Classifier design and validation
Genes identified as discriminatory between responders and inadequate responders to infliximab that were in the LCC were used as features of a probabilistic neural network. Classifier training was performed by implementing the R package pnn in Python. The classifier training included in-cohort validation using a leave-one-sample-out cross-validation where the classifier was blind to the response outcome of that left-out patient. The classifier was trained using the default smoothing parameter (σ = 0.8). The classifier was validated on the validation cohort where the training cohort was used for feature selection and classifier training and the validation cohort was used for independent validation.
The classifier provided a probability for each patient reflecting whether or not that individual responded to infliximab treatment. The log-likelihood ratio of response and inadequate response probabilities were used to define a score for each patient and draw the receiver operating characteristic (ROC) curves by comparing the score to actual response outcomes. The area under the curve (AUC) determined the performance of the classifier. In cross-cohort assessment, the trained classifier was blind to the clinical outcomes of the validation cohort patients.
Response module randomization
Response module was comprised of the largest connected component formed by top genes when derived from both training and validation cohorts. None of the shared genes (STC1, PAPPA, SOD2 and HGF) between the 2 cohorts’ top gene sets was a high degree node in the Human Interactome, that could have caused a high degree of perceived connectedness between the LCC genes from the 2 cohorts. Hence, nodes were randomly assigned to both cohorts uniformly at random.
Identification of gene expression features predictive of inadequate response to infliximab
To identify genes whose expression best distinguished responders from inadequate responders to infliximab, 2 publicly available UC patient gene expression datasets were downloaded for which the clinical outcomes data were available. The training cohort data were analyzed to find genes with significant gene expression deviations between responders and inadequate responders (response prediction genes). Unlike conventional differential expression methods that look for large fold-changes in gene expression between 2 groups, this analysis investigated small but significant changes – a high signal-to-noise ratio – between the 2 groups (see Material and methods). Genes were ranked by decreasing value of signal-to-noise ratio and the top genes with the highest signal-to-noise ratio were selected as infliximab response discriminatory genes (Fig 1A).
Refinement of molecular signature genes using the Human Interactome
The top genes from the training cohort for which expression values across patients were significantly correlated to clinical outcome after infliximab treatment were selected and mapped onto the Human Interactome network map of protein-protein interactions (see Material and Methods) (Fig 1B-C). The cut-off of the top 123 probes was empirically determined as the minimum number of genes needed for the LCC size to plateau (Supplementary Figure S1). Although these genes were identified from gene expression data only, proteins encoded by these genes formed several clusters with the largest 1 containing 12 proteins on the Human Interactome. The associated proteins to the set of 123 probes on the Human Interactome was significantly closer to each other than expected by chance (z-score of -2.07) (Fig 1B-C and Table I). Absolute z-scores > 1.6 have been associated with sub-networks of underlying disease biology.
Classifier training and blinded cross-cohort validation
The LCC genes from the training cohort were used to train a probabilistic neural network; an optimum pattern classifier that minimizes the risk of incorrectly classifying an object with high efficiency. The probabilistic neural networks were trained using the LCC genes and patient data to teach the predictive classifier the appropriate patient outcome (ie, response or inadequate response to infliximab) for each input (ie, gene expression levels of LCC genes) (see Material and Methods).
Blinded, independent cross-cohort validation assessed the performance of the predictive classifier. In this analysis, the classifier that was trained on the known data and outcomes from the training cohort was used to predict the outcomes on the validation cohort, ultimately testing the ability of the classifier to accurately predict inadequate response to infliximab in an unseen patient population. The classifier predicted probabilities were converted to a continuous classifier prediction score using log-likelihood ratio (see Material and Methods). ROC curves, which plot the rate of false positives versus the rate of true positives, were used to assess cross-cohort performance (Fig 2A). An AUC of 0.83 was observed for the classifier predicting inadequate response to infliximab among validation cohort patients. The distribution of classifier prediction scores within the validation cohort showed a significant difference between the classifier prediction scores for responders and inadequate responders (P-value = 0.004) (Fig 2B). Additionally, the cross-cohort positive predictive value (PPV) and sensitivity were estimated (Fig 2C), which are metrics that describe the accuracy of the inadequate response predictions. At a 100% PPV, the classifier had a sensitivity of 64%.
Baseline gene expression profiles of responders more closely resemble healthy controls
To further evaluate the 12-LCC classifier genes, the expression was compared between responders and inadequate responders. In general, the changes in gene expression between these 2 patient groups were small and no gene reached a threshold that indicated a significant differential expression (P-value ≤ 0.05 and enrichment of -log(0.05) ≥ 2.99) (Fig 3A). However, the expression of the 12-LCC classifier genes tended toward higher enrichment scores. One interpretation was that the genes with the greatest fold-change were not necessarily the most relevant to treatment response as they could potentially be downstream of master regulators or be altered by indirect/secondary consequences of disease-relevant signaling processes.
Next, the expression of the 12-LCC classifier genes was compared to that of healthy controls, comparing the fold changes relative to responders or to inadequate responders. The patients who were inadequate responders showed the largest divergence in gene expression pattern from that of healthy controls (Fig 3B). Unsupervised hierarchical clustering analysis showed that the baseline expression profiles of the patients who responded to infliximab more closely resembled the expression pattern of healthy controls than did the inadequate responders (Fig 3C).
The UC infliximab response module is a sub-network on the Human Interactome
The Human Interactome can serve as a blueprint to better understand the interconnectivity and underlying biology of the inadequate response prediction genes. Thus, the top 200 genes with the highest signal-to-noise ratio between responders and inadequate responders among the validation cohort data were also determined. When the 200 top genes from the training and validation cohorts were mapped simultaneously onto Human Interactome, the genes were not randomly scattered on the network, but instead significantly interacted with each other (z-score, absolute value of 7.68) forming a common response module LCC (Fig 4) that was significantly larger than the random expectation (139 genes; z-score of 2.09). To account for genes that were shared between the 2 cohort gene lists, including any high-degree node (UBC) on the Human Interactome, a careful randomization was made to estimate the significance of interconnectivity (see Material and Methods). Three proteins in the response module LCC (GK, FFAR2, CEBPB) are direct interaction partners of TNF-ɑ, the protein target of infliximab. Several proteins in the response module LCC were orphan genes that were not previously part of LCCs of the individual cohorts (eg, STC1 and IL7R) yet were integrated into this response module LCC (Fig 4A). These results shows that even though the response discriminatory genes identified from each cohort were apparently distinct with minimal overlap, their protein products tended to interact significantly on the network, reflecting the existence of an underlying disease biology sub-network, or response module, that defined a molecular signature of non-response to infliximab in UC patients.