Identifying Pathways Associated with Disease States
The complex nature of the most burdensome diseases has driven an explosion of interest in systems biology—an approach in biomedical research that seeks to elucidate mechanisms as a whole by putting pieces together, rather than reducing them to constituent parts. The growth of bioinformatic pathway databases enables systems-based approaches by encapsulating knowledge about interaction networks in an algorithmically queryable form. This information can be combined with high throughput omics data to identify systems associated with disease.
Using techniques derived from graph theory, machine learning, nonlinear dimension reduction, and dynamical systems modeling, we have developed innovative algorithms to summarize omicdata at the systems–level, without relying upon single-gene tests. These summary scores are compared between cases and controls to identify pathways associated with disease. For example, in our 2011 paper we described the Pathway Partition Decoupling Method (Pathway- PDM), a technique that uses nonlinear dimension reduction to summarize gene co-expression patterns across pathways and accurately identify molecular subtypes of samples. Such inferences have the potential to be used for drug design and treatment decisions. In our 2017 paper, we extended these analyses to integratively analyze multi-omics data at the systems level, elucidating the role of gene regulation by miRNAs.
Using Network Analysis Methods to Identify Disease Mechanisms
Systems–level analysis can provide high–level mechanistic insights and are often more robust than gene–level approaches. However, because pathways comprise hundreds of genes, the results may be difficult to interpret or to target experimentally. There is thus a need for methods that not only detect significant pathways, but can also identify elements within those pathways that can be targeted experimentally (i.e, can we identify pathway “control knobs” or “driving genes”?) To this end, we developed GeneSurrounder, a method that identifies nodes in pathway networks that are “epicenters” of dysregulation. We demonstrated that incorporating pathway network information yielded more reliable gene–level results than simple tests of differential expression.
How Can Network Analysis Methods Solve Real World Problems?
In collaboration with Seth Corey (VCU), we applied GeneSurrounder to understand how mutations in GCSFR, which are common in severe congenital neutropenia (SCN), impair signaling on the network to drive cells to a proliferative state. This work helped identify drivers of malignant transformation, has important implications for predicting the malignant transformation of myelodysplasias (SCN to secondary AML), and provides a modeling framework that can be used to guide treatment decisions.
Refining Our Understanding of Network Connections To Improve Biological Inference
Pathway analysis techniques, including the methods described above, require as an input both the experimental data and a candidate network. These are typically obtained from curated pathway databases, and are assumed to accurately represent the underlying biology. However, it is known that these descriptions may be incomplete, and that the pathways may be altered (“rewired”) under certain conditions. Considerable efforts have thus focused on inferring the underlying interaction networks from observed omics data. Yet despite advances in machine learning, no strategy has been able to accurately reconstruct large-scale biological networks de novo from omics data alone.
A more productive approach integrates what is already known about the pathway and uses omic data to refine the network model, rather than inferring it from scratch. We have developed a number of methods that take database-derived networks as an initial structure, and then infer novel connections using experimental data [Study 1, Study 2, Study 3, Study 4]. Using these techniques, were we able to identify novel regulators of pathway activity and discover “regQTLs” – genetic variants that impact the regulatory relationship between miRNAs and target genes.
Semi-supervised network refinement
We have developed several “semi-supervised” approaches for network reconstruction that use database-derived pathway networks as a starting point, and then refine the networks based on experimental data by adding and removing links from the reference network in a data-driven manner. We pioneered a Bayesian approach to infer novel connections in pathways by comparing dependencies observed in time-course gene expression data to those derived from a dynamical systems model using prior networks. We were able to demonstrate that it outperforms other methods when applied to both simulated and real data. The method was published in Bioinformatics, the top statistical bioinformatics journal
We also recently published a network reconstruction method based on time-lagged ordered lasso that could be applied in either a “de novo” or “semi-supervised” mode. Applying this method to time-course gene expression in HeLa cells, we demonstrated that our method could correctly detect new, previously unknown edges as well as exclude incorrect links from prior network models. Our altered networks were confirmed by subsequent experiments.