Lab Developed Methods
Take a Look at Our Lab’s Methods!
Our lab develops methods for system level analysis of high-throughput data. Primarily written in the R programming language, our packages and tools can be downloaded from GitHub, BioConductor, or as R packages! Depending on the dataset size, our methods can be run locally from your machine, or may require a high performance computer cluster, such as Quest at Northwestern. Find links and description to our methods below!
Please feel free to contact us if you have questions or are having issues with implementing our code!
Network Analysis Methods
Network-Based Pathway Analysis
GeneSurrounder, a new algorithm that ranks genes based on the evidence that they are sources of disruption on the network of interacting genes. Since the effects of a “disruptive” source gene would propagate outward in the interaction network, we find these genes by searching for a telltale pattern of attenuating and correlated biological signal in the data.
Time-lagged Ordered Lasso for network inference
Accurate gene regulatory networks can be used to explain the emergence of different phenotypes, disease mechanisms, and other biological functions. We adapted the time-lagged Ordered Lasso, a regularized regression method with temporal monotonicity constraints, for de novo reconstruction. We also developed a semi-supervised method that embeds prior network information into the Ordered Lasso to discover novel regulatory dependencies in existing pathways.
Inferring the structure of gene regulatory networks from high-throughput datasets remains an important and unsolved problem. We developed a semi-supervised network reconstruction algorithm that enables the synthesis of information from partially known networks with time course gene expression data. We adapted partial least square-variable importance in projection (VIP) for time course data and used reference networks to simulate expression data from which null distributions of VIP scores are generated and used to estimate edge probabilities for input expression data.
Network-Based Pathway Analysis
- PoDA.R – Main script to perform the PoDA calculations
- PoDA-example.R – Example usage of PoDA.R
- PoDA-example-data.RData – Data used in PoDA-example.R (necessary for example)
- plotSvals.R – A script to generate the boxplots of the S values, similar to those shown in the paper.
PoDA is a pathway-based, multi-SNP analysis method for GWAS data. The method is based upon the hypothesis that if a pathway is related to disease risk, cases will appear more similar to other cases than to controls for the SNPs associated with that pathway; by systematically applying the method to all pathways of potential interest, we can identify those for which the hypothesis holds true, i.e., pathways containing SNPs for which the samples exhibit greater within-class similarity than across classes. PoDA improves on existing single-SNP and SNP-set enrichment analyses in that it does not require the SNPs in a pathway to exhibit independent main effects.
Circadian Biology Methods
Circadian Time Prediction
TimeSignature is a machine-learning approach to predict physiological time based on gene expression in human blood. A powerful feature is TimeSignature’s generalizability, enabling it to be applied to samples from disparate studies and yield highly accurate results despite systematic differences between the studies. This quality is unique among expression-based predictors and addresses a major challenge in the development of reliable and clinically useful biomarker tests.
High-Throughout Data Analysis Resources
Getting Up To Speed with High-Throughput Data Analysis
- Review – System Analysis of High Throughput Data
- Review – Implication of Big Data for Cell Biology
- Review – Coming of age: ten years of next-generation sequencing technologies
- Review – Multi-Omic Approaches to Disease
- Review – Methods for the integration of multi-omics data: mathematical aspects
- Review – Statistical Methods in Integrative Genomics
- Review – Analyzing gene expression data in terms of gene sets: methodological issues
- Review – Studying and modeling dynamic biological processes using time-series gene expression data
Getting Up To Speed with R
- Coding Cheat Sheets
Getting Up To Speed with GitHub
Getting Up To Speed with Northwestern Quest Super Computer