Although life is inherently dynamic, observing gene expression dynamics on a genome-wide scale remains challenging. In consequence, most data comprises either a single timepoint observation or a short/sparse timecourse from which we seek to make inferences about functionally relevant temporal dynamics. To this end, our group develops novel machine learning algorithms to make dynamical inferences from one or two measurements of gene expression.
An important application domain of this work is in the study of circadian regulation. The circadian clock, an inherent 24-hour physiological rhythm that synchronizes to the Earth’s day, is an evolutionarily conserved mechanism that orchestrates a large variety of biological processes across multiple scales (cell, tissue, and organism). Despite its apparent simplicity, the circadian clock cannot be explained as a simple chemical oscillator. For example, the circadian clock is temperature compensated: even in absence of light or other entrainment cues, it stably maintains an intrinsic free-running period of 24 hours across a large range of temperatures, despite the fact that reaction rates are more rapid at higher temperatures. Yet it cannot be said that the clock is temperature insensitive; the circadian clock can entrain to fluctuations in environmental temperature by advancing or delaying phase when the temperature changes. The combined features of temperature compensation and temperature-sensitive entrainment are crucial to maintaining life in a changing world. The molecular mechanisms governing these dynamics are still not fully understood.
Underscoring the clock’s fundamental importance is abundant epidemiological evidence linking circadian disruption to adverse health outcomes (including cardiovascular disease and neurodegeneration). Yet despite its importance, the clock’s mechanistic role in human health remains poorly characterized due to the burden of measuring internal physiological time. The current gold standard for measuring circadian phase requires serial melatonin sampling (hourly or half-hourly blood draws over the course of a day), a procedure that is too costly to be incorporated into most biomedical research studies and too burdensome to be implemented in the clinic. The discovery that nearly half of the genome is under circadian control inspires an alternative approach: inferring physiological time by using gene expression as a readout of the internal clock.
From Computation to the Clinic
TimeSignature: a universal predictor of physiological time
We devised a novel algorithm to robustly and accurately infer circadian phase to within 1.5 hrs using the expression levels of ∼40 genes measured in only two blood draws. The “TimeSignature” method and capabilities are described in two PNAS publications, and a patent application for the biomarker derived by our method is pending. A significant methodological innovation of this work is the predictor’s ability to robustly generalize across experimental conditions (including different sleep protocols and expression profiling technologies) without requiring retraining of the machine and without losing accuracy. This property is unique amongst expression-based circadian biomarkers (and rare in general).
The robustness of the TimeSignature algorithm can be attributed to the key insight that we could exploit the rotational symmetry of the signal of interest to perform a within-subject normalization that removes non-circadian variation (such as noise or technical artifacts). By designing the machine learning algorithm specifically for circadian data, we obtain a predictor that is not only highly accurate, but also highly generalizable. Importantly, the fact that the trained predictor can be used with any type of gene expression assay (RT-PCR, various microarray platforms, RNA-seq) overcomes an important obstacle in creating translatable biomarker diagnostics.
How Can Computational Methods For Complex Biological Dynamics Solve Real World Problems?
Our interests in complex biological dynamics have also yielded significant advances at the organism [Study 1, Study 2, Study 3, Study 4] and population scales [Study 5, Study 6, Study 7]. For instance, working closely with Phyllis Zee’s Center for Circadian and Sleep Medicine, we have modeled the links between slow wave sleep, autonomic control, memory formation, and health outcomes, including investigating how these can be modulated by acoustic stimulation or light exposure [Study 1, Study 2, Study 3, Study 4]—all of which have implications for preventive medicine.