Prediction of Viral Evolution from Biophysics and Machine Learning

Overview

In the wake of the COVID-19 pandemic, the need to comprehend viral protein evolution has never been greater. We are an interdisciplinary team that uses statistical mechanics, mathematical modeling, and various machine learning methods to investigate the complex dynamics of protein evolution. We aim to develop predictive tools that assess how mutations alter the biophysical properties of proteins and, consequently, impact protein fitness. Physically, we use statistical mechanics to model the viral fitness landscapes  based on its biophysical properties such as folding energy and antibody evasion. Computationally, we integrate state-of-the-art protein language models with molecular dynamics (MD) to predict these biophysical properties. Mathematically, we model the trajectories of viral protein escape.

Predicting Viral Fitness from Biophysical Principles

We use statistical mechanics to develop a comprehensive biophysical model of viral fitness in using free energy data from host-virus protein-protein interactions. We have shown that such biophysical models are capable of recapitulating population-level fitness information in COVID-19, making it possible to improve pandemic preparedness through machine learning-assisted and in vitro binding affinity measurements.

Active Learning for Viral Variant Detection

When a new virus (like SARS-CoV-2) mutates, some variants can spread faster. Predicting these dangerous variants early—before they spread—is critical. But there’s a problem: testing every possible mutation in the lab is expensive and slow. We are developing active learning methods to provide “recommendations” for what experiments to perform in the lab in order to mitigate 

Inferring Microbial Fitness from Evolutionary Time Series Data

Evolution in complex settings including competition between multiple microbial strains and large fluctuations in allele frequencies can make it difficult to extract fitness information without a ground truth model of epistasis. We are using population genetics theory to develop new  computational tools to extract fitness information from evolutionary time series data, without the need for explicit assumptions on epistasis. We are currently exploring applications to yeast evolution experiments as well as extraction of viral strain fitness from SARS-CoV-2 global sequencing data.

Free energy prediction in viral protein interactions

By combining novel machine learning architectures, biological features, and molecular dynamics simulations, we study protein complexes and the mutational effects on protein-protein interactions. We also investigate questions of interpretability and robustness in these models.

 

Selected Papers

  1. Wang, D., Huot, M., Mohanty, V. & Shakhnovich, E. I. Biophysical principles predict fitness of SARS-CoV-2 variants.. Proceedings of the National Academy of Sciences of the United States of America 121, e2314518121 (2024).
  2. Huot, M., Wang, D., Liu, J. & Shakhnovich, E. Few-Shot Viral Variant Detection via Bayesian Active Learning and Biophysics.. bioRxiv : the preprint server for biology (2025) doi:10.1101/2025.03.12.642881.
  3. Rotem, A. et al. Evolution on the Biophysical Fitness Landscape of an RNA Virus.. Molecular biology and evolution 35, 2390–2400 (2018).