Episode 4 - Advanced sampling and mathematical foundations

Previous All episodes Next

Table of contents

Duration

Objectives

Content

Thermodynamic foundations

The canonical distribution

\[\rho(\mathbf{x}) = \frac{1}{Z} \exp\left(-\beta U(\mathbf{x})\right),\]

where $\beta = 1/(k_B T)$ sets the relation between energy and entropy. Simulated annealing strategies modify (T) gradually to explore high- and low-potential regions without abrupt jumps, and allow estimating free energy differences via

\[\Delta G = -k_B T \ln \frac{Z_{\text{solv}}}{Z_{\text{no\ soluto}}}.\]

To estimate restraint stability, also monitor the norm of the potential gradient

\[\mathbf{g} = \nabla U(\mathbf{x}), \qquad \langle \|\mathbf{g}\|^2 \rangle = \frac{1}{N} \sum_{i=1}^N \|\mathbf{g}_i\|^2,\]

which is reported periodically to identify numerical drift.

Energy funnel and sampling strategies

Official OpenMM tutorials

The OpenMM Cookbook chapters and related sections (building systems, simulation parameters, alchemical free energy) describe steps consistent with our scripts: defining the system, customizing external forces, and logging events. From there we adopt three pillars:

All linked scripts use this architecture and serve as guided examples for each system.

Simulated annealing

Refer to Example 5-1 in the “Simulated annealing” block of the OpenMM User Guide: Advanced Simulation Examples page. The sample here ramps the integrator temperature from 300 K toward lower values in configurable increments while running a fixed number of steps per temperature, mirroring the temperature schedule in that guide.

Script source: openmm_advanced_annealing.py

Guided demo

Exercise

Notebooks

The accompanying explanation notes that the loop simply updates the LangevinMiddleIntegrator temperature before each batch of 1,000 steps, which mirrors the ramping schedules we describe here. Keep that block as a quick reference when tuning integrator parameters or defining temperature sequences for your exercises so the practical code stays aligned with the theory in this episode.

Replica exchange (REMD)

The same temperature-ramping intuition underpins replica-exchange (REMD) methods. Instead of moving a single trajectory between high and low temperatures, REMD maintains multiple replicas at different thermodynamic states and swaps them through Metropolis trials to jump across barriers. OpenMMTools exposes ReplicaExchangeSampler (see https://openmmtools.readthedocs.io/en/stable/multistate.html#replicaexchangesampler-replica-exchange-among-thermodynamic-states) to manage state definitions, collect swap statistics, and maintain detailed balance for arbitrary Hamiltonians.

For a broader explanation of why REMD is statistically valid and how the exchanges let you preserve canonical sampling, see the review PMC6484850 <https://pmc.ncbi.nlm.nih.gov/articles/PMC6484850/>__ 1, which discusses accepted protocols, ensemble averages, and implementations. Pairing replica exchange with the annealing schedule above gives two complementary views on negotiating rough energy landscapes: direct temperature ramps and ensemble-wide swaps that both respect the Boltzmann distribution.

REMD for alanine dipeptide

This guided block runs REMD on the alanine dipeptide system already prepared for the previous exercises. Four Langevin trajectories at staggered temperatures attempt swaps between nearest neighbors, so the algorithm crosses barriers without needing a single long trajectory at the highest temperature. The notebook below is the canonical source; the Python script shown beside it is generated from the same cells and can be used directly in batch pipelines.

Script source: 04-remd.py

Exercise

Notebooks

Gaussian accelerated MD (GaMD) with OpenMM

Gaussian accelerated molecular dynamics (GaMD) adds a boost potential that smooths the energy surface and reduces barriers without requiring pre-selected reaction coordinates. The gamd-openmm repository (https://github.com/MiaoLab20/gamd-openmm/tree/main) wraps OpenMM integrators to apply GaMD on top of existing routines, making it easy to run the same systems we already prepare for this lecture. Running GaMD alongside the simulated annealing loop lets you sample the same conformations while still estimating unbiased observables via the provided reweighting formulas 2.

Guided demo: GaMD on alanine dipeptide

Script source: 04-gamd.py

Exercise

Notebooks

Martini coarse-graining with OpenMM

The MacCallum/Tieleman group has implemented both Martini 2 and Martini 3 directly in OpenMM (https://github.com/maccallumlab/martini_openmm/tree/martini3), taking advantage of OpenMM’s flexible force field layer to encode all Martini interactions, custom scripts to translate GROMACS topology files, and the GPU performance that makes OpenMM competitive with other engines. OpenMM’s extensibility—custom forces, fields, and integrators—lets these implementations reproduce the full Martini potential while still running inside the same infrastructure we use for simulated annealing and GaMD. See the Biophysical Journal paper by MacCallum et al. (2023) for the complete description of this OpenMM-friendly Martini pipeline 3.

Guided demo: Martini coarse-graining for the complex system

Setting up the complex protein-ligand system for Martini teaches how coarse-grained beads map back to the all-atom coordinates and how a Martini topology feeds into the rest of the pipeline. The notebook below is canonical; the helper script records the Martini command line plus a JSON summary so you can reproduce conversions later.

Script source: 04-martini-complex.py

Exercise

Notebooks

Umbrella sampling

The Umbrella Sampling tutorial from the OpenMM Cookbook summarizes how to add Gaussian bias potentials along a collective variable, combine per-window histograms, and reconstruct (F(x) = -k_B T \ln p(x)). We adopt that setup by biasing the distance between two backbone atoms in alanine dipeptide, recording the sampled distances, and plotting a coarse free-energy-like curve from the histograms collected across the windows.

The notebook 05-umbrella-sampling.ipynb reproduces those windows inline, gathers the sampled distances, stores the collected points in $COURSE_DIR/data/umbrella_samples.csv, and visualizes the emergent profile so you can poke at bin edges and the (F(x) = -k_B T \ln p(x)) conversion without leaving the browser. Its cells call the same systems described in the cookbook block, which demonstrates how each window’s bias shifts the sampled coordinate and why the umbrella reconstruction formula holds when you combine overlapping histograms.

The cookbook’s Bash section explains how to compile and run the WHAM binary to combine the windowed histograms and recover the free energy via the <rmin> <rmax> <dx> <temperature> <histogram-files> interface. Follow those steps locally with commands such as:

git clone https://github.com/choderalab/wham.git
cd wham
cmake .
make
./wham 0.18 0.35 0.005 300 $COURSE_DIR/data/umbrella_samples.csv > umbrella_free_energy.dat

These commands mirror the cookbook block: compile the OpenMM-provided WHAM executable, then pass your collected umbrella_samples.csv (or other histogram files you generated) to produce the final umbrella_free_energy.dat curve, matching the plot described on the page.

SBMOpenMM resources

This subsection highlights how sbmopenmm (repository: https://github.com/CompBiochBiophLab/sbm-openmm) and its tutorials help explore advanced sampling through structure-based models. The official guides show how to build a Hamiltonian from the protein topology and apply external forces, a workflow that aligns with the methodology of this episode.

These references can serve as a baseline for seeing how simplified SBM setups follow the same sampling and reporting patterns we implement with a full OpenMM stack.

Guide scripts (OpenMM Application Layer)

To feed the analysis, we rely on the official scripts described in the OpenMM guide:

In each case, the outputs (DCD, CSV, and energy reports) are used in this episode’s exercises.

Alanine dipeptide

Script source: 05-muestreo-avanzado_simple.py

Protein-ligand complex

Script source: 05-muestreo-avanzado.py

References

Previous All episodes Next
  1. Sugita, Y. and Okamoto, Y., “Replica-exchange molecular dynamics method for protein folding”, Chem. Phys. Lett. 2000, 314, 141–151, and review PMC6484850 <https://pmc.ncbi.nlm.nih.gov/articles/PMC6484850/>__. 

  2. Miao, Y.; Feher, V. A.; McCammon, J. A., “Gaussian accelerated molecular dynamics: Unconstrained enhanced sampling and free energy calculation”, J. Chem. Theory Comput. 2015, 11, 3584–3595. 

  3. MacCallum, J. L.; Tieleman, D. P. et al., “Martini implementation in OpenMM,” Biophys. J. 2023, and the open-source repository at https://github.com/maccallumlab/martini_openmm/tree/martini3; the OpenMM adaptation covers both Martini 2 and 3 force fields and leverages OpenMM’s extensible API to process GROMACS topologies with the same flexibility as the native package.