Episode 4 - Advanced sampling and mathematical foundations

Previous All episodes Next

Duration
Objectives
Content
Thermodynamic foundations
Official OpenMM tutorials
Simulated annealing
Replica exchange (REMD)
Gaussian accelerated MD (GaMD) with OpenMM
Martini coarse-graining with OpenMM
Umbrella sampling
SBMOpenMM resources
Guide scripts (OpenMM Application Layer)
References

Duration

Session: 60 min
Exercises: 45 min

Objectives

Connect Boltzmann statistics with advanced sampling algorithms.
Show how each course script represents a mathematical strategy (annealing, external forces, and quantitative reporting).
Validate periodic solvation in alanine and protein-ligand complex trajectories.

Content

Canonical distribution and free energy formulation.
Effective temperature control via adaptive integrators.
Energy and force reports to estimate potential derivatives.
Solvation and periodic box strategies with OpenMM.

Thermodynamic foundations

The canonical distribution

\[\rho(\mathbf{x}) = \frac{1}{Z} \exp\left(-\beta U(\mathbf{x})\right),\]

where $\beta = 1/(k_B T)$ sets the relation between energy and entropy. Simulated annealing strategies modify (T) gradually to explore high- and low-potential regions without abrupt jumps, and allow estimating free energy differences via

\[\Delta G = -k_B T \ln \frac{Z_{\text{solv}}}{Z_{\text{no\ soluto}}}.\]

To estimate restraint stability, also monitor the norm of the potential gradient

\[\mathbf{g} = \nabla U(\mathbf{x}), \qquad \langle \|\mathbf{g}\|^2 \rangle = \frac{1}{N} \sum_{i=1}^N \|\mathbf{g}_i\|^2,\]

which is reported periodically to identify numerical drift.

Energy funnel and sampling strategies

Official OpenMM tutorials

The OpenMM Cookbook chapters and related sections (building systems, simulation parameters, alchemical free energy) describe steps consistent with our scripts: defining the system, customizing external forces, and logging events. From there we adopt three pillars:

Simulated annealing by adjusting temperature in integrators to favor jumps between wells.
CustomExternalForce external forces to constrain regions and study mechanical responses.
State-based reporting to retrieve potential energies and forces and calibrate convergence.

All linked scripts use this architecture and serve as guided examples for each system.

Simulated annealing

Refer to Example 5-1 in the “Simulated annealing” block of the OpenMM User Guide: Advanced Simulation Examples page. The sample here ramps the integrator temperature from 300 K toward lower values in configurable increments while running a fixed number of steps per temperature, mirroring the temperature schedule in that guide.

Script source: openmm_advanced_annealing.py

Guided demo

Download notebook

Exercise

Run the notebook from within your COURSE_DIR so the annealing loop reads data/alanine-dipeptide.pdb and dumps snapshots into results/05-muestreo-avanzado/openmm-annealing; tweak steps or increments to explore more aggressive ramps.

Notebooks

Simulated annealing with OpenMM and configurable temperature ramps. (notebook script derived from this notebook)

The accompanying explanation notes that the loop simply updates the LangevinMiddleIntegrator temperature before each batch of 1,000 steps, which mirrors the ramping schedules we describe here. Keep that block as a quick reference when tuning integrator parameters or defining temperature sequences for your exercises so the practical code stays aligned with the theory in this episode.

Replica exchange (REMD)

The same temperature-ramping intuition underpins replica-exchange (REMD) methods. Instead of moving a single trajectory between high and low temperatures, REMD maintains multiple replicas at different thermodynamic states and swaps them through Metropolis trials to jump across barriers. OpenMMTools exposes ReplicaExchangeSampler (see https://openmmtools.readthedocs.io/en/stable/multistate.html#replicaexchangesampler-replica-exchange-among-thermodynamic-states) to manage state definitions, collect swap statistics, and maintain detailed balance for arbitrary Hamiltonians.

For a broader explanation of why REMD is statistically valid and how the exchanges let you preserve canonical sampling, see the review PMC6484850 <https://pmc.ncbi.nlm.nih.gov/articles/PMC6484850/>__ ¹, which discusses accepted protocols, ensemble averages, and implementations. Pairing replica exchange with the annealing schedule above gives two complementary views on negotiating rough energy landscapes: direct temperature ramps and ensemble-wide swaps that both respect the Boltzmann distribution.

REMD for alanine dipeptide

This guided block runs REMD on the alanine dipeptide system already prepared for the previous exercises. Four Langevin trajectories at staggered temperatures attempt swaps between nearest neighbors, so the algorithm crosses barriers without needing a single long trajectory at the highest temperature. The notebook below is the canonical source; the Python script shown beside it is generated from the same cells and can be used directly in batch pipelines.

Script source: 04-remd.py

Download notebook | Download script (.py)

Exercise

Run the notebook from your COURSE_DIR, let it read data/alanine-dipeptide.pdb, and track the swaps logged at results/05-muestreo-avanzado/remd/remd_swaps.csv. Increase the number of iterations and explorer steps-per-iteration to see how swap acceptance changes with longer integration windows.

Notebooks

Replica exchange with multiple replicas (alanine dipeptide). (notebook script derived from this notebook)

Gaussian accelerated MD (GaMD) with OpenMM

Gaussian accelerated molecular dynamics (GaMD) adds a boost potential that smooths the energy surface and reduces barriers without requiring pre-selected reaction coordinates. The gamd-openmm repository (https://github.com/MiaoLab20/gamd-openmm/tree/main) wraps OpenMM integrators to apply GaMD on top of existing routines, making it easy to run the same systems we already prepare for this lecture. Running GaMD alongside the simulated annealing loop lets you sample the same conformations while still estimating unbiased observables via the provided reweighting formulas ².

Guided demo: GaMD on alanine dipeptide

Script source: 04-gamd.py

Download notebook | Download script (.py)

Exercise

Launch the notebook from inside your COURSE_DIR, keep data/alanine-dipeptide.pdb as the input, and inspect results/05-muestreo-avanzado/gamd/gamd_log.csv to see how the boost amplitude adapts across iterations.

Notebooks

GaMD with OpenMM and adaptive boost logging. (notebook script derived from this notebook)

Martini coarse-graining with OpenMM

The MacCallum/Tieleman group has implemented both Martini 2 and Martini 3 directly in OpenMM (https://github.com/maccallumlab/martini_openmm/tree/martini3), taking advantage of OpenMM’s flexible force field layer to encode all Martini interactions, custom scripts to translate GROMACS topology files, and the GPU performance that makes OpenMM competitive with other engines. OpenMM’s extensibility—custom forces, fields, and integrators—lets these implementations reproduce the full Martini potential while still running inside the same infrastructure we use for simulated annealing and GaMD. See the Biophysical Journal paper by MacCallum et al. (2023) for the complete description of this OpenMM-friendly Martini pipeline ³.

Guided demo: Martini coarse-graining for the complex system

Setting up the complex protein-ligand system for Martini teaches how coarse-grained beads map back to the all-atom coordinates and how a Martini topology feeds into the rest of the pipeline. The notebook below is canonical; the helper script records the Martini command line plus a JSON summary so you can reproduce conversions later.

Script source: 04-martini-complex.py

Download notebook | Download script (.py)

Exercise

Run the notebook from your COURSE_DIR, ensure martini_openmm is installed, and compare the bead count/output topology with the original protein-ligand complex. Check that the .json summary under results/05-muestreo-avanzado/martini/ documents the executed command.

Notebooks

Martini coarse-graining for the complex protein-ligand system. (notebook script derived from this notebook)

Umbrella sampling

The Umbrella Sampling tutorial from the OpenMM Cookbook summarizes how to add Gaussian bias potentials along a collective variable, combine per-window histograms, and reconstruct (F(x) = -k_B T \ln p(x)). We adopt that setup by biasing the distance between two backbone atoms in alanine dipeptide, recording the sampled distances, and plotting a coarse free-energy-like curve from the histograms collected across the windows.

The notebook 05-umbrella-sampling.ipynb reproduces those windows inline, gathers the sampled distances, stores the collected points in $COURSE_DIR/data/umbrella_samples.csv, and visualizes the emergent profile so you can poke at bin edges and the (F(x) = -k_B T \ln p(x)) conversion without leaving the browser. Its cells call the same systems described in the cookbook block, which demonstrates how each window’s bias shifts the sampled coordinate and why the umbrella reconstruction formula holds when you combine overlapping histograms.

Download notebook

The cookbook’s Bash section explains how to compile and run the WHAM binary to combine the windowed histograms and recover the free energy via the <rmin> <rmax> <dx> <temperature> <histogram-files> interface. Follow those steps locally with commands such as:

git clone https://github.com/choderalab/wham.git
cd wham
cmake .
make
./wham 0.18 0.35 0.005 300 $COURSE_DIR/data/umbrella_samples.csv > umbrella_free_energy.dat

These commands mirror the cookbook block: compile the OpenMM-provided WHAM executable, then pass your collected umbrella_samples.csv (or other histogram files you generated) to produce the final umbrella_free_energy.dat curve, matching the plot described on the page.

SBMOpenMM resources

This subsection highlights how sbmopenmm (repository: https://github.com/CompBiochBiophLab/sbm-openmm) and its tutorials help explore advanced sampling through structure-based models. The official guides show how to build a Hamiltonian from the protein topology and apply external forces, a workflow that aligns with the methodology of this episode.

Quickstart and single-basin setup – the top-level README section explains how to import a PDB, generate contact lists, and run simulations with custom forces (perfect for experimenting with temperature control and energy barriers).
Repository tutorials – a directory of folding and reaction-path scripts showing how to add restraint forces, log energies and gradients, and combine them with enhanced observation techniques to estimate free energy landscapes.

These references can serve as a baseline for seeing how simplified SBM setups follow the same sampling and reporting patterns we implement with a full OpenMM stack.

Guide scripts (OpenMM Application Layer)

To feed the analysis, we rely on the official scripts described in the OpenMM guide:

simulatePdb.py: fast simulations from PDB, useful for annealing tests.
simulateAmber.py: AMBER input (prmtop/inpcrd) to compare energies with and without solvent.
simulateCharmm.py: PSF/CRD topologies to evaluate restraints and reporting.
simulateGromacs.py: GROMACS input (top/gro), ideal for validating preprocessing.
simulateTinker.py: AMOEBA, useful for contrasting polarization effects.

In each case, the outputs (DCD, CSV, and energy reports) are used in this episode’s exercises.

Alanine dipeptide

Script source: 05-muestreo-avanzado_simple.py

Download notebook

This notebook executes the advanced sampling routine for alanine (restraints, custom forces, reporting energies) and tracks gradient norms. (notebook

script)

Protein-ligand complex

Script source: 05-muestreo-avanzado.py

Download notebook

This notebook covers the complex system with solvation options, barostat control, and energy/force logging to validate enhanced sampling workflows. (notebook

script)

References

Previous All episodes Next

Sugita, Y. and Okamoto, Y., “Replica-exchange molecular dynamics method for protein folding”, Chem. Phys. Lett. 2000, 314, 141–151, and review PMC6484850 <https://pmc.ncbi.nlm.nih.gov/articles/PMC6484850/>__. ↩
Miao, Y.; Feher, V. A.; McCammon, J. A., “Gaussian accelerated molecular dynamics: Unconstrained enhanced sampling and free energy calculation”, J. Chem. Theory Comput. 2015, 11, 3584–3595. ↩
MacCallum, J. L.; Tieleman, D. P. et al., “Martini implementation in OpenMM,” Biophys. J. 2023, and the open-source repository at https://github.com/maccallumlab/martini_openmm/tree/martini3; the OpenMM adaptation covers both Martini 2 and 3 force fields and leverages OpenMM’s extensible API to process GROMACS topologies with the same flexibility as the native package. ↩

Episode 4 - Advanced sampling and mathematical foundations

Table of contents

Duration

Objectives

Content

Thermodynamic foundations

Official OpenMM tutorials

Simulated annealing

Guided demo

Exercise

Notebooks

Replica exchange (REMD)

REMD for alanine dipeptide

Exercise

Notebooks

Gaussian accelerated MD (GaMD) with OpenMM

Guided demo: GaMD on alanine dipeptide

Exercise

Notebooks

Martini coarse-graining with OpenMM

Guided demo: Martini coarse-graining for the complex system

Exercise

Notebooks

Umbrella sampling

SBMOpenMM resources

Guide scripts (OpenMM Application Layer)

Alanine dipeptide

Protein-ligand complex

References