Tutorial 11

Reaction Yield Optimization with BO

Optimise Suzuki coupling yield using BoTorch, and compare Expected Improvement vs. Upper Confidence Bound acquisition functions.

August 18, 2026 · 10:15 – 12:00
105 min
Python · Google Colab
Back to schedule

Open in Google Colab

The notebook has most of the code pre-filled. Complete the exercises marked ### YOUR CODE HERE ###.

Open Notebook

Getting started

Open the Colab notebook using the button above. Run each cell in order; cells marked Exercise require you to fill in code.

  1. Set up the environment. Install botorch and torch. Load the Suzuki coupling yield dataset (Doyle et al., Science 2019) with 3960 experimental conditions and their yields.

  2. Encode reaction conditions. One-hot-encode the four categorical inputs: aryl halide, additive, base, and ligand. Combine into a single feature vector per reaction condition.

  3. Build the GP surrogate. Initialise a SingleTaskGP with a Matérn-5/2 kernel on 10 random initial observations. Fit hyperparameters with fit_gpytorch_mll.

  4. Run the BO loop. For 30 rounds, optimise the EI acquisition function with optimize_acqf. Record the observed yield and update the surrogate. Track the best yield found so far.

  5. Compare acquisition functions. Repeat the loop with UCB (β=0.1 and β=2.0). Plot cumulative best yield vs. round for EI, UCB-0.1, and UCB-2.0. Which converges fastest?


Answer these questions as you work through the notebook. Discuss with your neighbour — some have no single right answer.

    Warm-up

    What is the maximum yield in the dataset? After 10 random initial observations, what is the best yield observed so far as a fraction of the maximum?

    Easy
    Core

    After 30 BO rounds, how close is EI to the true maximum yield? How many rounds does it take to first exceed 90% of the maximum?

    Medium
    Core

    How does increasing UCB β from 0.1 to 2.0 change the exploration-exploitation balance? In which phase of optimisation (early vs. late) does high-β UCB have the advantage?

    Medium
    Challenge

    Add a synthetic constraint: the base must not be Cs₂CO₃ (assume safety restriction). Modify the acquisition optimisation to exclude this base using a linear constraint. How does the constrained BO perform vs. unconstrained?

    Challenge

Notebook (Colab) GitHub repo Paired lecture notes