Tutorial 14

Building a Chemistry ReAct Agent

Wire up PubChem lookup, RDKit property tools, and a BO surrogate into a ReAct-style agent that autonomously designs experiments.

August 19, 2026 · 14:45 – 17:00
105 min
Python · Google Colab
Back to schedule

Open in Google Colab

The notebook has most of the code pre-filled. Complete the exercises marked ### YOUR CODE HERE ###.

Open Notebook

Getting started

Open the Colab notebook using the button above. Run each cell in order; cells marked Exercise require you to fill in code.

  1. Set up the environment. Install openai, rdkit, requests, and botorch. Define the three tool schemas as JSON for function calling: lookup_pubchem(name), calculate_properties(smiles), predict_yield(conditions).

  2. Implement the tools. Complete lookup_pubchem (PubChem REST API → canonical SMILES), calculate_properties (RDKit: MW, logP, TPSA, HBD, HBA, QED), and predict_yield (GP posterior mean for Suzuki conditions from Tutorial 11).

  3. Build the ReAct loop. Implement a loop that (a) calls the LLM with the current conversation, (b) parses the function call, (c) executes the tool, (d) appends the observation, and (e) repeats until the LLM returns a final answer.

  4. Add toxicity guardrails. Load a pre-trained toxicity classifier (provided). Intercept any predict_yield call: if the SMILES argument is predicted toxic (score > 0.8), return a refusal observation instead of the yield.

  5. Run the agent. Give the agent the task: "Find the Suzuki coupling conditions that maximise yield for iodobenzene with phenylboronic acid, while avoiding toxic additives." Log the full thought-action-observation trace.


Answer these questions as you work through the notebook. Discuss with your neighbour — some have no single right answer.

    Warm-up

    How many tool calls does the agent make before reaching a final answer? Which tool is called most frequently?

    Easy
    Core

    Does the agent ever call predict_yield for a molecule that the toxicity guardrail blocks? How does it respond to the refusal observation — does it recover gracefully or get stuck?

    Medium
    Core

    Examine the agent's "thought" traces. Does it reason correctly about the trade-off between exploration (trying new conditions) and exploitation (repeating high-yield conditions)? Quote an example of correct and incorrect reasoning.

    Medium
    Challenge

    Add a memory tool that stores and retrieves previous (conditions, yield) observations. Does the agent with memory reach a higher maximum yield in fewer steps than the memoryless version?

    Challenge

Notebook (Colab) GitHub repo Paired lecture notes