Tutorial 6

Explaining ML Predictions

Compute SHAP values on molecular fingerprints and visualise attention weights on GNN predictions to understand model behaviour.

August 12, 2026 · 14:45 – 17:00
105 min
Python · Google Colab
Back to schedule

Open in Google Colab

The notebook has most of the code pre-filled. Complete the exercises marked ### YOUR CODE HERE ###.

Open Notebook

Getting started

Open the Colab notebook using the button above. Run each cell in order; cells marked Exercise require you to fill in code.

  1. Set up the environment. Install shap, lime, and torch-geometric. Load the pre-trained random forest (ESOL dataset) and GAT (QM9 HOMO-LUMO gap) provided in the notebook.

  2. Compute SHAP values. Run the SHAP TreeExplainer on the random forest. Produce a beeswarm summary plot. Identify the top 10 most important fingerprint bits.

  3. Map bits to substructures. For each of the top 10 bits, use RDKit to find the corresponding Morgan substructure (radius, center atom). Render the molecules with those substructures highlighted using rdkit.Chem.Draw.

  4. Compute attention weights. Pass five molecules through the GAT. Extract edge-level attention coefficients from the last attention head. Render each molecule with bond widths proportional to attention.

  5. Apply LIME. Use the LIME tabular explainer on three molecules that the RF mispredicts. Compare LIME explanations to SHAP explanations for the same molecules.


Answer these questions as you work through the notebook. Discuss with your neighbour — some have no single right answer.

    Warm-up

    According to the SHAP summary plot, which fingerprint bits contribute most positively to predicted solubility? Can you identify the functional groups they correspond to?

    Easy
    Core

    For a correctly predicted molecule, compare the SHAP-highlighted substructure to the attention-highlighted bonds. Do they agree? What does agreement (or disagreement) tell you?

    Medium
    Core

    Find a molecule for which the model is confidently wrong (large error, low uncertainty). Do the SHAP values point to a plausible chemical reason for the error?

    Medium
    Challenge

    Compute SHAP interaction values for two fingerprint bits that are individually unimportant but jointly predictive. Visualise the interaction plot and interpret it chemically.

    Challenge

Notebook (Colab) GitHub repo Paired lecture notes