Tutorial 9

Fine-tuning ChemBERTa

Fine-tune a pre-trained chemical language model on BACE IC50 data using HuggingFace Transformers, comparing frozen vs. full fine-tuning.

August 17, 2026 · 10:15 – 12:00
105 min
Python · Google Colab
Back to schedule

Open in Google Colab

The notebook has most of the code pre-filled. Complete the exercises marked ### YOUR CODE HERE ###.

Open Notebook

Getting started

Open the Colab notebook using the button above. Run each cell in order; cells marked Exercise require you to fill in code.

  1. Set up the environment. Install transformers, datasets, and scikit-learn. Load seyonec/ChemBERTa-zinc-base-v1 from HuggingFace Hub and verify the tokeniser handles a few SMILES strings correctly.

  2. Prepare the BACE dataset. Load BACE IC50 values, log-transform them (pIC50 = −log10(IC50)), and apply a scaffold split. Tokenise all SMILES with max_length=128 and padding.

  3. Fine-tune (head only). Freeze all BERT layers. Add a regression head (dropout + linear). Train for 10 epochs with AdamW (lr=1e-3). Record validation RMSE.

  4. Full fine-tune. Unfreeze all layers. Train with a small learning rate (lr=2e-5) and linear warmup. Train for 5 epochs. Compare validation RMSE to head-only fine-tuning.

  5. Visualise embeddings. Extract CLS embeddings for all test-set molecules using the full fine-tuned model. Run UMAP and colour by pIC50 value. Identify clusters.


Answer these questions as you work through the notebook. Discuss with your neighbour — some have no single right answer.

    Warm-up

    What is the baseline RMSE if you predict the mean pIC50 for all test molecules? How much does head-only fine-tuning improve on this?

    Easy
    Core

    At what epoch does validation RMSE plateau for full fine-tuning? Does it ever increase (overfitting)? How does learning rate warmup affect early-epoch stability?

    Medium
    Core

    In the UMAP embedding, are molecules with high pIC50 (potent) clustered together? Do structurally similar molecules (same scaffold) cluster regardless of potency?

    Medium
    Challenge

    Replace the CLS token with mean-pooling over all token embeddings. Does this change validation RMSE? Which pooling strategy gives better-calibrated embeddings according to your UMAP?

    Challenge

Notebook (Colab) GitHub repo Paired lecture notes