Tutorial 7 — Few-Shot Learning · AI4Chemical Sciences Bootcamp

What you will learn

Fine-tune a pre-trained GNN encoder on a new property with only 20 labelled molecules
Compare frozen-encoder vs. full fine-tuning strategies at different label counts
Implement Prototypical Networks for few-shot molecular classification
Plot learning curves showing how performance scales with the number of labelled examples

Instructions

Getting started

Open the Colab notebook using the button above. Run each cell in order; cells marked Exercise require you to fill in code.

Set up the environment. Install torch-geometric and huggingface datasets. Load the pre-trained GIN encoder (pre-trained on ChEMBL with masked-atom prediction) provided in the notebook.
Prepare the few-shot tasks. Construct 10-shot and 20-shot datasets from the MoleculeNet Tox21 benchmark. Ensure no scaffold overlap between support and query sets.
Fine-tune frozen encoder. Attach a 2-layer MLP head to the frozen encoder. Train only the head for 100 epochs on the support set. Report AUC on the query set.
Full fine-tune. Unfreeze all encoder layers and repeat training with a small learning rate (1e-4). Compare query AUC to the frozen-encoder baseline.
Prototypical Networks. Implement prototype computation (mean embedding of support molecules per class). Predict query labels by nearest prototype in embedding space. Report AUC.

Questions

Answer these questions as you work through the notebook. Discuss with your neighbour — some have no single right answer.

Warm-up

With 10 support molecules per class, what is the 95% confidence interval on query AUC across 10 random task samples? How does variance compare to 20-shot?

Easy

Core

At what support-set size does full fine-tuning start to outperform frozen-encoder fine-tuning? What does this threshold tell you about the quality of the pre-trained representations?

Medium

Core

How do Prototypical Networks compare to fine-tuning in the 5-shot regime? Explain why prototypical methods might be preferable when the support set is very small.

Medium

Challenge

Implement Model-Agnostic Meta-Learning (MAML) for one inner-loop step on the Tox21 few-shot tasks. Compare its query AUC to Prototypical Networks at 10-shot.

Challenge

Resources

Notebook (Colab) GitHub repo Paired lecture notes

Few-Shot Learning

Open in Google Colab

Getting started