Explore transfer learning and meta-learning for accurate molecular property prediction with very few labelled examples.
What you will learn
Instructions
Open the Colab notebook using the button above. Run each cell in order; cells marked Exercise require you to fill in code.
Set up the environment. Install torch-geometric and huggingface datasets. Load the pre-trained GIN encoder (pre-trained on ChEMBL with masked-atom prediction) provided in the notebook.
Prepare the few-shot tasks. Construct 10-shot and 20-shot datasets from the MoleculeNet Tox21 benchmark. Ensure no scaffold overlap between support and query sets.
Fine-tune frozen encoder. Attach a 2-layer MLP head to the frozen encoder. Train only the head for 100 epochs on the support set. Report AUC on the query set.
Full fine-tune. Unfreeze all encoder layers and repeat training with a small learning rate (1e-4). Compare query AUC to the frozen-encoder baseline.
Prototypical Networks. Implement prototype computation (mean embedding of support molecules per class). Predict query labels by nearest prototype in embedding space. Report AUC.
Questions
Answer these questions as you work through the notebook. Discuss with your neighbour — some have no single right answer.
With 10 support molecules per class, what is the 95% confidence interval on query AUC across 10 random task samples? How does variance compare to 20-shot?
EasyAt what support-set size does full fine-tuning start to outperform frozen-encoder fine-tuning? What does this threshold tell you about the quality of the pre-trained representations?
MediumHow do Prototypical Networks compare to fine-tuning in the 5-shot regime? Explain why prototypical methods might be preferable when the support set is very small.
MediumImplement Model-Agnostic Meta-Learning (MAML) for one inner-loop step on the Tox21 few-shot tasks. Compare its query AUC to Prototypical Networks at 10-shot.
ChallengeResources