Pre-training on large chemical corpora and fine-tuning on task-specific data — transferring chemical knowledge across domains.
Recording
Recording will be available after the bootcamp.
August 2026Learning Objectives
Key Takeaways
Takeaway 1. Pre-training forces a model to learn general chemical knowledge (atom environments, bond patterns) before it ever sees task labels — this regularisation often outperforms task-specific models on low-data regimes.
Takeaway 2. The choice of pre-training objective matters: masked-atom prediction captures local chemistry; contrastive objectives (e.g., MolCLR) encourage global molecular similarity.
Takeaway 3. Fine-tuning all layers (full fine-tuning) beats frozen-encoder approaches when you have more than a few hundred labelled examples, but risks catastrophic forgetting on very small datasets.
Takeaway 4. Learned representations are not always better than ECFP — always run fingerprint baselines before investing in pre-training pipelines.
Further Reading & Resources