Labelling strategically: query strategies that minimise experimental cost while maximising model improvement.
Recording
Recording will be available after the bootcamp.
August 2026Learning Objectives
Key Takeaways
Takeaway 1. Random sampling is a surprisingly strong baseline for active learning in chemistry — always benchmark against it before claiming that a fancy query strategy helps.
Takeaway 2. Uncertainty sampling selects the points the model is least confident about, but high uncertainty near the training distribution boundary is not the same as informativeness — diversity-promoting methods (core-set, BADGE) often do better.
Takeaway 3. BALD (Bayesian Active Learning by Disagreement) maximises the mutual information between model parameters and predictions; it is theoretically principled but computationally expensive for large pools.
Takeaway 4. In chemistry, the cost of labelling (DFT calculation, synthesis, assay) dwarfs the cost of the model. Even a modest reduction in the number of required experiments translates directly to saved resources.
Further Reading & Resources