Build calibration curves, apply conformal prediction, and compare ensemble variance on molecular property prediction datasets.
What you will learn
Instructions
Open the Colab notebook using the button above. Run each cell in order; cells marked Exercise require you to fill in code.
Set up the environment. Install scikit-learn, torch, and mapie. Load the ESOL dataset and a pre-trained random forest with predict_std support (using the forest's tree variance).
Build calibration plots. Bin predicted probabilities into 10 bins. Compute the fraction of true values falling within each predicted confidence interval. Plot a reliability diagram and compute ECE.
Apply temperature scaling. Fit a scalar temperature parameter T on a held-out calibration set. Re-plot the reliability diagram after temperature scaling. Report the change in ECE.
Conformal prediction. Use MAPIE's MapieRegressor with a ridge regression base model. Set the target coverage to 90%. Report the empirical coverage and mean interval width on the test set.
Deep ensemble. Train 5 independently initialised MLPs on the training set. Compute ensemble mean and variance on the test set. Compare interval width and coverage to conformal prediction.
Questions
Answer these questions as you work through the notebook. Discuss with your neighbour — some have no single right answer.
Before calibration, is your model overconfident or underconfident? How can you tell from the reliability diagram?
EasyAfter temperature scaling, how much does ECE decrease? Is the model now well-calibrated across all probability bins, or only in certain regions?
MediumFor conformal prediction at 90% target coverage, what is the empirical coverage on the test set? Is it at least 90%? What happens to interval width if you raise the target to 95%?
MediumIdentify the 10 test molecules with highest ensemble variance. Are they structurally different from the training set (measure Tanimoto distance to nearest training neighbour)? What does this tell you about the applicability domain?
ChallengeResources