Self-attention, tokenisation of SMILES, and chemical language models — from BERT to domain-adapted chemistry LLMs.
Recording
Recording will be available after the bootcamp.
August 2026Learning Objectives
Key Takeaways
Takeaway 1. Self-attention allows every token to attend to every other token in constant depth — this global receptive field is the key advantage over recurrent models, and it is why transformers scale so well.
Takeaway 2. SMILES tokenisation is non-trivial: character-level tokenisation conflates multi-character element symbols; atom-level tokenisation with a chemistry-aware tokeniser is strongly preferred.
Takeaway 3. For property prediction, fine-tuning only the classification head on a frozen BERT encoder is a fast baseline; full fine-tuning usually wins with ≥500 labelled molecules.
Takeaway 4. Transformers on SMILES and GNNs on molecular graphs are not interchangeable — they encode complementary inductive biases; ensemble or multi-view models often outperform either alone.
Further Reading & Resources