RAG pipelines, tool use, literature extraction, and the hallucination limits of large language models in the chemical sciences.
Recording
Recording will be available after the bootcamp.
August 2026Learning Objectives
Key Takeaways
Takeaway 1. RAG grounds LLM answers in retrieved documents, dramatically reducing hallucination rates on factual chemistry questions — but retrieval quality is the bottleneck, not the language model.
Takeaway 2. LLMs are poor chemistry calculators: they hallucinate reaction mechanisms, confidently produce invalid SMILES, and make errors in stoichiometry. Always validate programmatically with RDKit or similar tools.
Takeaway 3. Tool use transforms LLMs from text generators into orchestrators of computation — they can call a yield predictor, retrieve a crystal structure, or run a retrosynthesis engine without the user writing any integration code.
Takeaway 4. The bottleneck in LLM-assisted chemistry is not language understanding but knowledge currency: models have training cutoffs, cannot access paywalled journals, and lack lab-specific institutional knowledge unless explicitly provided.
Further Reading & Resources