Tutorial 13

Chemistry RAG Assistant

Index a corpus of chemistry abstracts and build a retrieval-augmented Q&A system that grounds LLM answers in the literature.

August 19, 2026 · 10:15 – 12:00
105 min
Python · Google Colab
Back to schedule

Open in Google Colab

The notebook has most of the code pre-filled. Complete the exercises marked ### YOUR CODE HERE ###.

Open Notebook

Getting started

Open the Colab notebook using the button above. Run each cell in order; cells marked Exercise require you to fill in code.

  1. Set up the environment. Install sentence-transformers, faiss-cpu, and openai (or use a local model via ollama). Load the provided dataset of 500 chemistry paper abstracts.

  2. Chunk and embed. Split each abstract into 200-token chunks with 20-token overlap. Embed all chunks using all-MiniLM-L6-v2. Store embeddings in a FAISS flat-L2 index.

  3. Implement retrieval. Complete the retrieve(query, k=5) function: embed the query, search the FAISS index, and return the top-k chunks with their source metadata.

  4. Build the RAG chain. Construct a prompt template that inserts retrieved chunks as context. Call the LLM (gpt-4o-mini or a local model) with the augmented prompt. Parse and display the answer with citations.

  5. Evaluate. Run 10 pre-defined chemistry questions from the notebook. Score each answer for faithfulness (does the answer contradict the retrieved context?) and relevance (does it address the question?).


Answer these questions as you work through the notebook. Discuss with your neighbour — some have no single right answer.

    Warm-up

    For the query "What solvents are used in Suzuki coupling?", what are the top-3 retrieved chunks? Do they contain relevant information?

    Easy
    Core

    Compare the LLM answer with retrieval (RAG) vs. without retrieval (bare LLM) for a question about a recent reaction. Which answer is more accurate? Does the bare LLM hallucinate?

    Medium
    Core

    How does increasing k (retrieved chunks) from 3 to 10 affect answer quality and latency? Is there a point of diminishing returns?

    Medium
    Challenge

    Implement re-ranking: after FAISS retrieval, re-rank the top-10 chunks using a cross-encoder (cross-encoder/ms-marco-MiniLM-L-6-v2) before selecting the top-3 for the prompt. Does re-ranking improve faithfulness scores?

    Challenge

Notebook (Colab) GitHub repo Paired lecture notes