Document QA Assistant

What it does

Answers natural-language questions against a static document corpus by retrieving the most relevant passages and grounding the model's answer in those passages.

Pipeline

Ingest: chunk documents into ~500-token windows with overlap.
Embed: encode with a sentence-transformer model.
Retrieve: cosine top-k against the query embedding.
Generate: prompt the LLM with question + retrieved passages, ask for an answer with explicit source citations.
Evaluate: hand-graded answers against a held-out question set, plus automatic metrics (faithfulness, answer relevance).

Why I'm extending it

The Phase-2 plan turns this into a public-facing Security Copilot RAG — the same architecture, but the corpus becomes CVE feeds and a target codebase, and the eval harness becomes the centrepiece. Most RAG demos online skip evaluation; that's exactly where the interesting engineering lives.

Lessons so far

Chunk boundaries matter more than chunk size. Overlap helps.
Cosine similarity is fine until it isn't; switching to a small reranker on the top-50 made a measurable jump in answer quality.
An evaluation harness you actually run is worth ten you talk about.