Skip to content

Selected work

Projects

A few representative builds. Filter by stack to narrow down.

  • 2026

    Synthetic evaluation lab for AI-agent reliability: 350 golden cases, 60 red-team cases, RAG evaluation, safe refusal, safety classifiers, prevalence estimation, human-review simulation, mitigation impact, release gate reporting, FastAPI, OTel tracing, and CI.

    • Python
    • FastAPI
    • Streamlit
    • Pydantic
    • pytest
    • ruff
    • Docker
    • GitHub Actions
    • GitHub Pages
    View source