Selected work

Projects

A few representative builds. Filter by stack to narrow down.

Internal AI Agent Evaluation Lab
2026
Synthetic evaluation lab for AI-agent reliability: 350 golden cases, 60 red-team cases, RAG evaluation, safe refusal, safety classifiers, prevalence estimation, human-review simulation, mitigation impact, release gate reporting, FastAPI, OTel tracing, and CI.
- Python
- FastAPI
- Streamlit
- Pydantic
- pytest
- ruff
- Docker
- GitHub Actions
- GitHub Pages
View source