[ Index ]10 projects · one table

Projects

Evaluation harnesses, AI-safety work, and data products. Filter by stack, or sort by the evidence — every row links to a full write-up with the methods and numbers.

Filter →

Sort →

Project 01: London Cycle-Hire Analytics Platform
Python · PySpark · dbt · 3 min read
41.4M
journeys unified
Limits: a result published with its documented limitsLIVE
Project 02: England & Wales Housing Decision Support
dbt · DuckDB · Dagster · 3 min read
7,264
MSOAs scored
Limits: a result published with its documented limitsLIVE
Project 03: Community Energy Flex
Python · FastAPI · Pydantic · 3 min read
15
UK grid regions
Limits: a result published with its documented limitsLIVE
Project 04: Aerospace Prognostics
Python · FastAPI · Streamlit · 3 min read
Withdrawn: 0.24. Corrected to 0.42 event-wise recall (corrected).
Corrected: a published number was withdrawn and replacedRUN LOG
Project 05: Agent Release Safety Gates
Python · uv · Inspect AI · 5 min read
Withdrawn: 99.31%. Corrected to 79.92% external retrieval hit@3.
Corrected: a published number was withdrawn and replacedLIVE
Project 06: redteam-foundry
Python · Claude Sonnet 4.6 · Llama 3.1 8B · 4 min read
0–4%
attack success
Controlled: a null result, carried by a positive controlRUN LOG
Project 07: Cited Market Brief Agent
TypeScript · React · FastAPI · 3 min read
Withdrawn: 1.000. Corrected to 0.400 holdout precision (corrected).
Corrected: a published number was withdrawn and replacedLIVE
Project 08: Responsible Neobank Growth
Python · dbt · BigQuery · 3 min read
217
dbt tests (cloud run)
Limits: a result published with its documented limitsLIVE
Project 09: Cashflow Risk Intelligence
Python · FastAPI · PostgreSQL · 3 min read
13-week
runway forecast
Limits: a result published with its documented limitsRUN LOG
Project 10: Marketing Effectiveness Lab
Python · pandas · NumPy · 2 min read
no metric published
ARCHIVED

Shown: 10 shown of 10
Tests: 1,681 tests total
Evidence: 3 corrected · 1 controlled
Live: 6 live