Selected work
Projects
A few representative builds. Filter by stack to narrow down.
- All
- A100
- Astro 5
- BigQuery
- Claude Sonnet 4.6
- Cloud Run
- Cloudflare Workers
- Docker
- DuckDB
- FBANK
- FastAPI
- GCP
- GitHub Actions
- GitHub Pages
- HPC
- Hugging Face
- Inspect AI
- Llama 3.1 8B
- Marimo
- NLP
- NumPy
- PDF processing
- PySpark
- PyTorch
- Pydantic
- Python
- Qwen2.5-7B
- React
- RoBERTa
- SQL
- Slurm
- Spark MLlib
- Streamlit
- Tailwind CSS v4
- Transformers
- TypeScript
- WebAssembly
- dbt
- mozjpeg
- mypy
- pdf-lib
- pdf.js
- pytest
- ruff
- scikit-learn
- synthetic control
2026
Synthetic evaluation lab for AI-agent reliability: 350 golden cases, 60 red-team cases, RAG evaluation, safe refusal, safety classifiers, prevalence estimation, human-review simulation, mitigation impact, release gate reporting, FastAPI, OTel tracing, and CI.
- Python
- FastAPI
- Streamlit
- Pydantic
- pytest
- ruff
- Docker
- GitHub Actions
- GitHub Pages
2026
Reproducible benchmark measuring how published adversarial prompts perform against 2026-era LLMs and whether prompt-only defences move the needle — with cross-judge validation and bootstrap confidence intervals.
- Python
- Claude Sonnet 4.6
- Llama 3.1 8B
- Inspect AI
- GitHub Actions
- pytest
- ruff
- mypy
2026
Zero-shot event extraction with Qwen2.5-7B-Instruct on MAVEN and WikiEvents. Compares unconstrained vs constrained-label prompting across trigger detection, type prediction, and argument extraction. A100 GPU inference via Hugging Face.
- Python
- Qwen2.5-7B
- Hugging Face
- PyTorch
- A100
2026
End-to-end synthetic fintech data product — dbt metrics, CUPED A/B experimentation, activation model, geo-lift referral analysis, pricing intelligence, FastAPI service, and a full GCP deployment path with BigQuery, Cloud Run, and Cloud Monitoring.
- Python
- dbt
- DuckDB
- BigQuery
- Cloud Run
- FastAPI
- Streamlit
- Marimo
- scikit-learn
- synthetic control
- GitHub Actions
- GCP
2026
Extractive document question-answering pipeline using PDF text extraction, sentence-based chunking, RoBERTa-SQuAD2 inference, and answer evaluation scripts.
- Python
- Transformers
- RoBERTa
- PDF processing
- NLP
2026
End-to-end analytics engineering on 4.99M UK Land Registry property transactions. dbt + DuckDB warehouse with staging, intermediate, fact, dimension, and reporting layers, 88 data tests, CI, docs, and a Streamlit dashboard.
- SQL
- dbt
- DuckDB
- Streamlit
- GitHub Actions
- Python
2026
Distributed data mining and ML pipelines on 1.9M-20M record datasets, run on the University of Sheffield Stanage HPC cluster. Web log mining, traffic prediction, HIGGS classification, MovieLens recommendations.
- PySpark
- Python
- Slurm
- HPC
- Spark MLlib
2025
Classical ML benchmark on FBANK speech features. Improved speed-classification accuracy from 79.2% to 86.6% via feature standardisation and kNN tuning. Compared kNN, Logistic Regression, Linear SVM, Random Forest with detailed error analysis.
- Python
- scikit-learn
- NumPy
- FBANK