Skip to content

Selected work

Projects

A few representative builds. Filter by stack to narrow down.

  • 2026

    Synthetic evaluation lab for AI-agent reliability: 350 golden cases, 60 red-team cases, RAG evaluation, safe refusal, safety classifiers, prevalence estimation, human-review simulation, mitigation impact, release gate reporting, FastAPI, OTel tracing, and CI.

    • Python
    • FastAPI
    • Streamlit
    • Pydantic
    • pytest
    • ruff
    • Docker
    • GitHub Actions
    • GitHub Pages
    View source
  • 2026

    Reproducible benchmark measuring how published adversarial prompts perform against 2026-era LLMs and whether prompt-only defences move the needle — with cross-judge validation and bootstrap confidence intervals.

    • Python
    • Claude Sonnet 4.6
    • Llama 3.1 8B
    • Inspect AI
    • GitHub Actions
    • pytest
    • ruff
    • mypy
    View source
  • 2026

    Zero-shot event extraction with Qwen2.5-7B-Instruct on MAVEN and WikiEvents. Compares unconstrained vs constrained-label prompting across trigger detection, type prediction, and argument extraction. A100 GPU inference via Hugging Face.

    • Python
    • Qwen2.5-7B
    • Hugging Face
    • PyTorch
    • A100
    View source
  • 2026

    End-to-end synthetic fintech data product — dbt metrics, CUPED A/B experimentation, activation model, geo-lift referral analysis, pricing intelligence, FastAPI service, and a full GCP deployment path with BigQuery, Cloud Run, and Cloud Monitoring.

    • Python
    • dbt
    • DuckDB
    • BigQuery
    • Cloud Run
    • FastAPI
    • Streamlit
    • Marimo
    • scikit-learn
    • synthetic control
    • GitHub Actions
    • GCP
    View source
  • 2026

    Extractive document question-answering pipeline using PDF text extraction, sentence-based chunking, RoBERTa-SQuAD2 inference, and answer evaluation scripts.

    • Python
    • Transformers
    • RoBERTa
    • PDF processing
    • NLP
    View source
  • 2026

    End-to-end analytics engineering on 4.99M UK Land Registry property transactions. dbt + DuckDB warehouse with staging, intermediate, fact, dimension, and reporting layers, 88 data tests, CI, docs, and a Streamlit dashboard.

    • SQL
    • dbt
    • DuckDB
    • Streamlit
    • GitHub Actions
    • Python
    View source
  • 2026

    Distributed data mining and ML pipelines on 1.9M-20M record datasets, run on the University of Sheffield Stanage HPC cluster. Web log mining, traffic prediction, HIGGS classification, MovieLens recommendations.

    • PySpark
    • Python
    • Slurm
    • HPC
    • Spark MLlib
    View source
  • 2025

    Classical ML benchmark on FBANK speech features. Improved speed-classification accuracy from 79.2% to 86.6% via feature standardisation and kNN tuning. Compared kNN, Logistic Regression, Linear SVM, Random Forest with detailed error analysis.

    • Python
    • scikit-learn
    • NumPy
    • FBANK
    View source
  • 2026

    A free, fully client-side file converter — image and PDF tools that run entirely in the browser via WebAssembly. Files never leave your device. Live at fromatob.app.

    • TypeScript
    • Astro 5
    • React
    • Tailwind CSS v4
    • WebAssembly
    • pdf-lib
    • pdf.js
    • mozjpeg
    • Cloudflare Workers
    View source