[ 08 ]2026Solo project3 min read

Responsible Neobank Growth

A synthetic neobank whose backend events misbehave on purpose — late, duplicated, reversed, schema-evolving — generated against a known-truth manifest so a governed dbt warehouse can be checked rather than trusted. On top sit the responsible-growth consumers: experimentation (CUPED, SRM, difference-in-differences, synthetic control), a calibrated activation model, and a release-gate that weighs customer-outcome guardrails. Run once on BigQuery: 68 dbt models under 217 data tests and 400 pytest tests, with full-refresh and incremental matching exactly at all six governed interfaces.

Python
dbt
BigQuery
DuckDB
FastAPI
Streamlit
scikit-learn
LookML
GitHub Actions

Source Live demo

217

dbt tests (cloud run)

62.7%

less compute used

400

pytest tests

What I built

A synthetic neobank whose backend events misbehave on purpose — late, duplicated, reversed, schema-evolving — and a governed dbt warehouse that turns them into trusted Growth and referral-reward interfaces. The events are generated against a known-truth manifest, so the warehouse’s correctness can be checked rather than asserted.

Everything is synthetic. No affiliation with any bank, and no real customer or proprietary data; Monzo’s public engineering writing shaped which problems I picked, not how any of it is built.

Why known truth

Most analytics demos start from clean data and treat growth as the only goal. A real neobank has neither luxury: events arrive late, duplicated, corrected and reversed, schemas change under you, and a bad growth decision can leave a vulnerable customer worse off. So the source is built with the answer known in advance — every duplicate, late arrival, reversal, malformed payload and missing posting is injected against a manifest. That is what makes the incremental-versus-full comparison and the reward reconciliation mean anything: there is a fixed truth to check against, not a plausible-looking output to trust.

What holds up

The numbers below come from the committed run artefacts and the dbt manifest:

The four-layer warehouse ran once on BigQuery: 68 dbt models, 217 data tests and 4 unit tests green, alongside a 400-test pytest suite.
Full-refresh and incremental builds matched exactly at every governed interface, across all three phases.
The cost result is measured and mixed, and I report it that way: incremental billed +1.95% bytes but used −62.7% compute, and partitioning cut one query’s scan 523.9×, over 569k deliveries. The per-run spend figures the site used to quote are no longer published: the benchmark artefacts behind them were removed from the public repo, and a number whose evidence has gone is a number I will not keep asserting.

Governed interfaces

Four governed interfaces the downstream work reads, enforced against the real dbt manifest, with reconciliation covering six:

growth_acquisition — where applicants move or drop between application, approval and funded activation.
referral_economics — whether referrals bring in incremental activated customers at a reward cost worth paying.
reward_reconciliation — which expected rewards are missing, duplicated, mismatched, stale or wrongly reversed.

The responsible-growth work

The interfaces feed the analytics, not the other way round:

Experimentation: CUPED, SRM, heterogeneous effects, difference-in-differences and synthetic control, with the Welch/CUPED/SRM estimators running unchanged on the governed interface.
A calibrated activation model — isotonic scoring with a model card — behind a FastAPI service.
Fairness, wellbeing, inclusion and protection modules, plus a fair-value pricing check.
A responsible release-gate that resolves a change to ship / limited_rollout / experiment_only / needs_human_review / block on evidence and customer-outcome guardrails.

Looker — written, not validated

There is a full LookML project: a model, four Explores, three dashboards, and Assert tests, all against the governed BigQuery interface. It has never run in a Looker instance — the trial signup returned a sales-contact page with no instance — so I claim no Looker experience and no validated LookML.

Honest limits

The data is engineered for coverage, not calibrated to any bank: activation rates, £CLV and fairness-gap sizes are illustrative magnitudes, not forecasts. The synthetic event dataset is published on Hugging Face under CC-BY-4.0, verified from a clean download. A real regulated deployment would still need formal data governance, keyless CI/CD, a model registry, and Consumer Duty and model-risk controls before any live customer decisioning — gaps the repo documents rather than hides.