About
Cheng-Yuan (Ross) King
MSc Artificial Intelligence · Empirical AI Safety & Evaluation
I'm an MSc Artificial Intelligence student at the University of Sheffield, with a Computer Science background from Queen's University Belfast. My focus is empirical AI safety and evaluation — designing and running experiments that measure model behaviour, failure modes, robustness, and safe refusal reliability.
Recent work centres on AI safety and evaluation: a synthetic internal AI agent evaluation lab covering RAG retrieval benchmarks, structured extraction, red-team safety testing, safe refusal evaluation, safety classifiers, tool governance, OTel-style observability, and a public Streamlit dashboard (350 golden cases, 60 red-team cases, release gate reporting); an LLM red-team harness measuring attack success rate across 12 evaluation cells with cross-judge validation and bootstrap confidence intervals; and an LLM event extraction baseline comparing unconstrained vs. constrained-label prompting on MAVEN and WikiEvents with Qwen2.5-7B on A100 GPU. Supporting production-systems evidence: a synthetic fintech analytics platform with a dbt/BigQuery warehouse, activation model, CUPED experimentation, and a GCP/Cloud Run deployment path.
I prefer building end-to-end systems over isolated notebooks. I care about reproducibility, honest evaluation, and clear documentation of limitations — understanding failure modes as clearly as the wins. When something scores well, I want to know whether the benchmark is actually well-posed, which is why cross-judge validation, benchmark transparency, and guardrail metrics show up across most of my projects.
I'm currently looking for empirical AI safety research, AI evaluation, and applied AI/ML roles in the UK from October 2026, with particular interest in measuring model behaviour, robustness, and safety. I have a UK Graduate Visa route lined up, so no sponsorship is needed for two years post-graduation.
The fastest way to reach me is by email, or through LinkedIn.
Education
MSc Artificial Intelligence
CurrentUniversity of Sheffield · Sheffield, UK
Sep 2025 – Sep 2026
- Core modules: Scalable Machine Learning, Natural Language Processing, Parallel Computing with GPUs, Machine Learning, Data Science, Text Processing
- Focus on applied AI systems, GenAI evaluation, NLP pipelines, scalable computing, and rigorous model evaluation
BSc Computer Science
Queen's University Belfast · Belfast, UK
Sep 2021 – Jun 2024
- Data Structures and Algorithms, Software Engineering, Advanced Computer Architecture, Cloud Computing
Toolbox
- AI Safety & Eval
- RAG evaluation
- LLM evaluation
- Prompt engineering
- Structured extraction
- Red-team testing
- Guardrail checks
- Hugging Face Transformers
- Safe refusal evaluation
- Adversarial testing
- Safety classifier evaluation
- OpenTelemetry / OTel tracing
- AI & machine learning
- PyTorch
- scikit-learn
- PySpark
- Feature engineering
- Model evaluation
- Calibration
- A/B testing & CUPED
- Data & cloud
- dbt
- DuckDB
- BigQuery
- GCP
- Cloud Run
- Cloud Storage
- SQL
- PostgreSQL
- Engineering
- Python
- FastAPI
- Streamlit
- Pydantic
- Docker
- GitHub Actions
- Monitoring
- Analytics
- Synthetic control
- Causal inference
- Customer segmentation
- Dashboards
- pandas
- Statistical analysis
- Languages
- English (fluent)
- Mandarin (native)
- Japanese (JLPT N1)
Languages
- EnglishFluent
- MandarinNative
- JapaneseJLPT N1