2025 · Solo project · 1 min read
Speech Speed & Tempo Classification
Classical ML benchmark on FBANK speech features. Improved speed-classification accuracy from 79.2% to 86.6% via feature standardisation and kNN tuning. Compared kNN, Logistic Regression, Linear SVM, Random Forest with detailed error analysis.
- Python
- scikit-learn
- NumPy
- FBANK
The task
Classify whether a speech recording has been altered in either speed (playback rate) or tempo (perceived speech pace, holding pitch fixed), using FBANK acoustic features.
What I tried
Four classical classifiers, head to head:
- k-Nearest Neighbours
- Logistic Regression
- Linear SVM
- Random Forest
Each evaluated with cross-validated accuracy and confusion matrices.
Result
Speed classification: 79.2% → 86.6% after two changes — feature standardisation (z-scoring per feature) and kNN hyperparameter tuning (k swept over odd integers, distance metric grid).
Tempo classification stayed weaker. Error analysis pointed at the cause: the temporal-mean pooling step I used to flatten time-varying features discarded exactly the timing information tempo classification needs. A pure architectural choice, not a hyperparameter problem.
Why this is on the CV
It's not the flashiest project, but it shows the discipline that matters most for an ML role: pick a tractable benchmark, run multiple models honestly, look at why one wins, and write up the failure modes as carefully as the wins.