Speech Speed & Tempo Classification

The task

Classify whether a speech recording has been altered in either speed (playback rate) or tempo (perceived speech pace, holding pitch fixed), using FBANK acoustic features.

What I tried

Four classical classifiers, head to head:

k-Nearest Neighbours
Logistic Regression
Linear SVM
Random Forest

Each evaluated with cross-validated accuracy and confusion matrices.

Result

Speed classification: 79.2% → 86.6% after two changes — feature standardisation (z-scoring per feature) and kNN hyperparameter tuning (k swept over odd integers, distance metric grid).

Tempo classification stayed weaker. Error analysis pointed at the cause: the temporal-mean pooling step I used to flatten time-varying features discarded exactly the timing information tempo classification needs. A pure architectural choice, not a hyperparameter problem.

Why this is on the CV

It's not the flashiest project, but it shows the discipline that matters most for an ML role: pick a tractable benchmark, run multiple models honestly, look at why one wins, and write up the failure modes as carefully as the wins.