This article requires having read the first article on Speaker Verification using GMM-UBM method.

GMM super-vectors

SVM-based method, as GMM-UBM method, rely on GMM vectors, but in another format. We must compute SVM super-vectors, a concatenation of GMM mean vectors in a single vector. We concatenate the feature vectors we extracted for each mixture component. Instead of having 512 Gaussian components of dimention 26 each, we have a single vector of size $ 512 \times 26 = 13312$.

SVM classification

Support Vector Machine (SVM) algorithm learns a discriminative frontier between two classes which maximizes margins. It can leverage a non-linear kernel mapping to project the data in a high-dimensional space in which it is linearly separable.

The two classes to distinguish from are simply:

  • the target speaker
  • the impostor/background/population

The discriminative function of the SVM is given by:

Where:

  • $ y_i $ is the ground truth for the output value, either 1 or -1.
  • $ x_i $ is the support vector
  • $ \alpha_i $ are the corresponding weights
  • $ d $ is a bias term

And that’s it ! We just need to train the SVM model on GMM super-vectors with positive and negative labels. Applying a SVM with a non-linear Kernel will identify the discriminative frontier.

The prediction is straight-forward, since we just need to extract the super-vector and run it into the trained SVM.


Like it? Buy me a coffeeLike it? Buy me a coffee

Leave a comment