This article requires having read the first article on Speaker Verification using GMM-UBM method.
SVM-based method, as GMM-UBM method, rely on GMM vectors, but in another format. We must compute SVM super-vectors, a concatenation of GMM mean vectors in a single vector. We concatenate the feature vectors we extracted for each mixture component. Instead of having 512 Gaussian components of dimention 26 each, we have a single vector of size $ 512 \times 26 = 13312$.
Support Vector Machine (SVM) algorithm learns a discriminative frontier between two classes which maximizes margins. It can leverage a non-linear kernel mapping to project the data in a high-dimensional space in which it is linearly separable.
The two classes to distinguish from are simply:
- the target speaker
- the impostor/background/population
The discriminative function of the SVM is given by:
- $ y_i $ is the ground truth for the output value, either 1 or -1.
- $ x_i $ is the support vector
- $ \alpha_i $ are the corresponding weights
- $ d $ is a bias term
And that’s it ! We just need to train the SVM model on GMM super-vectors with positive and negative labels. Applying a SVM with a non-linear Kernel will identify the discriminative frontier.
The prediction is straight-forward, since we just need to extract the super-vector and run it into the trained SVM.
Like it? Buy me a coffee