I am gathering in this article fundamental Speech Processing papers, and specifically Speaker Verification, and will provide summary of most of them over time.
LibriSpeech: A fundamental english database based on audio-book recordings for text-independent speaker recognition.
Speaker In The Wild: A large hand-annotated real-condition database for text-independent speaker recognition.
VoxCeleb 1: Large amount of open-source data extracted from Youtube using Computer Vision techniques for speaker recongition and speaker diarization.
VoxCeleb 2: An even larger corpus extracted with an improved pipeline.
RSR: A text-dependent speaker recognition database using multiple pass phrase, in English, from Singapore.
Other famous databases include NIST Speaker Recognition Evaluation Challenge or the “Ok Google” proprietary speaker verification database.
Speaker Verification Fundamentals
SVM GMM-Supervector Speaker Verification: A method relying on GMM-supervectors and SVM to classify speakers in Speaker Verification tasks.
Probabilistic Linear Discriminant Analysis: A key element used for dimension reduction and speaker classification.
Front-End Factor Analysis For Speaker Verification: The paper describing the i-vector feature extraction.
X-vectors: Robust DNN Embeddings for speaker recognition: Applying time-delay NN and data augmentation to create robust embeddings called x-vectors.
Text-dependent Speaker Recognition
DNN for small footprint text-dependent speaker verification: A NN approach to feature extraction called the d-vector.
Text-independent Speaker Recognition
Deep Neural Network Embeddings for Text-Independent Speaker Verification: Learning speaker embeddings with DNN with a PDLA background. Building block of the x-vector.
Application-Independent Evaluation of Speaker Recognition Systems: A summary of the different speaker recognition systems used including false alarms, misses, DET-plot, EER and Detection Cost Function.
Like it? Buy me a coffee