I am gathering in this article fundamental Speech Processing papers, and specifically Speaker Verification, and will provide summary of most of them over time.

Databases

LibriSpeech: A fundamental english database based on audio-book recordings for text-independent speaker recognition.

Speaker In The Wild: A large hand-annotated real-condition database for text-independent speaker recognition.

VoxCeleb 1: Large amount of open-source data extracted from Youtube using Computer Vision techniques for speaker recongition and speaker diarization.

VoxCeleb 2: An even larger corpus extracted with an improved pipeline.

RSR: A text-dependent speaker recognition database using multiple pass phrase, in English, from Singapore.

Other famous databases include NIST Speaker Recognition Evaluation Challenge or the “Ok Google” proprietary speaker verification database.

Speaker Verification Fundamentals

SVM GMM-Supervector Speaker Verification: A method relying on GMM-supervectors and SVM to classify speakers in Speaker Verification tasks.

Probabilistic Linear Discriminant Analysis: A key element used for dimension reduction and speaker classification.

Front-End Factor Analysis For Speaker Verification: The paper describing the i-vector feature extraction.

X-vectors: Robust DNN Embeddings for speaker recognition: Applying time-delay NN and data augmentation to create robust embeddings called x-vectors.

Text-dependent Speaker Recognition

DNN for small footprint text-dependent speaker verification: A NN approach to feature extraction called the d-vector.

Text-independent Speaker Recognition

Deep Neural Network Embeddings for Text-Independent Speaker Verification: Learning speaker embeddings with DNN with a PDLA background. Building block of the x-vector.

Evaluation Metrics

Application-Independent Evaluation of Speaker Recognition Systems: A summary of the different speaker recognition systems used including false alarms, misses, DET-plot, EER and Detection Cost Function.