In March 2020, I started my Ph.D. in Speech Processing at Idiap Research Institute, affiliated to EPFL. I tried to document the process, whether by describing technical concepts, or simply by writing about some projects, describing a typical day…
I’m working on Roxanne European Project. ROXANNE (Real time network, text, and speaker analytics for combating organized crime) is an EU funded collaborative research and innovation project, aiming to unmask criminal networks and their members as well as to reveal the true identity of perpetrators by combining the capabilities of speech/language technologies and visual analysis with network analysis.
You can learn more about the ROXANNE Project in the article I wrote on it here.
You’ll find below articles that I wrote on topics related to my Ph.D.
Fundamental papers in Speaker Verification : A list of fundamentals papers in Speaker Verification.
I. Maths and stats
Linear Discriminant Analysis (LDA) and QDA : In this article, we’ll cover the intuition behind LDA, when it should be used, and the maths behind it. We’ll also quick cover the Quadratic version of LDA.
EM for Gaussian Mixture Models and Hidden Markov Models : 140 detailed and visual slides on GMMs, HMMs and EM.
Joint Factor Analysis (JFA) : Coming soon.
II. Speech Processing
Introduction to Kaldi: An introduction to install, understand the key features, the organization, and get you started with Kaldi.
Kaldi for speaker verification: An example on how to run Kaldi for speaker verification.
Voice Processing Fundamentals
Introduction to Voice Processing in Python: Summary of the book “Voice Computing with Python” with concepts, code and examples.
Sound Feature Extraction: An overview with a Python implementation of the different sound features to extract.
Sound Visualization: Dive into spectrograms, chromagrams, tempograms, spectral power density and more…
Speaker Verification fundamentals
The basics of Speaker Verification: High-level overview of the speaker verification process.
Speaker Verification using Gaussian Mixture Model (GMM-UBM): Diving deeper in the training process of a GMM-UBM model.
Speaker Verification using SVM-based methods: Another method relying on Support Vector Machines for Speaker Verification.
Speaker Verification and i-vectors: Coming soon
Automatic Speech Recognition
Introduction to Automatic Speech Recognition: What is ASR? What is the pipeline? What is acoustic modeling?
HMM Acoustic Modeling: Introduction to HMM Acoustic Modeling, context-dependent phone models, triphones…
Neural Network Acoustic Modeling: Deep Neural Networks for Acoustic modeling, and introduction to hybrid HMM-DNN acoustic models
The decoding graph: Problems arise in large vocabularies to decode a sequence with Viterbi. How is the language model used then? And how does it improve the search of the best sequence? Learn about the decoding graph and WFSTs.
Speaker Adaptation: How can we handle the mismatch between training and test data in ASR?
Sequence discriminative training: State-of-the-art methods rely on discriminative training (MMI). What changes?
Multilingual and Low-Resource Speech Recognition: Diving in some methods to handle low-resource languages.
Wav2Vec and Wav2Vec 2.0 tutorials: An in-depth tutorial that covers Wav2Vec and Wav2Vec 2.0 research papers and code.
III. Network analysis
Criminal and social networks are at the core of criminal investigation. More and more data is being collected in investigation cases, and identifying who knows who and what is being said is crucial.
Introduction to Graphs : What is a graph ? Where are graphs being used ? What are the components of a graph ?
Graph Analysis, Erdos-Rényi, Barabasi-Albert : In this article, we cover the two main types of graphs, and describe a first approach to graph analysis.
Graph Algorithms : We’ll now explore the main graph algorithms and several use cases in a visual way with direct examples in Python.
Graph Learning : How can we handle missing links or missing nodes in graphs ?
Graph Embedding : A practical introduction to Graph Embedding with Node2Vec and Graph2Vec.
“Disrupting Resilient criminal networks through data analysis” paper summary: A summary and data exploration of an interesting paper on criminal networks in the Sicilian MAFIA.
“Structural Analysis of Criminal Network and Predicting Hidden Links using Machine Learning” paper summary: Summary and discussion of a paper tackling hidden link prediction as a supervised learning problem.
“Social network analysis as a tool for criminal intelligence:understanding its potential from the perspective of intelligence analysts” paper summary: A qualitative review on how Law Enforcement Agencies using Criminal Network Analysis tools, and my personal view on that.
A supervised learning approach to predicting nodes betweenness-centrality in time-varying networks: Can we predict which nodes will be central in the future? An explorative approach applied to Enron dataset with encouraging results.
IV. When Speaker Identification meets graphs
There are really few papers linking graphs and speaker identification. Phonexia wrote this web article on how to leverage community detection for speaker identification. It’s a good starting point. Below, I’ll summarize papers that I found on this topic and ideas that I have.
“Leveraging side information for speaker identification with the Enron conversational telephone speech collection” paper summary: A first approach of how to leverage the structure of a network to enhance speaker identification on an e-mail and call database.
“Speaker Identification Enhancement using Network Knowledge in Criminal Investigations”: Our contribution to mixing graphs and speaker identification. We introduce new metrics to measure the accuracy of a speaker identification system, a new set of criminal data and improve overall results. Mael Fabien, Seyyed Saeed Sarfjoo, Petr Motlicek, Srikanth Madikeri
Submitting a first paper to ArXiv: 1-2 details about what might not work when doing your first submission, and how to troubleshoot it.
My Ph.D. timeline: A timeline recap of my Ph.D. process at Idiap.
Like it? Buy me a coffee