Ph.D. at Idiap/EPFL on Roxanne EU Project

In March 2020, I started my Ph.D. in Speech Processing at Idiap Research Institute, affiliated to EPFL. I tried to document the process, whether by describing technical concepts, or simply by writing about some projects, describing a typical day…

I’m working on Roxanne European Project. ROXANNE (Real time network, text, and speaker analytics for combating organized crime) is an EU funded collaborative research and innovation project, aiming to unmask criminal networks and their members as well as to reveal the true identity of perpetrators by combining the capabilities of speech/language technologies and visual analysis with network analysis.


You can learn more about the ROXANNE Project in the article I wrote on it here.

You’ll find below articles that I wrote on topics related to my Ph.D.



Fundamental papers in Speaker Verification : A list of fundamentals papers in Speaker Verification.

I. Maths and stats

Linear Discriminant Analysis (LDA) and QDA : In this article, we’ll cover the intuition behind LDA, when it should be used, and the maths behind it. We’ll also quick cover the Quadratic version of LDA.

Probabilistic Linear Discriminant Analysis (PLDA) : Coming soon.

EM for Gaussian Mixture Models and Hidden Markov Models : 140 detailed and visual slides on GMMs, HMMs and EM.

Joint Factor Analysis (JFA) : Introduction to joint factor analysis in speaker verification.

II. Speech Processing


Introduction to Kaldi: An introduction to install, understand the key features, the organization, and get you started with Kaldi.

Kaldi for speaker verification: An example on how to run Kaldi for speaker verification.

Voice Processing Fundamentals

Introduction to Voice Processing in Python: Summary of the book “Voice Computing with Python” with concepts, code and examples.

Sound Feature Extraction: An overview with a Python implementation of the different sound features to extract.

Sound Visualization: Dive into spectrograms, chromagrams, tempograms, spectral power density and more…

Speaker Verification fundamentals

The basics of Speaker Verification: High-level overview of the speaker verification process.

Speaker Verification using Gaussian Mixture Model (GMM-UBM): Diving deeper in the training process of a GMM-UBM model.

Speaker Verification using SVM-based methods: Another method relying on Support Vector Machines for Speaker Verification.

Speaker Verification and i-vectors: Overview of the i-vector method for Speaker Verification.

Deep Learning approach to speaker verification with X-vectors: How X-vectors improve the speaker verification process.

Automatic Speech Recognition

Introduction to Automatic Speech Recognition: What is ASR? What is the pipeline? What is acoustic modeling?

III. Network analysis

Criminal and social networks are at the core of criminal investigation. More and more data is being collected in investigation cases, and identifying who knows who and what is being said is crucial.

Graph theory

Introduction to Graphs : What is a graph ? Where are graphs being used ? What are the components of a graph ?

Graph Analysis, Erdos-Rényi, Barabasi-Albert : In this article, we cover the two main types of graphs, and describe a first approach to graph analysis.

Graph Algorithms : We’ll now explore the main graph algorithms and several use cases in a visual way with direct examples in Python.

Graph Learning : How can we handle missing links or missing nodes in graphs ?

Graph Embedding : A practical introduction to Graph Embedding with Node2Vec and Graph2Vec.

Criminal networks

“Disrupting Resilient criminal networks through data analysis” paper summary: A summary and data exploration of an interesting paper on criminal networks in the Sicilian MAFIA.

“Structural Analysis of Criminal Network and Predicting Hidden Links using Machine Learning” paper summary: Summary and discussion of a paper tackling hidden link prediction as a supervised learning problem.

“Social network analysis as a tool for criminal intelligence:understanding its potential from the perspective of intelligence analysts” paper summary: A qualitative review on how Law Enforcement Agencies using Criminal Network Analysis tools, and my personal view on that.

IV. When Speaker Identification meets graphs

There are really few papers linking graphs and speaker identification. Phonexia wrote this web article on how to leverage community detection for speaker identification. It’s a good starting point. Below, I’ll summarize papers that I found on this topic and ideas that I have.

“Leveraging side information for speaker identification with the Enron conversational telephone speech collection” paper summary: A first approach of how to leverage the structure of a network to enhance speaker identification on an e-mail and call database.

Like it? Buy me a coffeeLike it? Buy me a coffee