Posts by Year
2025
2020
How to install (py)Spark on MacOS (late 2020)
Apache Spark
Self-training and pre-training, understanding the wav2vec series
Speech Processing
My Ph.D. process
Process
Multilingual and Low-Resource Speech Recognition
Speech Processing
Sequence discriminative training
Speech Processing
Speaker Adaptation
Speech Processing
The decoding graph
Speech Processing
Neural Network acoustic modeling
Speech Processing
HMM acoustic modeling
Speech Processing
AutoHome, a tool to find your dream house
My girlfriend and I were recently looking for a house to buy. Rather than spending time on each of the real-estate websites individually, I decided to build ...
Introduction to Automatic Speech Recognition (ASR)
Speech Processing
Illustrating EM for GMMs and HMMs
I recently gave a talk on EM for GMMs and HMMs at EPFL and published the slides here. For the sake of the presentation, I built an interactive web applicatio...
Building a Dash Web application for Data Viz and ML
I recently had to build a Dash web application to illustrate what Dash-Plotly can do. I chose to present some capabilities regarding Data Viz and Machine Lea...
A supervised learning approach to predicting nodes betweenness-centrality in time-varying networks
Criminal Networks
Destabilizing Netorks
Criminal Networks
Autograde, a grading tool for teachers
Many teachers are working from home during COVID-19 crisis, receiving and grading works online. My mother is in this situation. To help her, I built AutoGrad...
Probabilistic Linear Discriminant Analysis (PLDA)
Speech Processing
Structural Analysis of Criminal Network and Predicting Hidden Links using Machine Learning
Criminal Networks
Leveraging side information for speaker identification with the Enron conversational telephone speech collection
Criminal Networks
Disrupting Resilient criminal networks through data analysis
Criminal Networks
Basics of Speaker Verification
Speech Processing
COVID-19 in Senegal Live application
Due to the lack of regularly updated database of COVID-19 cases in Senegal, I decided to build an open database and a web application to display this informa...
Fundamental Speech Processing Papers
Speech Processing
Kaldi for Speaker Verification
Speech Processing
Introduction to Kaldi
Speech Processing
Voice Gender Identification
Signal Processing
Voice Gender Identification
Can we detect the gender of a voice using ML methods? I recently came across this article which I found quite interesting in the way it addresses Gender Iden...
Voice Activity Detection Application
Voice activity detection is a field which consists in identifying whether someone is speaking or not at a given moment. It can be useful to launch a vocal as...
2019
Interview by DataCast
Thoughts
Implementing YoloV3 for object detection
Yolo is one of the greatest algorithm for real-time object detection. In its large version, it can detect thousands of object types in a quick and efficient ...
Sound Feature Extraction
Signal Processing
Voice Computing in Python
Signal Processing
Introduction to Digital Signal Processing
Signal Processing
Introduction to Continuous Signal Processing
Signal Processing
Voice Activity Detection
Speech Processing
Sound Visualization
Signal Processing
Introduction to Continuous Signal Processing
Signal Processing
Easy Question Answering with AllenNLP
AllenNLP is an Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks....
Implementing YoloV3 for object detection
Computer Vision
Easy Question Answering with AllenNLP
Natural Language Processing
Speaker Verification using I-vector features
Speech Processing
Speaker Verification using SVM-based methods
Speech Processing
Speaker Verification using Gaussian Mixture Model (GMM-UBM)
Speech Processing
Deploy a container on GCP
In the previous article, we managed to build a container from a simple web application using Spacy, Streamlit and Docker. We ran the container locally. In th...
Deploy a Streamlit WebApp with Docker
Deploying your model in an interactive web application as a container can be challenging. Well, at least it used to. In this project, I will show you how to ...
Data Augmentation in Natural Language Processing
Natural Language Processing
Character-Level LSTMs for Gender Classification from Name
Natural Language Processing
Improved Few-Shot Text classification
Natural Language Processing
Text classification from few training examples
Natural Language Processing
Planning by Dynamic Programming
Advanced AI
Markov Decision Process
Advanced AI
Introduction to Reinforcement Learning
Advanced AI
I trained a Network to Speak Like Me (and it’s funny)
Natural Language Processing
Emotion Recognition WebApp
We developped a multimodal emotion recognition platform to analyze the emotions of job candidates, in partnership with the French Employment Agency.
A No-SQL Big Data project from scratch
The GDELT Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the peop...
Face Detection
In this tutorial, we’ll see how to create and launch a face detection algorithm in Python using OpenCV. We’ll also add some features to detect eyes and mouth...
I trained a Network to Speak Like Me
Over the course of the past months, I wrote over 100 articles on my blog. That’s quite a large amount of content. An idea then came to my mind : train a lang...
A guide to Data Acquisition
Applied Data Science
Predicting the next hit song (Part 2)
Applied Data Science
Predicting the next hit song (Part 1)
Applied Data Science
Who’s the painter ?
In this article, we will be using data from the Web Gallery of Art, a virtual museum and searchable database of European fine arts from the \(3^{rd}\) to \(1...
Interactive Map with D3.js
I developped an interactive D3.js plot of the population density of France. The tool highlights dense regions of France, and has a zoom feature.
How do Neural Networks learn ?
Deep Learning with PyTorch
Activation Functions
Deep Learning with PyTorch
Making your code production-ready
Tips & Tricks
Full introduction to Neural Nets
Deep Learning with PyTorch
Anomaly Detection
Supervise and Unsupervised Algorithms
Run jobs on Dataproc - Week 1 Module 2
Road to Google Cloud Platform Certification
Lab - Create a Cloud DataProc Cluster
Road to Google Cloud Platform Certification
Introduction to Cloud Dataproc - Week 1 Module 1
Road to Google Cloud Platform Certification
Handle Missing Values in Time Series
Time Series
Time Series Forecasting with Prophet
Time Series
Create a streaming data pipeline with Cloud DataFlow
In this project, we will analyze data from a taxi business. The aim of the lab is to : Connect to a streaming data Topic in Cloud Pub/sub Ingest streami...
Lab - Create a streaming data pipeline with Cloud DataFlow
Road to Google Cloud Platform Certification
Create Streaming Data Pipelines - Week 2 Module 1
Road to Google Cloud Platform Certification
Basic Time Series Forecasting
Time Series
Run ML models in SQL with BigQuery ML - Week 1 Module 3
Road to Google Cloud Platform Certification
Lab - Recommend products using Cloud SQL and SparkML
Road to Google Cloud Platform Certification
Lab - Running a Apache Spark job on Cloud DataProc
Road to Google Cloud Platform Certification
Lab - Classify Images with Pre-Built ML models using Cloud Vision and AutoML
Road to Google Cloud Platform Certification
Classify Images using Vision API and Cloud AutoML - Week 2 Module 2
Road to Google Cloud Platform Certification
Using OpenPose on macOS
“OpenPose represents the first real-time multi-person system to jointly detect human body, hand, facial, and foot key points (in total 135 keypoints) on sing...
Earthquake Analysis on GCP
In this project, our aim will be to create a VM instance to process real earthquake data and make the analysis publicly available.
Recommendation Systems in GCP - Week 1 Module 2
Road to Google Cloud Platform Certification
Introduction to Google Cloud Platform - Week 1 Module 1
Road to Google Cloud Platform Certification
Explore a dataset on Google BigQuery
Storing and querying massive datasets can be time-consuming and expensive without the right hardware and infrastructure. Google BigQuery is an enterprise dat...
Lab - Explore a BigQuery Public Dataset
Road to Google Cloud Platform Certification
Lab - VM and Storage for Earthquake Data
Road to Google Cloud Platform Certification
A Guide to Hyperparameter Optimization (HPO)
Parameters and Model Optimization
Interactive Data Visualization
For a recent project, I developped an interactive data visualization tool that was deployed on a web app. The corresponding GitHub repository can be found he...
Container Dwell Time prediction
More than 80% of the goods consumed worldwide are transported in containers. The invention of the container is no less than the greatest revolution in our mo...
The phylosophy game in Wikipedia
According to ReadWriteWeb, all articles in the English version of Wikipedia lead to the article “Philosophy”. If you click on the first link of each article,...
Graph Embedding
Graph Analysis and Graph Learning
Estimate location using RSSI
Smart devices such as IoT sensors use low energy consuming networks such as the ones provided by Sigfox or Lora. But without using GPS networks, it becomes h...
Face Classification using XCeption
In this challenge, our aim was to develop face classification algorithms using Deep Learning Architectures. I have explored hand-made CNNs, Inception, XCepti...
NLP on GitHub comments
The dataset I am using in this project (github_comments.tsv) that carries 4000 comments that were published on pull requests on Github by developer teams.
Grid Search vs. Randomized Search
Advanced Machine Learning
Machine Learning Explainability
Advanced Machine Learning
Who’s the painter ?
Better features, better data
Introduction to Natural Language Processing
Natural Language Processing
Using Spark-Scala for Machine Learning
Parallel and Distributed Computing
Install Spark-Scala and PySpark
Parallel and Distributed Computing
Introduction to Spark
Parallel and Distributed Computing
Introduction to Bash Scripting
Tips & Tricks
Execute MapReduce Job in Python locally
Parallel and Distributed Computing
Linear Classification
Advanced Machine Learning
Introduction to Online Learning
Advanced Machine Learning
Introduction to D3.js
Data Viz
Create Ubuntu VMs with Virtual Box
Hadoop runs only on GNU/Linux platforms. Therefore, if you have another OS, you need to install Virtual Box. Virtual Box is a software that lets you create a...
Virtual Machines with Virtual Box
Parallel and Distributed Computing
MapReduce Jobs in Python (4/4)
Parallel and Distributed Computing
Launch a MapReduce Job (3/4)
Parallel and Distributed Computing
Hadoop with the HortonWorks Sandbox (1/4)
Parallel and Distributed Computing
Load and move files to HDFS (2/4)
Parallel and Distributed Computing
Introduction to Hadoop
Parallel and Distributed Computing
MapReduce, Illustrated
Parallel and Distributed Computing
Hadoop Distributed File System (HDFS)
Parallel and Distributed Computing
Bayesian Hyperparameter Optimization
Parameters and Model Optimization
AutoML with h2o
Parameters and Model Optimization
Gradient Boosting Classification
Supervised Learning Algorithms
Gradient Boosting Regression
Supervised Learning Algorithms
Key Concepts of Data Visualization
Data Viz
Tableau-like in Python with Altair
Data Viz
Binary output prediction and Logistic Regression
Logistic Regression
Build a Language Recognition app
In this project, we will build a language recognition app using Markov Chains and likelihood decoding algorithm. If you have not seen my previous articles on...
Build a Language Recognition app from scratch
Markov Processes and HMM
Hidden Markov Model (HMM)
Markov Processes and HMM
Markov Processes
Markov Processes and HMM
Word Embedding with Skip-Gram Word2Vec
Natural Language Processing
From text to vectors with BoW and TF-IDF
Natural Language Processing
Text Preprocessing
Natural Language Processing
Graph Learning
Graph Analysis and Graph Learning
Key Concepts of Time Series
Time Series
Introduction to Time Series
Time Series
XCeption Model and Depthwise Separable Convolutions
Deep Neural Networks
AWS Cloud Practitioner - Core Services
Amazon Web Services
AWS Cloud Practitioner - Core Services
Amazon Web Services
Key Computer Components
Parallel and Distributed Computing
AWS Cloud Practitioner - Cloud Concepts
Amazon Web Services
Graph Algorithms
Graph Analysis and Graph Learning
Graph Analysis
Graph Analysis and Graph Learning
Introduction to Graphs
Graph Analysis and Graph Learning
Image Alignment and Image Warping
Computer Vision
Local features, Detection, Description and Matching
Computer Vision
Image subsampling and downsampling
Computer Vision
Image formation and Filtering
Computer Vision
Large Scale Kernel Methods
Supervised Learning Algorithms
Forest Type Prediction
In this challenge, I am trying to predict the forest cover type (the predominant kind of tree cover) from strictly cartographic variables (as opposed to remo...
Introduction to Computer Vision
Computer Vision
Create an Auto-Encoder
Autoencoder is a type a neural network widely used for unsupervised dimension reduction. So, how does it work? What can it be used for? And how do we impleme...
Create an Auto-Encoder using Keras functional API
Deep Neural Networks
Statistics in Matlab
Matlab
A guide to Inception Model in Keras
Deep Neural Networks
Tests, p-values, restrictions
Statistical Hypothesis testing
Linear Discriminant Analysis (LDA), QDA
Supervised Learning Algorithms
A full guide to face detection
Tutorials
Getting Started with Dev Tools in Elasticsearch
Elastic Search, Logstash, Kibana
Install and run Elasticsearch + Kibana locally
Elastic Search, Logstash, Kibana
Getting started with Elastic Cloud
Elastic Search, Logstash, Kibana
Introduction to the ElasticStack
Elastic Search, Logstash, Kibana
Move Scala Dataframes to Cassandra
GDelt Project
Build an ETL in Scala for GDELT Data
GDelt Project
Bayes Classifier
Machine Learning Basics
Adaptative Boosting (AdaBoost)
Supervised Learning Algorithms
Install Apache Spark on EC2 instances
Amazon Web Services
Using Google Drive to store your data on Colab
Google Cloud Platform
TPU survival guide on Google Colaboratory
Google Cloud Platform
Install Zookeeper on EC2 instances
Amazon Web Services
Install Apache Cassandra on an AWS EC2 Cluster
Amazon Web Services
Launch and access an AWS EC2 Cluster
GDelt Project
Big (Open) Data , the GDELT Project
GDelt Project
Working with Amazon S3 buckets
Amazon Web Services
How to run a Zeppelin notebook on AWS EMR?
Amazon Web Services
How to run a Zeppelin notebook locally?
Amazon Web Services
Unsupervised Learning Cheat Sheet
Machine Learning Basics
Interview for DataCast
Thoughts
Setup your computer
Before we start
The important advice I wish I received
Data Analysis Basics
Functions
Data Analysis Basics
Loops
Data Analysis Basics
Manipulating strings
Data Analysis Basics
Data structures
Data Analysis Basics
Conditions
Data Analysis Basics
Basics of Python programming
Data Analysis Basics
Introduction to Python’s environment
Data Analysis Basics
Getting started without setup
Data Analysis Basics
The basis of Machine Learning
Machine Learning Basics
Installing external libraries
Data Analysis Basics
Introduction to Git
Data Analysis Basics
Business Analyst vs. Data Analyst vs. Data Scientist
Data Analysis Basics
What to expect from this training?
Before we start
The final step of the training
Final step
2018
How to use OpenPose on macOS ?
Tutorials
Convolutional Neural Networks
Deep Neural Networks
Prevent Overfitting of Neural Networks
Deep Neural Networks
Multilayer Perceptron
Deep Neural Networks
The Rosenblatt’s Perceptron
Deep Neural Networks
Full guide to Linear Regression (2/2)
Linear Model
Full guide to Linear Regression (Part 1)
Linear Model
Key Resources
Books, papers and talks