Natural Language Processing

A series of articles dedicated to natural language processing. All codes and exercises of this section are hosted on GitHub in a dedicated repository :

Introduction to Natural Language Processing : What is NLP ? What is it used for ?

Text Preprocessing : Preprocessing in Natural Language Processing (NLP) is the process by which we try to “standardize” the text we want to analyze.

Text Embedding with Bag-Of-Words and TF-IDF : In order to analyze text and run algorithms on it, we need to embed the text. The notion of embedding simply means that we’ll conver the input text into a set of numerical vectors that can be used into algorithms. In this article, we’ll cover BOW and TF-IDF, two simple techniques for embedding.

Text Embedding with Word2Vec : A deeper dive into the state of the art embedding technique : Word2Vec.

Data Augmentation in NLP : Details of the implementation of “Easy Data Augmentation” paper.

I trained a Network to Speak Like Me (and it’s funny) : Over the course of the past months, I wrote over 100 articles on my blog. That’s quite a large amount of content. An idea then came to my mind : train a language generation model to speak like me. Or more specifically, to write like me.

Few-Shot Text Classification with Human in the Loop : This article addresses the task of classifying texts when we have few training examples.

Improved Few Short Text Classification : As an extension of the previous article, I propose a method that leverages both Data Augmentation and better classifiers.

Character-level LSTMs for Gender Classification from First Name : Implementation of the paper “Predicting the gender of Indonesian Names” on names given in France and in the US using bi-directional character-level LSTMs architecture. Achieved 90% accuracy.

Easy Question Answering with AllenNLP : Understand the core concepts and create a simple example of Question Answering.