One of the main challenges, when dealing with text, is to build an efficient preprocessing pipeline.
What is Natural Language Processing?
Natural Language Processing (NLP) is at the crossroads of artificial intelligence, linguistics and machine learning.
Natural Language Processing aims to extract meaning from textual data.
Therefore, NLP has many applications, especially in :
- translation (DeepL or Google Translate)
- document classification
- spell-checkers
- automatic summary
- human-computer interactions
- speech recognition
- speech synthesis
- opinion analysis
NLP is generally divided in 2 to 3 main tasks :
- Text preprocessing: A step whose role is to standardize the text input according to the usage one wants to make of it
- Representing text as vectors, which can generally be done either using BoW/TF-IDF methods (which we’ll cover in future articles) or by learning an embedding of the text as a vector (Word2Vec for example)
In the next articles, we’ll cover these 2 major steps, but we’ll also cover some extensions and some cool applications of NLP!