I am currently getting to know Kaldi for my Ph.D. work. I thought that documenting the process would be interesting.
What is Kaldi?
Kaldi is a speech recognition tool written in C++, available on Github right here. The homepage of the project can be found here.
Kaldi is a tool user for many speech-related tasks, such as:
- Automatic Speech Recogniton (ASR)
- Speaker Verification (SV)
- Speaker Diarization
It implements low-level efficient algorithms and makes them available to the end-user through bash and Python scripts. Kaldi is developped by Johns Hopkins University, and Idiap is a large contributor. The project started in 2009. Many ASR or speech-related companies rely today on Kaldi.
What documentation to read?
Kaldi has itself a great documentation. I’ll be presenting here some notes I took through the process of getting used to Kaldi and working on various speech tasks. You can find the documentation here.
Otherwise, it’s quite hard to find external resources on Kaldi. Some articles on Medium can help getting a general overview, like this one.
If you have questions on Kaldi, refer to the help group.
Knowledge Requirements
If you are interested in Kaldi, there are chances that you want to do some Speech Processing for a project or research. Apart from a high-level idea of the process of training an ASR or a Speaker Verification model, one should be familiar with bash scripting and Python. C++ is not necessary to get started, although good to know if you want to dive deeper in Kaldi afterwards.
Setup Requirements
Kaldi runs best on Unix environments. At Idiap, we use Debian. But I also installed it on my MacOS environment using Docker. Kaldi is computationally intensive by the nature of the jobs it will run. It is advised to work on a cluster of Linux machines on the grid, and have access to GPUs. This is however not required to get started in this article.
Kaldi will require you to install several packages. Some of them are required, such as git, wget, bash, perl, awk, grep, make… There are chances that you already have them all installed.
Kaldi will also install other softwares (OpenFst, IRSTLM, SRILM…). Refer to this page if you want to know more about what is installed.
Another option, which I will present below, is to simply rely on Docker to do the job :)
File formats you’ll encounter
In Kaldi, you will encounter many file formats, among which:
- .sh for bash scripts
- .py for Python scripts
- .cc for C++ code
- .h for header files, containing variables, functions… used by various C++ files
- .pl for Perl scripts, useful to process text files
Install Kaldi
Install Kaldi using Docker
Docker is a good option if you don’t want to bother with all dependencies for your machine. I am running Kaldi on MacOS for example. The image of the Kaldi ASR tookit is available on DockerHub, right here. Supposing that you have Docker installed and are signed in to pull the image, simply run:
docker pull kaldiasr/kaldi
If everything goes well, the 2.5Gb of the project will be downloaded, and you will obtain:
Status: Downloaded newer image for kaldiasr/kaldi:latest
Make sure that the image is available in your Docker images:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
kaldiasr/kaldi latest 314e2e8353b4 8 hours ago 11.5GB
Ok, you are now ready to access Kaldi by launching the container (-it stands for interactive and will give you access to a terminal window).
docker run -it kaldiasr/kaldi
If everything worked fine, you terminal should display:
root@b28f0647d1f2:/opt/kaldi#
Install Kaldi through Git
To install Kaldi through Git, you will first need to clone the project.
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream
cd kaldi
Then, go to tools:
cd tools
The file INSTALL gathers all the instructions. Check it out using Vim:
vim INSTALL
Read the fill completely, as it provides warning messages and how to solve potential issues. If everything goes well, the following commands should get you ready:
extras/check_dependencies.sh
make
What’s in Kaldi
By running ls
, the folders are:
README.txt cmd.sh conf diarization local path.sh run.sh sid steps utils
The most important directories are:
egs
, which stands for examplestools
, which contains Kaldi dependencies and setup instructionssrc
, which contains the source code
For the sake of completeness, the other directories are:
windows
to run Kaldi on Windowsmisc
which contains additional tools
Tools
Tools, apart from containing setup instructions and makefiles, also contains OpenFST, which is the library used for computing Weighted Finite State Transfucer.
Src
Src contains all the internal code needed for the various Kaldi algorithms and functionalities. All folders ending in “bin” contain executables. To check that your setup is ready, simply type:
make test
This will run tests of internal src code and let you know if there is an issue.
Egs
Egs contains examples. For example, wsj
is the famous Wall Street Journal Corpus for Speech Recognition, callhome_diarization
is a speaker diarization challenge. Each directory corresponds to a challenge for which scripts were built. Most of them require a Linguistic Data Consortium membership (LDC) but some of them are free (e.g voxforge). Read more about it in the README.txt file.
In the next article, we’ll move on to a concrete example of speaker verification.