I am currently getting to know Kaldi for my Ph.D. work. I thought that documenting the process would be interesting.
What is Kaldi?
Kaldi is a tool user for many speech-related tasks, such as:
- Automatic Speech Recogniton (ASR)
- Speaker Verification (SV)
- Speaker Diarization
It implements low-level efficient algorithms and makes them available to the end-user through bash and Python scripts. Kaldi is developped by Johns Hopkins University, and Idiap is a large contributor. The project started in 2009. Many ASR or speech-related companies rely today on Kaldi.
What documentation to read?
Kaldi has itself a great documentation. I’ll be presenting here some notes I took through the process of getting used to Kaldi and working on various speech tasks. You can find the documentation here.
Otherwise, it’s quite hard to find external resources on Kaldi. Some articles on Medium can help getting a general overview, like this one.
If you have questions on Kaldi, refer to the help group.
If you are interested in Kaldi, there are chances that you want to do some Speech Processing for a project or research. Apart from a high-level idea of the process of training an ASR or a Speaker Verification model, one should be familiar with bash scripting and Python. C++ is not necessary to get started, although good to know if you want to dive deeper in Kaldi afterwards.
Kaldi runs best on Unix environments. At Idiap, we use Debian. But I also installed it on my MacOS environment using Docker. Kaldi is computationally intensive by the nature of the jobs it will run. It is advised to work on a cluster of Linux machines on the grid, and have access to GPUs. This is however not required to get started in this article.
Kaldi will require you to install several packages. Some of them are required, such as git, wget, bash, perl, awk, grep, make… There are chances that you already have them all installed.
Kaldi will also install other softwares (OpenFst, IRSTLM, SRILM…). Refer to this page if you want to know more about what is installed.
Another option, which I will present below, is to simply rely on Docker to do the job :)
File formats you’ll encounter
In Kaldi, you will encounter many file formats, among which:
- .sh for bash scripts
- .py for Python scripts
- .cc for C++ code
- .h for header files, containing variables, functions… used by various C++ files
- .pl for Perl scripts, useful to process text files
Install Kaldi using Docker
Docker is a good option if you don’t want to bother with all dependencies for your machine. I am running Kaldi on MacOS for example. The image of the Kaldi ASR tookit is available on DockerHub, right here. Supposing that you have Docker installed and are signed in to pull the image, simply run:
docker pull kaldiasr/kaldi
If everything goes well, the 2.5Gb of the project will be downloaded, and you will obtain:
Status: Downloaded newer image for kaldiasr/kaldi:latest
Make sure that the image is available in your Docker images:
docker images REPOSITORY TAG IMAGE ID CREATED SIZE kaldiasr/kaldi latest 314e2e8353b4 8 hours ago 11.5GB
Ok, you are now ready to access Kaldi by launching the container (-it stands for interactive and will give you access to a terminal window).
docker run -it kaldiasr/kaldi
If everything worked fine, you terminal should display:
Install Kaldi through Git
To install Kaldi through Git, you will first need to clone the project.
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream cd kaldi
Then, go to tools:
The file INSTALL gathers all the instructions. Check it out using Vim:
Read the fill completely, as it provides warning messages and how to solve potential issues. If everything goes well, the following commands should get you ready:
What’s in Kaldi
ls, the folders are:
README.txt cmd.sh conf diarization local path.sh run.sh sid steps utils
The most important directories are:
egs, which stands for examples
tools, which contains Kaldi dependencies and setup instructions
src, which contains the source code
For the sake of completeness, the other directories are:
windowsto run Kaldi on Windows
miscwhich contains additional tools
Tools, apart from containing setup instructions and makefiles, also contains OpenFST, which is the library used for computing Weighted Finite State Transfucer.
Src contains all the internal code needed for the various Kaldi algorithms and functionalities. All folders ending in “bin” contain executables. To check that your setup is ready, simply type:
This will run tests of internal src code and let you know if there is an issue.
Egs contains examples. For example,
wsj is the famous Wall Street Journal Corpus for Speech Recognition,
callhome_diarization is a speaker diarization challenge. Each directory corresponds to a challenge for which scripts were built. Most of them require a Linguistic Data Consortium membership (LDC) but some of them are free (e.g voxforge). Read more about it in the README.txt file.
In the next article, we’ll move on to a concrete example of speaker verification.
Like it? Buy me a coffee