I am currently getting to know Kaldi for my Ph.D. work. I thought that documenting the process would be interesting.

What is Kaldi?

Kaldi is a speech recognition tool written in C++, available on Github right here. The homepage of the project can be found here.

Kaldi is a tool user for many speech-related tasks, such as:

  • Automatic Speech Recogniton (ASR)
  • Speaker Verification (SV)
  • Speaker Diarization

It implements low-level efficient algorithms and makes them available to the end-user through bash and Python scripts. Kaldi is developped by Johns Hopkins University, and Idiap is a large contributor. The project started in 2009. Many ASR or speech-related companies rely today on Kaldi.

What documentation to read?

Kaldi has itself a great documentation. I’ll be presenting here some notes I took through the process of getting used to Kaldi and working on various speech tasks. You can find the documentation here.

Otherwise, it’s quite hard to find external resources on Kaldi. Some articles on Medium can help getting a general overview, like this one.

If you have questions on Kaldi, refer to the help group.

Knowledge Requirements

If you are interested in Kaldi, there are chances that you want to do some Speech Processing for a project or research. Apart from a high-level idea of the process of training an ASR or a Speaker Verification model, one should be familiar with bash scripting and Python. C++ is not necessary to get started, although good to know if you want to dive deeper in Kaldi afterwards.

Setup Requirements

Kaldi runs best on Unix environments. At Idiap, we use Debian. But I also installed it on my MacOS environment using Docker. Kaldi is computationally intensive by the nature of the jobs it will run. It is advised to work on a cluster of Linux machines on the grid, and have access to GPUs. This is however not required to get started in this article.

Kaldi will require you to install several packages. Some of them are required, such as git, wget, bash, perl, awk, grep, make… There are chances that you already have them all installed.

Kaldi will also install other softwares (OpenFst, IRSTLM, SRILM…). Refer to this page if you want to know more about what is installed.

Another option, which I will present below, is to simply rely on Docker to do the job :)

File formats you’ll encounter

In Kaldi, you will encounter many file formats, among which:

  • .sh for bash scripts
  • .py for Python scripts
  • .cc for C++ code
  • .h for header files, containing variables, functions… used by various C++ files
  • .pl for Perl scripts, useful to process text files

Install Kaldi

Install Kaldi using Docker

Docker is a good option if you don’t want to bother with all dependencies for your machine. I am running Kaldi on MacOS for example. The image of the Kaldi ASR tookit is available on DockerHub, right here. Supposing that you have Docker installed and are signed in to pull the image, simply run:

docker pull kaldiasr/kaldi

If everything goes well, the 2.5Gb of the project will be downloaded, and you will obtain:

Status: Downloaded newer image for kaldiasr/kaldi:latest

Make sure that the image is available in your Docker images:

docker images

REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
kaldiasr/kaldi       latest              314e2e8353b4        8 hours ago         11.5GB

Ok, you are now ready to access Kaldi by launching the container (-it stands for interactive and will give you access to a terminal window).

docker run -it kaldiasr/kaldi

If everything worked fine, you terminal should display:

root@b28f0647d1f2:/opt/kaldi#

Install Kaldi through Git

To install Kaldi through Git, you will first need to clone the project.

git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream
cd kaldi

Then, go to tools:

cd tools

The file INSTALL gathers all the instructions. Check it out using Vim:

vim INSTALL

Read the fill completely, as it provides warning messages and how to solve potential issues. If everything goes well, the following commands should get you ready:

extras/check_dependencies.sh
make

What’s in Kaldi

By running ls, the folders are:

README.txt  cmd.sh  conf  diarization  local  path.sh  run.sh  sid  steps  utils

The most important directories are:

  • egs, which stands for examples
  • tools, which contains Kaldi dependencies and setup instructions
  • src, which contains the source code

For the sake of completeness, the other directories are:

  • windows to run Kaldi on Windows
  • misc which contains additional tools

Tools

Tools, apart from containing setup instructions and makefiles, also contains OpenFST, which is the library used for computing Weighted Finite State Transfucer.

Src

Src contains all the internal code needed for the various Kaldi algorithms and functionalities. All folders ending in “bin” contain executables. To check that your setup is ready, simply type:

make test

This will run tests of internal src code and let you know if there is an issue.

Egs

Egs contains examples. For example, wsj is the famous Wall Street Journal Corpus for Speech Recognition, callhome_diarization is a speaker diarization challenge. Each directory corresponds to a challenge for which scripts were built. Most of them require a Linguistic Data Consortium membership (LDC) but some of them are free (e.g voxforge). Read more about it in the README.txt file.

In the next article, we’ll move on to a concrete example of speaker verification.