Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware requirements to trains Kaldi models #72

Open
OleksandrChekmez opened this issue Aug 30, 2019 · 4 comments
Open

Hardware requirements to trains Kaldi models #72

OleksandrChekmez opened this issue Aug 30, 2019 · 4 comments

Comments

@OleksandrChekmez
Copy link

Dear Guenter,
It will be very helpful to know hardware requirements, to avoid problems with lack of RAM, HDD or GPU RAM and wasting time to trying to train using not enough powerful computer.

I understand that there may be no well defined requirements, everything depends from used corpora, configs, etc.

But would you mind at least sharing your hardware spec to understand what was enough to build kaldi-generic-en-tdnn_f model. And how much time it took.
Thank you!

@pguyot
Copy link
Contributor

pguyot commented Sep 25, 2019

Most of the training (time-wise) does not actually require a GPU. The GPU is only used at the very end of the process, and the script aborts too early. I have been using an alternate script for building French models and move data to another VM for the GPU part.

The process is often CPU-bound, at least on my setup, and not always optimized for several cores.
CPU usage to train French model

Günter uses a 64GB machine but in my experience, 16GB or even 12GB can prove sufficient.
You need quite a lot of disk to store every clip in 16kHz wav format - that's 150 GB for 1200 hours - and some more to handle the conversion between formats.

For the initial French model (200 hours), it took me about a month, including working on transcripts and IPAs and French-specific adaptations. I expect the same time will be required for a newer model with 400 hours as I am running it on a larger VM (4 vCPU). Günter is using a faster box (two CPUs with 6 cores each) and reported that it took him 5-6 weeks for the English model (1200 hours).

@joazoa
Copy link

joazoa commented Oct 7, 2019

@pguyot can you share your split CPU / GPU scripts ? How many CPU cores / memory are required per GPU ? What other issues have you seen while training french models ? I found the .ipa files to have missing entries, the quality in the transcripts is wrong, the CNTRL sentence import hangs, what else can I expect and how can i help ? I can use 3 machines with 28 cores each, do you have a way to split the work over multiple pc's ?

@OleksandrChekmez, I ran the small german model on 50 hours of audio with 28 cores and 1 1080 ti card in ~24 hours. Memory usage ~16gb.

@pguyot
Copy link
Contributor

pguyot commented Oct 7, 2019

@joazoa Sorry if my previous message was unclear about CPU/GPU requirements.

I have been renting a VM with a GPU and I found out that the GPU is required too early by script
kaldi-run-chain.sh:
https://github.com/gooofy/zamia-speech/blob/master/data/src/speech/kaldi-run-chain.sh#L55

It is not used until stage 1 of train.py which is invoked in stage "11":
https://github.com/gooofy/zamia-speech/blob/master/data/src/speech/kaldi-run-chain.sh#L250

The rest is CPU or I/O-bound (mostly CPU). Too many cores can be a waste of computing power as Kaldi splits data in jobs and some jobs can prove significantly longer than others (eventually n-1 cores are waiting for a single core to finish). You can set the number of jobs as printed out from this line:
https://github.com/gooofy/zamia-speech/blob/master/data/src/speech/kaldi-run-chain.sh#L65

My script is just an adaptation of kaldi-run-chain.sh, writing snapshots after every step which allowed me to debug transcripts, IPAs and some of the scripts.

I've been working on the French model, which we may discuss on another thread. My patches may require a careful review for reproductibility and I am very glad you are trying!

Indeed, the quality flag of transcripts is ignored as verbatim are not stored in tokenized form in CSVs. This may or may not be a good idea, but does it prevent you from using the standard script?
What do you mean with ".ipa files have missing entries"? Many words from several verbatims entries are not in IPA file, yet are generated by sequituur. I tried to add as many entries as possible, especially those for which the sequituur model generated wrong pronunciations.
What do you mean with "the CNTRL sentence import hangs"? Please do not hesitate to open a ticket for this with detail, I'll look into it.

Considering parallelization on several boxes:

  • Studying dependencies between steps, few steps can be performed in parallel with others.
  • Kaldi scripts themselves are designed to run on several boxes. This mostly apply to the CPU and I/O bound parts and requires a shared network storage. It relies on GridEngine but could be adapted to other situations. I haven't tried this.
    http://kaldi-asr.org/doc/queue.html
  • Obviously, you can perform the final training for the smaller model on a GPU-equipped box and the training for the larger model on another box. However, I am not sure "train.py" itself, which takes 12-13 days on a K80 for 400 hours of French, about half of the total wall clock training time, can be parallelized on two boxes, let alone two GPUs.

@joazoa
Copy link

joazoa commented Oct 8, 2019

Hello,

I noticed that when i did a test run for german that the cpu did not get used until epoch 1of10, probably spent half a day debugging why my cuda wasn't working until i ran it a bit longer once :)

I will try and document the use of multiple gpu's and maybe slurm usage once i get that to that stage with the french model.

I will leave a comment for everything french related in the other ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants