[BUG] When canceling --no-gpu model loading becomes abnormally slow. #2392

aoom · 2024-08-28T14:07:14Z

~/whisper.cpp-master$ ./main -m qqml-tiny-q5_1.bin -f ./samples/jfk.wav -pc --no-gpu
whisper_init_from_file_with_params_no_state: loading model from 'qqml-tiny-q5_1.bin'
whisper_init_with_params_no_state: use gpu = 0
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 9
whisper_model_load: qntvr = 1
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 31.57 MB
whisper_model_load: model size = 31.57 MB
whisper_init_state: kv self size = 9.44 MB
whisper_init_state: kv cross size = 9.44 MB
whisper_init_state: kv pad size = 2.36 MB
whisper_init_state: compute buffer (conv) = 13.19 MB
whisper_init_state: compute buffer (encode) = 85.53 MB
whisper_init_state: compute buffer (cross) = 3.88 MB
whisper_init_state: compute buffer (decode) = 95.89 MB

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:10.560] And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: load time = 110.93 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 47.28 ms
whisper_print_timings: sample time = 131.60 ms / 131 runs ( 1.00 ms per run)
whisper_print_timings: encode time = 2595.64 ms / 1 runs ( 2595.64 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: batchd time = 609.37 ms / 129 runs ( 4.72 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 3531.42 ms
~/whisper.cpp-master$ ./main -m qqml-tiny-q5_1.bin -f ./samples/jfk.wav -pc
whisper_init_from_file_with_params_no_state: loading model from 'qqml-tiny-q5_1.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 9
whisper_model_load: qntvr = 1
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Xavier, compute capability 7.2, VMM: yes
whisper_model_load: CUDA0 total size = 31.58 MB
whisper_model_load: model size = 31.57 MB
whisper_backend_init_gpu: using CUDA backend
whisper_init_state: kv self size = 9.44 MB
whisper_init_state: kv cross size = 9.44 MB
whisper_init_state: kv pad size = 2.36 MB
whisper_init_state: compute buffer (conv) = 13.19 MB
whisper_init_state: compute buffer (encode) = 85.53 MB
whisper_init_state: compute buffer (cross) = 3.88 MB
whisper_init_state: compute buffer (decode) = 97.96 MB

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:10.580] And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: load time = 25600.27 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 32.80 ms
whisper_print_timings: sample time = 128.63 ms / 131 runs ( 0.98 ms per run)
whisper_print_timings: encode time = 1718.51 ms / 1 runs ( 1718.51 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: batchd time = 258.43 ms / 129 runs ( 2.00 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 27770.06 ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] When canceling --no-gpu model loading becomes abnormally slow. #2392

[BUG] When canceling --no-gpu model loading becomes abnormally slow. #2392

aoom commented Aug 28, 2024

[BUG] When canceling --no-gpu model loading becomes abnormally slow. #2392

[BUG] When canceling --no-gpu model loading becomes abnormally slow. #2392

Comments

aoom commented Aug 28, 2024