Adds support for precomputing conditioning latents to xtts for repeated inference on the same reference wavs for significant performance gains. #2956

Iamgoofball · 2023-09-16T21:30:04Z

Accidentally includes #2951 because I was lazy.

To use:

xtts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True)
xtts.synthesizer.tts_model.precompute_conditioning_latents("./path_to_wavs")

...

xtts.tts_to_file(
	text, 
	file_path="./path_to_output.wav", 
	speaker_wav="./xtts_ref_wavs/ref_speaker.wav", 
	language="en"
	precomputed_latents = True)

During benchmarking of the inference stack I found that like 2/3rds of the inference time was precomputation of the GPT latents. This is bad when you're working with fixed reference audio.

Without precomputed latents:

With precomputed latents:

…ed inference on the same reference wavs for significant performance gains.

CLAassistant · 2023-09-16T21:30:09Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

skshadan · 2023-09-18T07:45:24Z

pls provide full code.

Iamgoofball · 2023-09-18T20:05:53Z

this is the full code for this feature

skshadan · 2023-09-18T20:17:42Z

I used it still my inference time is same:(

Iamgoofball · 2023-09-18T20:56:24Z

i recommend doing some local benchmarking

skshadan · 2023-09-20T05:34:27Z

locally it is taking too much time:((

skshadan · 2023-09-20T05:44:35Z

PranjalyaDS · 2023-09-26T11:17:12Z

I have noticed that the latent computation is the longest during the first run, and the subsequent latent computations are taking time in a miniscule format, even if different reference audio files are used. I am not sure of the reason, the most obvious could be that the GPT and Diffusion latent generators are not lazy loaded, most probably?

stale · 2023-10-28T17:40:11Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Adds support for precomputing conditioning latents to xtts for repeat…

472aaeb

…ed inference on the same reference wavs for significant performance gains.

stale bot added the wontfix This will not be worked on but feel free to help. label Oct 28, 2023

stale bot closed this Nov 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds support for precomputing conditioning latents to xtts for repeated inference on the same reference wavs for significant performance gains. #2956

Adds support for precomputing conditioning latents to xtts for repeated inference on the same reference wavs for significant performance gains. #2956

Iamgoofball commented Sep 16, 2023 •

edited

Loading

CLAassistant commented Sep 16, 2023

skshadan commented Sep 18, 2023

Iamgoofball commented Sep 18, 2023

skshadan commented Sep 18, 2023

Iamgoofball commented Sep 18, 2023

skshadan commented Sep 20, 2023

skshadan commented Sep 20, 2023

PranjalyaDS commented Sep 26, 2023

stale bot commented Oct 28, 2023

Adds support for precomputing conditioning latents to xtts for repeated inference on the same reference wavs for significant performance gains. #2956

Adds support for precomputing conditioning latents to xtts for repeated inference on the same reference wavs for significant performance gains. #2956

Conversation

Iamgoofball commented Sep 16, 2023 • edited Loading

CLAassistant commented Sep 16, 2023

skshadan commented Sep 18, 2023

Iamgoofball commented Sep 18, 2023

skshadan commented Sep 18, 2023

Iamgoofball commented Sep 18, 2023

skshadan commented Sep 20, 2023

skshadan commented Sep 20, 2023

PranjalyaDS commented Sep 26, 2023

stale bot commented Oct 28, 2023

Iamgoofball commented Sep 16, 2023 •

edited

Loading