-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
other language support? #3
Comments
Hey @zdj97, at the moment, we don't support other languages. What languages do you have in mind? Would you like to open a PR to add support for other languages ? |
@ylacombe Why did you choose g2p specifically? I had to swap it with espeak-ng phonemizer for Spanish because g2p doesn't support Spanish. Happy to push my changes later in the week. |
@ittailup, this work started as a reproduction of this research paper Natural language guidance of high-fidelity text-to-speech with synthetic annotations, which uses
|
@ittailup I am interested to fine-tuning the current model to other languages, i.e., Spanish, did you use the existing trained model and prompt tokenizer "parler-tts/parler_tts_mini_v0.1" or did you train from scratch with custom tokenizer for espeak-ng? Thank you for your insights. |
@taalua I took the mini_v0.1 checkpoint and fine tuned it with my dataset. this was my "rate_apply" (written by Claude).
|
@taalua I did not have to change the prompt at dataspeech/scripts/run_prompt_creation.py Line 317 in 8fd2dd4
I did add a nationality to the "text description" so "A man" would become "A {country_name} man", but this was a text replace after building the the initial dataspeech dataset. |
@ittailup Thank you. I appreciate your help. So the tokenizer remains the same, i.e., parler-tts/parler_tts_mini_v0.1. Does fine-tuning work well with Spanish using mini_v0.1? How much data for fine-tuning you have, and also how many epochs do you need? |
Parler gave me the best results of all the pipelines and models I had tested. Better than Piper, easier to train than pflow, vits2, styletts2. The voice quality with ~15h of speech and 39 epochs was very impressive. Even after 10k steps the quality was probably good enough to stop, we did 54k. |
Hey @ittailup, this is great to hear! |
Hi thanks, I tried your rate
Thanks I tried using your "rate_apply" (not using g2p) and finetuned using Indonesian speech dataset from Common Voice 13, and works and the result is also good although using only 1706 samples. here is the result: output.mp4 |
Yes, but the model is not stable yet, in my finding, if I use espeak backend for larger amount data, I got memory leak, so I decided to use custom phoneme module, since I only need for Indonesian language. soon after I finished the training, I will let you know. |
How is the training going? Let me know if I can help! |
This method had some issues when I use it for Tamil language. Here is the updated version
|
No description provided.
The text was updated successfully, but these errors were encountered: