Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS #2596

Open
dmingke opened this issue Sep 19, 2024 · 3 comments

Comments

@dmingke
Copy link

dmingke commented Sep 19, 2024

Bug Description

This issue occurs when using the text_stream_sample with the zh-CN-YunxiaNeural voice model, resulting in unintended pauses between words, which disrupts the natural flow of the speech. For instance, when synthesizing the sentence "今天的天气真好", there is a noticeable and unnatural pause between the words "今" and "天", affecting the overall quality of the output. This behavior might be related to how the OpenAI-generated text is processed in chunks, leading to these pauses during real-time synthesis. I am looking for ways to prevent this from happening.

Steps to Reproduce

  1. Use the framework of the text_stream_sample in the repo.
  2. Set the speech_synthesis_voice_name to zh-CN-YunxiaNeural.
  3. Send the synthesized audio chunks (using audio_buffer.tobytes()) through the WebSocket and play the PCM audio data on the client-side.
  4. Notice that the longer the response, the more frequently unnatural pauses between words occur.

Expected Behavior

The synthesized speech should flow smoothly, with no unintended pauses between words unless indicated by appropriate punctuation. Each sentence should be delivered naturally and continuously.

Version of the Cognitive Services Speech SDK

  • SDK Version: azure-cognitiveservices-speech==1.40.0
  • Programming Language: Python 3.x

Additional Context

  • No SSML was used, only plain text input.
  • The issue is most prominent with Chinese voice models. I have tested both zh-CN-YunxiaNeural and zh-CN-YunxiNeural, and the issue seems consistent across them.
  • The issue does not seem to occur with English voice models, such as en-US-BrianMultilingualNeural.
@dmingke dmingke changed the title Unintended Pauses Between Words using text_stream_sample.py for Chinese TTS Unintended Pauses Between Words When Using text_stream_sample.py for Chinese TTS Sep 19, 2024
@dmingke dmingke changed the title Unintended Pauses Between Words When Using text_stream_sample.py for Chinese TTS Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS Sep 19, 2024
@yulin-li
Copy link
Contributor

Thanks for reporting this issue.

@niuzheng168 could you check?

Copy link

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

@github-actions github-actions bot added the update needed For items that are in progress but have not been updated label Oct 14, 2024
@dmingke
Copy link
Author

dmingke commented Oct 16, 2024

hi guys, can anyone have a look at this problem?

@github-actions github-actions bot removed the update needed For items that are in progress but have not been updated label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants