Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS #2596

dmingke · 2024-09-19T11:43:22Z

Bug Description

This issue occurs when using the text_stream_sample with the zh-CN-YunxiaNeural voice model, resulting in unintended pauses between words, which disrupts the natural flow of the speech. For instance, when synthesizing the sentence "今天的天气真好", there is a noticeable and unnatural pause between the words "今" and "天", affecting the overall quality of the output. This behavior might be related to how the OpenAI-generated text is processed in chunks, leading to these pauses during real-time synthesis. I am looking for ways to prevent this from happening.

Steps to Reproduce

Use the framework of the text_stream_sample in the repo.
Set the speech_synthesis_voice_name to zh-CN-YunxiaNeural.
Send the synthesized audio chunks (using audio_buffer.tobytes()) through the WebSocket and play the PCM audio data on the client-side.
Notice that the longer the response, the more frequently unnatural pauses between words occur.

Expected Behavior

The synthesized speech should flow smoothly, with no unintended pauses between words unless indicated by appropriate punctuation. Each sentence should be delivered naturally and continuously.

Version of the Cognitive Services Speech SDK

SDK Version: azure-cognitiveservices-speech==1.40.0
Programming Language: Python 3.x

Additional Context

No SSML was used, only plain text input.
The issue is most prominent with Chinese voice models. I have tested both zh-CN-YunxiaNeural and zh-CN-YunxiNeural, and the issue seems consistent across them.
The issue does not seem to occur with English voice models, such as en-US-BrianMultilingualNeural.

The text was updated successfully, but these errors were encountered:

yulin-li · 2024-09-24T00:17:23Z

Thanks for reporting this issue.

@niuzheng168 could you check?

github-actions · 2024-10-14T02:24:49Z

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

dmingke · 2024-10-16T06:43:32Z

hi guys, can anyone have a look at this problem?

dmingke changed the title ~~Unintended Pauses Between Words using text_stream_sample.py for Chinese TTS~~ Unintended Pauses Between Words When Using text_stream_sample.py for Chinese TTS Sep 19, 2024

dmingke changed the title ~~Unintended Pauses Between Words When Using text_stream_sample.py for Chinese TTS~~ Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS Sep 19, 2024

github-actions bot added the update needed For items that are in progress but have not been updated label Oct 14, 2024

github-actions bot removed the update needed For items that are in progress but have not been updated label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS #2596

Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS #2596

dmingke commented Sep 19, 2024 •

edited

Loading

yulin-li commented Sep 24, 2024

github-actions bot commented Oct 14, 2024

dmingke commented Oct 16, 2024

Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS #2596

Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS #2596

Comments

dmingke commented Sep 19, 2024 • edited Loading

Bug Description

Steps to Reproduce

Expected Behavior

Version of the Cognitive Services Speech SDK

Additional Context

yulin-li commented Sep 24, 2024

github-actions bot commented Oct 14, 2024

dmingke commented Oct 16, 2024

dmingke commented Sep 19, 2024 •

edited

Loading