Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaces before SSML tags cause internal server error for Azure TTS Instances #2635

Open
JorySchossau opened this issue Oct 18, 2024 · 0 comments

Comments

@JorySchossau
Copy link

Log File: log.txt

Describe the bug

Putting spaces before the prosody tag in SSML and using this python API causes an internal server error. However, Speech Studio processes this correctly.

this will error
   <prosody contour="(1%, +19%) (45%, -12%) (100%, -36%)">
   TITLE TEXT
   </prosody>
this is ok
<prosody contour="(1%, +19%) (45%, -12%) (100%, -36%)">
TITLE TEXT
</prosody>

To Reproduce

Here is the input SSML that causes the problem. Remove leading spaces from tags to fix the problem.

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
    <voice name="en-US-ShimmerMultilingualNeural">
 
    Lorem Ipsum and Dolor Sit
 
Lorem ipsum dolor sit amet! Consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 
Duis aute irure dolor in reprehenderit, voluptate velit esse cillum dolore, eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
 
In consequat, velit esse cillum, dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.  
 
 
<break time="800ms"/>
    <prosody contour="(1%, +19%) (45%, -12%) (100%, -36%)">
        A TEMPOR INCIDIDUNT
    </prosody>
<break time="800ms"/>
 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
 
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
 
Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua: quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 
 
    </voice>
</speak>

Expected behavior

Rendering the prosody tag section.

What happened

Rendering up until near the prosody tag section.

Version of the Cognitive Services Speech SDK

1.40.0

Platform, Operating System, and Programming Language

  • OS: Windows WSL 2.0
  • Hardware - x64
  • Programming language: Python

Additional context

  • attached log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant