Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS Response is clipped at the beginning #122

Open
mdvickst opened this issue Feb 22, 2024 · 5 comments
Open

TTS Response is clipped at the beginning #122

mdvickst opened this issue Feb 22, 2024 · 5 comments

Comments

@mdvickst
Copy link

I've got Wyoming Satellite running on an Ubuntu VM (Proxmox) with a USB speakerphone connected for mic/speaker and when it plays back the TTS Response the first 1-2 seconds is cutoff. Awake and Done wav sounds work as expected.

Satellite Service:

[Unit]
Description=Wyoming Satellite
After=multi-user.target

[Service]
WorkingDirectory=/home/satellite/wyoming-satellite
ExecStart=/usr/bin/env python3 script/run   --name 'my satellite'   --uri 'tcp://0.0.0.0:10700'   --mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw'   --snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw'   --wake-uri 'tcp://127.0.0.1:10400'   --wake-word-name 'hey_jarvis' --done-wav 'awake.wav'
Type=simple
Restart=always
RestartSec=1

[Install]
WantedBy=multi-user.target

Local Wake word service:

[Unit]
Description=Start OpenWakeWord Service
After=multi-user.target

[Service]
WorkingDirectory=/home/satellite/wyoming-openwakeword
ExecStart=/usr/bin/env python3 script/run --uri 'tcp://0.0.0.0:10400' --preload-model 'hey_jarvis' --threshold .99
Type=simple

[Install]
WantedBy=multi-user.target
@mdvickst
Copy link
Author

Here's a sample where the response was a simple "done" and nothing was played.

stage: done
run:
  pipeline: 01gznrs9cwqteanxeccwr64hev
  language: en
events:
  - type: run-start
    data:
      pipeline: 01gznrs9cwqteanxeccwr64hev
      language: en
    timestamp: "2024-02-22T13:30:54.760654+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-02-22T13:30:54.760748+00:00"
  - type: stt-vad-start
    data:
      timestamp: 325
    timestamp: "2024-02-22T13:30:55.459661+00:00"
  - type: stt-vad-end
    data:
      timestamp: 1485
    timestamp: "2024-02-22T13:30:57.765833+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Raise Girls Room shade.
    timestamp: "2024-02-22T13:30:57.926834+00:00"
  - type: intent-start
    data:
      engine: homeassistant
      language: en
      intent_input: Raise Girls Room shade.
      conversation_id: null
      device_id: 42a86d70378853b7a345e4b8bd136800
    timestamp: "2024-02-22T13:30:57.926956+00:00"
  - type: intent-end
    data:
      intent_output:
        response:
          speech:
            plain:
              speech: Opened
              extra_data: null
          card: {}
          language: en
          response_type: action_done
          data:
            targets: []
            success:
              - name: Girls Room Shade
                type: entity
                id: cover.girls_room_shade
            failed: []
        conversation_id: null
    timestamp: "2024-02-22T13:30:57.952652+00:00"
  - type: tts-start
    data:
      engine: tts.home_assistant_cloud
      language: en-GB
      voice: EthanNeural
      tts_input: Opened
    timestamp: "2024-02-22T13:30:57.952700+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/tts.home_assistant_cloud?message=Opened&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
        url: >-
          /api/tts_proxy/c4f1f5b1d49f90d5437402166829d6b471bf1593_en-gb_35edc9ddc9_tts.home_assistant_cloud.wav
        mime_type: audio/x-wav
    timestamp: "2024-02-22T13:30:57.953188+00:00"
  - type: run-end
    data: null
    timestamp: "2024-02-22T13:30:57.953247+00:00"
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: Raise Girls Room shade.
intent:
  engine: homeassistant
  language: en
  intent_input: Raise Girls Room shade.
  conversation_id: null
  device_id: 42a86d70378853b7a345e4b8bd136800
  done: true
  intent_output:
    response:
      speech:
        plain:
          speech: Opened
          extra_data: null
      card: {}
      language: en
      response_type: action_done
      data:
        targets: []
        success:
          - name: Girls Room Shade
            type: entity
            id: cover.girls_room_shade
        failed: []
    conversation_id: null
tts:
  engine: tts.home_assistant_cloud
  language: en-GB
  voice: EthanNeural
  tts_input: Opened
  done: true
  tts_output:
    media_id: >-
      media-source://tts/tts.home_assistant_cloud?message=Opened&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
    url: >-
      /api/tts_proxy/c4f1f5b1d49f90d5437402166829d6b471bf1593_en-gb_35edc9ddc9_tts.home_assistant_cloud.wav
    mime_type: audio/x-wav

@mdvickst
Copy link
Author

And here is another with a longer response where I just heard "rned off the lights"

stage: done
run:
  pipeline: 01gznrs9cwqteanxeccwr64hev
  language: en
events:
  - type: run-start
    data:
      pipeline: 01gznrs9cwqteanxeccwr64hev
      language: en
    timestamp: "2024-02-22T13:33:42.117326+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-02-22T13:33:42.117494+00:00"
  - type: stt-vad-start
    data:
      timestamp: 275
    timestamp: "2024-02-22T13:33:42.688250+00:00"
  - type: stt-vad-end
    data:
      timestamp: 1125
    timestamp: "2024-02-22T13:33:44.417176+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Turn off living room lights.
    timestamp: "2024-02-22T13:33:44.591799+00:00"
  - type: intent-start
    data:
      engine: homeassistant
      language: en
      intent_input: Turn off living room lights.
      conversation_id: null
      device_id: 42a86d70378853b7a345e4b8bd136800
    timestamp: "2024-02-22T13:33:44.591861+00:00"
  - type: intent-end
    data:
      intent_output:
        response:
          speech:
            plain:
              speech: Turned off the lights
              extra_data: null
          card: {}
          language: en
          response_type: action_done
          data:
            targets: []
            success:
              - name: Living Room
                type: area
                id: 86726e558f304c699f0015d0f229a901
              - name: Living Room Can Lights Basic
                type: entity
                id: light.living_room_can_lights_basic
              - name: "Living Room Can Lights "
                type: entity
                id: light.living_room_can_lights
            failed: []
        conversation_id: null
    timestamp: "2024-02-22T13:33:44.736401+00:00"
  - type: tts-start
    data:
      engine: cloud
      language: en-GB
      voice: EthanNeural
      tts_input: Turned off the lights
    timestamp: "2024-02-22T13:33:44.736437+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/cloud?message=Turned+off+the+lights&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
        url: >-
          /api/tts_proxy/85d43b448ab715eae17c0361864a34ff749eb14a_en-gb_35edc9ddc9_cloud.wav
        mime_type: audio/x-wav
    timestamp: "2024-02-22T13:33:44.736757+00:00"
  - type: run-end
    data: null
    timestamp: "2024-02-22T13:33:44.736789+00:00"
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: Turn off living room lights.
intent:
  engine: homeassistant
  language: en
  intent_input: Turn off living room lights.
  conversation_id: null
  device_id: 42a86d70378853b7a345e4b8bd136800
  done: true
  intent_output:
    response:
      speech:
        plain:
          speech: Turned off the lights
          extra_data: null
      card: {}
      language: en
      response_type: action_done
      data:
        targets: []
        success:
          - name: Living Room
            type: area
            id: 86726e558f304c699f0015d0f229a901
          - name: Living Room Can Lights Basic
            type: entity
            id: light.living_room_can_lights_basic
          - name: "Living Room Can Lights "
            type: entity
            id: light.living_room_can_lights
        failed: []
    conversation_id: null
tts:
  engine: cloud
  language: en-GB
  voice: EthanNeural
  tts_input: Turned off the lights
  done: true
  tts_output:
    media_id: >-
      media-source://tts/cloud?message=Turned+off+the+lights&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
    url: >-
      /api/tts_proxy/85d43b448ab715eae17c0361864a34ff749eb14a_en-gb_35edc9ddc9_cloud.wav
    mime_type: audio/x-wav

@khalob
Copy link

khalob commented Feb 25, 2024

Try looking if lowering/toggling off this setting helps you:
#121

I had a similar issue

@motoridersd
Copy link

Considering doing this in a Proxmox box. Were you able to resolve the issue? Has it been working well for you?

@regnighc
Copy link

regnighc commented Jul 8, 2024

I'm also having this issue, and the suggestion at #121 didnt resolve it for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants