TTS Response is clipped at the beginning #122

mdvickst · 2024-02-22T13:29:03Z

I've got Wyoming Satellite running on an Ubuntu VM (Proxmox) with a USB speakerphone connected for mic/speaker and when it plays back the TTS Response the first 1-2 seconds is cutoff. Awake and Done wav sounds work as expected.

Satellite Service:

[Unit]
Description=Wyoming Satellite
After=multi-user.target

[Service]
WorkingDirectory=/home/satellite/wyoming-satellite
ExecStart=/usr/bin/env python3 script/run   --name 'my satellite'   --uri 'tcp://0.0.0.0:10700'   --mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw'   --snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw'   --wake-uri 'tcp://127.0.0.1:10400'   --wake-word-name 'hey_jarvis' --done-wav 'awake.wav'
Type=simple
Restart=always
RestartSec=1

[Install]
WantedBy=multi-user.target

Local Wake word service:

[Unit]
Description=Start OpenWakeWord Service
After=multi-user.target

[Service]
WorkingDirectory=/home/satellite/wyoming-openwakeword
ExecStart=/usr/bin/env python3 script/run --uri 'tcp://0.0.0.0:10400' --preload-model 'hey_jarvis' --threshold .99
Type=simple

[Install]
WantedBy=multi-user.target

The text was updated successfully, but these errors were encountered:

mdvickst · 2024-02-22T13:33:15Z

Here's a sample where the response was a simple "done" and nothing was played.

stage: done
run:
  pipeline: 01gznrs9cwqteanxeccwr64hev
  language: en
events:
  - type: run-start
    data:
      pipeline: 01gznrs9cwqteanxeccwr64hev
      language: en
    timestamp: "2024-02-22T13:30:54.760654+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-02-22T13:30:54.760748+00:00"
  - type: stt-vad-start
    data:
      timestamp: 325
    timestamp: "2024-02-22T13:30:55.459661+00:00"
  - type: stt-vad-end
    data:
      timestamp: 1485
    timestamp: "2024-02-22T13:30:57.765833+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Raise Girls Room shade.
    timestamp: "2024-02-22T13:30:57.926834+00:00"
  - type: intent-start
    data:
      engine: homeassistant
      language: en
      intent_input: Raise Girls Room shade.
      conversation_id: null
      device_id: 42a86d70378853b7a345e4b8bd136800
    timestamp: "2024-02-22T13:30:57.926956+00:00"
  - type: intent-end
    data:
      intent_output:
        response:
          speech:
            plain:
              speech: Opened
              extra_data: null
          card: {}
          language: en
          response_type: action_done
          data:
            targets: []
            success:
              - name: Girls Room Shade
                type: entity
                id: cover.girls_room_shade
            failed: []
        conversation_id: null
    timestamp: "2024-02-22T13:30:57.952652+00:00"
  - type: tts-start
    data:
      engine: tts.home_assistant_cloud
      language: en-GB
      voice: EthanNeural
      tts_input: Opened
    timestamp: "2024-02-22T13:30:57.952700+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/tts.home_assistant_cloud?message=Opened&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
        url: >-
          /api/tts_proxy/c4f1f5b1d49f90d5437402166829d6b471bf1593_en-gb_35edc9ddc9_tts.home_assistant_cloud.wav
        mime_type: audio/x-wav
    timestamp: "2024-02-22T13:30:57.953188+00:00"
  - type: run-end
    data: null
    timestamp: "2024-02-22T13:30:57.953247+00:00"
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: Raise Girls Room shade.
intent:
  engine: homeassistant
  language: en
  intent_input: Raise Girls Room shade.
  conversation_id: null
  device_id: 42a86d70378853b7a345e4b8bd136800
  done: true
  intent_output:
    response:
      speech:
        plain:
          speech: Opened
          extra_data: null
      card: {}
      language: en
      response_type: action_done
      data:
        targets: []
        success:
          - name: Girls Room Shade
            type: entity
            id: cover.girls_room_shade
        failed: []
    conversation_id: null
tts:
  engine: tts.home_assistant_cloud
  language: en-GB
  voice: EthanNeural
  tts_input: Opened
  done: true
  tts_output:
    media_id: >-
      media-source://tts/tts.home_assistant_cloud?message=Opened&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
    url: >-
      /api/tts_proxy/c4f1f5b1d49f90d5437402166829d6b471bf1593_en-gb_35edc9ddc9_tts.home_assistant_cloud.wav
    mime_type: audio/x-wav

mdvickst · 2024-02-22T13:34:40Z

And here is another with a longer response where I just heard "rned off the lights"

stage: done
run:
  pipeline: 01gznrs9cwqteanxeccwr64hev
  language: en
events:
  - type: run-start
    data:
      pipeline: 01gznrs9cwqteanxeccwr64hev
      language: en
    timestamp: "2024-02-22T13:33:42.117326+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-02-22T13:33:42.117494+00:00"
  - type: stt-vad-start
    data:
      timestamp: 275
    timestamp: "2024-02-22T13:33:42.688250+00:00"
  - type: stt-vad-end
    data:
      timestamp: 1125
    timestamp: "2024-02-22T13:33:44.417176+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Turn off living room lights.
    timestamp: "2024-02-22T13:33:44.591799+00:00"
  - type: intent-start
    data:
      engine: homeassistant
      language: en
      intent_input: Turn off living room lights.
      conversation_id: null
      device_id: 42a86d70378853b7a345e4b8bd136800
    timestamp: "2024-02-22T13:33:44.591861+00:00"
  - type: intent-end
    data:
      intent_output:
        response:
          speech:
            plain:
              speech: Turned off the lights
              extra_data: null
          card: {}
          language: en
          response_type: action_done
          data:
            targets: []
            success:
              - name: Living Room
                type: area
                id: 86726e558f304c699f0015d0f229a901
              - name: Living Room Can Lights Basic
                type: entity
                id: light.living_room_can_lights_basic
              - name: "Living Room Can Lights "
                type: entity
                id: light.living_room_can_lights
            failed: []
        conversation_id: null
    timestamp: "2024-02-22T13:33:44.736401+00:00"
  - type: tts-start
    data:
      engine: cloud
      language: en-GB
      voice: EthanNeural
      tts_input: Turned off the lights
    timestamp: "2024-02-22T13:33:44.736437+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/cloud?message=Turned+off+the+lights&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
        url: >-
          /api/tts_proxy/85d43b448ab715eae17c0361864a34ff749eb14a_en-gb_35edc9ddc9_cloud.wav
        mime_type: audio/x-wav
    timestamp: "2024-02-22T13:33:44.736757+00:00"
  - type: run-end
    data: null
    timestamp: "2024-02-22T13:33:44.736789+00:00"
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: Turn off living room lights.
intent:
  engine: homeassistant
  language: en
  intent_input: Turn off living room lights.
  conversation_id: null
  device_id: 42a86d70378853b7a345e4b8bd136800
  done: true
  intent_output:
    response:
      speech:
        plain:
          speech: Turned off the lights
          extra_data: null
      card: {}
      language: en
      response_type: action_done
      data:
        targets: []
        success:
          - name: Living Room
            type: area
            id: 86726e558f304c699f0015d0f229a901
          - name: Living Room Can Lights Basic
            type: entity
            id: light.living_room_can_lights_basic
          - name: "Living Room Can Lights "
            type: entity
            id: light.living_room_can_lights
        failed: []
    conversation_id: null
tts:
  engine: cloud
  language: en-GB
  voice: EthanNeural
  tts_input: Turned off the lights
  done: true
  tts_output:
    media_id: >-
      media-source://tts/cloud?message=Turned+off+the+lights&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
    url: >-
      /api/tts_proxy/85d43b448ab715eae17c0361864a34ff749eb14a_en-gb_35edc9ddc9_cloud.wav
    mime_type: audio/x-wav

khalob · 2024-02-25T17:02:56Z

Try looking if lowering/toggling off this setting helps you:
#121

I had a similar issue

motoridersd · 2024-07-03T22:48:23Z

Considering doing this in a Proxmox box. Were you able to resolve the issue? Has it been working well for you?

regnighc · 2024-07-08T12:47:29Z

I'm also having this issue, and the suggestion at #121 didnt resolve it for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS Response is clipped at the beginning #122

TTS Response is clipped at the beginning #122

mdvickst commented Feb 22, 2024

mdvickst commented Feb 22, 2024

mdvickst commented Feb 22, 2024

khalob commented Feb 25, 2024

motoridersd commented Jul 3, 2024

regnighc commented Jul 8, 2024 •

edited

Loading

TTS Response is clipped at the beginning #122

TTS Response is clipped at the beginning #122

Comments

mdvickst commented Feb 22, 2024

mdvickst commented Feb 22, 2024

mdvickst commented Feb 22, 2024

khalob commented Feb 25, 2024

motoridersd commented Jul 3, 2024

regnighc commented Jul 8, 2024 • edited Loading

regnighc commented Jul 8, 2024 •

edited

Loading