Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run in server mode? #376

Open
flatsiedatsie opened this issue Feb 3, 2024 · 3 comments
Open

Run in server mode? #376

flatsiedatsie opened this issue Feb 3, 2024 · 3 comments

Comments

@flatsiedatsie
Copy link

TLDR: Is there a way to keep the process running long-term, and "instantly" generate output whenever it receives text input?

I've been following the Rhasspy project for a long time, and I'm digging the Piper project. Great work! I'm integrating it into Voco, a voice control plugin for the Candle Controller (and Webthings Gateway).

One thing I'm running into is: whenever I generate text the model takes a second to load. This is a precious second.

To combat this I tried implementing JSON input, being under the false assumption that doing so would allow me to pipe text into Piper running as a Python Subprocess once in a while, on demand. I was hoping the model would stay loaded that way, so that audio generation could start as soon as possible.

Unfortunately, after piping text into the process it generates the audio, and then stops. I then have to restart Piper, which in the case of multiple sentences needing to be spoken in a row generates a second delay between each sentence.

Is there a way to keep the process running long-term, and "instantly" generate output whenever it receives text input?

This would have some other small advantages too.

  • Instead of checking if there is enough memory to run Piper beforehand, which the code does now, the memory would only really need to be allocated once. This would make it more predictable / stable when my plugin uses the nice Piper voice, or when it has to fall back to nanoTTS for lack of free memory. This would also make it easier for users to predict if they have enough free memory to install other plugins.
  • Voco has two other LLM parts that already operate in such a 'server mode': Whisper for TTS, and Llamafile for the actual local chat assistant. Having all three processes run in such a server mode could make it attractive to code a single system for managing these long running processes (and restarting them if they crash, for example).

Even faster
A related question: the LLM assistant generates output on a word-for-word basis. Currently I wait for a full sentence to be complete before I send it to Piper. Since (on a Raspberry Pi 5) the assistant generates text faster than Piper can speak ik, would it be possible to have a mode where Piper already starts generating speech if it has a buffer of just, say, 3 words? That might shave another second off the response time.

@odurc
Copy link

odurc commented Mar 7, 2024

TLDR: Is there a way to keep the process running long-term, and "instantly" generate output whenever it receives text input?

If you are running on a Linux machine (possibly works on MacOS too), you can create a FIFO special file and pass it to piper using the raw mode. Then open this file via Python to write data on it.

# run once on a terminal
mkfifo /tmp/piper-fifo
./piper --model en_US-danny-low.onnx --output-raw < /tmp/piper-fifo | aplay -q -D "default:USB" -r 16000 -f S16_LE -t raw -
# open another terminal and start python
import os
fd = os.open('/tmp/piper-fifo', os.O_WRONLY)
os.write(fd, b'hello from python\n')

Note that piper will be blocked until the FIFO is open for writing by another process (i.e. python in this case).

@flatsiedatsie
Copy link
Author

Yes, I looked into that as well. Thanks for the suggestion.

In the end I created a modified version of Piper that has the option to run in a loop. It's working great:

#378

@IanEdington
Copy link

I'm currently using piper with my "select and speak" workflow on linux. I moved to a lower quality voice because it was taking a long time to start talking with the larger models. I suspect, although I didn't measure, that the initial load time is largely due to loading the initial model since there is no lag in speaking once it starts.

triggered using a hotkey:

#! /bin/bash

tts_pid=$(pidof piper)

voice="en_US-libritts_r-medium.onnx"

if [ -z "$tts_pid"]
then
    xclip -out -selection primary | \
        tr "\n" " " | \
        ~/.dotfiles/packages/piper/piper \
            --model "$HOME/.dotfiles/packages/piper-models/$voice" \
            --output-raw \
            --length_scale 0.4 \
            --sentence_silence 0.1 | \
        aplay -r 22050 -f S16_LE -t raw -
else
    kill $tts_pid
fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants