GitHub - WesleyFister/llm-voice-assistant: Carry a spoken conversation with LLM models through the use of Whisper speech to text and Piper text to speech!

Carry a spoken conversation with large language models! This project uses Whisper speech to text to transcribe the user's voice, send it to the LLM and pipe the final result to Piper text to speech. This project uses a socket server to get audio from a client device and sends it to the server to do all the processing. The final TTS output from the LLM is sent to the client device.

WIP Warning

This is very much a work in progress. Many basic features have yet to be implemented.

Install

This program only works on Linux Ubuntu/Debian and Arch systems.

Run 'setup.sh' script for both the client and server side. Next run 'start.sh' for the server first followed by the client next.

Features

100% offline, opensource and private
Wake word detection: 'Hey Jarvis'
Hands free interation
Client server model
Fully multilingual pipeline
Streamed reponses

Configuring

In both the client and server configuring is done by opening 'start.sh' and passing in the corresponding flag to 'main.py' like so. python3 main.py --ip-address 192.168.1.123 --port 5432

Multilinguality

The language has to be supported by STT (Whisper languages), the LLM (Llama3.1 languages)) and TTS (Piper languages).

This means by default the application supports the following languages: English, German, French, Italian, Portuguese, and Spanish. However, more languages can easily be added by changing the large language model used such as Mistral Small.

The output text generated by the LLM is chunked into sentences and ran aganist Lingua to detect the language. A TTS model for that language is then downloaded and cached for future use. This unfortunately means that the first time a language is used the time it takes to generate a response back to the user will be exetremely slow. This is done to save on memory as loading all of the TTS models can take ~2.5GB of memory.

Todo

Cancel audio playback when saying the wake word 'hey Jarvis'.
Properly close the server with CTRL + C.
Make a proper setup.sh for both the client and server.
Clear LLM chat history by saying some variation of 'hey Jarvis clear chat history'.
Allow multiple clients to connect to the server.
Clean up code.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WIP Warning

Install

Features

Configuring

Multilinguality

Todo

About

Releases

Packages

Languages

License

WesleyFister/llm-voice-assistant

Folders and files

Latest commit

History

Repository files navigation

WIP Warning

Install

Features

Configuring

Multilinguality

Todo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages