ElevenLabs WebSocket API Sample

A Python sample showcasing a complete server-side integration w/ the ElevenLabs WebSockets API.

Running

Setup

0) clone + cd:

git clone https://github.com/bephrem1/elevenlabs-websockets.git
cd elevenlabs-websockets

1) install dependencies:

pip install -r requirements.txt

2) create .env file:

create a file called .env at the root of the project w/ 1 key:

ELEVENLABS_API_KEY = api_key_here

Run

3) run in terminal

python3 src/testing/voicebox.py

you should see something like this:

Clocking Times: elapsed time is clocked for a few critical events

initial socket connection: websocket connection to ElevenLabs (usually takes 150-250ms) — this overhead exists on every TTS generation since connections have to be reestablished every generation (& the websocket handshake has to be redone).
fgl: stands for "first generation latency", this is the time between when the first speech text chunk is sent → & the first base64 speech chunk is received back from ElevenLabs
totelap: this is the total elapsed time between when prepare() was called → & the relevant log being recorded. This is an impotant metric to track the time from when LLM inference may have been fired off & the first speech chunk received back.

Inspecting

4) inspect files

src/voicebox/Voicebox.py → has all socket-related logic
src/testing/voicebox.py → driver file

Overview

The sample showcases a Voicebox class wrapping around the WebSocket functionality with a simple API:

actions:

prepare(speech_generation_start_time: float) (non-blocking): This will prepare & initialize the socket connection (async).
- If you are streaming text from an LLM you'd fire prepare() before your LLM request so both TTS prep + LLM inference proceed concurrently.
- The voicebox should be ready to ingest speech within 200-300ms (before your LLM would get its first token back to you).
async feed_speech(text: str): Once a connection is open, you can feed speech over the socket. Feed a string of any size (from a single character to a full sentence).
async feeding_finished(): Signal to the voicebox that it has received all speech & transmission is finished.
- This is a mandatory step — although speech generations may continue to come back (even when you have no more to send), ElevenLabs requires that you let it know that further speech will not be sent & that the transmission is complete.
async reset(): This will close the socket connection & reset the voicebox for the next speech generation to run.
- This is a required step, socket connections cannot be reused (at the time of this sample's writing) or kept alive (default timeout is 20s)

state:

is_ready(): Check if the voicebox is ready for speech transmission.
- You would check this in a loop after firing off prepare() (since prepare() will not block).
generation_complete(): Returns true when all speech has been received back from ElevenLabs.
- You would check this in a loop after calling feeding_finished() to actually know when all speech has been generated & sent back to you.
- You can then safely call reset() to prepare for the next generation.

Just wanted to publish this as a quick sample — forgive any rough edges on code formatting, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets/img		assets/img
src		src
.env.example		.env.example
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ElevenLabs WebSocket API Sample

Running

Setup

0) clone + cd:

1) install dependencies:

2) create .env file:

Run

3) run in terminal

you should see something like this:

Inspecting

4) inspect files

Overview

About

Releases

Packages

Languages

License

bephrem1/elevenlabs-websockets

Folders and files

Latest commit

History

Repository files navigation

ElevenLabs WebSocket API Sample

Running

Setup

0) clone + cd:

1) install dependencies:

2) create .env file:

Run

3) run in terminal

you should see something like this:

Inspecting

4) inspect files

Overview

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages