Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: (alpha/beta) LiveClient closing too early - lost transcript events #198

Closed
ftr-lwilson opened this issue Nov 15, 2023 · 3 comments · Fixed by #201
Closed

Bug: (alpha/beta) LiveClient closing too early - lost transcript events #198

ftr-lwilson opened this issue Nov 15, 2023 · 3 comments · Fixed by #201
Assignees
Labels
beta Pending GA release bug Something isn't working

Comments

@ftr-lwilson
Copy link
Contributor

What is the current behavior?

I notice that when streaming transcription through the live client I was losing events towards the end of the stream. I believe this is because the new alpha/beta LiveClient explitly closes the websocket on the client side, rather than letting the sever terminate the connection here:
https://github.com/deepgram/deepgram-node-sdk/blob/c0def146e0d480c9b09c5a366c8b21d67609150e/src/packages/LiveClient.ts#L152C7-L152C7

Looking at the original implementation on the main branch, it simply sends a message to the server indicating that no more data will be sent without an explicit client socket.close().
https://github.com/deepgram/deepgram-node-sdk/blob/95f9291f20c92058c013923a357c5a1c6dc7f2de/src/transcription/liveTranscription.ts#L106

I have tried commenting out the socket.close() call and that does seem to fix the issue - all events come through before the connection closes. 🥳

According to the MDN documentation on the the WebSocket API:

The process of closing the connection begins with a closing handshake, and the close() method does not discard previously-sent messages before starting that closing handshake; even if the user agent is still busy sending those messages, the handshake will only start after the messages are sent.

It does stand to reason that all data is send to the server over the socket before it is closed, but its ambiguous as to whether the server has opportunity to continue return messages.

Steps to reproduce

import { createClient, LiveTranscriptionEvent, LiveTranscriptionEvents } from '@deepgram/sdk'
import { createReadStream } from 'fs'

const audio = createReadStream('test-audio-file.wav')

const client = createClient(this.apiKey)
const connection = client.listen.live({
  interim_results: true,
})

connection
  .on(LiveTranscriptionEvents.Open, () => {
    audio
      .on('data', (data: Buffer) => {
        connection.send(data)
      })
      .on('end', () => {
        connection.finish() // here we're telling the client there is no more data as soon as the source stream ends
      })
  })
  .on(LiveTranscriptionEvents.Transcript, (event: LiveTranscriptionEvent) => {
    console.log(event)
  })
  .on(LiveTranscriptionEvents.Close, () => {
    console.log('closed!
  })

Expected behavior

Ideally, the whole audio is transcribed and emitted. An obvious issue is when the last event has is_final set to false - I would definitely expect the last event to be final.

Please tell us about your environment

  • Operating System/Version: MacOs Sonoma 14.1 (m1), Node v20.9.0, @deepgram/sdk 3.0.0-beta.2
  • Language: TypeScript
  • Browser: n/a
@lukeocodes
Copy link
Contributor

lukeocodes commented Nov 15, 2023

is_final refers to our interim results feature, and has nothing to do with the specifics of how the websocket is implemented.

It may be that the client close occurring when a finish is requested is not letting the final transcription events come through. I will need to check this, as i had explicitly tested that this would not be the case.

One thing I hadn't accounted for was an end event on the data/microphone, which is an oversight on my part.

@lukeocodes lukeocodes added bug Something isn't working beta Pending GA release labels Nov 15, 2023
@lukeocodes lukeocodes self-assigned this Nov 15, 2023
@ftr-lwilson
Copy link
Contributor Author

Thanks for getting back to me Luke!

is_final refers to our interim results feature, and has nothing to do with the specifics of how the websocket is implemented.

Yes yes absolutely. But if this feature is enabled, I would have thought that the last received event would be a finalised result, since there is no more work to do.

For example, for the given audio with speech: "Hello Luke, how are you?", I would expect the events to look something like:

  1. "Hello Luke" is_final=false
  2. "Hello Luke, how" is_final=false
  3. "Hello Luke, how are you?" is_final=true

But in practice, the last (or last few) events tend to be lost, yielding incomplete results, and ending on a non final event

  1. "Hello Luke" is_final=false
  2. "Hello Luke, how" is_final=false

It may be that the client close occurring when a finish is requested is not letting the final transcription events come through. I will need to check this, as i had explicitly tested that this would not be the case.

For sure, this is my theory too. Thanks Luke!

One thing I hadn't accounted for was an end event on the data/microphone, which is an oversight on my part.

Oh okay. Is there an alternative mechanism you might suggest, or a recommended approach on how/when to call LiveClient.finish without depending on the source stream ending?

@lukeocodes lukeocodes linked a pull request Nov 16, 2023 that will close this issue
@lukeocodes
Copy link
Contributor

fixed in v3.0.0-beta.4!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta Pending GA release bug Something isn't working
Projects
None yet
2 participants