Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Dynamic Sample Rate Detection for Audio Compatibility #7

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The OpenAI Realtime Console is intended as an inspector and interactive API reference
for the OpenAI Realtime API. It comes packaged with two utility libraries,
[openai/openai-realtime-api-beta](https://github.com/openai/openai-reatime-api-beta)
[openai/openai-realtime-api-beta](https://github.com/openai/openai-realtime-api-beta)
that acts as a **Reference Client** (for browser and Node.js) and
[`/src/lib/wavtools`](./src/lib/wavtools) which allows for simple audio
management in the browser.
Expand Down Expand Up @@ -98,7 +98,7 @@ You will have to implement these features yourself.
# Realtime API reference client

The latest reference client and documentation are available on GitHub at
[openai/openai-realtime-api-beta](https://github.com/openai/openai-reatime-api-beta).
[openai/openai-realtime-api-beta](https://github.com/openai/openai-realtime-api-beta).

You can use this client yourself in any React (front-end) or Node.js project.
For full documentation, refer to the GitHub repository, but you can use the
Expand Down Expand Up @@ -212,7 +212,7 @@ anything it has generated that is ahead of where the user's state is.
There are five main client events for application control flow in `RealtimeClient`.
Note that this is only an overview of using the client, the full Realtime API
event specification is considerably larger, if you need more control check out the GitHub repository:
[openai/openai-realtime-api-beta](https://github.com/openai/openai-reatime-api-beta).
[openai/openai-realtime-api-beta](https://github.com/openai/openai-realtime-api-beta).

```javascript
// errors like connection failures
Expand Down
8 changes: 4 additions & 4 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

100 changes: 99 additions & 1 deletion src/lib/wavtools/lib/wav_recorder.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,43 @@ import { AudioProcessorSrc } from './worklets/audio_processor.js';
import { AudioAnalysis } from './analysis/audio_analysis.js';
import { WavPacker } from './wav_packer.js';

// Add the helper functions here
/**
* Converts Int16Array to AudioBuffer
* @param {Int16Array} pcmData - The raw PCM data
* @param {number} sampleRate - The sample rate of the PCM data
* @returns {Promise<AudioBuffer>} - A promise that resolves to the AudioBuffer
*/
async function int16ArrayToAudioBuffer(pcmData, sampleRate) {
const audioCtx = new AudioContext({ sampleRate });
const audioBuffer = audioCtx.createBuffer(1, pcmData.length, sampleRate);
const bufferData = audioBuffer.getChannelData(0);

// Convert Int16Array to Float32Array, as AudioBuffer expects Float32 data
for (let i = 0; i < pcmData.length; i++) {
bufferData[i] = pcmData[i] / 0x8000; // Convert to [-1, 1] range
}

return audioBuffer;
}

/**
* Converts AudioBuffer to Int16Array
* @param {AudioBuffer} audioBuffer - The audio buffer to convert
* @returns {Int16Array} - The converted Int16Array
*/
function audioBufferToInt16Array(audioBuffer) {
const float32Array = audioBuffer.getChannelData(0);
const int16Array = new Int16Array(float32Array.length);

for (let i = 0; i < float32Array.length; i++) {
// Convert Float32 ([-1, 1]) to Int16 ([−32768, 32767])
int16Array[i] = Math.max(-1, Math.min(1, float32Array[i])) * 0x8000;
}

return int16Array;
}

/**
* Decodes audio into a wav file
* @typedef {Object} DecodedAudioType
Expand All @@ -11,18 +48,45 @@ import { WavPacker } from './wav_packer.js';
* @property {AudioBuffer} audioBuffer
*/

/**
* Resample audio from system sample rate to 24,000 Hz
* @param {AudioBuffer} audioBuffer - The captured audio buffer at system sample rate
* @param {number} targetSampleRate - The target sample rate, e.g., 24000 Hz
* @returns {Promise<AudioBuffer>} - A promise that resolves with the resampled audio buffer
*/

/**
* Records live stream of user audio as PCM16 "audio/wav" data
* @class
*/

async function resampleAudioBuffer(audioBuffer, targetSampleRate) {
const numChannels = audioBuffer.numberOfChannels;
const context = new OfflineAudioContext(
numChannels,
audioBuffer.duration * targetSampleRate,
targetSampleRate
);

// Create a buffer source and set its buffer to the input buffer
const bufferSource = context.createBufferSource();
bufferSource.buffer = audioBuffer;
bufferSource.connect(context.destination);

// Start processing the audio
bufferSource.start(0);
const renderedBuffer = await context.startRendering();
return renderedBuffer;
}

export class WavRecorder {
/**
* Create a new WavRecorder instance
* @param {{sampleRate?: number, outputToSpeakers?: boolean, debug?: boolean}} [options]
* @returns {WavRecorder}
*/
constructor({
sampleRate = 44100,
sampleRate = new AudioContext().sampleRate, // Capture the system sample rate if none provided
outputToSpeakers = false,
debug = false,
} = {}) {
Expand Down Expand Up @@ -431,6 +495,7 @@ export class WavRecorder {
* @param {number} [chunkSize] chunkProcessor will not be triggered until this size threshold met in mono audio
* @returns {Promise<true>}
*/

async record(chunkProcessor = () => {}, chunkSize = 8192) {
if (!this.processor) {
throw new Error('Session ended: please call .begin() first');
Expand All @@ -439,15 +504,48 @@ export class WavRecorder {
} else if (typeof chunkProcessor !== 'function') {
throw new Error(`chunkProcessor must be a function`);
}

this._chunkProcessor = chunkProcessor;
this._chunkProcessorSize = chunkSize;
this._chunkProcessorBuffer = {
raw: new ArrayBuffer(0),
mono: new ArrayBuffer(0),
};
this.log('Recording ...');

// Start recording and capture audio at the system sample rate
await this._event('start');
this.recording = true;

// Example: Process small chunks of audio data
const processChunk = async (startIdx, endIdx) => {
// Extract the smaller chunk of data
const chunkBuffer = this._chunkProcessorBuffer.mono.slice(startIdx, endIdx);

// Convert raw PCM data to AudioBuffer
const rawBuffer = new Int16Array(chunkBuffer);
const audioBuffer = await int16ArrayToAudioBuffer(rawBuffer, this.sampleRate);

// Resample captured audio to 24,000 Hz
const resampledBuffer = await resampleAudioBuffer(audioBuffer, 24000);

// Convert the resampled AudioBuffer back to Int16Array
const resampledInt16Array = audioBufferToInt16Array(resampledBuffer);

// Process the resampled buffer (now in Int16Array format)
this._chunkProcessor({ mono: resampledInt16Array });
};

// Now process the full buffer in smaller chunks
const totalSamples = this._chunkProcessorBuffer.mono.byteLength / 2; // Each Int16 is 2 bytes
const chunkSizeInSamples = Math.min(this._chunkProcessorSize, totalSamples);

// Iterate over the buffer in smaller chunks
for (let i = 0; i < totalSamples; i += chunkSizeInSamples) {
const endIdx = Math.min(i + chunkSizeInSamples, totalSamples);
await processChunk(i * 2, endIdx * 2); // Multiply by 2 to get byte offset
}

return true;
}

Expand Down
32 changes: 29 additions & 3 deletions src/lib/wavtools/lib/wav_stream_player.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ export class WavStreamPlayer {
* @returns {WavStreamPlayer}
*/
constructor({ sampleRate = 44100 } = {}) {
this.sampleRate = sampleRate || new AudioContext().sampleRate; // Fallback to system sample rate
this.scriptSrc = StreamProcessorSrc;
this.sampleRate = sampleRate;
this.context = null;
this.stream = null;
this.analyser = null;
Expand Down Expand Up @@ -100,7 +100,8 @@ export class WavStreamPlayer {
* @param {string} [trackId]
* @returns {Int16Array}
*/
add16BitPCM(arrayBuffer, trackId = 'default') {

async add16BitPCM(arrayBuffer, trackId = 'default') {
if (typeof trackId !== 'string') {
throw new Error(`trackId must be a string`);
} else if (this.interruptedTrackIds[trackId]) {
Expand All @@ -109,6 +110,7 @@ export class WavStreamPlayer {
if (!this.stream) {
this._start();
}

let buffer;
if (arrayBuffer instanceof Int16Array) {
buffer = arrayBuffer;
Expand All @@ -117,7 +119,31 @@ export class WavStreamPlayer {
} else {
throw new Error(`argument must be Int16Array or ArrayBuffer`);
}
this.stream.port.postMessage({ event: 'write', buffer, trackId });

// Example: Process smaller chunks of the buffer
const processChunk = async (startIdx, endIdx) => {
const chunkBuffer = buffer.slice(startIdx, endIdx);

// Convert Int16Array to AudioBuffer
const audioBuffer = await int16ArrayToAudioBuffer(chunkBuffer, this.sampleRate);

// Resample to 24,000 Hz
const resampledBuffer = await resampleAudioBuffer(audioBuffer, 24000);

// Convert the resampled AudioBuffer back to Int16Array
const resampledInt16Array = audioBufferToInt16Array(resampledBuffer);

// Send the resampled buffer to the stream for playback
this.stream.port.postMessage({ event: 'write', buffer: resampledInt16Array, trackId });
};

// Process the buffer in smaller chunks
const chunkSize = 1024; // Customize the chunk size as needed
for (let i = 0; i < buffer.length; i += chunkSize) {
const endIdx = Math.min(i + chunkSize, buffer.length);
await processChunk(i, endIdx);
}

return buffer;
}

Expand Down
8 changes: 5 additions & 3 deletions src/pages/ConsolePage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,14 @@ export function ConsolePage() {
* - WavStreamPlayer (speech output)
* - RealtimeClient (API client)
*/
const getSystemSampleRate = () => new AudioContext().sampleRate;

const wavRecorderRef = useRef<WavRecorder>(
new WavRecorder({ sampleRate: 24000 })
new WavRecorder({ sampleRate: getSystemSampleRate() })
);
const wavStreamPlayerRef = useRef<WavStreamPlayer>(
new WavStreamPlayer({ sampleRate: 24000 })
);
new WavStreamPlayer({ sampleRate: getSystemSampleRate() })
);
const clientRef = useRef<RealtimeClient>(
new RealtimeClient(
USE_LOCAL_RELAY_SERVER_URL
Expand Down