Skip to content
marisademeglio edited this page May 9, 2012 · 24 revisions

Comparison of two playback mechanisms: HTML5 audio element and web audio.

HTML 5 audio

pros:

  • buffers automatically and does a reasonable job
  • can easily set start position
  • built-in playback rate control

cons:

  • some glitching when setting currentTime (used to point the player at the start of a clip)
  • have to use timeout/setinterval to monitor for end of file

If we allow for +/- 11ms when setting the currentTime, then glitching seems to go away.

If we encounter issues with loading files, we can create an asset manager to pre-load audio assets. In this case, it would be helpful to list all the audio assets related to SMIL when the book loads; of course, this can't be done without analyzing all the SMIL files, which could get expensive. One compromise might be to have the MO component, which deals with one file at a time, pre-load audio assets of a single file.

The remaining issue is that if the tab moves to the background, audio playback is affected because the interval is checked only every second (read more). In this case, the system should perform an integrity check that will help stay synchronized.

The integrity check works like this:

  • As each audio clip finishes playback, the calling application receives notification
  • However, when the tab is in the background, these notifications will happen at most every second
  • So, as each notification arrives, if the tab is in the background, compare the audio player's currentTime against the SMIL tree to determine what clip we're in.
  • While back-calculating the current SMIL node based on audio player position is a bit of an expensive search operation, its use is minimized in this case to 1. once per second when the tab is in the background, 2. one extra time when the tab regains focus. It is not used while the tab is in focus. It is also optimized by only looking ahead in the tree, not behind.

This approach isn't perfect: for out-of-order audio clips (see the numbers example)[https://github.com/marisademeglio/media-overlays-js/tree/master/testdata/numbers], we don't get feedback often enough and clips shorter than one second could fall through the cracks.

Another approach would be to build a timegraph of SMIL audio and determine what we should be playing based on the wall clock time. We would still be limited to checking our position every 1 second, but it would solve this use case:

<audio clipBegin="0" clipEnd="4.5s"/> <audio clipBegin="8s" clipEnd="10s"/> <audio clipBegin="4.5s" clipEnd="8s"/>

Clip 1 plays, although, if we're in the background, we don't hear that it has ended at 4.5s, but rather 5s after starting, at which point we determine that we are in clip 3, and we've consequently skipped clip 2 entirely.

If we use wall clock time, we can say that after 5s of playback, we should be in clip 2.

Web audio

pros:

  • no glitching
  • can specify clip duration up front
  • can accept audio filters, for example to scale the playback rate without affecting the pitch

cons:

  • have to use timeout/setinterval to monitor the status of an audio clip
  • have to manage buffer manually, and it can be slow to load

future pros:

  • need for timeout/setinterval monitoring will go away in the future (see this bug)

If we were to use web audio, we would have to progressively buffer windows of data (for example, 1MB at a time). We might not need the full power of web audio.

Clone this wiki locally