Audio PR #2: Move APU frame processing to new thread #250

mborgerson · 2019-07-12T22:33:22Z

Note: This PR is part of a series of PRs coming to add basic audio.

Reduce timer based interval to 600us
Add new, thread-based model to allow processing frames on an independent thread. Timers are not as reliable on all platforms for the frequency we need to run at, especially given the work that needs to be done in each frame. We will phase out the timer-based model in a future PR.

JayFoxRox · 2019-07-13T22:56:11Z

hw/xbox/mcpx_apu.c

+    volatile MCPXAPUState *d = MCPX_APU_DEVICE(arg);
+
+    while (!d->exiting) {
+        int should_run = (d->regs[NV_PAPU_SECTL] & NV_PAPU_SECTL_XCNTMODE) >> 3;


nit: This is an unsuitable variable name

JayFoxRox · 2019-07-13T22:57:02Z

hw/xbox/mcpx_apu.c

@@ -1478,7 +1498,9 @@ static void se_frame(void *opaque)
    int mixbin;
    int sample;

-    timer_mod(d->se.frame_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 10);
+#if !USE_APU_THREAD
+    timer_mod(d->se.frame_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 600);


An optimal frame size would be 20ms (960 samples @ 48kHz). This is the recommended configuration from the Opus audio codec spec (RFC6716).

No, this must run at 1500Hz.

This is because APU processes 32 samples per frame (which is hardware design). It is designed for 48kHz samplerate (also, hardware design; it handles other samplerates by modifying the pitch).

So (48000 samples / second) / (32 samples / frame) = 1500 frames / second.

You can even see this when observing memory (which my xboxpy APU streaming code does): there's 1500 discrete updates per second, where each update produces 32 samples.
To my knowledge this can not be configured (although DSP DMA streaming buffer sizes can be changed).

Some more background:

The APU is a audio processor - it streams back to memory, and with DirectSound it runs synchronous with the ACI, which plays back from a small looping buffer, to output the APU results to the speakers.

The APU VP + DSPs are synchronized to this 1500Hz and the current frame count and timer can be read back. Also, in DirectSound, the DSPs are communicating with the CPU using command lists (which the GP DSP code processes) and the CPU expects this 1500Hz - it also synchronizes game events with the DSPs (so this timing is possibly critical).

Also see http://xboxdevwiki.net/APU

FTR, 1 second / 1500 Hz = 0.000666667 seconds or 666.667us per 32 sample frame, assembled by the APU.

Right.. but 600us is not the same as 666.666..us - so where does it come from?

Assuming a flawless FIFO (and QEMU audio driver, sampling at 48kHz, making 32 samples of space at a time), shouldn't it be at least 2*f (of realtime; Nyquist-Shannon theorem), to avoid any drift?

Eitherway, it should be documented why 600us was chosen.

The reason why this isn't 666.666..us already, is because the GP emulation doesn't even come close to finish a frame in that time (also see JayFoxRox#22) on some machines (for games using GP). If the DSP frame is not finished by the time another DSP frame starts, it can lead to games locking up for those users.
It's similar to our dsp_run parameters, which require tweaking for some games (as they need more than ~1000 instructions), which otherwise also leads to the same problem.

Also see #150, #151 and #155

[...] shouldn't it be at least 2*f (Shannon-Nyquist theorem), to avoid any drift?

Yes. A circular buffer of only one frame (32 samples) in length may not give smooth playback to a thread-safe reader in my circular buffer mixer (now in my Mangler work). A buffer with >=2 frames (2*f) may work to not exhibit playback stutter & if the frame is mixed on time with an accurate timer (calculating granule position).

My timer keeps a starting granule position for the mixer to do work & increments the position by measuring writer buffer lengths (mixer reads each mix buffer at 960 samples & outputs 960 samples, but the mixer only does work when the next timeout expires), as well as measuring the next timeout for the CPU to do mixer & other stuff. Code latency is subtracted out of the timeout result, giving an accurate timeout, so that drift is not added.

Maybe code latency is causing issues?

See v3_timer() here.

I think you are missing my point: This APU code must be independent of audio output.

This has to do with game logic, not audio-output (things like Opus or AC97) .. which the APU isn't doing - APU is incapable of driving a speaker directly. APU does math at a fixed framerate / fixed bandwidth - it doesn't drive a speaker.

Games expect 1500 events per second. Games synchronize certain events (on the CPU) to this framerate.
If we don't run at that framerate (say.. if we synchronize at 1/600us or at the hosts audio driver choice), we will get Audio/Video desynchronization and other problems.

Regardless of what virtual-time is derived from, the APU has to run at 1500Hz in virtual-time.
None of the proposed solutions do this.

The issue is that audio-output in XQEMU is supposed to be driven by host (at 48kHz real-time); see planned #251 . However, the APU processing must be done in virtual-time (to synchronize with the CPU, ACI and all existing timers). The timer modification (which is only temporary according to PR description), is also not based on hardware behaviour, but a randomly chosen value.

My solution for this was to use time-stretching in the ACI (as is the most common solution for emulators) or a hack to assume that virtual-time = real-time in the APU (which leads to audio crackling, due to minor drifting).

If you want to assume that virtual-time = real-time you must basically assume that all deadlines will be met. However, we don't do that (#155), so it can lead to bad code-flow in the DSP code, which can also crash or hang games (as it will affect the CPU, too).

Hence, I was initially pushing for working on APU / ACI routing and improving our DSPs before integrating the VP: it would be safer to make assumptions.

If you just want to get audio from the VP (as temporary hack / for debugging), you can ignore most of the APU audio framing, because you don't feed the GP DSP MIXBUF (which is fixed size / bandwidth + GP DSP synchronizes with CPU).
So if we output audio from VP, our frame logic could largely remain untouched (broken as is; not introducing more error) and VP frames would largely run independent (only driving the envelopes) from the rest of APU frames. Games would still hang on feedback loops or some GP command lists (as some games already do).

My point is: if we touch framing, it should be done properly, and independent of VP audio output to speakers (it's being made dependent in the planned PR #251 - which is something that doesn't even exist in hardware); at the very least, we should document how our code works and why it diverges from hardware.

Threading the APU frames is a good idea in general though; I'm only criticizing the random synchronization, that isn't based on hardware behaviour (host audio driver synchronization / 600us timer - both incorrect). VP, GP and EP could also possibly be split into 3 different threads (which would need further hardware testing, as I'm not sure how VP / GP, DSP DMA, EP and ACI are synchronized; co-routines could be more suitable).

This was months ago now, but if memory serves correctly there was a slight timing issue I was working around. It's better to be conservative here with a smaller number as the sample FIFO keeps the same rate throttled due to system audio sync at 48kHz.

Please add a comment in the code about this (although your comment below says the timer model will be removed anyway; in that case: ignore this).

mborgerson · 2019-09-04T20:53:24Z

I'm going to clean this PR up a bit and just remove the timer based model before this gets merged.

apu: Run frame timer with 600us interval

96627bb

mborgerson force-pushed the audio-pr-2 branch from 15f1183 to be63014 Compare July 13, 2019 01:45

apu: Add initial threaded model for APU

1c19e25

mborgerson force-pushed the audio-pr-2 branch from be63014 to 1c19e25 Compare July 13, 2019 01:52

JayFoxRox reviewed Jul 13, 2019

View reviewed changes

JayFoxRox mentioned this pull request Jul 14, 2019

Audio PR #3: Connect VP output directly into system audio #251

Closed

mborgerson closed this Feb 18, 2020

mborgerson deleted the audio-pr-2 branch February 19, 2020 06:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio PR #2: Move APU frame processing to new thread #250

Audio PR #2: Move APU frame processing to new thread #250

mborgerson commented Jul 12, 2019

JayFoxRox Jul 13, 2019

mborgerson Sep 4, 2019

JayFoxRox Jul 13, 2019

haxar Jul 13, 2019

JayFoxRox Jul 13, 2019

haxar Jul 14, 2019

JayFoxRox Jul 14, 2019 •

edited

Loading

haxar Jul 14, 2019 •

edited

Loading

JayFoxRox Jul 14, 2019 •

edited

Loading

mborgerson Sep 4, 2019

JayFoxRox Sep 4, 2019

mborgerson commented Sep 4, 2019

Audio PR #2: Move APU frame processing to new thread #250

Audio PR #2: Move APU frame processing to new thread #250

Conversation

mborgerson commented Jul 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JayFoxRox Jul 14, 2019 • edited Loading

Choose a reason for hiding this comment

haxar Jul 14, 2019 • edited Loading

Choose a reason for hiding this comment

JayFoxRox Jul 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mborgerson commented Sep 4, 2019

JayFoxRox Jul 14, 2019 •

edited

Loading

haxar Jul 14, 2019 •

edited

Loading

JayFoxRox Jul 14, 2019 •

edited

Loading