Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio PR #2: Move APU frame processing to new thread #250

Closed
wants to merge 2 commits into from

Conversation

mborgerson
Copy link
Member

Note: This PR is part of a series of PRs coming to add basic audio.


  • Reduce timer based interval to 600us
  • Add new, thread-based model to allow processing frames on an independent thread. Timers are not as reliable on all platforms for the frequency we need to run at, especially given the work that needs to be done in each frame. We will phase out the timer-based model in a future PR.

volatile MCPXAPUState *d = MCPX_APU_DEVICE(arg);

while (!d->exiting) {
int should_run = (d->regs[NV_PAPU_SECTL] & NV_PAPU_SECTL_XCNTMODE) >> 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is an unsuitable variable name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@@ -1478,7 +1498,9 @@ static void se_frame(void *opaque)
int mixbin;
int sample;

timer_mod(d->se.frame_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 10);
#if !USE_APU_THREAD
timer_mod(d->se.frame_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 600);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 600us?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optimal frame size would be 20ms (960 samples @ 48kHz). This is the recommended configuration from the Opus audio codec spec (RFC6716).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this must run at 1500Hz.

This is because APU processes 32 samples per frame (which is hardware design). It is designed for 48kHz samplerate (also, hardware design; it handles other samplerates by modifying the pitch).

So (48000 samples / second) / (32 samples / frame) = 1500 frames / second.

You can even see this when observing memory (which my xboxpy APU streaming code does): there's 1500 discrete updates per second, where each update produces 32 samples.
To my knowledge this can not be configured (although DSP DMA streaming buffer sizes can be changed).

Some more background:

The APU is a audio processor - it streams back to memory, and with DirectSound it runs synchronous with the ACI, which plays back from a small looping buffer, to output the APU results to the speakers.

The APU VP + DSPs are synchronized to this 1500Hz and the current frame count and timer can be read back. Also, in DirectSound, the DSPs are communicating with the CPU using command lists (which the GP DSP code processes) and the CPU expects this 1500Hz - it also synchronizes game events with the DSPs (so this timing is possibly critical).

Also see http://xboxdevwiki.net/APU

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, 1 second / 1500 Hz = 0.000666667 seconds or 666.667us per 32 sample frame, assembled by the APU.

Copy link
Member

@JayFoxRox JayFoxRox Jul 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.. but 600us is not the same as 666.666..us - so where does it come from?

Assuming a flawless FIFO (and QEMU audio driver, sampling at 48kHz, making 32 samples of space at a time), shouldn't it be at least 2*f (of realtime; Nyquist-Shannon theorem), to avoid any drift?

Eitherway, it should be documented why 600us was chosen.

The reason why this isn't 666.666..us already, is because the GP emulation doesn't even come close to finish a frame in that time (also see JayFoxRox#22) on some machines (for games using GP). If the DSP frame is not finished by the time another DSP frame starts, it can lead to games locking up for those users.
It's similar to our dsp_run parameters, which require tweaking for some games (as they need more than ~1000 instructions), which otherwise also leads to the same problem.

Also see #150, #151 and #155

Copy link

@haxar haxar Jul 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] shouldn't it be at least 2*f (Shannon-Nyquist theorem), to avoid any drift?

Yes. A circular buffer of only one frame (32 samples) in length may not give smooth playback to a thread-safe reader in my circular buffer mixer (now in my Mangler work). A buffer with >=2 frames (2*f) may work to not exhibit playback stutter & if the frame is mixed on time with an accurate timer (calculating granule position).

My timer keeps a starting granule position for the mixer to do work & increments the position by measuring writer buffer lengths (mixer reads each mix buffer at 960 samples & outputs 960 samples, but the mixer only does work when the next timeout expires), as well as measuring the next timeout for the CPU to do mixer & other stuff. Code latency is subtracted out of the timeout result, giving an accurate timeout, so that drift is not added.

Maybe code latency is causing issues?

See v3_timer() here.

Copy link
Member

@JayFoxRox JayFoxRox Jul 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are missing my point: This APU code must be independent of audio output.

This has to do with game logic, not audio-output (things like Opus or AC97) .. which the APU isn't doing - APU is incapable of driving a speaker directly. APU does math at a fixed framerate / fixed bandwidth - it doesn't drive a speaker.

Games expect 1500 events per second. Games synchronize certain events (on the CPU) to this framerate.
If we don't run at that framerate (say.. if we synchronize at 1/600us or at the hosts audio driver choice), we will get Audio/Video desynchronization and other problems.

Regardless of what virtual-time is derived from, the APU has to run at 1500Hz in virtual-time.
None of the proposed solutions do this.

The issue is that audio-output in XQEMU is supposed to be driven by host (at 48kHz real-time); see planned #251 . However, the APU processing must be done in virtual-time (to synchronize with the CPU, ACI and all existing timers). The timer modification (which is only temporary according to PR description), is also not based on hardware behaviour, but a randomly chosen value.

My solution for this was to use time-stretching in the ACI (as is the most common solution for emulators) or a hack to assume that virtual-time = real-time in the APU (which leads to audio crackling, due to minor drifting).

If you want to assume that virtual-time = real-time you must basically assume that all deadlines will be met. However, we don't do that (#155), so it can lead to bad code-flow in the DSP code, which can also crash or hang games (as it will affect the CPU, too).

Hence, I was initially pushing for working on APU / ACI routing and improving our DSPs before integrating the VP: it would be safer to make assumptions.

If you just want to get audio from the VP (as temporary hack / for debugging), you can ignore most of the APU audio framing, because you don't feed the GP DSP MIXBUF (which is fixed size / bandwidth + GP DSP synchronizes with CPU).
So if we output audio from VP, our frame logic could largely remain untouched (broken as is; not introducing more error) and VP frames would largely run independent (only driving the envelopes) from the rest of APU frames. Games would still hang on feedback loops or some GP command lists (as some games already do).


My point is: if we touch framing, it should be done properly, and independent of VP audio output to speakers (it's being made dependent in the planned PR #251 - which is something that doesn't even exist in hardware); at the very least, we should document how our code works and why it diverges from hardware.

Threading the APU frames is a good idea in general though; I'm only criticizing the random synchronization, that isn't based on hardware behaviour (host audio driver synchronization / 600us timer - both incorrect). VP, GP and EP could also possibly be split into 3 different threads (which would need further hardware testing, as I'm not sure how VP / GP, DSP DMA, EP and ACI are synchronized; co-routines could be more suitable).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was months ago now, but if memory serves correctly there was a slight timing issue I was working around. It's better to be conservative here with a smaller number as the sample FIFO keeps the same rate throttled due to system audio sync at 48kHz.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment in the code about this (although your comment below says the timer model will be removed anyway; in that case: ignore this).

@mborgerson
Copy link
Member Author

I'm going to clean this PR up a bit and just remove the timer based model before this gets merged.

@mborgerson mborgerson closed this Feb 18, 2020
@mborgerson mborgerson deleted the audio-pr-2 branch February 19, 2020 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants