Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vkd3d seems to be ignoring maxTexelBufferElements in some games, leading to visual corruption #2071

Open
pzanoni-intel opened this issue Aug 16, 2024 · 7 comments

Comments

@pzanoni-intel
Copy link

Hello

Original bug report: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9963

On Intel graphics with the Mesa drivers we're observing that some games such as Marvel's Spider-Man Remastered and Assassin's Creed: Valhalla seem to be emitting vkGetDescriptorEXT() calls that ignore VUID-VkDescriptorGetInfoEXT-type-09427, which relates to VkPhysicalDeviceLimits::maxTexelBufferElements.

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkDescriptorGetInfoEXT.html#VUID-VkDescriptorGetInfoEXT-type-09427

Since vkGetDescriptorEXT() doesn't have a way to return errors, I tried to change anv to completely disable VK_EXT_descriptor_buffer. With this, we have the same kind of problem, but now with vkCreateBufferView() calls violating VkBufferViewCreateInfo-range-00930, which is basically the same thing.

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkBufferViewCreateInfo.html#VUID-VkBufferViewCreateInfo-range-00930

Now, vkCreateBufferView() is able to return errors, but if I make it return VK_ERROR_VALIDATION_FAILED_EXT on the relevant cases then Spider-Man GPU hangs on the main menu. But an interesting thing is that, when I do this, the white parts of his uniform (the spider in his chest and the back of his hands), which previously showed rendering corruption, are now completely black. So this change really seems to have an effect on the current visual corruption.

Please notice: in Release builds the anv driver doesn't do any checking related to this at all, but as a result we end up programming the hardware with wrong values. In debug builds we catch this with an assertion deep in our call stack:

https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/isl/isl_surface_state.c?ref_type=heads#L997

... as a result, the driver ignores the errors. On the other hand, both games show some rendering corruption. Still, we shouldn't be issuing invalid Vulkan commands.

Is there something that can be done about this issue in the vkd3d-proton layer?

Thanks in advance!

@doitsujin
Copy link
Collaborator

doitsujin commented Aug 16, 2024

What kind of numbers are we talking about here? D3D12 requirements are at least 2^27 texels regardless of format.

There's also the problematic case of games using typed descriptors but passing non-typed buffer descriptors, which is an extremely common app bug that works on all D3D12 drivers that we need to work around by creating typed views. If that's what's happening here then the best we can do is to just not create the typed view and have the game be broken entirely, or create a smaller view and most likely still completely break the game, but at least that way it wouldn't trigger Vulkan validation. Emulating larger views via multiple smaller views is not possible for us.

@pzanoni-intel
Copy link
Author

pzanoni-intel commented Aug 16, 2024

@doitsujin:

What kind of numbers are we talking about here?

For Anv, maxTexelBufferElements is 134217728 (aka 2^27).

On Spider-Man, we see:

error: num_elements 209715200 > 134217728 (size:838860800 stride:4)
error: num_elements 209715200 > 134217728 (size:838860800 stride:4)
error: num_elements 209715200 > 134217728 (size:838860800 stride:4)
error: num_elements 419430400 > 134217728 (size:838860800 stride:2)

Edit:

On Assassin's creed:

(gdb) print num_elements
$1 = 268435456
(gdb) print buffer_size
$2 = 536870912
(gdb) print info->stride_B
$3 = 2

format = ISL_FORMAT_R16_UINT

@doitsujin
Copy link
Collaborator

I'll have a look next week at what the game is doing, but the fact that there's an R16 format involved is a relatively strong indication that this is just coming directly from the app and is just expected to work somehow.

@pzanoni-intel
Copy link
Author

For Spider-Man the formats are:

ISL_FORMAT_R8G8B8A8_UNORM
ISL_FORMAT_R10G10B10A2_UNORM
ISL_FORMAT_R16G16_SINT
ISL_FORMAT_R16_UINT

@doitsujin
Copy link
Collaborator

Confirmed that it's the app itself creating these views.

Also confirmed that these work as expected both on native AMD/Nvidia drivers, as well as the respective Vulkan drivers, despite NV also reporting 2^27 as an upper limit, as in, querying the size and accessing the last element of the view works just fine. I strongly suspect that the real limit for this kind of hardware is something along the lines of 2GB for texel buffers rather than a specific element count, and 2^27 is conservative enough to cover R32G32B32A32 formats.

D3D12 does not have runtime validation for this or any sort of query for the app, so we really can't do much here.

@pzanoni-intel
Copy link
Author

Thank you for your assessment here!

So I'm not sure what to do here other than remove the assert() in the Anv driver, but at least now I have better information. If you have any ideas/suggestions, please let me know.

Chaotic-Temeraire pushed a commit to chaotic-cx/mesa-mirror that referenced this issue Aug 27, 2024
Some games such as Marvel's Spider-Man Remastered and Assassin's
Creed: Valhalla don't work in debug mode because they hit this
assertion. In Release mode, they appear to work (although in some
platforms there may be visual corruption or GPU hangs). There's
nothing we can do about this error (see below), so in this patch we
replace the assertion with an error message, because it allows us to
(i) test the rest of the game in debug mode so we may catch other
issues; and (ii) warn users of release mode that the issue is
happening.

The unsupported num_elements comes from vkGetDescriptorEXT() and
appears to be violating VUID-VkDescriptorGetInfoEXT-type-09427. This
function cannot return errors, but we can disable
VK_EXT_descriptor_buffer.

If we do disable the extension, then vkCreateBufferView() will start
triggering the assertion, and we can see that
VkBufferViewCreateInfo-range-00930 is being violated. If we change Anv
to return errors on these vkCreateBufferView() cases, then the games
won't work at all.

I reported this to vkd3d-proton, but according to the vkd3d-proton
developer Philip Rebohle:

 "There's also the problematic case of games using typed descriptors
  but passing non-typed buffer descriptors, which is an extremely
  common app bug that works on all D3D12 drivers that we need to work
  around by creating typed views. If that's what's happening here then
  the best we can do is to just not create the typed view and have the
  game be broken entirely, or create a smaller view and most likely
  still completely break the game, but at least that way it wouldn't
  trigger Vulkan validation. Emulating larger views via multiple
  smaller views is not possible for us."

 "Confirmed that it's the app itself creating these views."

 "D3D12 does not have runtime validation for this or any sort of query
  for the app, so we really can't do much here."

Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9963
Link: HansKristian-Work/vkd3d-proton#2071
Reviewed-by: Lionel Landwerlin <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30775>
@werman
Copy link
Contributor

werman commented Oct 4, 2024

For the record, the same issue is present on Turnip, it leads to bad UVs on different objects in Spider-Man Remastered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants