Vulkan AS rebuild-on-replay: Copy AS input buffers on build #3426

cmannett85-arm · 2024-09-07T15:52:03Z

When vkCmdBuildAccelerationStructuresKHR is called, the input buffers used for the build are copied into host-accessible memory, so when initial state serialisation is triggered it can be copied into the capture file.

To unify the behaviour of the different geometry types, there is only one readback buffer per AS, and in the case of triangle data it can contain the vertex, index, and transform data concatenated into a single contiguous block.

The buffer data along with build arguments are stored in a VkAccelerationStructureInfo object owned by the AS's VkResourceRecord. If this record is freed, then the copied buffer mem is also freed. If this destruction occurs during the active capture phase, the AS info is kept alive until the end of the active capture.

Serialisation of the initial state will follow in later patches.

Caveats:

vkCmdBuildAccelerationStructuresIndirectKHR is not handled
VkAccelerationStructureGeometryInstancesDataKHR::arrayOfPointers mode is not handled

baldurk

As a general note since I can't put this on the specific line, remember to disable the indirect build physical device feature.

It's fine not to support indirect builds as it seems few drivers do as well and I don't think anyone will use it given the feature is not supported in D3D12, but I think it should be technically possible as it looks like we get bounds on the data size on the CPU? The main challenge will be doing the copy, might have to fall back to a compute shader or something and do it inefficiently.

renderdoc/driver/vulkan/wrappers/vk_misc_funcs.cpp

renderdoc/driver/vulkan/vk_acceleration_structure.h

renderdoc/driver/vulkan/vk_acceleration_structure.cpp

When vkCmdBuildAccelerationStructuresKHR is called, the input buffers used for the build are copied into host-accessible memory, so when initial state serialisation is triggers it can be copied into the capture file. To unify the behaviour of the different geometry types, there is only one readback buffer per AS, and in the case of triangle data it can contain the vertex, index, and transform data concatenated into a single contiguous block. The buffer data along with build arguments are stored in a ref-counted VkAccelerationStructureInfo object held by the AS's VkResourceRecord. Once the info object is destroyed, then the copied buffer mem is also freed. Serialisation of the initial state will follow in later patches. Caveats: * vkCmdBuildAccelerationStructuresIndirectKHR is not handled * VkAccelerationStructureGeometryInstancesDataKHR::arrayOfPointers mode is not handled

The spec does not mandate that the compaction size is the same between capture and replay, so we effectively disable it by always returning the full AS size when the compaction size is queried.

baldurk

The input buffer copying looks good. I've made some notes about the query pool handling though - there's one wrinkle I hadn't thought of which makes it more complex, but hopefully some ways it can be simplified so it's not too bad overall.

baldurk · 2024-09-30T15:09:22Z

renderdoc/driver/vulkan/vk_acceleration_structure.h

+  RDResult CopyInputBuffers(VkCommandBuffer commandBuffer,
+                            const VkAccelerationStructureBuildGeometryInfoKHR &info,
+                            const VkAccelerationStructureBuildRangeInfoKHR *buildRange,
+                            CaptureState state);


This state seems redundant here - we don't do this during replay, and we don't need to differentiate between background and active capturing. Doing a quick look through unless I'm missing something this value is (ultimately) not used.

Same applies below to CopyAccelerationStructure.

baldurk · 2024-09-30T15:12:48Z

renderdoc/driver/vulkan/vk_resources.cpp

+
+void QueryPoolInfo::Reset(uint32_t firstQuery, uint32_t queryCount)
+{
+  m_Entries.removeIf([&](const auto &entry) {


Please avoid auto except where necessary for lambdas or extremely ugly for std iterators, you can use Entry here.

Also please use explicit lambda capture rather than unconditional capture to be clear about what is stored in the lambda.

baldurk · 2024-09-30T15:21:30Z

renderdoc/driver/vulkan/vk_resources.cpp

+}
+
+void QueryPoolInfo::Replace(uint32_t firstQuery, uint32_t queryCount,
+                            const std::function<void(Entry)> &writeEntry) const


Is there a reason for this function to be separated out? It looks like it's only called from the other Replace() to apply any replacements so could be done inside that function? That should allow some simplification and ease of reading since it took me a couple of passes to understand the code flow with the functions being named the same.

baldurk · 2024-09-30T15:32:16Z

renderdoc/driver/vulkan/vk_resources.h

+  void Replace(uint32_t firstQuery, uint32_t queryCount,
+               const std::function<void(Entry)> &writeEntry) const;
+
+  rdcarray<Entry> m_Entries;


Thinking about this more generally, query pools are single-typed so we know at creation time whether we're either going to replace all results or none of them (when the query pool type is compacted size).

Do you have any empirical data or intuition on how large those query pools are? I would assume either 1, some small fixed count (UE5 seems to use 64, configurable), or maybe the application's expected max number of BLASs (some thousands?).

Unless that seems unreasonable or you're otherwise worried about having a huge number of possible queries with only a few actually being valid, I think this code could be simplified a good deal by a) only allocating this struct when needed for patching and b) pre-allocating all the entries up front. Then you don't have to worry about intersecting, resizing this array on the fly, etc. We just read all our values out of it.

baldurk · 2024-09-30T15:45:36Z

renderdoc/driver/vulkan/wrappers/vk_cmd_funcs.cpp

+    for(uint32_t i = 0; i < accelerationStructureCount; ++i)
+      sizes.push_back(GetRecord(pAccelerationStructures[i])->memSize);
+
+    qpInfo->Add(firstQuery, std::move(sizes));


Unfortunately I've realised seeing this that there is an ordering/timeline constraint that I hadn't considered with this data. Because the properties write to a query and then the queries are resolved separately, these commands can be recorded and submitted out of order so we can't update a single authoritative fake query pool on the CPU timeline and update from there.

There may be a clever solution to this to resolve it on the CPU by deferring query updates and keeping multiple versions around, but I think I'd say we should solve it on the GPU to keep things simpler:

Create an internal buffer backing the query pool (ie. what the driver probably does on most GPUs) of equal size to its full set of results. It should probably be readback memory to be CPU-accessible because of 4 below. In theory in future we could have two buffers kept up to date in parallel, one for CPU reads and one for GPU reads, but I doubt that will be a problem.

This vkCmdWriteAccelerationStructuresPropertiesKHR writes the data via vkUpdateBuffer into that backing buffer - again effectively in parallel to what the driver is doing.

In vkCmdCopyQueryPoolResults we do a plain vkCmdCopyBuffer from backing buffer to user's results buffer.

In vkGetQueryPoolResults we CPU-read from the backing buffer.

It's a little bit uglier/more work than tracking purely on the CPU side but it keeps the timeline neatly in sync without having to mirror it with submission tracking and CPU-side versioning of data.

baldurk · 2024-09-30T15:46:37Z

renderdoc/driver/vulkan/wrappers/vk_misc_funcs.cpp

@@ -1699,6 +1705,8 @@ void WrappedVulkan::vkResetQueryPool(VkDevice device, VkQueryPool queryPool, uin
  SERIALISE_TIME_CALL(
      ObjDisp(device)->ResetQueryPool(Unwrap(device), Unwrap(queryPool), firstQuery, queryCount));

+  GetRecord(queryPool)->queryPoolInfo->Reset(firstQuery, queryCount);


This also misses handling of CmdResetQueryPool but I think with the other comments handling reset may not be needed at all.

baldurk · 2024-09-30T15:54:31Z

renderdoc/driver/vulkan/vk_acceleration_structure.cpp

+
+        if(instanceInfo.arrayOfPointers)
+          return RDResult(ResultCode::InternalError,
+                          "AS instance build arrayOfPointers unsupported");


FYI there is a helper RETURN_ERROR_RESULT which can do this and also print the RDCERR for you, without the calling code needing to do that.

baldurk requested changes Sep 10, 2024

View reviewed changes

cmannett85-arm force-pushed the as_copy_input_buffers branch 2 times, most recently from 1d1d022 to eb504cf Compare September 14, 2024 15:26

cmannett85-arm force-pushed the as_copy_input_buffers branch from eb504cf to 8a9cf5a Compare September 22, 2024 09:03

cmannett85-arm force-pushed the as_copy_input_buffers branch from 8a9cf5a to 6aa2935 Compare September 24, 2024 11:37

cmannett85-arm requested a review from baldurk September 24, 2024 11:38

cmannett85-arm force-pushed the as_copy_input_buffers branch 2 times, most recently from e5a7ce2 to 029c273 Compare September 24, 2024 12:18

Replace queried AS compaction sizes with full size

f53a115

The spec does not mandate that the compaction size is the same between capture and replay, so we effectively disable it by always returning the full AS size when the compaction size is queried.

cmannett85-arm force-pushed the as_copy_input_buffers branch from 029c273 to f53a115 Compare September 24, 2024 12:26

baldurk requested changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan AS rebuild-on-replay: Copy AS input buffers on build #3426

Vulkan AS rebuild-on-replay: Copy AS input buffers on build #3426

cmannett85-arm commented Sep 7, 2024

baldurk left a comment

baldurk left a comment

baldurk Sep 30, 2024

baldurk Sep 30, 2024

baldurk Sep 30, 2024

baldurk Sep 30, 2024

baldurk Sep 30, 2024

baldurk Sep 30, 2024

baldurk Sep 30, 2024

Vulkan AS rebuild-on-replay: Copy AS input buffers on build #3426

Are you sure you want to change the base?

Vulkan AS rebuild-on-replay: Copy AS input buffers on build #3426

Conversation

cmannett85-arm commented Sep 7, 2024

baldurk left a comment

Choose a reason for hiding this comment

baldurk left a comment

Choose a reason for hiding this comment

baldurk Sep 30, 2024

Choose a reason for hiding this comment

baldurk Sep 30, 2024

Choose a reason for hiding this comment

baldurk Sep 30, 2024

Choose a reason for hiding this comment

baldurk Sep 30, 2024

Choose a reason for hiding this comment

baldurk Sep 30, 2024

Choose a reason for hiding this comment

baldurk Sep 30, 2024

Choose a reason for hiding this comment

baldurk Sep 30, 2024

Choose a reason for hiding this comment