-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan AS rebuild-on-replay: Copy AS input buffers on build #3426
base: v1.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general note since I can't put this on the specific line, remember to disable the indirect build physical device feature.
It's fine not to support indirect builds as it seems few drivers do as well and I don't think anyone will use it given the feature is not supported in D3D12, but I think it should be technically possible as it looks like we get bounds on the data size on the CPU? The main challenge will be doing the copy, might have to fall back to a compute shader or something and do it inefficiently.
1d1d022
to
eb504cf
Compare
eb504cf
to
8a9cf5a
Compare
When vkCmdBuildAccelerationStructuresKHR is called, the input buffers used for the build are copied into host-accessible memory, so when initial state serialisation is triggers it can be copied into the capture file. To unify the behaviour of the different geometry types, there is only one readback buffer per AS, and in the case of triangle data it can contain the vertex, index, and transform data concatenated into a single contiguous block. The buffer data along with build arguments are stored in a ref-counted VkAccelerationStructureInfo object held by the AS's VkResourceRecord. Once the info object is destroyed, then the copied buffer mem is also freed. Serialisation of the initial state will follow in later patches. Caveats: * vkCmdBuildAccelerationStructuresIndirectKHR is not handled * VkAccelerationStructureGeometryInstancesDataKHR::arrayOfPointers mode is not handled
8a9cf5a
to
6aa2935
Compare
e5a7ce2
to
029c273
Compare
The spec does not mandate that the compaction size is the same between capture and replay, so we effectively disable it by always returning the full AS size when the compaction size is queried.
029c273
to
f53a115
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input buffer copying looks good. I've made some notes about the query pool handling though - there's one wrinkle I hadn't thought of which makes it more complex, but hopefully some ways it can be simplified so it's not too bad overall.
RDResult CopyInputBuffers(VkCommandBuffer commandBuffer, | ||
const VkAccelerationStructureBuildGeometryInfoKHR &info, | ||
const VkAccelerationStructureBuildRangeInfoKHR *buildRange, | ||
CaptureState state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This state seems redundant here - we don't do this during replay, and we don't need to differentiate between background and active capturing. Doing a quick look through unless I'm missing something this value is (ultimately) not used.
Same applies below to CopyAccelerationStructure
.
|
||
void QueryPoolInfo::Reset(uint32_t firstQuery, uint32_t queryCount) | ||
{ | ||
m_Entries.removeIf([&](const auto &entry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid auto
except where necessary for lambdas or extremely ugly for std iterators, you can use Entry
here.
Also please use explicit lambda capture rather than unconditional capture to be clear about what is stored in the lambda.
} | ||
|
||
void QueryPoolInfo::Replace(uint32_t firstQuery, uint32_t queryCount, | ||
const std::function<void(Entry)> &writeEntry) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this function to be separated out? It looks like it's only called from the other Replace()
to apply any replacements so could be done inside that function? That should allow some simplification and ease of reading since it took me a couple of passes to understand the code flow with the functions being named the same.
void Replace(uint32_t firstQuery, uint32_t queryCount, | ||
const std::function<void(Entry)> &writeEntry) const; | ||
|
||
rdcarray<Entry> m_Entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more generally, query pools are single-typed so we know at creation time whether we're either going to replace all results or none of them (when the query pool type is compacted size).
Do you have any empirical data or intuition on how large those query pools are? I would assume either 1, some small fixed count (UE5 seems to use 64, configurable), or maybe the application's expected max number of BLASs (some thousands?).
Unless that seems unreasonable or you're otherwise worried about having a huge number of possible queries with only a few actually being valid, I think this code could be simplified a good deal by a) only allocating this struct when needed for patching and b) pre-allocating all the entries up front. Then you don't have to worry about intersecting, resizing this array on the fly, etc. We just read all our values out of it.
for(uint32_t i = 0; i < accelerationStructureCount; ++i) | ||
sizes.push_back(GetRecord(pAccelerationStructures[i])->memSize); | ||
|
||
qpInfo->Add(firstQuery, std::move(sizes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I've realised seeing this that there is an ordering/timeline constraint that I hadn't considered with this data. Because the properties write to a query and then the queries are resolved separately, these commands can be recorded and submitted out of order so we can't update a single authoritative fake query pool on the CPU timeline and update from there.
There may be a clever solution to this to resolve it on the CPU by deferring query updates and keeping multiple versions around, but I think I'd say we should solve it on the GPU to keep things simpler:
- Create an internal buffer backing the query pool (ie. what the driver probably does on most GPUs) of equal size to its full set of results. It should probably be readback memory to be CPU-accessible because of 4 below. In theory in future we could have two buffers kept up to date in parallel, one for CPU reads and one for GPU reads, but I doubt that will be a problem.
- This
vkCmdWriteAccelerationStructuresPropertiesKHR
writes the data viavkUpdateBuffer
into that backing buffer - again effectively in parallel to what the driver is doing. - In
vkCmdCopyQueryPoolResults
we do a plainvkCmdCopyBuffer
from backing buffer to user's results buffer. - In
vkGetQueryPoolResults
we CPU-read from the backing buffer.
It's a little bit uglier/more work than tracking purely on the CPU side but it keeps the timeline neatly in sync without having to mirror it with submission tracking and CPU-side versioning of data.
@@ -1699,6 +1705,8 @@ void WrappedVulkan::vkResetQueryPool(VkDevice device, VkQueryPool queryPool, uin | |||
SERIALISE_TIME_CALL( | |||
ObjDisp(device)->ResetQueryPool(Unwrap(device), Unwrap(queryPool), firstQuery, queryCount)); | |||
|
|||
GetRecord(queryPool)->queryPoolInfo->Reset(firstQuery, queryCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also misses handling of CmdResetQueryPool
but I think with the other comments handling reset may not be needed at all.
|
||
if(instanceInfo.arrayOfPointers) | ||
return RDResult(ResultCode::InternalError, | ||
"AS instance build arrayOfPointers unsupported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI there is a helper RETURN_ERROR_RESULT
which can do this and also print the RDCERR
for you, without the calling code needing to do that.
When
vkCmdBuildAccelerationStructuresKHR
is called, the input buffers used for the build are copied into host-accessible memory, so when initial state serialisation is triggered it can be copied into the capture file.To unify the behaviour of the different geometry types, there is only one readback buffer per AS, and in the case of triangle data it can contain the vertex, index, and transform data concatenated into a single contiguous block.
The buffer data along with build arguments are stored in a
VkAccelerationStructureInfo
object owned by the AS'sVkResourceRecord
. If this record is freed, then the copied buffer mem is also freed. If this destruction occurs during the active capture phase, the AS info is kept alive until the end of the active capture.Serialisation of the initial state will follow in later patches.
Caveats:
vkCmdBuildAccelerationStructuresIndirectKHR
is not handledVkAccelerationStructureGeometryInstancesDataKHR::arrayOfPointers
mode is not handled