Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all the uniforms #346

Merged
merged 6 commits into from
Jan 23, 2020
Merged

Remove all the uniforms #346

merged 6 commits into from
Jan 23, 2020

Conversation

zakorgy
Copy link

@zakorgy zakorgy commented Jan 12, 2020

This adresses #345
WIP because these changes are not tested on macOS yet.


This change is Reviewable

@zakorgy zakorgy requested a review from kvark January 12, 2020 19:15
@@ -152,13 +151,12 @@ impl Locals {
impl Hash for Locals {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to cache it any more: just upload the new data for every target.

@@ -152,13 +151,12 @@ impl Locals {
impl Hash for Locals {
fn hash<H: Hasher>(&self, state: &mut H) {
self.transform_as_u32_slice().hash(state);
self.uMode.hash(state);
}
}

impl PartialEq for Locals {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be needed either

@@ -1373,9 +1339,7 @@ impl<B: hal::Backend> Device<B> {
if self.bound_locals.uTransform != projection {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this caching of the transform needs to go: instead we'll just need to update the data in a constant buffer that is bound per render target

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the caching in 215e71b

self.bind_uniforms(new_locals);
}
self.bound_locals = new_locals;
info!("Gfx backend has uMode uniform baked into the pipeline");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be too much spam, wouldn't it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to upstream a change where the uMode is passed to program binding, which will help this code

..(SPECIALIZATION_CONSTANT_SIZE * (i + 1)) as _,
.map(|(i, mode)| {
specialization_data.push(vec![0; (SPECIALIZATION_CONSTANT_COUNT - 1) * SPECIALIZATION_CONSTANT_SIZE]);
let mut constants: ArrayVec<[SpecializationConstant; SPECIALIZATION_CONSTANT_COUNT]> = SPECIALIZATION_FEATURES
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can omit the type of a element here, i.e . ArrayVec<[_; SPECIALIZATION_CONSTANT_COUNT]>

@zakorgy zakorgy requested a review from kvark January 16, 2020 15:11
@zakorgy zakorgy changed the title WIP Remove all the uniforms Remove all the uniforms Jan 16, 2020
Copy link

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to do another full review pass after the current concerns are addressed

\t\t// an operation mode for this batch.\n\
\t\tint uMode;\n\
\t} pushConstants;\n",
);
new_data.push_str(&format!(
"\tlayout(set = {}, binding = 0) uniform Locals {{\n\
\t\tuniform mat4 uTransform; // Orthographic projection\n\
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we still having the transform here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a misunderstanding here, looks like I didn't payed attention what you mentioned in #345

WebRender has it associated with the target, so we can safely move it into a constant buffer within the per-target descriptor set.

Unfortunately we don't have descriptor sets on a per target basis, we have the following groups: global (for each rendering loop), per WR pass and per draw. (The per-target variant would fit somewhere between second and third one.) That's the reason why I kept the uniform block around. If you think that getting rid of the uniform block and introducing a new descriptor binding with a new set index which points to a buffer could benefit us, I can make those changes as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely just adopt the "per pass" descriptor set and increase its frequency just a bit to bind whenever the projection happens. This is still relatively rare, i.e. dozens per frame, not thousands.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's sounds totally reasonable. Also I will add a more descriptive name to it.

buffer_usage: hal::buffer::Usage,
data_stride: usize,
buffers: ArrayVec<[Buffer<B>; MAX_FRAME_COUNT]>,
offsets: ArrayVec<[usize; MAX_FRAME_COUNT]>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we combine these to into (Buffer<B>, BufferAddress) pairs?

self.offset = 0;
pub(super) fn buffer(&self, next_id: usize) -> (&B::Buffer, u64) {
let ref buffer = self.buffers[next_id];
(&buffer.buffer, ((self.offsets[next_id] - 1) * buffer.stride) as u64)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we subtracting 1 here?

bound: bool,
}

struct DecsriptorBundle<B: hal::Backend> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we are having this descriptor re-usal infrastructure in this PR.
As we discussed on the call, this is supposed to be a single descriptor with dynamic offset (for the uniform buffer that carries out the projection matrix), which is just re-used constantly for many passes.

@kvark
Copy link

kvark commented Jan 16, 2020 via email

@zakorgy zakorgy force-pushed the remove-uniforms branch 3 times, most recently from 9d539c4 to 58e78c1 Compare January 19, 2020 13:43
Copy link

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need another pass at this :)

\t\tuniform mat4 uTransform; // Orthographic projection\n\
\t\t// A generic uniform that shaders can optionally use to configure\n\
\t\t// an operation mode for this batch.\n\
\t\tuniform int uMode;\n\
\t}};\n",
DESCRIPTOR_SET_LOCALS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this parameter for?

Copy link
Author

@zakorgy zakorgy Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -652,8 +665,8 @@ impl<B: hal::Backend> Program<B> {
.into_iter()
.chain(iter::once(desc_set_per_frame))
.chain(iter::once(desc_set_per_draw))
.chain(desc_set_locals),
&[],
.chain(iter::once(desc_set_locals)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't right. The descriptor set carrying projection matrix as well as other per-target parameters should have lower index than the per-draw one, since it changes at less frequency

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the ordering in d8716f7

@@ -652,8 +665,8 @@ impl<B: hal::Backend> Program<B> {
.into_iter()
.chain(iter::once(desc_set_per_frame))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we calling this for every draw call? we definitely shouldn't :)

non_coherent_atom_size_mask: u64,
) {
let _ = self.buffer.update(device, data, self.buffer_offset as usize, non_coherent_atom_size_mask);
self.buffer_offset += 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we

// Maximum number of bound projections per frame.
// Projections are bound on a per render target basis, so those should fit in the 96 limit.
// We need this limit to define a fixed size for our uniform buffers.
pub const PROJECTION_PER_FRAME: usize = 96;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we checking if we hit this limit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1309,6 +1328,7 @@ impl<B: hal::Backend> Device<B> {
&[self.bound_projection],
self.next_id
);
self.rebind_descriptors = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need set_uniforms function? We know when we start working on the new target, so we can just unconditionally update the projection uniforms and rebind the descriptor, instead of doing it lazily here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't , I will change according to that.

@@ -1331,6 +1351,7 @@ impl<B: hal::Backend> Device<B> {
bound_sampler[1] = per_draw_bindings.1[1];
bound_sampler[2] = per_draw_bindings.1[2];

self.rebind_descriptors = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this assignment here?

&[dynamic_offset],
);
if rebind_descriptors {
cmd_buffer.bind_graphics_descriptor_sets(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we are still deferring to much work till this function. We need to aim for it to be just pass.draw() basically, reducing as much cruft as possible.

For descriptor binding, can't we just do it higher level? I.e. the start of a frame binds its descriptor set and keeps it forever. The start of a target binds its descriptor sets and not touches it. When we break a batch, we can bind the per-draw descriptor set (even unconditionally).

P.S. also, aren't index buffers always the same? we should be just binding it once for the whole frame, ideally, instead of rebinding here

Copy link
Author

@zakorgy zakorgy Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For descriptor binding, can't we just do it higher level? I.e. the start of a frame binds its descriptor set and keeps it forever. The start of a target binds its descriptor sets and not touches it. When we break a batch, we can bind the per-draw descriptor set (even unconditionally).

I think this would be only applicable if we only had one pipeline layout, but currently we have three. I tried to implement it, but things broke when I try to bind the per frame graphics descriptor sets (for descriptor slot 0), because those are different for each layout, and if we bind three different descriptor sets to the free different layouts first slot only the third one remains active and I get an error for the other two kind of pipeline layouts:

UNASSIGNED-CoreValidation-DrawState-DescriptorSetNotBound(ERROR / SPEC): msgNum: 0 - VkPipeline 0x27d uses set #0 but that set is not bound.
    Objects: 1
        [0] 0x22486f180c8, type: 6, name: NULL

On the other hand having only one pipeline layout would allow us to make this change also reduces the overhead on descriptor set lookup. The drawback of this would be that we would have to touch the shaders again to move those bindings in a shared shader part. Also I don't know the overhead having those in the shaders, even though they are unused.
I can make this change but I think that would be out of the scope of this PR. I will add this to the list in #335.

P.S. also, aren't index buffers always the same? we should be just binding it once for the whole frame, ideally, instead of rebinding here

Those are only related to debug shaders, so those don't have impact on the test benchmarking results. I will create an issue for this to not forget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed clarification!
When exactly do we have a switch of the pipeline layouts? I recall, for example, that you have a pipeline layout for the clip shader kind. I'd expect them to be together in the alpha pass. Same for the brush ones.
If we do switch the pipeline layout from time to time, we only need to rebind the descriptor sets starting from the first incompatible descriptor set layout. I.e. you can have the same descriptor set layout for slot [0] of all the used pipeline layouts and never need to re-bind it on a pipeline change.
So it still seems hazardly wrong to try rebinding all the sets on each batch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the incompatible descriptor sets are int the first two slots. That's why I mentioned to change the shaders to have a common pipeline layout.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we'd only change them on the pipeline layout change, which is still not expected to happen often.

@zakorgy
Copy link
Author

zakorgy commented Jan 22, 2020

@kvark I've removed the descriptor binding related changes (I will make that in a different PR, updated #335 for that), added the change to set the uniforms on a per render target basis. I will squash the commits after the review.

@zakorgy zakorgy requested a review from kvark January 22, 2020 09:23
@@ -1300,16 +1300,8 @@ impl<B: hal::Backend> Device<B> {
pub fn set_uniforms(
&mut self,
_program_id: &ProgramId,
projection: &default::Transform3D<f32>,
_projection: &default::Transform3D<f32>,
) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could do some assert_eq!(mCurrentProjection, projection) just to be sure

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kvark Added it in 776540a

Copy link

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks!
So we are leaving the binding refactor as a follow-up, right?

@zakorgy
Copy link
Author

zakorgy commented Jan 22, 2020

That's right.

@zakorgy zakorgy merged commit 633396f into szeged:master Jan 23, 2020
@kvark kvark mentioned this pull request Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants