Add JS bindings for clusterizer API #738

JolifantoBambla · 2024-08-18T16:05:18Z

Hi,

This adds JS bindings for the clusterizer API (meshopt_buildMeshlets, meshopt_optimizeMeshlet, etc.).

Here's how the JS API works (copied from my changes to js/README.md):

Clusterizer

MeshoptClusterizer (meshopt_clusterizer.js) implements meshlet generation and optimization.

To split a triangle mesh into clusters, this library provides two algorithms - buildMeshletsScan, which creates the meshlet data using a vertex cache-optimized index buffer as a starting point by greedily aggregating consecutive triangles until they go over the meshlet limits, and buildMeshlets, which doesn't depend on any other algorithms and tries to balance topological efficiency (by maximizing vertex reuse inside meshlets) with culling efficiency.

buildMeshlets(indices: Uint32Array, vertex_positions: Float32Array, vertex_positions_stride: number, max_vertices?: number, max_triangles?: number, cone_weight?: number, max_meshlets?: number, index_byte_size?: number) => Meshlets;

buildMeshletsScan(indices: Uint32Array, vertex_count: number, max_vertices?: number, max_triangles?: number, cone_weight?: number, max_meshlets?: number, index_byte_size?: number) => Meshlets;

The number of triangles and number of vertices per meshlet can be limited with both algorithms using the optional max_triangles and max_vertices parameters. If not set, they default to the maximum supported number of vertices (255) and triangles (512).

The buildMeshlets algorithm uses position data stored in a strided array; vertex_positions_stride represents the distance between subsequent positions in Float32 units.

Additionally, if cluster cone culling is to be used, buildMeshlets allows specifying a cone_weight as a value between 0 and 1 to balance culling efficiency with other forms of culling. By default, cone_weight is set to 0.

Both algorithms return a Meshlets object, a helper object to further process meshlets. At its core, a Meshlets object is just a wrapper around the typed arrays containing the meshlet data:

console.log(meshlets.meshlets); // prints the Uint32Array containing the meshlet data, i.e., the indices into the vertices and triangles array
console.log(meshlets.vertices); // prints the Uint32Array containing the indices into the original meshes vertices
console.log(meshlets.triangles); // prints the Uint8Array containing the indices into the verices array.

To optimize meshlets for better triangle and vertex locality, optimize can be called directly on a Meshlets instance:

meshlets.optimize() => Meshlets;

After generating the meshlet data, it's also possible to generate extra culling data for each meshlet and populate a bounds array within the Meshlets instance:

meshlets.computeBounds(vertex_positions: Float32Array, vertex_positions_stride: number) => Meshlets;

console.log(meshlets.bounds); // prints the Uint8Array containing the meshlet bounds data

Meshlet genration and optimization and culling data generation can be chained as well:

const meshlets = buildMeshlets(indices, vertex_positions, vertex_positions_stride)
    .optimize()
    .computeBounds(vertex_positions, vertex_positions_stride);

To work with individual meshlets, Meshlets objects expose an iterator and support the iteratable protocol to iterate over the individual meshlets. Each meshlet is an instance of Meshlet, a wrapper around the corresponding subarrays within the owning Meshlets instance:

// print all meshlets
for (const m of meshlets) {
    // m is a Meshlet
    console.log(m.vertices, m.triangles);
}

// copy all meshlets into a meshlet array and print them
console.log([...meshlets]);

// print 2 meshlets starting from index 3
for (const m of meshlets.iterator(3, 5)) {
    console.log(m);
}

In environments that support the experimental Iterator prototype methods (forEach, map, reduce, etc.) can be used on the iterator returned by meshlets.iterator() as well.
Using Iterator prototype methods in Typescript requires casting to a Meshlet array:

(meshlets.iterator(3, 5) as unknown as Meshlet[]).forEach(console.log);

However, be aware that while Meshlets is iterable, is not an actual array and does not support indexing using the [] operator. Instead, use the get method:

console.log(meshlets.get(0));

Instead of optimizing or computing bounds for all meshlets, Meshlet objects also support processing each meshlet individually. Both operations are chainable:

const meshlet = meshlets.get(0).optimize().computeBounds(vertex_positions, vertex_positions_stride);

After populating a meshlet's bounds they can be inspected through individual MeshletBounds instances, which are again wrappers around the underlying subarray in the ownining Meshlets object:

const bounds = meshlets.get(0).bounds;
console.log(bounds.center, bounds.radius);

Alternatively, MeshoptClusterizer also exposes a low level API for each function:

optimizeMeshlet(meshlet_vertices: Uint32Array, meshlet_triangles: Uint8Array);
computeClusterBounds(indices: Uint32Array, vertex_positions: Float32Array, vertex_positions_stride: number): MeshletBounds
computeMeshletBounds(meshlet_vertices: Uint32Array, meshlet_triangles: Uint8Array, vertex_positions: Float32Array, vertex_positions_stride: number): MeshletBounds

zeux · 2024-08-18T17:33:54Z

Thanks for the PR! I like the idea of a new clusterizer JS module. However I'd like to see an attempt to have a very minimal and functional (as in not object oriented) interface that matches the rest of JS bindings more closely. This would be my initial view based on the README changes:

I'd start without buildMeshletsScan. It's possible to add in the future if requested, but buildMeshlets is the default & recommended algorithm
I'd start with optimizeMeshlet being implicitly called by buildMeshlets for each meshlet. This should have a reasonable cost, it simplifies the interface, and it's the same approach that Rust bindings currently take. Always possible to revisit in the future but that would also make the buildMeshlets API purely functional.
I'd try to get by without any class wrappers. No other part of the bindings uses it; I get that it can be helpful / convenient but at least initially we should expose the minimal possible interface that gets the job done. Maybe the iterator wrapper can be part of the test file if that makes tests easier to write.
max_vertices/max_triangles should not be optional; these are critical to reason about wrt the rendering data flow, and 255/512 is not a great default (nor would I suggest a different generic default! these are situational)

So in my ideal world we'd shoot for:

buildMeshlets that returns packed meshlet buffers plus an array of Meshlet objects
computeMeshletBounds that, given the information from buildMeshlets, returns an array of Bounds objects

... as far as the initial API goes. I'm not sure if buildMeshlets should return the actual meshlet data as a packed Uint32 array or as a JS array of objects - it would be interesting to benchmark both on v8, as this is a place where maybe using JS objects is a reasonable efficiency vs usability compromise. The interface would be clean and coherent if both buildMeshlet and computeMeshletBounds return an array of JS objects.

I'd be also fine with exposing computeClusterBounds if it is useful.

JolifantoBambla · 2024-08-18T19:51:38Z

Thanks for the quick response!

I'd start without buildMeshletsScan.

Yes, makes sense.

I'd start with optimizeMeshlet being implicitly called by buildMeshlets for each meshlet.

Sounds good. I'll change the behavior.

max_vertices/max_triangles should not be optional;

Totally! I'll remove the default values.

I'd try to get by without any class wrappers.

Yeah, that makes sense. I wasn't really sure if I should map the three buffers (meshlets, vertices, triangles) to an array of JS objects or just return the three buffers. I ended up with this hybrid wrapper object that holds the three packed buffers but behaves like an array of JS objects via its accessors. The benefit here is that the data is in one place and can easily be written to WebGPU buffers to inspect the results (my personal use case) while at the same time it's easy to treat the meshlets as JS objects on the host side. Now that I've looked it up, that's actually the same approach the Rust bindings take.

I definitely agree that the optimize & computeBounds methods really don't make sense with the rest of the API though. Also having the bounds as an optional array in the Meshlets object that needs to be initialized and populated by an extra function call on the object is a bit ugly.

How about I remove those methods & the bounds array but keep the Meshlets & Meshlet wrappers for now?

In the meantime, I'll remove the following functions + tests...

buildMeshletsBound (called implicitly by buildMeshlets)
buildMeshletsScan (maybe reintroduce in the future)
optimizeMeshlet (called implicitly by buildMeshlets)
Meshlets.optimize
Meshlets.computeBounds
Meshlets.bounds
Meshlet.optimize
Meshlet.computeBounds
Meshlet.bounds

...and let computeMeshletBounds return a JS object that is not backed by an Uint8Array storing bounds for all meshlets.

js/meshopt_clusterizer.js

JolifantoBambla · 2024-08-19T12:38:50Z

I incorporated the feedback. The JS API now looks like this (copied from js/README.md):

Clusterizer

MeshoptClusterizer (meshopt_clusterizer.js) implements meshlet generation and optimization.

To split a triangle mesh into clusters, call buildMeshlets, which tries to balance topological efficiency (by maximizing vertex reuse inside meshlets) with culling efficiency.

buildMeshlets(indices: Uint32Array, vertex_positions: Float32Array, vertex_positions_stride: number, max_vertices: number, max_triangles: number, cone_weight?: number, index_byte_size?: number) => Meshlet[];

The algorithm uses position data stored in a strided array; vertex_positions_stride represents the distance between subsequent positions in Float32 units.

The maximum number of triangles and number of vertices per meshlet can be controlled via max_triangles and max_vertices parameters. However, max_vertices must not be greater than 255 and max_triangles must not be greater than 512.

Additionally, if cluster cone culling is to be used, buildMeshlets allows specifying a cone_weight as a value between 0 and 1 to balance culling efficiency with other forms of culling. By default, cone_weight is set to 0.

All meshlets are implicitly optimized for better triangle and vertex locality by buildMeshlets.

The algorithm returns an array of Meshlet objects:

const meshlets = MeshoptClusterizer.buildMeshlets(indices, positions, stride, /* args */);
console.log(meshlets[0].vertices);  // prints the packed Uint32Array of the first meshlet's vertex indices, i.e., indices into the original meshes vertex buffer
console.log(meshlets[0].triangles); // prints the packed Uint8Array of the first meshlet's indices into its own vertices array

A meshlet's vertices and triangles arrays are backed by an internal MeshletBuffers object storing the raw data of all meshlets in packed buffers:

console.log(meshlets[0].buffers.meshlets);      // prints the raw packed Uint32Array containing the meshlet data, i.e., the indices into the vertices and triangles array
console.log(meshlets[0].buffers.vertices);      // prints the raw packed Uint32Array containing the indices into the original meshes vertices
console.log(meshlets[0].buffers.triangles);     // prints the raw packed Uint8Array containing the indices into the verices array.
console.log(meshlets[0].buffers.meshletCount);  // prints the number of meshlets - this is not the same as meshlet[0].buffers.meshlets.length because each meshlet consists of 4 unsigned 32-bit integers

// all meshlets are also accessible through the packed buffers
console.log(meshlets[0].buffers.getMeshlet(0).vertices[0] === meshlets[0].vertices[0]) // prints true

After generating the meshlet data, it's also possible to generate extra culling data for one or more meshlets:

computeMeshletBounds(meshlets: Meshlet | Meshlet[], vertex_positions: Float32Array, vertex_positions_stride: number) => Bounds | Bounds[];

If more than one meshlet is passed to computeMeshletBounds, the algorithm returns an array of Bounds. Otherwise, a single Bounds object is returned.

If bounds are to be computed for more than one meshlet, it might be more efficient to call computeMeshletBounds once with an array of Meshlet objects instead of calling it for each Meshlet individually, since vertex data only has to be copied to the WebAssembly heap once.

const meshlets = MeshoptClusterizer.buildMeshlets(indices, positions, stride, /* args */);
const bounds = MeshoptClusterizer.computeClusterBounds(meshlets, positions, stride);
console.log(bounds[0].center);          // prints the center of the first meshlet's bounding sphere
console.log(bounds[0].radius);          // prints the radius of the first meshlet's bounding sphere
console.log(bounds[0].coneApex);        // prints the apex of the first meshlet's normal cone
console.log(bounds[0].coneAxis);        // prints the axis of the first meshlet's normal cone
console.log(bounds[0].coneCutoff);      // prins the cutoff angle of the first meshlet's normal cone
console.log(bounds[0].coneAxisS8);      // prints the axis of the first meshlet's normal cone in 8-bit SNORM format
console.log(bounds[0].coneCutoffS8);    // prints the cutoff angle of the first meshlet's normal cone in 8-bit SNORM format

It is also possible to compute bounds of a vertex cluster that is not generated by MeshoptClusterizer using computeClusterBounds. Like buildMeshlets, this algorithm takes vertex indices and a strided vertex positions array with a vertex stride in Float32 units as input.

computeClusterBounds: (indices: Uint32Array, vertex_positions: Float32Array, vertex_positions_stride: number, index_byte_size?: number) => Bounds;

Makefile

js/meshopt_clusterizer.js

js/meshopt_clusterizer.module.d.ts

zeux · 2024-08-21T15:27:05Z

Two more interface comments, hopefully last:

I don't think we need index_byte_size arguments? Elsewhere in JS code we just detect the type of the index array dynamically.
Would it make sense for buildMeshlets to just return MeshletBuffers? Individual meshlets can be extracted in a loop; if the caller is not interested in segmented data they can upload the buffers directly without the cost of slicing the buffers and creating a bunch of smaller ones.

I need to do a final code review pass as well but this is looking close. I might merge this as is after interface changes above, not sure. Before this, can you also rebase this into separate commits, for example 1) implementation, 2) tests, 3) github actions changes, 4) documentation? This is large enough that I don't want to just squash-merge the whole PR; this will also make it easier for me to do a final code pass in the PR itself.

JolifantoBambla · 2024-08-21T16:27:05Z

I don't think we need index_byte_size arguments? Elsewhere in JS code we just detect the type of the index array dynamically.

True, I removed them now. I initially added the index size because of the MeshoptEncoder api.

Would it make sense for buildMeshlets to just return MeshletBuffers? Individual meshlets can be extracted in a loop; if the caller is not interested in segmented data they can upload the buffers directly without the cost of slicing the buffers and creating a bunch of smaller ones.

I think so, yeah. Changed it now. However, currently, extractMeshlet doesn't copy the data from the buffers but returns subarrays - so still backed by the same memory. Would you prefer copies instead?

I need to do a final code review pass as well but this is looking close. I might merge this as is after interface changes above, not sure. Before this, can you also rebase this into separate commits, for example 1) implementation, 2) tests, 3) github actions changes, 4) documentation? This is large enough that I don't want to just squash-merge the whole PR; this will also make it easier for me to do a final code pass in the PR itself.

Awesome! I gotta run now. I'll see if I get to rebasing the commits later today. Otherwise I'll do it tomorrow.

JolifantoBambla · 2024-08-21T17:29:27Z

Would it make sense for buildMeshlets to just return MeshletBuffers? Individual meshlets can be extracted in a loop; if the caller is not interested in segmented data they can upload the buffers directly without the cost of slicing the buffers and creating a bunch of smaller ones.

Should computeMeshletBounds then also return a buffer and MeshoptClusterizer expose an extractBounds function? Or should it stay the way it is?

zeux · 2024-08-21T17:32:04Z

I would probably keep computeMeshletBounds as is: there should not be an expectation that a direct GPU upload is useful for this data, because different renderers will care about different subset of the data and will likely want to pack this together with some other metadata.

JolifantoBambla · 2024-08-21T18:33:04Z

Before this, can you also rebase this into separate commits, for example 1) implementation, 2) tests, 3) github actions changes, 4) documentation?

done

zeux · 2024-08-21T20:42:48Z

Thanks! This looks great. Implementation looks good I think, if I discover minor nits I can fix them post-merge. One change that I'd like to see before I merge this though:

During computeMeshletBounds, there's repeated reallocation / copying that I think is redundant. Because you are working with MeshletBuffers that stores all data contiguously, you can copy the buffers to Wasm heap and then just address them individually. It's a little more memory but most of the memory would be the position data, and it means you don't need any sbrk calls per meshlet, or even extractMeshlets. meshopt_computeMeshletBounds does not do its own allocations which I think is safe to rely on (so heap doesn't have to be adjusted)

Also the code that creates JS bounds object could be shared between computeMeshletBounds & computeClusterBounds but I can fix that post-merge as well.

Adds WASM bindings and a new JS API for the clusterizer API. The JS API consists of - buildMeshlets: generates mehslet data and implicitly optimizes the generated meshlets. Returns packed buffers containing raw meshlet data. - extractMeshlet: given buffers as returned by buildMeshlets and a meshlet index, returns a meshlet object containing a triangles and vertices array. - computeClusterBounds: computes bounds for cluster data not generated by buildMeshlets. Returns an object containing the computed bounding data, except for s8 compressed data. - computeMeshletBounds: given buffers as returned by buildMeshlets, computes bounds for all meshlets and returns the computed bounding data, except for s8 compressed data. Bumps the stack size of WASM modules to 36 kb because computeMeshletBounds and computeClusterBounds require more than the previously allocated 24kb

Adds tests for the new JS clusterizer API. All tests work on a cube with normal data for which 6 clusters - one for each face - are created. Bounds are validated by comparing each cube face's normal to the computed normal cone's axis.

Adds automated ES5 validation for the new JS clusterizer API. Adds automated tests for the new JS clusterizer API.

Documents the new JS clusterizer API in the JS readme.

JolifantoBambla · 2024-08-21T21:39:42Z

Thanks! This looks great. Implementation looks good I think, if I discover minor nits I can fix them post-merge.

Awesome! Thanks!

During computeMeshletBounds, there's repeated reallocation / copying that I think is redundant. Because you are working with MeshletBuffers that stores all data contiguously, you can copy the buffers to Wasm heap and then just address them individually.

done

Also the code that creates JS bounds object could be shared between computeMeshletBounds & computeClusterBounds

done

zeux · 2024-08-21T22:17:46Z

Thanks for the contribution and for quick iteration!

zeux reviewed Aug 19, 2024

View reviewed changes

js/meshopt_clusterizer.js Outdated Show resolved Hide resolved

zeux reviewed Aug 19, 2024

View reviewed changes

js/meshopt_clusterizer.js Outdated Show resolved Hide resolved