Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] WASM support #272

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

CGamesPlay
Copy link

@CGamesPlay CGamesPlay commented Jul 29, 2024

Hello, I've been experimenting with what it would take to add wasm support to py-tree-sitter and wanted to get some feedback. Is there interest in merging this feature with py-tree-sitter?

I've made the top-level API similar to what a user of wasmtime would expect it to look like, basically you just pass the wasm module bytes and the engine into a separate constructor for Language (Language.from_wasm). After that, the usage is identical to normal Language objects. I've modified Parser to automatically manage the wasm store object, since it doesn't have any configurable parameters.

My goal with this change is to keep wasmtime as an optional dependency. Wasmtime's binary wheels are about 5 MiB, whereas tree-sitter's are only 0.5 MiB, so for that reason alone it seems like this niche feature should be optional rather than mandatory. Supporting optional dependencies at compile-time is not supported, so in order to add support I have to create trampoline functions which get populated at runtime by pulling them out of wasmtime's ffi module (wasmtime.c handles this).

There isn't a good way to land this change until wasmtime v24 lands, because it isn't possible to clone a reference to an Engine before I added a method to do so. Without that ability, there can only be a single language per engine, and only a single parser per language.

So, if there is interest in getting this into the main distribution, I can help close out the remaining issues with this PR. If there isn't, I can fork the project and make a separate variant which supports the functionality. Thanks for your consideration!

In order to land this PR:

  • Wait for wasmtime v24 to release (on 2024-08-20)
  • Upgrade tree-sitter to wasmtime v24 (presently blocked, see build: bump deps tree-sitter#3428)
  • Update documentation to reflect the added methods (Language.from_wasm, Language.is_wasm)
  • Update CI to install wasmtime headers, or vendor them (note: only the headers are required, there is no linking modification)

@ObserverOfTime ObserverOfTime marked this pull request as draft July 29, 2024 12:36
@ObserverOfTime
Copy link
Member

I'll consider this one after completing the current TODOs.

To anyone interested in this PR, please react to it with 👍.

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
Add wasm support
@CGamesPlay
Copy link
Author

I've updated the PR to target wasmtime 25, so it is theoretically mergable now. Just for some process documentation, here's how I get the included tests to pass:

  1. Run pip install .[wasm] to get all the normal dependencies.
  2. Clone wasmtime at the same branch that tree_sitter/core/lib/Cargo.toml wants to use.
  3. Build the C API, using cargo build -p wasmtime-c-api, then install the headers to an appropriate place. For me, that's rsync -av target/debug/build/wasmtime-c-api-impl-*/out/include/ ../py-tree-sitter/.direnv/python-3.11/include/
  4. Build the native extension and test: python setup.py build_ext --inplace --force && python -m unittest tests.test_wasm

This isn't great. An even bigger problem that I'm seeing is that wasmtime is a fast-moving target. In the 2 versions released since I initially put up this PR, the API changed enough that the trampoline had to be updated. This means that there will be a potentially high maintenance overhead for this optional dependency if a fully automated solution can't be developed.

@ObserverOfTime
Copy link
Member

Yeah this seems like more trouble than it's worth. Do we really need all the trampoline functions?

@CGamesPlay
Copy link
Author

CGamesPlay commented Oct 23, 2024

So, one of the goals I had was to use wasmtime as an optional dependency, rather than a mandatory one. If we link the native code of py-tree-sitter against wasmtime the normal way, then it isn't going to be an optional dependency because loading the py-tree-sitter native library would fail with missing symbols, even if the user never intends to use the wasm features. So, we cannot link against wasmtime when building the native library, and instead have to use dlsym to populate a vtable with all of the symbols we need. This is what these trampolines are. The problem is that we need to keep the declarations of the trampoline functions in sync with the header files for wasmtime (and also need to ensure that the library loaded at runtime is ABI compatible with the header files we used).

So, in order to properly resolve this, I think we need a fully automatic way of specifying these trampolines. This is easy to do if the library supports it, which wasmtime does not (currently). For example, sqlite extensions work exactly like this, where the sqlite library will populate the trampolines for the extension, and sqlite guarantees ABI compatibility. A second problem: it's clear that wasmtime doesn't care about ABI compatibility between major versions (they changed the width of a pointer in an existing method). I don't know their stance on ABI compatibility within a major version.

What I'm realizing now: the only way that dynamic linking against wasmtime can work is for a version with ABI compatibility. This means that even if py-tree-sitter uses wasmtime as an optional dependency, the version of that dependency is going to be tied to the specific version of wasmtime that py-tree-sitter was built against. This is annoying but manageable. However, a potentially simpler solution is to include a complete copy of wasmtime statically linked into the py-tree-sitter library. This would change my proposed API a bit, but more over, means that it cannot be an optional dependency. It would need to be a separate native library and would weigh ~25 MiB because it completely includes wasmtime. It would not have a runtime dependency on any other wasmtime, though.

Given all of this, would it be interesting to include a second package in this repository, say, py-tree-sitter-wasm, which could be installed separately and included the wasm features? It could be implemented using #ifdef to support two compilation targets from the same source code.

Otherwise, it would be a fork of this repository, basically. Or am I missing a better solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants