Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new proxying modes foo__proxy: 'abort' and foo__proxy: 'abort_debug' #22648

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

juj
Copy link
Collaborator

@juj juj commented Sep 27, 2024

Add new proxying modes foo__proxy: 'abort' and foo__proxy: 'abort_debug', which will add compile time checks to verify that given JS function is not called in pthreads or Wasm Workers.

…ug', which will add compile time checks to verify that given JS function is not called in pthreads or Wasm Workers.
@juj
Copy link
Collaborator Author

juj commented Sep 27, 2024

How to fix the lint errors?

C:\emsdk\emscripten\main>npm run lint --write

...

  1:4384  error  Strings must use singlequote   quotes
  1:4417  error  Operator '=' must be spaced    space-infix-ops
  1:4418  error  Strings must use singlequote   quotes
  1:4438  error  Operator '=' must be spaced    space-infix-ops
  1:4487  error  Operator '=' must be spaced    space-infix-ops
  1:4501  error  Operator '=' must be spaced    space-infix-ops
  1:4502  error  Strings must use singlequote   quotes
  1:4516  error  Operator '=' must be spaced    space-infix-ops
  1:4518  error  Operator '+' must be spaced    space-infix-ops
  1:4519  error  Strings must use singlequote   quotes
  1:4539  error  Operator '&&' must be spaced   space-infix-ops
  1:4555  error  Operator '=' must be spaced    space-infix-ops
  1:4556  error  Strings must use singlequote   quotes
  1:4579  error  Operator '=' must be spaced    space-infix-ops
  1:4597  error  Operator '=' must be spaced    space-infix-ops
  1:4612  error  Operator '=' must be spaced    space-infix-ops
  1:4628  error  Operator '=' must be spaced    space-infix-ops
  1:4629  error  Strings must use singlequote   quotes
  1:4682  error  Operator '-' must be spaced    space-infix-ops
  1:4731  error  Operator '-' must be spaced    space-infix-ops
  1:4762  error  Operator '=' must be spaced    space-infix-ops
  1:4783  error  Operator '=' must be spaced    space-infix-ops
  1:4789  error  Operator '=' must be spaced    space-infix-ops
  1:4795  error  Operator '=' must be spaced    space-infix-ops
  1:4805  error  Operator '=' must be spaced    space-infix-ops
  1:4816  error  Operator '=' must be spaced    space-infix-ops
  1:4835  error  Operator '=' must be spaced    space-infix-ops
  1:4855  error  Operator '=' must be spaced    space-infix-ops
  1:4875  error  Operator '=' must be spaced    space-infix-ops
  1:4895  error  Operator '=' must be spaced    space-infix-ops
  1:4916  error  Operator '=' must be spaced    space-infix-ops
  1:4937  error  Operator '=' must be spaced    space-infix-ops

✖ 26511 problems (26511 errors, 0 warnings)
  26507 errors and 0 warnings potentially fixable with the `--fix` option.


C:\emsdk\emscripten\main>git diff

C:\emsdk\emscripten\main>npm run check

> check
> prettier --check src/*.mjs tools/*.mjs

Checking formatting...
[warn] src/compiler.mjs
[warn] src/jsifier.mjs
[warn] src/modules.mjs
[warn] src/parseTools_legacy.mjs
[warn] src/parseTools.mjs
[warn] src/utility.mjs
[warn] tools/acorn-optimizer.mjs
[warn] tools/lz4-compress.mjs
[warn] tools/preprocessor.mjs
[warn] tools/unsafe_optimizations.mjs
[warn] Code style issues found in 10 files. Run Prettier with --write to fix.

C:\emsdk\emscripten\main>npm run check --write

> check
> prettier --check src/*.mjs tools/*.mjs

Checking formatting...
[warn] src/compiler.mjs
[warn] src/jsifier.mjs
[warn] src/modules.mjs
[warn] src/parseTools_legacy.mjs
[warn] src/parseTools.mjs
[warn] src/utility.mjs
[warn] tools/acorn-optimizer.mjs
[warn] tools/lz4-compress.mjs
[warn] tools/preprocessor.mjs
[warn] tools/unsafe_optimizations.mjs
[warn] Code style issues found in 10 files. Run Prettier with --write to fix.

C:\emsdk\emscripten\main>npm run check --write

C:\emsdk\emscripten\main>git diff

C:\emsdk\emscripten\main>prettier --check src/*.mjs tools/*.mjs
'prettier' is not recognized as an internal or external command,
operable program or batch file.

C:\emsdk\emscripten\main>npm prettier --check src/*.mjs tools/*.mjs
Unknown command: "prettier"

To see a list of supported npm commands, run:
  npm help

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable if there is use case for it. Are there existing system library function that you had in mind when designing this attribute?

test/pthread/proxy_abort.c Outdated Show resolved Hide resolved
src/jsifier.mjs Outdated
const insideWorker = (PTHREADS && WASM_WORKERS)
? '(ENVIRONMENT_IS_PTHREAD || ENVIRONMENT_IS_WASM_WORKER)'
: (PTHREADS ? 'ENVIRONMENT_IS_PTHREAD' : 'ENVIRONMENT_IS_WASM_WORKER');
prefix = `assert(!${insideWorker}, "Attempted to call function '${mangled}' inside a pthread/Wasm Worker, but this function has been declared to only be callable from the main browser thread");`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "but this function can only be called from the main thread"?

Using the term main browser thread I a little misleading because emscripten application can be run on workers where there is no main browser thread involved. We sometimes call this "main application thread", but I think just "main thread" here is clear enough.

Copy link
Collaborator Author

@juj juj Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proxying modes __proxy: 'sync' and __proxy: 'async' definitely mean to run the JS code in the main browser thread, and not in the thread that executes 'main'.

I.e. in PROXY_TO_PTHREAD builds, __proxy: 'sync' and __proxy: 'async' run the proxied JS code in the main browser thread, which is not the same thread as the main application thread (the main runtime thread)

The earlier PROXY_TO_WORKER build mode (which I presume is what you mean by "can be run on workers where there is no main browser thread involved") predates pthreads/SharedArrayBuffer, and does not interact with SAB, or the proxying architecture. So these proxying modes are not relevant there when pthreads are not used.

So I think saying "main browser thread" here is the correct thing to do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "but this function can only be called from the main thread"?

Changed the message into form

"Attempted to call function '${mangled}' inside a pthread/Wasm Worker, but by using the proxying directive "'${proxyingMode}'", this function has been declared to only be callable from the main browser thread"

I would like to keep the mention about the proxy declaration that prevents the call. Otherwise a developer might be surprised to mistakenly ponder "how does it know this function is not callable from a worker?"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proxying modes __proxy: 'sync' and __proxy: 'async' definitely mean to run the JS code in the main browser thread, and not in the thread that executes 'main'.

The thread on which the emscripten-generated code is first run I call the main application thread, and it might be running in worker, in which case there is no main browser thread at all in the picture. When you write MAIN_THREAD_EM_ASM or __proxy: 'sync' its is the main application thread that runs you code. It just happens that this is also normally the main browser thread.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, has there been some kind of change in configuration that I am not aware of?

My expectation:

Building with -sPROXY_TO_WORKER: main() is run in a Worker thread. __proxy architecture is not available there.
Building with -pthread: main() is run in main browser thread. main browser thread == main application thread, __proxy targets main browser thread.
Building with -pthread -sPROXY_TO_PTHREAD: main() is run in a separate Worker thread, i.e. main application thread is not main browser thread. __proxy: 'sync' and MAIN_THREAD_EM_ASM() proxy to the main browser thread, and not to the main application thread.

In each case, proxying does target the main browser thread. Has this changed somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have several users that do this.

Is this case a scenario where the user manually spawns a new Worker(), and inside that Worker, they load up the Emscripten-compiled pthread-enabled program?

Such use case was never supported originally, because of the problem that in such a manual "I loaded the app into a Worker" mode, Emscripten would still need to be able to load some stuff (a lot of stuff) into the JS context of the main application thread, in order for any kind of general proxying to work. And Emscripten runtime JS libraries were not developed with the impression that the target thread that proxying occurs to, would not be the main browser thread.

For example, if I start searching for __proxy: 'sync' in /src/, there are functions that assume that they are always the main browser thread:

emscripten_set_window_title(): document.title only exists in main browser thread
emscripten_get_screen_size(): screen.width / screen.height exist only in main browser thread
emscripten_hide_mouse(): document.styleSheets only in main browser thread
emscripten_set_canvas_size: Browser.setCanvasSize() works only in main browser thread
emscripten_get_canvas_size: Module['canvas'] only works in main browser thread
emscripten_create_worker: requires Subworkers support in order to work (https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers#spawning_subworkers)

library_html5.js: everything here that has __proxy: 'sync' expects that proxy target lands to the main browser thread
library_egl.js, library_glut.js: everything with __proxy: 'sync' expects that proxy target is the main browser thread
library_openal.js, library_html5_webgl.js, library_sdl.js: likewise

that is why I added the whole PROXY_TO_PTHREAD mode so that users would not need to figure out how to manually launch Emscripten content main() in a Worker - they could reuse the system-provided mechanism for that.

Although now that I think about it, iirc subworkers are supported widely, so that is no longer a problem (in 2015 Chrome did not support subworkers)

So maybe those users that do go and launch their pthread builds in a Worker are careful to not use any of the above APIs, and just instead do generic computation there? I.e. those people are not using Canvases or library_browser.js or library_html5.js or any other DOM accessing things that Emscripten provides?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I believe it was actually you who added those two functions back in 174a082)

Yeah, I added those, but that is not for supporting the "user manually launches a pthread-enabled build inside their manually created new Worker()".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case a scenario where the user manually spawns a new Worker(), and inside that Worker, they load up the Emscripten-compiled pthread-enabled program?

Such use case was never supported originally, because of the problem that in such a manual "I loaded the app into a Worker" mode, Emscripten would still need to be able to load some stuff (a lot of stuff) into the JS context of the main application thread, in order for any kind of general proxying to work. And Emscripten runtime JS libraries were not developed with the impression that the target thread that proxying occurs to, would not be the main browser thread.

Yes exactly. It turns out that this configuration is supported, at least for some programs. Obviously such programs cannot use DOM or anything like that, but I believe we do at least some testing of this configuration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I added those, but that is not for supporting the "user manually launches a pthread-enabled build inside their manually created new Worker()".

Can you explain what you intended the difference between those two functions to be then? I seems that unless we support that configuration then then the main runtime thread and the main browser thread would always be the same? Unless I'm missing something?

Copy link
Collaborator Author

@juj juj Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very originally those functions intended to be a generic set to enable common handling of each of single-threaded, -pthread, and PROXY_TO_PTHREAD and PROXY_TO_WORKER build modes, so that users would be able to do multiple variants of builds from the same codebase and we'd be able to write system library code that would be aware of each of those modes.

So, for example, in a PROXY_TO_PTHREAD build, it would be expected to happen that emscripten_is_main_runtime_thread() == emscripten_is_main_browser_thread(). But then if one would make a PROXY_TO_WORKER build of the same, then those functions would change to return the different respective things. So that way one could have one codebase that was ready to mutate to any of those build modes.

But I did not originally intend for pthreads enabled builds to ever be loaded while inside a Worker(), since the __proxy: 'sync' semantics would proxy to the hosting Worker instead of going to the main browser thread, and we have never had any facilities to do proxying to the main browser thread in such a mode.

Although now that I think about it, yeah, it is probably the case that if user avoids all those library_x.js files that I listed above (that suffer from this problem), things will probably be pretty safe for them, even if they manually load pthread-enabled builds in their own Workers. And one can build "reasonable" use cases in the absence of relying on those library_x.js files.

@juj
Copy link
Collaborator Author

juj commented Sep 30, 2024

Seems reasonable if there is use case for it. Are there existing system library function that you had in mind when designing this attribute?

I scanned library_browser.js and library_wasm_worker.js to see if there might be functions like this, but none stood out.

The motivation for adding this is that at Unity I am seeing more and more feature developers add their own JS functions, and they generally copy-paste the __proxy: 'sync' directives when they are unsure about what that means.

Many of these JS library functions should never use proxying, but we only ever call those JS functions in the main browser thread - if any pthread might call any of those JS functions, there would be trouble. So that is I why I wanted to make the proxying mechanism check this for me: by documenting each function that should be strictly main thread only, I can run the test suites to confirm that it indeeds happens to be the case, and that we are not getting any surprising unwanted proxying taking effect.

@juj
Copy link
Collaborator Author

juj commented Sep 30, 2024

It looks like there is this wholesale proxy: 'sync' directive that is happening with the filesystem, so changed https://github.com/emscripten-core/emscripten/pull/22648/files#diff-5de5cdf403acefc1dc87b4446a186049adc104ceca678d993f3d5a007bcafcc5R273 to make tests pass. fd_write is also called in Wasm Workers.

@@ -270,6 +270,7 @@ var WasiLibrary = {
#else
fd_write__deps: ['$printChar'],
#endif
fd_write__proxy: 'none', // Each Worker can do stdout/stderr printing on its own, without needing to proxy to the main thread.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure doWritev won't work if you call it on a thread. And I think its not true for printChar either since you would end up with a different printCharBuffers on each thread.. which is not correct.

Copy link
Collaborator Author

@juj juj Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would end up with a different printCharBuffers on each thread.. which is not correct.

I looked into this some decade ago, and from back then, I could not find defined semantics on how multiple threads writing on stdout/stderr should synchronize, when fflush is not called by the user. In most implementations, like ours, printing \n is an implicit fflush.

If multiple threads race to print to stdout/stderr, then it does seem to me it would be better semantics for each thread to construct full \n buffered lines that they would then race to print, rather than each thread mixing their individual character writes that could result in the data from multiple threads inter-mixing with each other? I could not find info to say such behavior would not be compliant.

Each Worker has a copy of MEMFS/library_tty.js in them, so they would each end up calling to default_tty_ops.put_char() and pushing out console.log() calls independently?

Having each thread synchronously block to perform stdout prints would seem like a rough performance overhead to have?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the function is not only used for stdout/stderr, its used for all file access.

If we do make this change it would be very large one (basically making the filesystem multi-threaded) and I would at least expect a separate RP for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, you're right of course. I somehow was interpreting this from the perspective that this function was already just for the stdout/stderr writing case. That would not work. I'll think of something here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants