Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous locale fetching? #87

Open
cormacrelf opened this issue Dec 1, 2020 · 7 comments
Open

Synchronous locale fetching? #87

cormacrelf opened this issue Dec 1, 2020 · 7 comments

Comments

@cormacrelf
Copy link
Collaborator

cormacrelf commented Dec 1, 2020

CC @adomasven

One thing I couldn't get going in the Zotero client code at the moment is the ability to fetch all the locales needed. The old locale fetching based on individual references (!) was wrong and please ignore it, but the new way is that:

  1. Driver.new({ style: '<style default-locale="en-GB" ... >' }) wants to fetch ['en-US', 'en-GB']
  2. Driver.new({ style: ..., localeOverride: "de-AT" }) wants to fetch ['en-US', 'de-DE', 'de-AT']
  3. await driver.fetchAll() fetches everything on that list except en-US, which is bundled.

Using the API correctly means always calling and waiting for fetchAll after Driver.new(). If you don't call it, you get everything from en-US, no localisation at all. If you call it but don't await it, then you get an error if you use the driver during the fetch, it wants to have exclusive access. This is a bit of a rust thing that would be pretty painful to work around.

So, fetchAll being async, you can't use it in most of the adapter API because most of those methods are synchronous i.e. _resetDriver can't be async unless you want the "red functions" to take over a lot of the integration code.

The question is: Do you want a synchronous locale fetching option? Just provide fetchLocaleSync(lang: string): string and let it run, no fetchAll required. Last time I read the zotero locale fetching code, it was synchronous, so this would make things easier. Let me know what you think.

@adomasven
Copy link
Member

Using the API correctly means always calling and waiting for fetchAll after Driver.new(). If you don't call it, you get everything from en-US, no localisation at all. If you call it but don't await it, then you get an error if you use the driver during the fetch, it wants to have exclusive access. This is a bit of a rust thing that would be pretty painful to work around.

I think this is fine, although would be better if the Drive threw some sort of error if being called while fetching locales. It is normally assumed in the JS world that if you do not await stateful async methods and try to use the stateful object you will get at the very least unexpected behavior. I don't think there's much reason for a sync version of this function, and in fact the sync nature of citeproc-js has been a problem for us before. If you want things to be sync-lite because the locales are pre-loaded, the async lifecycle function can return immediately.

So, fetchAll being async, you can't use it in most of the adapter API because most of those methods are synchronous i.e. _resetDriver can't be async unless you want the "red functions" to take over a lot of the integration code.

Sorry, I'm not sure what you mean by "red functions". Most of the integration code is async anyway and handling async code these days isn't generally an issue with most programming languages and on most platforms.


The driver method could probably be named more descriptively and the docs should be more explicit about the fact that you must call it and when would be the best time to do so. async init() would be standard to signify that this needs to be run directly after the constructor (and would pair well with free()), or async fetchLocales(), to be more specific about what we're fetching.

@cormacrelf
Copy link
Collaborator Author

cormacrelf commented Dec 1, 2020

better if Driver threw some sort of error if being called while fetching locales

It does. 👍

I don't think there's much reason for a sync version of this function, and in fact the sync nature of citeproc-js has been a problem for us before.

I was mainly asking if it would help while there are two switchable processors, but if it's easy just to make all those methods async for both -rs and -js, then cool.

In general labelling the functions as async is good, but as it stands I don't think we can actually exploit the async and solve those way-too-single-threaded problems in a Firefox legacy extension without some serious hackery. We're still working with browser WebAssembly, i.e. a single threaded platform as it stands. This is something for later but as I understand it if you want working with huge documents to go faster and not block the UI, you'll need either:

  • Electron, each citeproc-rs Driver in its own native process with its own multi-threading, using a json-rpc 2.0 over a normal stdio transport (fully async API) also known as "the dream setup"
  • The same json-rpc in Firefox but with a super hacky shim around stdio using named pipes etc (like you already do but in a long-lived process, not sure if that's possible)
  • Firefox but I write an actual native XPCOM component using the Gecko codebase's Rust XPCOM API (LOL but seriously that might work!)
  • Web Workers and a vastly more mature WebAssembly threading implementation, which FF ESR 60.9 will never get.
  • Edit: Web Workers only, single background worker thread. Maybe this is the lowest hanging fruit if you just want to unblock the UI.

There's definitely potential to process much faster and off the main thread, but maybe next year!

"red functions"

It's a reference to this article http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

I'm updating the docs now, and I'll take the rest of that feedback on board, thanks.

@adomasven
Copy link
Member

In general labelling the functions as async is good, but as it stands I don't think we can actually exploit the async and solve those way-too-single-threaded problems in a Firefox legacy extension without some serious hackery.

That's fine. We have much bigger performance bottlenecks (like orders of magnitude bigger) related to Word Processor APIs than compared to even citeproc-js, which while not being that fast due to its interpeted nature, isn't that slow in the grand scheme of things. Citeproc-rs from initial testing feels to be much faster, which is great (although I haven't done any measuring yet). FWIW we are planning to switch to FX ESR 78 at some not too distant point in the future for other reasons, but I have no idea whether WebAssembly in Web Workers is any better there. XPCOM being deprecated and heavily pushed out of Firefox, and with plans to either update the underlying Firefox or switch to Electron, is probably not a great path to go down.

But as I understand it if you want working with huge documents to go faster and not block the UI

This is mostly not an issue, but also can be alleviated to a certain degree by skipping into the next event loop (e.g. by calling await Zotero.Promise.delay()) when running many sync operations in a row that could potentially lock the UI.

It's a reference to this article http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

That's a great article. I've never heard about the concept before, but it's a good point.

@cormacrelf
Copy link
Collaborator Author

Well, I can get it to run for 5 seconds on an update to a 1000 item disambiguation pessimistic case. But yeah, that other issue with the guy letting it run for an hour is MSWord isn't it! I imagine Zotero.Promise.delay() won't help much if it's only one function call that gets stuck for 5-10s in the wasm code, but I could be wrong.

Good news about ESR 78, at least for general wasm perf but nothing else of note I think.

Back to the main point, if you don't want to paint the integration code red, the locale fetching really is the only thing that is async and it doesn't even have to be. So that's just an option open to you. Anyway!

@cormacrelf
Copy link
Collaborator Author

cormacrelf commented Dec 1, 2020

(Also I'm sure you've already seen https://officespdev.uservoice.com/forums/224641-feature-requests-and-feedback/suggestions/33088741-support-for-search-api-in-headnotes-and-footnotes, but if you haven't, it could be the thing that unblocks a pure JS word plugin.)

@adomasven
Copy link
Member

adomasven commented Dec 1, 2020

Back to the main point, if you don't want to paint the integration code red, the locale fetching really is the only thing that is async and it doesn't even have to be. So that's just an option open to you. Anyway!

Now that you're saying it this way, it's probably a good idea to have a sync method just so that the processor can be used in a completely sync situation, so if it's not hard to add, it can't hurt (and will save some time for me to).

Well, I can get it to run for 5 seconds on an update to a 1000 item disambiguation pessimistic case.

Wouldn't disambiguation be O(log n) in the worst case (if you presort the clusters)?

(Also I'm sure you've already seen https://officespdev.uservoice.com/forums/224641-feature-requests-and-feedback/suggestions/33088741-support-for-search-api-in-headnotes-and-footnotes, but if you haven't, it could be the thing that unblocks a pure JS word plugin.)

I was just recently looking over the Word JS API to see whether we could switch to that so we can have Zotero Word integration on iPads, and was pretty sure I saw at least some APIs to access footnotes. No APIs for field codes is the big showstopper for us right now. EDIT: Oh yeah, this is just searching for text in footnotes, which isn't really relevant to us.

@cormacrelf
Copy link
Collaborator Author

cormacrelf commented Dec 1, 2020

I'll do that then, no worries.

Disambiguation is actually more like O(n * m), n being the number of citations, m being number of references. It also scales somewhat with style branching complexity. With a thousand cites at one reference per cite, it's more like O(n^2). If they're all ibid it's more like O(n). When I say pessimistic I mean one reference per cite, so this case is a million 'ops'.

I thought it might be relevant as having search in footnotes also requires a footnote API at all, and it is one of the highest voted requests on the site. But I haven't looked recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants