Provide interoperability for JS, Java #5945
Replies: 20 comments
-
Sounds like # 4 is the superior choice. It's one con can be avoided, I think:
The plugin can just send path strings to data files/dirs, and the underlying DVC process could open those files directly. |
Beta Was this translation helpful? Give feedback.
-
I think, eventually we will need to do all of the options above (except 2?). The plugin might need to understand ".dvc" and "dvc.yaml" files to provide better completion (eg: params completion) or some "hover" actions with plain JS. The same could be said, for example, to show DAGs, file diffs, metrics/params diff. For hefty tasks, it should just use the DVC command line, but it'll only provide higher-level access. Let's say, if we want to show Repo's file structure and provide few options such as "get_url" and a bunch of other options at the same time, we might need jsonRPC. Or, make multiple small plumbing commands like git (which will be akin to jsonRPC/REST way). |
Beta Was this translation helpful? Give feedback.
-
I'm OK with using JSON RPC. We open a high port on localhost, and talk to it. I don't know about windows, but on linux, this is incredibly efficient. And it has the potential for usage in any app that can talk HTTP. I would also like plain old JSON and REST, which gives us a lot of options in terms of transport. The comms overhead doesn't seem like a big issue to me. We can cross that bridge when we come to it, with multiple techniques (from least to most complex):
|
Beta Was this translation helpful? Give feedback.
-
This is not even required - you could use stdin/stdout as transport channel! Also, JSON RPC is much simpler than REST. There is no content-encoding, method semantic, query parameter parsing, urls etc. involved. You just send some json object with a method, a body and a request id and get back some json object with a matching request id. |
Beta Was this translation helpful? Give feedback.
-
Yup, that seems good as well! Also, regarding helpers for DVC files, a language server can work on other editors as well (like vim with vim-ale). So it might be worth it to separate the two. |
Beta Was this translation helpful? Give feedback.
-
I think RPC is def. more appropriate than REST in the case of |
Beta Was this translation helpful? Give feedback.
-
Does VSCode provide any API to get what files changed in the repository? Or, maybe we could package "watchman" or something similar in the extension itself? This with |
Beta Was this translation helpful? Give feedback.
-
I took a look at some of the existing VSCode source control extensions (git, hg, svn) and it looked to me like they all just wrap their respective source control CLI programs. Is it really necessary for us to maintain another API here? |
Beta Was this translation helpful? Give feedback.
-
@pmrowla but this or this is not nice code to maintain - especially if you plan to support other IDEs in different programming languages later. But of course it depends. If there are only a fews commands and if the output is structured already, using the dvc command line as is might work just fine. In my opinion, CLI tools that are not specifically meant for programmatic consumption are hard to use for such. They are either great for human use (with all the fancy CLI UI) or great for automation, but usually not both. Also, every non-normative output (like a user-friendly way of reporting progress) of the CLI tool will become normative as eventually some code will depend on it.
Yes it has. |
Beta Was this translation helpful? Give feedback.
-
Maybe try an small MVP just wrapping DVC commands and move onto RPC if it's worth it. |
Beta Was this translation helpful? Give feedback.
-
So my take on this. Accessing files directly This is ok with me if we talk about reading. Won't add much complexity to implementation since whatever you read from So advantages:
Things that I don't like here:
Also this can't be used to perform any dvc thing, like pull/push/checkout. DVC cli We already have an interface here, whose compatibilty and stability we maintain. Building command line arguments is no more complex than sending them via JSON. However, parsing output might be tricky, i.e. tables, these ones will need a json option. On it being slow. For most commands like push/pull/retro this doesn't matter. And we only get 1s+ when we screw up loading all the remote dependencies on startup, I think we have some tests for it now. Beside startup speed will be the same as rpc. So pluses:
Minuses:
JSON RPC We can try to make it resemble our exiting non-public State between calls can't be shared unfortunately because dvc files might change under the hood. Unless we lock the repo while server is running, which will make dvc cli not usable in the meantime. Pluses:
Minuses:
Startup cost might play if we use many info only commands, like show stages or outputs. Simply reading files might bridge it. Deps if we use sync: |
Beta Was this translation helpful? Give feedback.
-
I don't see the JSON RPC idea that good. Enforces you to install DVC as a service. Not sure also how it's going to operate with relative paths. Also one should have to write almost the same code than in the parser CLI parser since all the parameters has to be sent via RPC.
If we were definitely going towards rpc I would consider protobuf to alleviate this. But I don't think thats the case since the messages should be like the cli ones.
I think that witting a parser is quite easy especially with a proper parser generator like peg.js or antlr4. I would go that way. |
Beta Was this translation helpful? Give feedback.
-
I don't understand this argument: The VS Code extension would instantiate and maintain the dvc process. DVC must be installed in a way that the extension can access it anyways.
You don't need to parse anything, merely JSON objects need to be converted into your target structure.
Either setting a cwd per request or just using normalized, absolute paths.
I would strongly discourage the use of protobuf. It brings a lot of complexity into all parties.
This sounds like a very fragile approach, especially if the CLI output is optimized for human readabilty. In this case, the output sometimes is not even context free (like indented tables), so good luck parsing that with peg. Also, defining these grammars is a very time consuming task and they are probably not very forward compatible. I think we can agree that JSON RPC is the most comfortable way suggested so far to access DVC functionality from IDE extensions (like for VS Code, Sublime Text or PyCharm). Just compare these code snippets: // If properly done, the args and the result is typed
const list = await dvcRpcChannel.request("status", { showUnchanged: settings.showUnchanged });
for (const item of list) { console.log(item.fullPath, item.status); } vs const args = ["status"];
if (settings.showUnchanged) { args.push("--showUnchanged"); }
const result = await execAsync("dvc", args);
const modifiedFullPaths = result.matchAll(/modified:\t(.*)/);
const unchangedFullPaths = result.matchAll(/unchanged:\t(.*)/); I think we can also agree that JSON RPC would also require the most work for DVC to offer its functionality to consumers. I think we should try to find the sweet spot between these two sides. If you don't go with the json rpc approach, you should at least consider adding a json output flag. If you are going forward with the CLI+JSON option, it would be awesome if you could export or craft a machine-readable documentation of your CLI (including all supported arguments and the JSON output type) so that at least command builders can be generated. Maybe it would be helpful to have a precise document of what the first iteration of the VS Code DVC extension should offer and then to see which APIs from DVC are needed for that and how they can be used. |
Beta Was this translation helpful? Give feedback.
-
Could you please elaborate?
dvc has --json-output as a parameter to receive that, a json output. There is no need to massage the output.
Again this is very poor. Please, not to be compared with a parser generator. I would never do it this way.
I don't think is a time consuming task. Actually in my experience has been pretty fast for my needs.
At the end is a service or a daemon right? If not is a kind of magic I still don't know 😸
I said just in case size would be an issue. What kind if issues did yoou have with gRPC is a RPC also as you like.
What about dvc service crashing like many extensions in XCode (eslint comes to my mind) would the user have to restart the IDE? I never use those extensions because of that actually. |
Beta Was this translation helpful? Give feedback.
-
In this case, if all commands understand
If using
Then the argument that VS Code also just executes CLI commands for git integration is not valid, as they do it this way.
It would be a long running command line process without any visible window. The extension would start a dvc process on startup and kill it on shutdown. If the DVC process crashes, it would be restarted. No user interaction required. |
Beta Was this translation helpful? Give feedback.
-
How do you do RPC? dvc is going to run as a daemon or service listening for TCP connections, right? That sounds to me like a service or daemon and of course without any visible window.
How? it's not that trivial, take into account user permissions to restart services depending on the OS. Might be possible or might be not... we should consider that friction also.
To properly generate the inputs if needed. I give a simple solution VS hand made input generation.
That does not means they are doing it correctly (not incorrectly either). I put the example also of XSLT since many people traverses the XML with code VS a simple XSLT transformation.
Totally agree! DVC has |
Beta Was this translation helpful? Give feedback.
-
I don't see this @Suor. Most DVC commands combine and process info from command line, metafiles, and file system significantly, no? The "files" approach sounds like re-implementing DVC in JS to me.
^ That's contradicting. But if the argument is that the startup time is acceptable (not enough of a reason to implement RPC) then I agree.
+1 @hediet. I also think that a loosely coupled middleware to wrap the CLI (has to be written in JS no?) sounds possibly fragile.
Only 5 commands have that option @DavidGOrtega
If going the CLI+JSON way gets us there (something like http://docopt.org/), then I like that option more 🙂
Good point. But is it really a problem? I'm guessing the same applies to any plugin like Git: you run a command on the IDE and another one via terminal and they don't know about each other, just use the latest working dir state. This is actually a feature in DVC: parallel repros.
It would be included in the regular installation, and the plugin would manage it in the background. Assuming that works well, then no extra work (or awareness even) is required from users. Summarizing, I think that if RPC is the "right" way to go (which isn't clear yet), and helps us refactor core classes (e.g. the
Yup, I also vote for wrapping the CLI for a basic MVP (as mentioned before) + a few RPC endpoints for more complex operations — letting us fully commit later depending on that experience. |
Beta Was this translation helpful? Give feedback.
-
Another matter to consider for either route is how will we keep the plugin version in sync with the core DVC version. DVC evolves rapidly and while in theory no minor changes should break things + efforts are made to keep backward compatibility, in reality it my happen that a DVC update breaks the plugin, an RPC endpoint, or the CLI middleware. Integration tests, I guess? Which alternative makes it easier to manage this aspect? |
Beta Was this translation helpful? Give feedback.
-
Thanks everyone for the discussion! It looks like just regular
@Suor if it's written with some introspection you don't need a separate API- it should be more or less auto-generated layer on top of the same API that we use for CLI
this doesn't sound like a big deal, but what I'm worried about here is that it might complicate some edge cases like delivering updates (e.g. DVC package itself). But also have to agree that it is indeed another moving part in all this integration.
I guess, we will need this. Also, extracting experiments (even the current list) is dependent on running internal git. So, I would not rely on parsing files- we need either RPC/REST or Questions:
|
Beta Was this translation helpful? Give feedback.
-
@shcheklein CLI returns text, no way it may be wrapped to return json. Not sure whether it is even returning text rather than printing it from random nested code. So we will need to add some intermediate level abstraction and then wrap it both with CLI and JSON. The CLI wrapping will be manual, JSON part might be more automatic. Also, some of the functions might be completely different. @jorgeorpinel Beside startup (logical pause) speed will be the same as rpc. I.e. we still need to collect stages do the work. On reading files. DVC might be performing some heavy things, but plugin might require basic info like a list of stages or outputs, reading files for that seems ok. The alternative is asking for the same info in the same format from dvc, so no more understanding needed than necessary and no reimplementing dvc. |
Beta Was this translation helpful? Give feedback.
-
As we came close to implementing our first official plugin for VS Code we need to find a way to call Python from JS code.
Repeating an excellent summary by @hediet:
Option 1: It builds the data and modifies the files itself (everything in JS)
Contra: Would require to reimplement DVC in JS.
Option 2: It calls into DVC internal Python code
Either the VS Code extension ships the python code + runtime, or it expects python libraries in some global paths.
Contra: This is very fragile
Option 3: It uses the DVC command line (like
dvc push
ordvc commit -m
“demo”)Contra: It is not developer friendly to build command line strings manually.
Contra: The DVC process must be started for each data query / action. Can be slow as we know (>1s sometimes).
Option 4: It launches some
dvc --jsonRpc
and uses json RPC to access DVC functionality (https://www.jsonrpc.org/specification).Pro: Offers a standardized API that can be consumed by many programming languages.
Note: This is how language services can be implemented for VS Code in any programming language
Pro: DVC can send notifications to VS Code in real time
Pro: State can be shared across multiple requests for faster response time
Contra: if hundreds of megabytes are sent for a single request or response, this blocks the entire communication channel.
@iterative/engineering WDYT?
@Suor @hediet could you please move your arguments into this ticket?
Beta Was this translation helpful? Give feedback.
All reactions