-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functions to manage drake cache location #66
Comments
It ballooned up to nearly half a gig with the BBS analysis included, so I think we're going to exhaust piggyback very quickly. I think this needs to be setup as a larger technical issue to:
|
I like the idea of a designated location to download the cache file, with only a limited numbers of users able to upload an updated version. I did look into directly connected to an online PostgreSQL database, but since Drake doesn't support a "read only" approach to the cache that won't work. Seems like there are three reasonable choices for storing the online cache at the moment:
I'm happy to set up whatever you all think is best and set up permissions for anyone who needs them. One complexity that we'll need to deal with in implementation is what to do after an initial download of the cache file. The user will then run the pipeline, potentially with changes, which will update the cache file. On subsequent runs do we download an updated cache file or keep the users version? Do we provide a function that lets the user optionally update to the newest remote cache? Finally, apologies for being negative about this today. I was overthinking things, which I wouldn't have done if I was paying attention to the issues in this repo since this one laid out what you were looking for nice and clearly. |
Serenity would likely be suitable for our needs, and I'm not sure we need to enable off-UF access, if we figure out the right workflow. I still lean towards Zenodo, even though it's a bit of a hacky use of their service. That sounds like it would make the MATSS package more easily portable for other users, since it's a lower barrier for individuals to set up a zenodo account and link up a cache there if so desired. There's even an R package, though the last commit was 4 years ago. 🙀 As for immediate needs... we've done some refactoring of how retriever data gets loaded in, which is keeping the current cache size around ~100 MB. Agree on discussing workflow issues -- I need to think more about goals here, as well as how to handle testing. |
|
Currently we have a working solution using the hipergator. We will want a long-term solution for external users. |
Long-term, I think we want functions to be able to push and pull a drake cache from a remotely deployed MATSS instance, but this is not a high-priority feature |
Just something to think about?
If it's in the cloud, I think this allows us to pass it around without re-running targets. However, it becomes sensitive to
drake
version changes (which is how it came to my attention). Right now it is in the .gitignore, but a version of it lives on GitHub, so I'm not sure what our consensus on it is?The text was updated successfully, but these errors were encountered: