Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address drake warning on non-hardcoded file paths #177

Open
ha0ye opened this issue Mar 30, 2020 · 2 comments
Open

Address drake warning on non-hardcoded file paths #177

ha0ye opened this issue Mar 30, 2020 · 2 comments
Labels
bug Something isn't working infrastructure how MATSS runs

Comments

@ha0ye
Copy link
Member

ha0ye commented Mar 30, 2020

1: Detected file_in(!!file.path(path, "breed-bird-survey-prepped", paste0("route",
route, "region", region, ".RDS"))). File paths in file_in(), file_out(), and knitr_in() must be literal strings, not variables. For example, file_in("file1.csv", "file2.csv") is legal, but file_in(paste0(filename_variable, ".csv")) is not. Details: https://books.ropensci.org/drake/plans.html#static-files

@ha0ye ha0ye added bug Something isn't working infrastructure how MATSS runs labels Mar 30, 2020
@ha0ye
Copy link
Member Author

ha0ye commented Mar 30, 2020

Possible need for dynamic branching - maybe combine with #168
https://books.ropensci.org/drake/plans.html#dynamic-files

@ha0ye
Copy link
Member Author

ha0ye commented Mar 31, 2020

So this pops up because the analysis script uses expose_imports(MATSS) so that the underlying functions in MATSS will be appropriate identified as dependencies in the analysis; and changes in MATSS can lead to an analysis being out of date.

This means that when drake_config() runs to do pre-processing on the plan, it detects these non-static file paths in the dependencies, even if they are not part of the actual targets defined in the analysis script.

It looks like dynamic files are a potential solution:

  • if BBS / BioTIME / GPDD datasets are part of an analysis, the datasets plan can include the data preprocessing that makes them into datasets for the analysis -- since this would turn the datasets into dynamic targets (I think?), that means the downstream analysis plan also has to be switched over to using dynamic branching (thus subsuming Refactor drake plan functions to make use of dynamic branching #168)
  • the intent of using file_in is so that the underlying file is included as a dependency, and changes to e.g. the data file mean that an analysis has to be re-run. A good question is IF this is an important feature... ideally we want an analysis to identify when updates to datasets occur, but the current recipe for an end-user means they have to be aware of this and download it on their own anyway. And we don't currently incorporate this functionality for other datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working infrastructure how MATSS runs
Projects
None yet
Development

No branches or pull requests

1 participant