Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to disable installing packages with --with-keep.source #1713

Open
sebffischer opened this issue Oct 10, 2023 · 5 comments
Open

Option to disable installing packages with --with-keep.source #1713

sebffischer opened this issue Oct 10, 2023 · 5 comments

Comments

@sebffischer
Copy link

Recently, renv changed the way packages are installed by adding with --with-keep.source installation option. I have created a related (solved) issue here: #1688.

While the problems that arose in this issue are fixed from our side, we are now running into related problems that also arise due to the installation option --with-keep.source.
A side effect of keeping all the sources is that the size of serialized functions becomes much larger.

If I e.g. save the ggplot2::ggplot function when installing ggplot2 without source, the function has 1.76 KB, wheres when I install the package with the sources, the serialized ggplot2::ggplot function has 1.65 MB, a factor of around 1000.

I believe that the culprit here is attributes(attributes(ggplot2::ggplot)$srcref)$srcfile$original$lines, which is a large character vector containing the sources.

It would be great if one could disable the --with-keep.source flag. In light of the above issue, might it also make sense to not set this option by default?

sebffischer added a commit to mlr-org/mlr3book that referenced this issue Oct 10, 2023
The "large-scale benchmarking" chapter always threw because
of a change in renv that started to install packages with --with-keep.source
this had two problems:
1. methods of R6 intances had a srcref object containing a promise with a large environment
2. function parameters (as in the robustify pipeline) were also quite
   large because from there one can also reach enviroments where a lot
   of srcrefs are kept

While this is inherently an issue that needs to be addressed in renv --
i.e. there should be an option to not keep the source -- we
here fix it temporarily by relying on an older renv version and
bumping the mlr3misc version where the problem is solved

rstudio/renv#1713
mlr-org/mlr3misc#88
@sebffischer
Copy link
Author

Note that it is possible to overwrite the --with-keep.source with --without-keep.source

@jemus42
Copy link

jemus42 commented Oct 11, 2023

Am I correct in assuming that via https://github.com/rstudio/renv/blob/36e1b5192fbc9f380dd4fffc6ee837856d5d3e75/R/r.R#L206C38-L206C50 we can override this with the install.opts option, e.g. options("install.opts" = "--without-keep.source")?

(EDIT: Yes, apparently that works)

I have just downgraded to renv v0.17.0 for a project where this became an issue, and I'm trying to see if I can use a recent renv version with this workaround just in case the downgrade caused other issues.

In any case, this issue affected me by causing a serialization step (saveRDS) of many objects containing R6 objects to become
a) unreasonably slow and
b) taking up over 200GB of real memory on our workstation after over 20 hours of running

🫠

@kevinushey
Copy link
Collaborator

b) taking up over 200GB of real memory on our workstation after over 20 hours of running

🤯

Do you know why this is happening? I wonder if there's a fix that needs to happen in R6, or one of the packages using R6...

@jemus42
Copy link

jemus42 commented Oct 16, 2023

I think @sebffischer might be able to elaborate on that a bit more - I just got the 'srcrefs full of stuff' and 'R6 is somehow not great in that regard' bits of that discussion 🙃

@sebffischer
Copy link
Author

sebffischer commented Oct 17, 2023

The problem is essentially the one I have already described in the first post in this issue, i.e. the size of serialized functions changes considerably when they contain srcrefs. While this might not seem horrible in itself, it becomes a major problem in combination with the way the mlr3 team uses R6 classes and how enviornments are (un)-serializeed (see code snippet below).

We already partially mitigate this problem ourselves by taking care that the methods of our R6 objects do not contain srcrefs, essentially overwriting the behaviour of --with-keep.source, but we cannot do this for functions from other packages.

# "Create R6 classes"
e1 = new.env()
e2 = new.env()
# User sets fields of the "R6 classes"
e1$fn = ggplot2::facet_grid
e2$fn = ggplot2::facet_grid

# the srcref attribute is represented only once in memory
identical(attr(e1$fn, "srcref"), attr(e2$fn, "srcref"))
#> [1] TRUE
pryr::object_size(list(e1$fn, e2$fn))
#> 1.76 MB

# User saves the "R6 classes"
p1 = tempfile()
p2 = tempfile()
saveRDS(e1, p1)
saveRDS(e2, p2)

# User reloads the "R6 classes"
l1 = readRDS(p1)
l2 = readRDS(p2)

# reference identities are lost
identical(attr(l1$fn, "srcref"), attr(l2$fn, "srcref"))
#> [1] FALSE
# RAM usage blows up
pryr::object_size(list(l1$fn, l2$fn))
#> 4.17 MB

Created on 2023-10-17 with reprex v2.0.2

May I ask why the decision was made to add the --with-keep.source option?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants