diff --git a/articles/c-using-rix-to-build-project-specific-environments.html b/articles/c-using-rix-to-build-project-specific-environments.html index 9b9e5ed8..6d7a88c6 100644 --- a/articles/c-using-rix-to-build-project-specific-environments.html +++ b/articles/c-using-rix-to-build-project-specific-environments.html @@ -231,12 +231,12 @@
rix is an R package that leverages Nix, a powerful package manager focusing on reproducible builds. With Nix, it is possible to create project-specific environments that contain a project-specific version of R and R packages (as well as other tools or languages, if needed). This project-specific environment will also include all the required system-level dependencies that can be difficult to install, such as GDAL
for packages for geospatial analysis for example. This is how Nix installs software: it installs software as a complete “bundle” that include all of its dependencies, and all of the dependencies’ dependencies and so on. Nix is an incredibly useful piece of software for ensuring reproducibility of projects, in research or otherwise. For example, it allows you run web applications like Shiny apps or plumber APIs in a controlled environment, or run targets pipelines with the right version of R and dependencies, and it is also possible to use environments managed by Nix to work interactively using an IDE.
In essence, this means that you can use rix and Nix to replace renv and Docker with one single tool, but the approach is quite different: renv records specific versions of individual packages, while rix provides a complete snapshot of the R ecosystem at a specific point in time, but also snapshots all the required dependencies to make your project-specific R environment work. To ensure complete reproducibility with renv, it must be combined with Docker, in order to include system-level dependencies (like GDAL
, as per the example above).
Nix has a fairly high entry cost though. Nix is a complex piece of software that comes with its own programming language, which is also called Nix. Its purpose is to solve a complex problem: defining instructions on how to build software packages and manage configurations in a declarative way. This makes sure that software gets installed in a fully reproducible manner, on any operating system or hardware.
+rix is an R package that leverages Nix, a powerful package manager focusing on reproducible builds. With Nix, it is possible to create project-specific environments that contain a project-specific version of R and R packages (as well as other tools or languages, if needed). This project-specific environment will also include all the required system-level dependencies that can be difficult to install, such as GDAL
for packages for geospatial analysis for example. Nix installs software as a complete “bundle” that include all of the software’s dependencies, and all of the dependencies’ dependencies and so on. Nix is an incredibly useful piece of software for ensuring reproducibility of projects, in research or otherwise.
Some other use cases include, for example, running web applications like Shiny apps or plumber APIs in a controlled environment, or executing targets pipelines with the right version of R and dependencies, or use environments managed by Nix to work interactively using an IDE.
+In essence, this means that you can use rix and Nix to replace renv and Docker with one single tool, but the approach is quite different: renv records specific versions of individual packages, while rix provides a complete snapshot of the R ecosystem at a specific point in time, but also snapshots all the required dependencies to make your project-specific R environment work. In contrast, to ensure complete reproducibility with renv, it must be combined with Docker, in order to include system-level dependencies (like GDAL
, as per the example above).
Nix has a fairly steep learning curve though. Nix is a complex piece of software that comes with its own programming language, which is also called Nix. Its purpose is to solve a complex problem: defining instructions on how to build software packages and manage configurations in a declarative way, using functional programming principles. This makes sure that software gets installed in a fully reproducible manner, on any operating system or hardware, but with the caveat that users must learn the Nix programming language and get into the “functional programming approach to software management” mindset, which is unusual.
rix provides functions to help you write Nix expressions (written in the Nix language). These expressions will be the inputs for the Nix package manager, to build sets of software packages and provide them in a reproducible development environment. These environments can be used for interactive data analysis, or reproduced when running pipelines in CI/CD systems. The Nixpkgs collection includes currently more than 100.000 pieces of software available through the Nix package manager.
With rix, you can define development environments, or shells, that contain the required tools needed to analyze data using R. These environments are isolated from each other and project-specific: this means that a project can use one version of R and R packages, and another environment another version of R and R packages. However, extra care is required if you already have R installed through the usual method for your operating system, as these development environments are not totally isolated from the rest of your system. Unlike Docker, where a running container cannot acces anything from the host system, unless explicitely configured to do so, Nix development shells are nothing but environments that add more software to the list of already available software (the so-called PATH
). As such, it is possible to access anything (files and software) already present on the system from a running Nix shell. Thus, rix also provides a function called rix_init()
that helps isolate R sessions running inside Nix environments from the rest of your system. This avoids clashes between the Nix-specific library of R packages and the user library of R packages should you already have R installed and managed by the usual method for your operating system.
It is also possible to add any other software package available on Nixpkgs to a Nix environment, for example IDEs such as RStudio or VS Code. The Nix R ecosystem currently includes almost the entirety of CRAN and Bioconductor packages (there is around a hundred CRAN or Biocondcuctor packages that are unavailable through Nix). Like with any other programming language or software, it is also possible to install older releases of R packages, or install packages from GitHub at defined states, as well as local packages in the .tar.gz
format.
Now try to build an expression using rix()
:
-path_default_nix <- "."
+library(rix)
+
+path_default_nix <- "."
rix(r_ver = "4.3.3",
r_pkgs = c("dplyr", "ggplot2"),
@@ -172,6 +176,30 @@ Docker
Let’s start with arguably the most popular combo for reproducibility in the R ecosystem, Docker+renv (it is also possible to add rspm or bspm in combination to renv which will install the required system-level dependencies automatically).
+{renv} snapshots the state of the library of R packages for a project, nothing more, nothing less. It can then be used to restore the library of packages on another machine, but it is the user’s responsibility to ensure that the right version of R and system-level dependencies are available on that other machine. This is whay renv is often coupled with a versioned Docker image, such as the images from the Rocker project. Combining both provides a very robust way to serve applications such as Shiny apps, but it can be awkward to develop interactively with this setup, which is why most of the time, people work on their current setup, and dockerize the setup once when they’re done. However, you need to make sure to keep updating the image, as the underlying operating system will eventually reach end of life. Eventually, you might even have to update the whole stack as it could become impossible to install the version of R and R packages you used on a recent Docker image. This can be a good thing actually; it could be the opportunity to update your app and make sure that it benefits from the latest security patches. However for reproducibility in research, this is not something that you should be doing because it could have an impact on historical results.
+What we suggest instead, is to keep using Docker if you are already invested in the ecosystem, and continue to use it to deploy and serve applications and archive research. But instead of using renv to get the right packages, you combine Docker and Nix. This way, you have a nice separation of concerns: Docker will only be used as a platter to serve code, while the environment will be handled by Nix. You could even use an image that gets continuously updated such as ubuntu:latest
as a base: it doesn’t matter that the image is always changing, since the environment that will be doing the heavy lifting inside the container is completely reproducible thanks to Nix.
Exactly the same reasoning can be applied to groundhog, rang or the CRAN snapshots of Posit in combination to Docker instead of renv.
+Anaconda, Miniconda, Mamba, Micromamba… (henceforth we’ll refer to these as Conda) and Nix have much in common: they are multiplatform package managers and both can be used to setup reproducible development environments for many languages, such as R or Python. Using conda-lock one can generate fully reproducible lock files that can then be used by Conda to build the environment as defined in the lock file. The main difference between Conda and Nix is conceptual and might not seem that important for end-users: Conda is a procedural package manager, while Nix is a functional package manager. In practice this means that environments managed by Conda are mutable and users are not prevented from changing their environment interactively, and then re-generate the lock file. This is quite comfortable when working interactively, but can lead to issues where dependency management might get borked.
+In the case of Nix however, environments are immutable: you cannot add software into a running Nix environment. You will need to stop working, re-define the environment, rebuild it and then use it. While this might sound more tedious (it is) it forces users to work more “cleanly” and avoids many issues from dynamically changing an environment. If it is not possible to build that environment, it fails as early as possible and forces you to deal with the issue. A mutating environment could lead you into a false sense of safeness.
+Another major difference is that Conda does not include the entirety of CRAN nor Bioconductor, which is the case for Nix. According to Anaconda’s Documentation 6000 CRAN packages are available through Conda (as of writing in July 2024, CRAN has 21’000+ packages). Nix also includes almost all of Bioconductor packages, and Conda includes them trough the Bioconda project, however, we were not able to find if Bioconda contains all of Bioconductor. According to Bioconda’s FAQ, Bioconductor data packages are not included.
+Just like Nix, Guix is a functional package manager with a focus on reproducible builds. We won’t go into technical differences/similarities, but only to pratical ones for end-users of the R programming language. If you want to know about technical aspects, read this https://news.ycombinator.com/item?id=18910683. The main shortcoming of Guix for R users is that not all CRAN or Bioconductor packages are included, nor is Guix available on Windows or macOS.
+Refer to Contributing.md
to learn how to contribute to the package.
Thanks to the Nix community for making Nix possible, and thanks to the community of R users on Nix for their work packaging R and CRAN/Bioconductor packages for Nix (in particular Justin Bedő, Rémi Nicole, nviets, Chris Hammill, László Kupcsik, Simon Lackerbauer, MrTarantoga and every other person from the Matrix Nixpkgs R channel).
+Finally, thanks to David Solito for creating rix’s logo!