Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move CI setup to pixi #2323

Open
wolfv opened this issue Oct 2, 2024 · 39 comments · May be fixed by conda-forge/conda-smithy#2099
Open

Move CI setup to pixi #2323

wolfv opened this issue Oct 2, 2024 · 39 comments · May be fixed by conda-forge/conda-smithy#2099

Comments

@wolfv
Copy link
Member

wolfv commented Oct 2, 2024

Currently, when the CI starts, it does a number of things:

mamba install --update-specs --yes --quiet --channel conda-forge --strict-channel-priority \
    pip mamba rattler-build conda-forge-ci-setup=4 "conda-build>=24.1"
mamba update --update-specs --yes --quiet --channel conda-forge --strict-channel-priority \
    pip mamba rattler-build conda-forge-ci-setup=4 "conda-build>=24.1"

On macOS / Windows it would start with setting up a Miniforge, and then installing the same:

macOS script
MINIFORGE_URL="https://github.com/conda-forge/miniforge/releases/latest/download"
MINIFORGE_FILE="Mambaforge-MacOSX-$(uname -m).sh"
curl -L -O "${MINIFORGE_URL}/${MINIFORGE_FILE}"
rm -rf ${MINIFORGE_HOME}
bash $MINIFORGE_FILE -b -p ${MINIFORGE_HOME}

( endgroup "Installing a fresh version of Miniforge" ) 2> /dev/null

( startgroup "Configuring conda" ) 2> /dev/null

source ${MINIFORGE_HOME}/etc/profile.d/conda.sh
conda activate base
export CONDA_SOLVER="libmamba"
export CONDA_LIBMAMBA_SOLVER_NO_CHANNELS_FROM_INSTALLED=1

mamba install --update-specs --quiet --yes --channel conda-forge --strict-channel-priority \
    pip mamba conda-build boa conda-forge-ci-setup=4
mamba update --update-specs --yes --quiet --channel conda-forge --strict-channel-priority \
    pip mamba conda-build boa conda-forge-ci-setup=4
Windows script
:: Activate the base conda environment
call activate base
:: Configure the solver
set "CONDA_SOLVER=libmamba"
if !errorlevel! neq 0 exit /b !errorlevel!
set "CONDA_LIBMAMBA_SOLVER_NO_CHANNELS_FROM_INSTALLED=1"

:: Provision the necessary dependencies to build the recipe later
echo Installing dependencies
mamba.exe install "python=3.10" pip mamba conda-build boa conda-forge-ci-setup=4 -c conda-forge --strict-channel-priority --yes
if !errorlevel! neq 0 exit /b !errorlevel!

This could all be done in one swift step with pixi. Pixi is a single binary, that can be dropped anywhere, and that can either resolve + install, or use a lockfile for even faster & more controlled installation.

We could create / maintain the pixi.toml + lockfile externally to the feedstocks.

Pixi + rattler-build have the added benefit that they share the cache (repodata & packages) but that is a minor concern.

@jaimergp
Copy link
Member

jaimergp commented Oct 2, 2024

We would need to accommodate for remote_ci_setup, which allows to modify the base environment. So conda-smithy would need to encode something like this:

{% for pkg in remote_ci_setup %}
pixi add pkg
{% endfor %}
pixi install

An argument for keeping Miniforge around is that we were installing a conda+Python distribution anyway so why not just use that, but that might not be as relevant these days.

conda-smithy has conda_install_tool to control for these things for a while now, so you could even start the PR and let people opt in by changing that in conda-forge.yml.

@beckermr
Copy link
Member

beckermr commented Oct 2, 2024

micromamba is a single binary too, would be a smaller perturbation on the existing code, and would likely get us most of the extra efficiencies here.

Note that any of these changes will effect all folks who run build-locally.py as well, which is an important consideration since the code runs not just in CI but on people's machines.

@xhochy
Copy link
Member

xhochy commented Oct 2, 2024

I don't think this is a tool question but rather the approach to installing versions. In the end, the choice also will lead to the different tools as each is better tailored for both approaches. I think our choice is

  1. We want to use the latest available versions (and use *mamba).
  2. We want to use the exact pinned versions (and use pixi).

Pinning brings in speed and reproducibility at the cost of maintaining lockfiles, especially in the case where you have remote_ci_setup. Personally, I would see the overhead as feasible as updating the lockfiles could be handled as part of a conda-smithy rerender.

@beckermr
Copy link
Member

beckermr commented Oct 2, 2024

micromamba also handles lock files, but preserves the more traditional conda env workflow many people are used to.

@xhochy
Copy link
Member

xhochy commented Oct 2, 2024

An alternative could also be to create an installer that already contains the base installation and use that instead of Miniforge. This is closer to our current approach and also includes some locking.

@jaimergp
Copy link
Member

jaimergp commented Oct 2, 2024

An alternative could also be to create an installer that already contains the base installation and use that instead of Miniforge.

I've been thinking about this for a bit because right now Miniforge release cycle is coupled to the operational status of ALL feedstocks. In the past we've been blocked by boa not being compatible with the latest conda, etc.

That said, at that point we can also put that effort to switch to a single-binary provider, be it Pixi or micromamba.

We just need to write down which packages are needed in an environment.yml or a pixi.toml, then run that to provision the "build tool environment". Lockfiles are a separate conversation, in a way.

The "traditional env management" point I don't get, though. We are only creating a new environment on each CI run, which happens to be base in the Mini(conda|forge) world. Even with build-locally.py it simply calls the build scripts, which means delegating to a Docker image, or even a fresh Miniforge installation on macOS. So we "only" need to point the install tool to the installation location we have used so far.

@beckermr
Copy link
Member

beckermr commented Oct 2, 2024

My comment on the traditional env management is in a way related to how we as developers want to maintain smithy and what our mental model for it is precisely. This interacts with users when they want to build locally and also debug builds using conda debug IIUIC.

@jaimergp
Copy link
Member

jaimergp commented Oct 2, 2024

From our core call today, the two items we seem to be tackling here are:

  • Reducing the overheads of provisioning the build tool (conda-build, rattler-build) and the related tools (ci_setup, etc). For this we will need some timings first (I'll post a table soon). The ideas are:
    • Avoiding Miniforge install times and using a single binary instead (micromamba, pixi). However, micromamba and pixi would need to download packages already included in Miniforge.
    • OR Creating an installer just to provision things
    • On top of that, we can revisit the notion of using lockfiles for feedstocks, while still allowing some flexibility or integrating it in the rerender process.
  • A task orchestration tool for feedstocks so it could replace build-locally.py as well as providing some convenient shortcuts for common operations like linting or rerendering.

@jaimergp
Copy link
Member

jaimergp commented Oct 2, 2024

On staged-recipes, I took the logs from conda-forge/staged-recipes#27748

  • Linux: 52s = 26s (pull Miniforge image) + 26s (just install deps)
2024-10-02T15:42:53.8611837Z + docker pull quay.io/condaforge/linux-anvil-cos7-x86_64
2024-10-02T15:43:19.2033102Z + docker run -v /home/vsts/work/1/s:/home/conda/staged-recipes -e HOST_USER_ID=1001 -e AZURE=True -e CONFIG -e CI -e CPU_COUNT -e DEFAULT_LINUX_VERSION quay.io/condaforge/linux-anvil-cos7-x86_64 bash /home/conda/staged-recipes/.scripts/build_steps.sh
2024-10-02T15:43:20.2050451Z + conda install --quiet --file /home/conda/staged-recipes/.ci_support/requirements.txt
2024-10-02T15:43:46.3255091Z + setup_conda_rc /home/conda/staged-recipes /home/conda/staged-recipes-copy/recipes /home/conda/staged-recipes-copy/.ci_support/linux64.yaml
  • macOS: 1m30s = 3 seconds (Download Miniforge) + 20s (Install Miniforge) + 67s (install deps)
2024-10-02T15:43:06.2974070Z + curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh
2024-10-02T15:43:09.8784060Z + bash Miniforge3-MacOSX-x86_64.sh -bp /Users/runner/Miniforge3
2024-10-02T15:43:28.3250310Z + /Users/runner/Miniforge3/bin/conda install --quiet --file .ci_support/requirements.txt
2024-10-02T15:44:35.9780640Z + setup_conda_rc ./ ./recipes ./.ci_support/osx64.yaml
  • Windows: 3min 5s = 3 seconds (Download Miniforge) + 60s (Install Miniforge) + 122s (install deps)
2024-10-02T15:44:08.7940784Z Installing dependencies
2024-10-02T15:46:14.4882501Z Setting up configuration

@jaimergp
Copy link
Member

jaimergp commented Oct 2, 2024

On a feedstock, I took the logs from libignition-physics (Unix) and mamba (Windows):

  • Linux: 59s = 29s (pull Docker image) + 30s (install and update deps)
2024-10-02T19:25:18.4151535Z + docker pull quay.io/condaforge/linux-anvil-cos7-x86_64
2024-10-02T19:25:47.8172848Z + docker run -v /home/vsts/work/1/s/recipe:/home/conda/recipe_root:rw,z,delegated -v /home/vsts/work/1/s:/home/conda/feedstock_root:rw,z,delegated -e CONFIG -e HOST_USER_ID -e UPLOAD_PACKAGES -e IS_PR_BUILD -e GIT_BRANCH -e UPLOAD_ON_BRANCH -e CI -e FEEDSTOCK_NAME -e CPU_COUNT -e BUILD_WITH_CONDA_DEBUG -e BUILD_OUTPUT_ID -e flow_run_id -e remote_url -e sha -e BINSTAR_TOKEN -e FEEDSTOCK_TOKEN -e STAGING_BINSTAR_TOKEN quay.io/condaforge/linux-anvil-cos7-x86_64 bash /home/conda/feedstock_root/.scripts/build_steps.sh
2024-10-02T19:25:48.6216131Z + mamba install --update-specs --yes --quiet --channel conda-forge --strict-channel-priority pip mamba conda-build conda-forge-ci-setup=4 'conda-build>=24.1'
2024-10-02T19:26:18.8514019Z + mamba update --update-specs --yes --quiet --channel conda-forge --strict-channel-priority pip mamba conda-build conda-forge-ci-setup=4 'conda-build>=24.1'
2024-10-02T19:26:27.3112445Z + setup_conda_rc /home/conda/feedstock_root /home/conda/recipe_root /home/conda/feedstock_root/.ci_support/linux_64_.yaml
  • macOS: 1m3s = 2s (Download Miniforge) + 14s (install Miniforge) + 47s (install + update deps)
2024-10-02T19:25:17.0039930Z + curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh
2024-10-02T19:25:19.3452300Z + bash Miniforge3-MacOSX-x86_64.sh -b -p /Users/runner/miniforge3
2024-10-02T19:25:33.0164800Z + mamba install --update-specs --quiet --yes --channel conda-forge --strict-channel-priority pip mamba conda-build conda-forge-ci-setup=4 'conda-build>=24.1'
2024-10-02T19:26:14.3050000Z + mamba update --update-specs --yes --quiet --channel conda-forge --strict-channel-priority pip mamba conda-build conda-forge-ci-setup=4 'conda-build>=24.1'
2024-10-02T19:26:20.2779970Z + setup_conda_rc ./ ./recipe ./.ci_support/osx_64_.yaml
  • Windows: 4m22s = 7s (Download Miniforge) + 1m30s (install Miniforge) + 2m45s (install deps)
2024-10-02T18:52:46.6814736Z Installing dependencies
2024-10-02T18:55:00.1965077Z Setting up configuration

@jaimergp
Copy link
Member

jaimergp commented Oct 2, 2024

Some timings for a micromamba-only replacement on macOS and Windows: conda-forge/staged-recipes#27753. Not much of a difference on macOS, but it's much faster on Windows!

@isuruf
Copy link
Member

isuruf commented Oct 2, 2024

I'd expect windows to be much faster because of the parallel download and extraction. We would get the same benefit with mamba 2.0.

@jaimergp
Copy link
Member

jaimergp commented Oct 2, 2024

For Pixi, both macOS and Windows take under 30s from scratch:

conda-forge/staged-recipes#27754 (comment)

@beckermr
Copy link
Member

beckermr commented Oct 2, 2024

When you said micromamba was about the same for osx, what was the number? I'm curious.

@jaimergp
Copy link
Member

jaimergp commented Oct 3, 2024

On staged-recipes macOS, the Miniforge approach takes 1m10s to 1m30s. With micromamba, that goes down to 1min; with Pixi, 30s.

The differences with Windows are striking: from ~4mins to under a minute (micromamba) or even <30s (pixi).

@jaimergp
Copy link
Member

jaimergp commented Oct 3, 2024

We would get the same benefit with mamba 2.0.

On Windows, you get <1m with micromamba v1, <2m with micromamba v2, <30s with Pixi.

@wolfv
Copy link
Member Author

wolfv commented Oct 3, 2024

We would get the same benefit with mamba 2.0.

@isuruf unfortunately, no. With pixi / rattler the linking is done in a completely parallelized / pipelined way using async Rust (e.g. the whole download -> extraction -> linking per package is done in one go). Clobber issues are resolved after the transaction has executed (vs. ordered installation as in mamba / conda).

So it's not yet possible to reach the same speeds with mamba / conda.

We could build something bespoke with py-rattler though that would reach the same speeds.

@beckermr
Copy link
Member

beckermr commented Oct 3, 2024

Thanks @wolfv @jaimergp and nice work all around!

It appears that micromamba would be an easy win now and we could basically drop it in. We'd need to ensure it uses the same cache as conda/mamba in the docker container.

It also appears we should either move to pixi or a micromamba-like tool built on the same components.

@wolfv Is it possible to use pixi to create an env that has a name (and isn't a global env)? That'd make it easier to work with inside of smithy now I think, though I don't think this is a blocker. The osx and Linux builds share the same env management commands and we mount the feedstock dir and recipe dir separately which could make deciding on a directory for the env a bit tricky.

@ruben-arts
Copy link

@beckermr Here is some information on your question for pixi:

You can activated an environment with run, shell or shell-hook. These are not possible to be named (yet) but you can specify a --manifest-path which is just a bit more verbose but doesn't require to be in a directory.

After activation all pixi commands will act like they are in that project. So you can do:

> pixi shell --manifest-path /path/to/pixi.toml
(env) > pixi run your_command

To activate similarly to conda activate you could use eval "$(pixi shell-hook --manifest-path /path/to/pixi.toml)".

@isuruf
Copy link
Member

isuruf commented Oct 3, 2024

On Windows, you get <1m with micromamba v1, <2m with micromamba v2, <30s with Pixi.

This is suspicious. micromamba v2 takes twice that of micromamba v1 ?

@isuruf unfortunately, no.

I was talking about the difference between mamba/conda and micromamba. I don't understand why you are trying to talk about pixi/rattler-build.

@jaimergp
Copy link
Member

jaimergp commented Oct 3, 2024

micromamba v2 takes twice that of micromamba v1 ?

Might be related to simdjson parsers not being so optimized on Windows (e.g. simdjson/simdjson#847). 1.x used libsolv parsers, IIRC. I think there's a flag for that, let me check 🤔 Edit: nope, didn't change much.

@jaimergp
Copy link
Member

jaimergp commented Oct 3, 2024

Turns out that part of the slowdown in Windows is due to installing to C drive. Changing to D cuts it in half. See conda-forge/conda-smithy#2076 (comment).

@jaimergp
Copy link
Member

jaimergp commented Oct 3, 2024

Let's summarize more or less what I've found out today (no lockfiles):

Platform Installer Time to provision
Windows Miniforge (C:) 1 ~3-4 minutes
Windows Miniforge (D:) 2 ~1.5-2 minutes
Windows Micromamba v1 (C:) 3 ~1min
Windows Micromamba v1 (D:) 3 ~1min
Windows Micromamba v2 (C:) 3 <2mins
Windows Micromamba v2 (D:) 3 ~1min
Windows Pixi (D:) 4 26s
macOS Miniforge 1 1-1.5min
macOS Micromamba 3 50s
macOS Pixi 4 24s

I think the key points we can enforce now without too much controversy is:

Pixi would be awesome too for a sub-30s deploy, but it will require a bigger overhaul of how the infra is set up.


Footnotes

  1. https://github.com/conda-forge/conda-smithy/pull/2076#issuecomment-2391665013 2

  2. https://github.com/conda-forge/conda-smithy/pull/2076

  3. https://github.com/conda-forge/staged-recipes/pull/27753, https://github.com/conda-forge/conda-smithy/pull/2075 2 3 4 5

  4. https://github.com/conda-forge/staged-recipes/pull/27754. Note we can't choose the target directory (yet?). 2

@isuruf
Copy link
Member

isuruf commented Oct 3, 2024

Consider using micromamba later to save some more 30s here and there:

Does it actually work when building packages? I thought that since the caches aren't shared, this is only moving the cost to a later stage.

@jaimergp
Copy link
Member

jaimergp commented Oct 3, 2024

I think we can set CONDA_PKGS_DIRS accordingly in Windows. In macOS, we are using ~/.conda so that should be ok. The caches should be compatible, I hope.

This can be checked with some carefully tuned deps in meta.yaml and then check the logs in debug mode.

@ruben-arts
Copy link

Is there a specific set of features you miss to move to pixi?

Note we can't choose the target directory (yet?).

You could detach environments from their folder with pixi: https://pixi.sh/latest/reference/pixi_configuration/#detached-environments. pixi config set --global detached-environments "/where/ever/you/require" Would that be enough? This can be local or global to the project or machine.

@beckermr
Copy link
Member

beckermr commented Oct 9, 2024

Well @ruben-arts it'd be nice to have a drop-in replacement for conda based on the rattler tools. Then transitions would be very easy for us.

@baszalmstra
Copy link
Member

baszalmstra commented Oct 11, 2024

Btw the latest version of pixi (0.32.1) (and py-rattler) should be a lot faster again. We landed some very significant solver improvements. :)

@jaimergp
Copy link
Member

You could detach environments from their folder with pixi: https://pixi.sh/latest/reference/pixi_configuration/#detached-environments.

That's only for .pixi/envs right? The current workflow assumes a single environment for conda-build, which gets installed to e.g. ~/Miniforge3. The idea would be to have pixi create .pixi/envs/default to ~/Miniforge3 but this is not possible right now, correct?

Otherwise I guess we'd need to craft something with py-rattler but then we are back to Python bootstrapping land and we'd need to use something like CI's Python + pip to provide our installer framework (instead of pixi).

@zooba
Copy link

zooba commented Oct 15, 2024

  • Moving Windows builds to D: for a 2x speed

If you want to experiment with https://github.com/marketplace/actions/setup-dev-drive as well, you may be able to get even more speed increase (that's not my action, I don't think I actually know the creator of it, but I did help with the underlying functionality, and the action's implementation looks reasonable at a quick glance).

Basically, the Windows OS drive does a lot of processing on every file access that any other drive will (probably) not do, and a Dev Drive is even more optimised for this kind of use. Hopefully, one day Actions will use a Dev Drive by default, but I don't think they've enabled that yet.

@jaimergp
Copy link
Member

Is the Dev Drive functionality available on Azure too? Maybe we can replicate the action there.

@zooba
Copy link

zooba commented Oct 15, 2024

It needs a recent enough Windows version, that's all. I don't know the exact build number (probably around 10.0.26000). Even without it, that action will give you a similar speed up.

uv is using their own script, which will be more portable than the Action. Again, they're falling back to a VHDX for now, but it only needs a -DevDrive added to the Format-Volume command for the extra boost (it'll fail if it's not available, so it could be handled with an error handler, but I guess they decided not to do that for one reason or another).

@jaimergp
Copy link
Member

That's amazing, thanks, I'll add an issue to conda-smithy so this is tracked later!

@jaimergp
Copy link
Member

I think we can set CONDA_PKGS_DIRS accordingly in Windows. In macOS, we are using ~/.conda so that should be ok. The caches should be compatible, I hope.

This can be checked with some carefully tuned deps in meta.yaml and then check the logs in debug mode.

I double checked and we don't need to re-set CONDA_PKGS_DIRS. It's enough to move pkgs/ into MINIFORGE_HOME once the base env is ready. See conda-forge/dav1d-feedstock#21 (comment) for details. TLDR: It Just Works ✨

@ruben-arts
Copy link

The idea would be to have pixi create .pixi/envs/default to ~/Miniforge3 but this is not possible right now, correct?

Correct, if a --target-prefix would be all you need then I would gladly implement that. As we already support the detached environments this shouldn't be to difficult.

@jaimergp
Copy link
Member

I can't promise that's "everything" we need to implement a provider based on Pixi, but it would definitely get us closer, imo.

@jaimergp
Copy link
Member

I've written some basic pixi integrations in smithy at conda-forge/conda-smithy#2099. You can see them in action at conda-forge/dav1d-feedstock#21.

Where is the pixi pkgs cache located? Does it use the same format and hashing mechanism for files as conda and mamba do? I'd like to repurpose the /opt/conda/pkgs cache already contained in the Docker container to save some MBs, so I was thinking of maybe moving it in place.

@baszalmstra
Copy link
Member

I think rattler uses a very similar cache layout/format as mamba and conda. Although rattler does have a different mechanism to lock the cache in case of multiprocess action. I havent tested this in a while though.

You can configure the cache location with the RATTLER_CACHE_DIR environment variable. See https://pixi.sh/latest/features/environment/#caching

@jaimergp
Copy link
Member

Oh nice, there's an env var. This is what I tried, but unfortunately it fails to lock:

+ pushd /home/conda/feedstock_root
+ ln -s /opt/conda/pkgs/cache /opt/conda/repodata
~/feedstock_root ~
+ echo 'Creating environment'
+ PIXI_CACHE_DIR=/opt/conda
+ pixi install
Creating environment
ERROR error=failed to acquire a lock on the repodata cache
ERROR error=failed to acquire a lock on the repodata cache
ERROR error=failed to acquire a lock on the repodata cache
ERROR error=failed to acquire a lock on the repodata cache
ERROR error=failed to acquire a lock on the repodata cache
ERROR error=failed to acquire a lock on the repodata cache
  × failed to acquire a lock on the repodata cache
  ├─▶ failed to open: /opt/conda/repodata/3018e552.lock
  ╰─▶ File exists (os error 17)

Note that conda and mamba put the repodata cache under $CONDA_ROOT/pkgs/cache, but pixi expects it under $CONDA_ROOT/repodata, so I had to symlink. Mentioning it in case it makes a difference, @baszalmstra. Is it possible to disable locking since we are running on CI?

@jaimergp jaimergp linked a pull request Oct 18, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

8 participants