-
-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infrastructure update for Channel Cloning/CDN sync #2272
Comments
Thank you for all your work on this! 🙏 |
The initial clone on the new infrastructure started on 12:34 UTC and is currently ongoing. I'll update here again when it is done. |
The initial clone finished uploading and purging the Cloudflare cache at 13:18, we are now enabling the scheduled runs. |
We are aware that the update frequency is lower than expected and are working to improve it. |
We have improved the update frequency to be under 20 minutes now. We will continue to look into improvements this week, as we already have ideas for that. |
Neat to be able to check https://conda.anaconda.org/conda-forge/last-updated.json now :) |
Cool, I'll update the status page! |
Am happy this went so smoothly! 😄 Thanks to everyone at Anaconda who deployed these updates! 🙏 Looking forward to hearing about additional planned improvements 😉 |
Yesterday, we improved the performance of the anaconda.org application, allowing us to put more load on it now with the channel cloning. With this, we were able to parallelize more parts of the cloning process, so it now takes 13-14 minutes for each run. There is some cleanup left for this migration (cleaning out unneeded cache files from S3), which we'll do early next week after reviewing the affected files carefully. |
We have identified some package archives that would be cleaned up since they don't appear in any subdir's repodata. To prevent the deletion of any archives that are still required, I will reach out to the maintainers of the affected packages in the upcoming days to clarify if the archives can be deleted or are still required. I will link these discussions to this issue so they're easy to find. |
Can you cc core as well? It'd be good for us to review any decisions. We do not delete packages. |
@beckermr Is it |
You'd need to join conda forge to ping us. Can you post a list of the packages here? |
@beckermr Sure! GeneralNone of these archives appear in the repodata, so the cloning process considers them unneeded and would delete them once we activate the feature that deletes unneeded files. Note that while these files would be removed from the clone, they would still be available to be downloaded from the anaconda.org frontend manually. Files to be deletedlightgbmNotes: Version
halideNotes:
halide-pythonNotes:
matplotlibNotes: 29 .conda archives for version 3.9.0 would be deleted
matplotlib-baseNotes: 29 .conda archives for version 3.9.0 would be deleted
Other packages, labeled broken/corruptedNotes: The following archives exist, but are labeled as broken or corrupted
Everything elseNotes: These exist and would be deleted, nothing else about them stands out.
|
What does this mean precisely? Do you mean the |
Other notes. We never delete packages in conda-forge, so even packages with broken labels need to be retained. |
The cloning process downloads the repodata.json for each subdir from the anaconda.org backend service directly and then processes them. I reproduced these HTTP requests and parsed the repodata.json locally, checking for all the package archives listed above. None of them appear in these repodata.json files. Since these archives are not in the repodata.json that the clone downloads, they are also not contained in the generated repodata_from_packages.json. From the point of view of the clone, they simply do not exist, so once the cleanup would be enabled, it would remove these archives from the S3 bucket backing the CDN. They would still remain stored on the anaconda.org backend and would also continue to be downloadable via the anaconda.org web frontend. However, they would not be downloadable anymore via anaconda.org. I’m opening an issue internally so that we can investigate why these files are not contained in the repodata.json. In the meantime, we will leave the clone running with the cleanup feature disabled so that conda-forge stays unaffected. |
Your question:
Hey everyone,
we (Anaconda) have been working on upgrades for the channel cloning/CDN sync infrastructure of anaconda.org, which is is used for conda-forge.
Time planning
We are planning to switch to the new infrastructure tomorrow, 2024-08-27 at 12:30 UTC.
No downtime is expected for users.
Due to the amount of changes on the conda-forge channel, the initial sync after the switch to the new infrastructure will be a bit slower than the usual syncs - it is currently running in a "dry run" mode on the new infrastructure already to keep its caches up to date, and I expect the initial sync on the new infrastructure to take ~30 minutes.
We have already implemented this change for the r and anaconda channels to ensure that our maintenance process and the new infrastructure are working as desired.
What's changing?
last-updated.json
file at the channel root, which contains the last update time both as UNIX and ISO8601 timestamps. We are using this directly to monitor the updates end-to-end. Check it out for the anaconda channel if you're interested!Other notes
This topic was discussed in the conda-forge meeting on 2024-08-21 and it was decided to bring it up here to raise awareness.
If there's any questions, please ask!
The text was updated successfully, but these errors were encountered: