Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunks index caching #8403

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Sep 21, 2024

borg compact uses ChunkIndex (a specialized, memory-efficient data structure), so it needs less memory now. Also, it saves that chunks index to cache/chunks in the repository.

When the chunks index is needed, it is first tried to get it from cache/chunks and only fall back to building the chunks index via repository.list() (which can be rather slow).

borg check --repair currently just invalidates the chunks cache.

borg create updates the chunks cache.

@ThomasWaldmann
Copy link
Member Author

Code is a bit less pretty now, but more efficient. Also less stats.

Copy link

codecov bot commented Sep 21, 2024

Codecov Report

Attention: Patch coverage is 89.38053% with 12 lines in your changes missing coverage. Please review.

Project coverage is 81.54%. Comparing base (bd6caf8) to head (36e3d63).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/borg/archiver/compact_cmd.py 81.13% 5 Missing and 5 partials ⚠️
src/borg/cache.py 95.91% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8403      +/-   ##
==========================================
+ Coverage   81.44%   81.54%   +0.10%     
==========================================
  Files          70       70              
  Lines       12739    12791      +52     
  Branches     2311     2318       +7     
==========================================
+ Hits        10375    10431      +56     
+ Misses       1707     1703       -4     
  Partials      657      657              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ThomasWaldmann ThomasWaldmann changed the title compact: build and cache a fresh chunks index, see #8397 chunks index caching, see #8397 Sep 23, 2024
@mirko
Copy link

mirko commented Sep 23, 2024

Is this (also) addressing the issue of listdir() being called for every (sub-)directory in data/ for every borg create-run?

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 24, 2024 via email

borg compact now uses ChunkIndex (a specialized, memory-efficient data structure),
so it needs less memory now. Also, it saves that chunks index to cache/chunks in
the repository.

When the chunks index is needed, it is first tried to get it from cache/chunks.
If that fails, fall back to building the chunks index via repository.list(),
which can be rather slow and immediately cache the resulting ChunkIndex in the
repo.

borg check --repair currently just deletes the chunks cache, because it might
have deleted some invalid chunks in the repo.

cache.close now saves the chunks index to cache/chunks in repo if it
was modified.
thus, borg create will update the cached chunks index with new chunks.

cache/chunks_hash can be used to validate cache/chunks (and also to validate /
invalidate locally cached copies of that).
@ThomasWaldmann ThomasWaldmann changed the title chunks index caching, see #8397 chunks index caching Sep 24, 2024
@ThomasWaldmann ThomasWaldmann merged commit 7d02fe2 into borgbackup:master Sep 24, 2024
16 checks passed
@ThomasWaldmann ThomasWaldmann deleted the cache-chunkindex branch September 24, 2024 21:37
@ThomasWaldmann
Copy link
Member Author

@mirko merged this.

also found another issues that it was doing one full repo.list too much, PR incoming soon.

so, master branch should be quite a bit faster now.

only check and compact are expected to always do the repository.list(), just to be on the safe side and not rely on caches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants