-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file size check #1491
Add file size check #1491
Conversation
Remark: I adapted the magpie approach here. Discussion points:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the large files in the repository
$ git ls-tree -r HEAD --name-only | xargs ls -sh | sort -k 1hr,1 | head -n 20
572K modules/31_fossil/exogenous/input/p31_fix_fuelex.put
216K core/input/grid/EDGAR_CO2_2010_excl_agr_and_intl_transport.mz
216K core/input/grid/EDGAR_CO2.mz
212K core/input/grid/EDGAR_SO2_2005_excl_agr_and_intl_transport.mz
212K core/input/grid/EDGAR_SO2.mz
204K config/scenario_config_21_EU11_Fit_for_55_sensitivity.csv
200K tutorials/figures/git-7-pull-request-github-1.PNG
172K tutorials/figures/git-8-pull-request-github-2.PNG
148K tutorials/figures/appResults_window.png
120K core/sets.gms
112K scripts/output/comparison/notebook_templates/AriadneComparison.Rmd
108K main.gms
I feel it should be easy to move some stuff to mrremind
and go to maybe 250 KB.
Also, could this not be put under the .github/
directory? The Remind directory is cluttered as it is …
Yes, happy to do that
True, we can look into moving stuff. For some of the listed files, I have ideas. For others, not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having looked into this a bit, I would call this a workflow or something. It is not a git hook, and should not be confused with one.
- Do we want to restrict the check to new files only? In that case, we need to come up with a clever way to define "new files". Maybe new files in comparison to develop?
git ls-tree -r HEAD --name-only
"lists the contents of a given tree object", i.e. the files currently existing in the repository.
git diff --cached --name-only
lists "the changes you staged for the next commit" i.e. the new files.
So drop the first to just get changed files.
Better yet, use substring(grep("^[AM]", system("git status --short --porcelain", intern = TRUE), value = TRUE), first = 4)
to get only files that have been either added or modified, and not rely on the fact that deleted files do not exist and do not have a file size (out <- out[!is.na(out$size), ]
) ;)
- If we want to reduce file size limit, but not restrict it to new files, we must come up with a way to deal with files currently exceeding the limit: either fixing the files or defining hard-coded exceptions comes to mind.
I do not see the point in checking the size of all files for every single merge request. Looks like some sort of code atavism to me.
From what I can see, the problem with both your approaches is that the only work for the latest commit. There is one alternative, to make this check part of these checks in pre-commit.ci using the hook "check-added-large-files". If this works as I hope it would, there should be a check of file size for all new files when opening a PR. |
Looks like pre-commit check won't work as expected: https://results.pre-commit.ci/run/github/226360184/1701792228.w1y28gzhSL-kQ9aVwps_zw It is just fine with me adding a 14 MB mif on a test branch. |
LOL.
|
Sorry for the confusion, this problem does not concern the code added in this PR. It concerns an alternative approach I tried here and mentioned in the second quote above. It is not used by MAgPIE, and does not seem to work for us, so just ignore this. So to answer your questions
The approach of this PR works as expected (see below) and @LaviniaBaumstark will check what to do with the files larger 250kB and in the best case we can decrease the limit. |
regarding the biggest file: I would suggest to delete the whole realization https://github.com/remindmodel/remind/tree/develop/modules/31_fossil/exogenous @nicobauer? |
Ok, since this is a working solution, I'd say we merge this for now and lower the threshold later as we delete/move files. I am open to more elegant solutions replacing this later, but feel like this simple solution works well for now. |
I agree |
Oh, I missed that one. Sorry.
Depends on when the workflow is run. Merges in git (other then fast-forwarding) work by applying all changes like a patch, stage them, and then commit them, which is why you only get one merge commit. So So, the difference between hooks and workflows is not trivial. |
I see, the workflow is currently triggered by push and pull request events. See
So I'd say, it would not be a proper commit hook in any case with the current solution. |
Purpose of this PR
Following up on this discussion, this PR introduces a simple file size limit check for all files currently under version control. It is part of the workflow test-code that is run for any PRs and commits on main, master and develop branches.
The 600 kB limit is chosen to exceed the current largest file
modules/31_fossil/exogenous/input/p31_fix_fuelex.put (570.67kB)
Type of change
Checklist: