Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace non-custom flake8 linting with ruff #21037

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

huonw
Copy link
Contributor

@huonw huonw commented Jun 9, 2024

This swaps all flake8 usage to ruff via the built-in Ruff backend, except for the custom PNT20 and PNT30 custom rules.

We still have to run flake8, and it does not run much faster (still ~10s). So, this is doing "strictly" more work than previously: run flake8 and ruff.

However this change does open the door for easily enabling some additional lints that are built-in to ruff, like #21018, rather than requiring searching out and installing a flake8 plugin.

I did some basic experiments running

  • flake8 on main: pants --no-local-cache lint --stats-log --only=flake8 ::
  • ruff on this PR: pants --no-local-cache lint --stats-log --only=ruff-check ::
timing flake8 ruff
building requirements/PEX (A) 0.8 0.9
local_process_total_time_run_ms (B) 27.757 8.643
time spent actually running the linter (B - A) ~27 7.7

Running ruff outside of pants runs much faster (e.g. ruff check . takes like 300ms, so I'd expect a lot of the 7.7 seconds is Pants overhead, e.g. setting up sandboxes or unzipping pexes or whatever. #18570 is potentially related.

Thoughts?

@huonw huonw added the category:internal CI, fixes for not-yet-released features, etc. label Jun 9, 2024
@sureshjoshi
Copy link
Member

If it's the same speed or faster, it's a +1 for me.

Question: What's the perf like once we get rid of flake entirely?

@huonw
Copy link
Contributor Author

huonw commented Jun 11, 2024

I did some basic experiments running

  • flake8 on main: pants --no-local-cache lint --stats-log --only=flake8 ::
  • ruff on this PR: pants --no-local-cache lint --stats-log --only=ruff-check ::
timing flake8 ruff
building requirements/PEX (A) 0.8 0.9
local_process_total_time_run_ms (B) 27.757 8.643
time spent actually running the linter (B - A) ~27 7.7

Running ruff outside of pants runs much faster (e.g. ruff check . takes like 300ms, so I'd expect a lot of the 7.7 seconds is Pants overhead, e.g. setting up sandboxes or unzipping pexes or whatever. #18570 is potentially related.

(Potentially there's some processes like discovering Python interpreters that are being included in the local_process_total_time_run_ms counter, but looking at the [INFO] Completed: Lint with `ruff check` ... lines shows 7 invocations that finish about a second after the [INFO] Completed: Building ruff.pex ... line, so ~7s seems plausible)

@sureshjoshi
Copy link
Member

sureshjoshi commented Jun 11, 2024

That's still a decent improvement, but clearly could be a lot more. There is something to also be said about creating sandboxes and whatnot for lint/check-based operations that won't mutate the environment.

Maybe something like the option to run those in the new workspace environments 🤷🏽

That's a can of worms all on its own, but overall, to answer the original question - it seems like swapping over is a win all around! And will likely become more of a win as we optimize some more steps.

@huonw huonw marked this pull request as ready for review June 20, 2024 04:39
# flake8-2020
"YTT",
# flake8-implicit-str-concat, but only on a single line (includes bytes)
"ISC001",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to add ISC002 to match NIC002 ? If it includes bytes too, I only found one instance of that (in a test), so it might be easier to fix it there

@lilatomic
Copy link
Contributor

That's still a decent improvement, but clearly could be a lot more. There is something to also be said about creating sandboxes and whatnot for lint/check-based operations that won't mutate the environment.

One thing to remember is that some linters are just formatter we run and check if they've modified any files

@sureshjoshi
Copy link
Member

One thing to remember is that some linters are just formatter we run and check if they've modified any files

Yeah, case-by-case - take the optimizations where we can, for those formatters that don't come baked with a "check" option, we can default to comparing sandbox code. As much as I hate splitting up code paths, if there is any non-trivial perf improvement, I generally think build tools should tend towards those options.

@huonw
Copy link
Contributor Author

huonw commented Jun 24, 2024

I think we can get rid of a lot of the sandbox overhead for fast tools (i.e. where the overhead of building sandboxes is a high proportion of the total time) by creating fewer sandboxes/putting more files in each one. See analysis/ideas in #18570 (comment)

In particular, running things in sandboxes has correctness advantages (e.g. handling of edits that happen concurrently with a running processes #21051 (comment)), that'd be nice to avoid losing by default, I think.

@sureshjoshi
Copy link
Member

From your linked comment, what is the cost of sandboxing X files on a modern computer? In my mind, if the cost to sandbox ever exceeds the cost of running the process, we'd probably want an escape hatch to run it in a workspace or skip the sandbox.

With a single format, it's not a huge deal, but a large monorepo with 20 tools across 4 languages - it really adds up.

Your equation from that linked comment I think is the right way, so long as the minimum number of sandboxes could be 0, not 1 (by default, 0 sandboxes would need to be opt-in though, as it's riskier to your concurrent edit points above).

Running fast tools without a sandbox in CI, for example, might be nice. And I can basically guarantee that I would always opt into whatever is fastest :)

@huonw
Copy link
Contributor Author

huonw commented Jun 24, 2024

Can I suggest we move the discussion of tweaking sandboxing behaviour to #18570 or somewhere else, doesn't feel relevant to this PR?

@sureshjoshi
Copy link
Member

Oh yeah, sure. I'm +1 for this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants