Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite it in Rust: Ruff as alternative Linter #28

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

DavidSchischke
Copy link

@DavidSchischke DavidSchischke commented Nov 7, 2023

Closes #22. EDIT: Also closes #26 now.

  • Option for choosing linter in cookiecutter setup
  • Automatic removal of unneeded files from other linter
  • Ruff used as non-local hook with according changes in pyproject.toml
  • No disabled options for ruff (tbd). See full list for reference

@@ -10,6 +10,7 @@ repos:
- id: end-of-file-fixer
exclude: '\.ipynb$'
- id: check-toml
exclude: '{{cookiecutter.repo_name}}/pyproject.toml'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you're excluding this, as we now use some templating in the file, correct?

To still make sure that the generated files pass our own quality standards, we should extend the tests with one variation that uses ruff.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the check-toml pre-commit hook tries to parse the toml within the project directory as-is, which leads to a flag because it still contains template strings. This is not necessarily related to ruff/pylint, but to the way the check itself is executed. Let's discuss how to solve this :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up our discussion, I have implemented that the tests now execute the pre-commit hooks. On top of that, I've added the option to specify different configurations to run the tests in. With these changes in place, I'd suggest to:

  1. Exclude everything on "{{cookiecutter.repo_name}}" from the top-level pre-commit configuration, as these are now validated in the tests (after cookiecutter initialized the repo)
  2. Add some configuration with Ruff to our permutations.

Copy link
Author

@DavidSchischke DavidSchischke Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Done in 0ccf8e3
  2. Done in f874c77. Two comments:
  • I took the liberty of adding separate tests for validating the contents of the python environment + jupyter & linter separately. I also refactored the validate gitlab function to be more generic and easily extendable if we want to integrate GitHub / Azure DevOps sometime in the future (add ci configuration files for azure devops #25)
  • Instead of hardcoding a set of configurations, I implemented a function that selects N random permutations of all possible configurations based on cookiecutter.json. See function get_all_possible_configuration_permutations() in tests/test_cookiecutter_template.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Not sure if I like the approach of choosing permutations at random. I think we want to have reproducible build and make sure that we cover each option at least once, right? However, we also don't want to test all permutations in the interest of keeping build times low... Let discuss to find a good tradeoff :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what this is good for. Can you explain? I guess it's somehow using one file or the other, but I don't understand how 🙈

Copy link
Author

@DavidSchischke DavidSchischke Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is related to a change in post_gen_project:

def remove_unused_linter_files(self):
with open(f"{self.temp_files_dir}/.manifest.yaml", "r", encoding="utf-8") as f:
manifest = yaml.safe_load(f)
for feature in manifest["features"]:
if not feature["enabled"]:
for resource in feature["resources"]:
os.remove(resource)

Basically, after the project is created, the type of linter is determined from manifest.yaml and the config for the other linter is deleted prior to committing. The procedure is taken from this cookiecutter issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main confusion here is the name of the folder .temp_ci_cd. I'd instead suggest something like .conditional_files and adjust the post generation script accordingly.

Instead of introducing yet another custom configuration file, we should move both configs (Pylint and Ruff) into that folder and adapt our ConditionalFileManager to only use the appropriate file. While we're at it, we might as well add the option to have no linter at all :)

Copy link
Author

@DavidSchischke DavidSchischke Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of moving everything to the ci cd folder and have adjusted the script accordingly in c2a45b0. I also refactored the get_conditional_file_manager() function to use match cases for the cookiecutter options, so extending it in the future (e.g., for GitHub / Azure DevOps pipelines) should be much easier now.

However, I feel that allowing the user to choose no linter at all goes against the spirit of the template. Happy to discuss this :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's do some testing with some existing projects to see if these settings are reasonable.

environment.yaml Outdated Show resolved Hide resolved
"""
cwd = os.getcwd()
os.chdir(env_dir)
PACKAGE_MANAGER.run_subprocess_in_env(env_name, ["pre-commit", "run", "--all-files"])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, this test fails in my branch but works in main. I am not sure which change caused this, as I don't think I changed env_dir or env_name. Will need to figure this out prior to merging

/Users/davidschischke/Documents/Code/data-science-project-template/tests/test_cookiecutter_template.py:171: in validate_pre_commit
    PACKAGE_MANAGER.run_subprocess_in_env(str(env_name), ["pre-commit", "run", "--all-files"])
/Users/davidschischke/Documents/Code/data-science-project-template/hooks/post_gen_project.py:26: in run_subprocess_in_env
    subprocess.run(full_cmd, check=True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

input = None, capture_output = False, timeout = None, check = True, popenargs = ([PosixPath('/opt/homebrew/Caskroom/mambaforge/base/bin/mamba'), 'run', '-n', 'pytest_59290224-c999-45b9-8a17-8655142c1acd-env', 'pre-commit', 'run', ...],), kwargs = {}
process = <Popen: returncode: 1 args: [PosixPath('/opt/homebrew/Caskroom/mambaforge/ba...>, stdout = None, stderr = None, retcode = 1

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
        or pass capture_output=True to capture both.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '[PosixPath('/opt/homebrew/Caskroom/mambaforge/base/bin/mamba'), 'run', '-n', 'pytest_59290224-c999-45b9-8a17-8655142c1acd-env', 'pre-commit', 'run', '--all-files']' returned non-zero exit status 1.

/opt/homebrew/Caskroom/mambaforge/base/envs/data-science-project-template/lib/python3.10/subprocess.py:526: CalledProcessError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we are now running pre-commit checks after the project is initialized by the template engine and you seem to have some whitespace issues in the files :)

That said, I think we should only exclude files from the outer pre-commit checks that actually use the template engine. That way, we can retain the automatic fixes for other files without running the CI and investigating manually. Maybe we can also add a diff if the test execution fails, to find issues more easily. Happy to support with that.

Subject: [PATCH] style: fix whitespace issues
---
Index: {{cookiecutter.repo_name}}/.pre-commit-config.yaml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/{{cookiecutter.repo_name}}/.pre-commit-config.yaml b/{{cookiecutter.repo_name}}/.pre-commit-config.yaml
--- a/{{cookiecutter.repo_name}}/.pre-commit-config.yaml	(revision e7b7d3b3c99ae9d801b8e10fb16726bdcca2a11f)
+++ b/{{cookiecutter.repo_name}}/.pre-commit-config.yaml	(revision 476f067f48445d9bbfa3fc1188e1a7db4249d91c)
@@ -44,8 +44,8 @@
     {%- elif cookiecutter.linter_name == "ruff" %}
     - repo: https://github.com/astral-sh/ruff-pre-commit
       rev: v0.6.7
-      hooks: 
-      - id: ruff
+      hooks:
+          - id: ruff
     {%- endif %}
     - repo: local
       hooks:
Index: {{cookiecutter.repo_name}}/pyproject.toml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/{{cookiecutter.repo_name}}/pyproject.toml b/{{cookiecutter.repo_name}}/pyproject.toml
--- a/{{cookiecutter.repo_name}}/pyproject.toml	(revision e7b7d3b3c99ae9d801b8e10fb16726bdcca2a11f)
+++ b/{{cookiecutter.repo_name}}/pyproject.toml	(revision 476f067f48445d9bbfa3fc1188e1a7db4249d91c)
@@ -17,8 +17,8 @@
 )
 '''
 
-{% if cookiecutter.linter_name == "pylint" %}
+{%- if cookiecutter.linter_name == "pylint" %}
 [tool.isort]
 profile = 'black'
 line_length = 100
-{% endif %}
+{%- endif %}

@@ -17,6 +17,8 @@ exclude = '''
)
'''

{% if cookiecutter.linter_name == "pylint" %}
[tool.isort]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isort is running in a separate pre-commit check and is not related to pylint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

install nbqa and nbqa hooks only if jupyter is selected Add option to use ruff instead of pylint
2 participants