Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masks secrets in traces. #1797

Merged
merged 10 commits into from
Sep 14, 2024
Merged

Masks secrets in traces. #1797

merged 10 commits into from
Sep 14, 2024

Conversation

willi-mueller
Copy link
Collaborator

@willi-mueller willi-mueller commented Sep 10, 2024

Description

Related Issues

Note

  • Instead of an empty string, which was requested in the issue, I implemented "*****" so that it looks similar to the rest_api source where we also mask secrets

TODO before merging

  • Test trace.asstr(verbosity=1) which is called by the dlt pipeline -v trace command
  • Decide whether we want identical masking logic for the rest_api source and the traces. The rest_api source shows the first and last character if the secret is at least 4 characters long. The traces show always "*****"
  • Agree on where to implement the unified secret masking logic or at least the constant MASKED_SECRET="*****" so that we can use it in both dlt/sources/rest_api/__init__.py as well as dlt/pipeline/trace.py

@willi-mueller willi-mueller linked an issue Sep 10, 2024 that may be closed by this pull request
Copy link

netlify bot commented Sep 10, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 9d34a76
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66e583fd80b0cf0008ba2d52

@willi-mueller willi-mueller marked this pull request as ready for review September 11, 2024 11:35
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we need is this test

DESTINATION__POSTGRES__CREDENTIALS="postgresql://loader:loader@localhost:5432/dlt_data" pytest tests/cli/ -k test_deploy_command

to pass. it is checking if secrets were printed out
we'll need a version of this test (use parametrize) that deletes secrets.toml and you placeholder string are present

# this time script will run
            venv.run_script("debug_pipeline.py")
            with echo.always_choose(False, always_choose_value=True):
                with io.StringIO() as buf, contextlib.redirect_stdout(buf):
                    deploy_command.deploy_command(
                        "debug_pipeline.py",
                        deployment_method,
                        deploy_command.COMMAND_DEPLOY_REPO_LOCATION,
                        **deployment_args
                    )
                    _out = buf.getvalue()

destroy secrets.toml just after venv.run_script("debug_pipeline.py") is run ie by injecting mock config providers like this fixture does

def toml_providers() -> Iterator[ConfigProvidersContext]:

dlt/cli/deploy_command_helpers.py Outdated Show resolved Hide resolved
dlt/common/configuration/providers/environ.py Outdated Show resolved Hide resolved
@@ -280,8 +284,8 @@ def end_trace_step(
resolved_values = map(
lambda v: SerializableResolvedValueTrace(
v.key,
v.value,
v.default_value,
_mask_secret(v.value) if is_secret_hint(v.hint) else v.value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not mask it in dump. just set None on secrets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it could be helpful to be able to distinguish between (i) None, meaning that no value is set, and (ii) "*****" meaning that a secret is present?

dlt/pipeline/trace.py Outdated Show resolved Hide resolved
@@ -298,6 +302,13 @@ def end_trace_step(
return trace


def _mask_secret(trace_item_value: Any) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should definitely have one common method to display masked secrets... but my take is that nothing should be revealed at all. for now we cna remove this function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed it.

I like revealing the first or last or both chars because I often debugged authorization errors by checking if the correct secret is present in the first place. When breakpoints are not practical or possible (e.g. github actions) I often print the first few chars.

Would it be therefore a helpful to reveal the first & last char by default or would it be better to not compromise security and reveal nothing? The developer can then do logging tricks as described above.

Sure, we can change it for the rest_api source too. What do you think, @burnash?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rudolfix I agree with @willi-mueller I think it makes it more practical to display first and last char (when the length of the secret string allows). Could you elaborate why you'd want to mask everything?

rudolfix
rudolfix previously approved these changes Sep 14, 2024
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added deploy test. LGTM!

@rudolfix rudolfix merged commit 1714801 into devel Sep 14, 2024
61 checks passed
@rudolfix rudolfix deleted the fix/1687-mask-password-in-trace branch September 14, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dlt pipeline -v <pipeline> trace source password not redacted
3 participants