Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Speeding up pysigma by using a faster yaml parser #133

Closed
wants to merge 1 commit into from

Conversation

jonathan-s
Copy link

@jonathan-s jonathan-s commented Jul 5, 2023

This is currently a draft. Would close #128

The goal of this PR to make it possible to use a yaml parser that has the most intensive parts written in rust. This can considerable increase the speed of a large amount of yaml documents.

To install the rust parts you would need to install the library in the following way pip install pysigma[rust]. Since there may be slight shift in behaviour this shouldn't be made the standard behaviour. My suggestion is that once major version is introduced we can shift this around so that ryaml becomes the default library used for parsing yaml.

ryaml which is the library written partially in rust does not have safe_load. But it is also not vulnerable to the same attacks that pyyaml was. To exemplify this I present the following poc taken from HackerOne.

Create both of these files in the same directory. You also need to run an older version of pyyaml; say 3.13. pip install ryaml.

config.yaml

!!python/object/apply:subprocess.Popen
- ls

poc.py - need to use an older version of pyyaml 3.13

import os
import yaml
import ryaml


def yaml_config():
    with open('config.yaml') as config_file:
        return yaml.load(config_file)


def ryaml_config():
    with open('config.yaml') as config_file:
        return ryaml.load(config_file)


yaml_config()
print("========== yaml =============")

ryaml_config()
print("========== ryaml =============")

It's not vulnerable to the same attack, so it should be considered safe.

Some parts of the rust library was not fully compliant with the yaml spec, which is why some tests are still failing. Currently I'm tracking those required changes here > ethanhs/ryaml#3

@thomaspatzke
Copy link
Member

Nice, thanks for the cool contribution and the good analysis of the potential vulnerability! I just have some small stuff I will comment here in the PR review, but from my perspective nothing speaks against a merge as this is optional and the bool problem is solved from my perspective. Tests using YAML 1.1 should be fixed to comply with version 1.2. Can you provide the fixed tests as you know which one fail with ryaml?

@@ -56,6 +56,7 @@ def apply(self, val : Union[SigmaType, Sequence[SigmaType]]) -> List[SigmaType]:
])
]
else:
print(val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgotten test code? 😉 Please remove.

@jonathan-s
Copy link
Author

I noticed that there was one bit lacking, which is that duplicate keys shouldn't be accepted. Right now I'm waiting for a PR in ryaml to be accepted / discussed first, so once that is resolved I'll fix the other bits here.

@thomaspatzke thomaspatzke marked this pull request as draft November 18, 2023 20:15
@thomaspatzke
Copy link
Member

Closing for now, we can reopen if continued.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Depending on pyyaml can make things slow.
2 participants