Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correctly stream process json data normalization to utf8 for bot… #764

Merged
merged 6 commits into from
Aug 13, 2024

Conversation

ddlees
Copy link
Contributor

@ddlees ddlees commented Aug 1, 2024

…h metatag validation and writing to disk

Description

Describe your changes in detail

  • correctly stream process json data normalization to utf8
  • feed normalized data to metatag validator
  • feed normalized data to disk writer

Motivation and Context

This PR addresses: [GitHub issue or Jira ticket number]
BED-4463

Why is this change required? What problem does it solve?

The initial pass at normalizing json data that is prefixed with a byte order mark (BOM) was causing the ValidateMetaTag check to incorrectly fail when preparing to write the data to disk and when it was being read from disk for graph ingest. Additionally, we were reading in the whole payload into an uncompressed []byte which is problematic for very large files. This PR addresses those issues by correcting data normalization via stream processing and feeding the data stream to the validator and disk writer.

How Has This Been Tested?

Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc.

Updated unit tests to reflect necessary interface changes.

Screenshots (optional):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

@ddlees ddlees added bug Something isn't working api A pull request containing changes affecting the API code. labels Aug 1, 2024
@ddlees ddlees self-assigned this Aug 1, 2024
@ddlees ddlees changed the base branch from main to stage/v5.14.0 August 12, 2024 19:09
Copy link
Contributor

@urangel urangel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to work for uploads of JSON files 👍

It may be out of scope for this work but I believe it is still possible to zip JSON files with BOM's and upload the zip into BH's UI. Since there is no normalization done of the files within the zip, data that is uploaded this way will encounter a similar BOM issue and an error will display that the data is not valid JSON.

@StephenHinck
Copy link
Collaborator

This looks to work for uploads of JSON files 👍

It may be out of scope for this work but I believe it is still possible to zip JSON files with BOM's and upload the zip into BH's UI. Since there is no normalization done of the files within the zip, data that is uploaded this way will encounter a similar BOM issue and an error will display that the data is not valid JSON.

This is correct - we should also expect to strip encoding from zipped JSON files.

@ddlees
Copy link
Contributor Author

ddlees commented Aug 13, 2024

This looks to work for uploads of JSON files 👍
It may be out of scope for this work but I believe it is still possible to zip JSON files with BOM's and upload the zip into BH's UI. Since there is no normalization done of the files within the zip, data that is uploaded this way will encounter a similar BOM issue and an error will display that the data is not valid JSON.

This is correct - we should also expect to strip encoding from zipped JSON files.

I looked into what it would take to include normalizing decompressed json files and we would have to do some non-trivial work to accommodate the ReadSeeker interface that we expect when processing the file. SharpHound zips will not include BOM encoded json files but we shouldn't lose sight of this issue because I'm sure someone will bom encode some json files and zip them and try to upload that data to BloodHound at some point. Will attempt to deliver a solution in a separate PR.

@ddlees ddlees merged commit 81c33af into stage/v5.14.0 Aug 13, 2024
3 checks passed
@ddlees ddlees deleted the fix-BED-4463 branch August 13, 2024 18:57
@github-actions github-actions bot locked and limited conversation to collaborators Aug 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api A pull request containing changes affecting the API code. bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants