-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: correctly stream process json data normalization to utf8 for bot… #764
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to work for uploads of JSON files 👍
It may be out of scope for this work but I believe it is still possible to zip JSON files with BOM's and upload the zip into BH's UI. Since there is no normalization done of the files within the zip, data that is uploaded this way will encounter a similar BOM issue and an error will display that the data is not valid JSON.
This is correct - we should also expect to strip encoding from zipped JSON files. |
…h metatag validation and writing to disk
I looked into what it would take to include normalizing decompressed json files and we would have to do some non-trivial work to accommodate the |
…h metatag validation and writing to disk
Description
Describe your changes in detail
Motivation and Context
This PR addresses: [GitHub issue or Jira ticket number]
BED-4463
Why is this change required? What problem does it solve?
The initial pass at normalizing json data that is prefixed with a byte order mark (BOM) was causing the
ValidateMetaTag
check to incorrectly fail when preparing to write the data to disk and when it was being read from disk for graph ingest. Additionally, we were reading in the whole payload into an uncompressed[]byte
which is problematic for very large files. This PR addresses those issues by correcting data normalization via stream processing and feeding the data stream to the validator and disk writer.How Has This Been Tested?
Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc.
Updated unit tests to reflect necessary interface changes.
Screenshots (optional):
Types of changes
Checklist: