Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoded Data as Values #11

Open
friendofasquid opened this issue Sep 5, 2018 · 0 comments
Open

Encoded Data as Values #11

friendofasquid opened this issue Sep 5, 2018 · 0 comments

Comments

@friendofasquid
Copy link

Hi, I'm trying to Parquet into S3 via your plugin. It looks like the values in the data are somehow being encoded in a way that seems to make the Parquet file unusable with AWS Glue / Athena (Presto). When I use the stdout plugin, the data looks correct.

When I try to open the Parquet file with Sublime (which renders it like JSON), I see data that looks like the following:

{"event_id":"MjI0Yzg4ODc0ZjEzYWJjM2Q4OGI3M2NiYWE5NTcwODQ=","event_timestamp":"MjAxOC0wOS0wNCAxNTozMjoxOC4wMDEwMDAgKzAwMDA="}
{"event_id":"ZjQzNmQxMmNkNmFlNGM5ZmJkMTc3OTExOTJmZGY2MmY=","event_timestamp":"MjAxOC0wOS0wNCAxNTozMjoxNi4xNzIwMDAgKzAwMDA="}
…

Here's the relevant parts of the Embulk file:

in:
  type: command
  command: lib/splunk export …
      
  parser:
    type: jsonl
    columns:
      - {name: "event_id", type: string}
      - {name: "event_timestamp", type: timestamp, format: "%Q"}
      
out:
  type: parquet
  path_prefix: s3a://…
  extra_configurations:
    fs.s3a.access.key: 
    fs.s3a.secret.key: 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant