Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dict_traverser and hstores #2

Open
rotten opened this issue Sep 28, 2016 · 2 comments
Open

dict_traverser and hstores #2

rotten opened this issue Sep 28, 2016 · 2 comments

Comments

@rotten
Copy link

rotten commented Sep 28, 2016

fwiw,

I'd rather have my sub-documents as jsonb column types than hstore column types. I could cast the hstore to jsonb in a view after I set up the column, but that didn't work for all of my sub-documents. Some of them had unicode and other things not-easily digested by the multicorn dictionary-to-hstore library functions.

To be able to use jsonb directly in my environment, I changed your "dict_traverser" lambda function.

Originally, it looked like this:
dict_traverser = partial(reduce, lambda x, y: x.get(y) if type(x) == dict else x)

My version looks like this:
dict_traverser = partial(reduce, lambda x, y: json.dumps(x.get(y)) if type(x) == dict else x.encode('utf8'))

I'm feeling to lazy to follow the usual "fork and pull request" process at this time for such a small change, but I thought I'd let you know of one possible change to your code that some people might find helpful.

As another reference, you can see where I did this in the rethinkdb multicorn fdw a while back too:
https://github.com/rotten/rethinkdb-multicorn-postgresql-fdw/blob/master/rethinkdb_fdw/rethinkdb_fdw.py#L121

@rotten
Copy link
Author

rotten commented Sep 28, 2016

Ok, the approach above doesn't work very well when you have ObjectId columns, which the python json library doesn't know anything about. Additionally, it json.dumps() every column.

This approach is a little more comprehensive:

from bson.json_util import dumps
dict_traverser = partial(reduce, lambda x, y: x.get(y) if type(x) == dict and type(x.get(y)) not in [ObjectId, dict, list]
                                                       else str(x.get(y)) if type(x.get(y)) == ObjectId
                                                       else dumps(x.get(y)) if type(x.get(y)) == dict
                                                       else dumps([dumps(z) for z in x.get(y)]) if type(x.get(y)) == list and type(x.get(y)[0]) == dict
                                                       else [z.encode('utf8') for z in x.get(y)] if type(x.get(y)) == list
                                                       else x.encode('utf8'))


The revised dict_traverser works with these postgresql columns types that I've tested so far:

  • varchar options (type 'ObjectId')
  • integer options (type 'Integer')
  • varchar
  • jsonb
  • boolean
  • varchar[]
  • jsonb[]

Perhaps it isn't the most pythonic or easily readable approach to doing this, but I kept it consistent with the style of the rest of the code in this fdw.

@rotten
Copy link
Author

rotten commented Oct 4, 2016

Actually while the above logic was fine for simple documents, it didn't work very well with many nested layers of subdocuments. I ended up reworking the "dict_traverser" into an equivalent set of recursive functions. I'll make a fork and then send a pull request when I get a chance so you can see what I did.

I'm currently testing it with Python 3.5 on PostgreSQL 9.6 using Multicorn 1.3.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant