dict_traverser and hstores #2

rotten · 2016-09-28T19:43:00Z

fwiw,

I'd rather have my sub-documents as jsonb column types than hstore column types. I could cast the hstore to jsonb in a view after I set up the column, but that didn't work for all of my sub-documents. Some of them had unicode and other things not-easily digested by the multicorn dictionary-to-hstore library functions.

To be able to use jsonb directly in my environment, I changed your "dict_traverser" lambda function.

Originally, it looked like this:
dict_traverser = partial(reduce, lambda x, y: x.get(y) if type(x) == dict else x)

My version looks like this:
dict_traverser = partial(reduce, lambda x, y: json.dumps(x.get(y)) if type(x) == dict else x.encode('utf8'))

I'm feeling to lazy to follow the usual "fork and pull request" process at this time for such a small change, but I thought I'd let you know of one possible change to your code that some people might find helpful.

As another reference, you can see where I did this in the rethinkdb multicorn fdw a while back too:
https://github.com/rotten/rethinkdb-multicorn-postgresql-fdw/blob/master/rethinkdb_fdw/rethinkdb_fdw.py#L121

The text was updated successfully, but these errors were encountered:

rotten · 2016-09-28T22:01:18Z

Ok, the approach above doesn't work very well when you have ObjectId columns, which the python json library doesn't know anything about. Additionally, it json.dumps() every column.

This approach is a little more comprehensive:

from bson.json_util import dumps
dict_traverser = partial(reduce, lambda x, y: x.get(y) if type(x) == dict and type(x.get(y)) not in [ObjectId, dict, list]
                                                       else str(x.get(y)) if type(x.get(y)) == ObjectId
                                                       else dumps(x.get(y)) if type(x.get(y)) == dict
                                                       else dumps([dumps(z) for z in x.get(y)]) if type(x.get(y)) == list and type(x.get(y)[0]) == dict
                                                       else [z.encode('utf8') for z in x.get(y)] if type(x.get(y)) == list
                                                       else x.encode('utf8'))

The revised dict_traverser works with these postgresql columns types that I've tested so far:

varchar options (type 'ObjectId')
integer options (type 'Integer')
varchar
jsonb
boolean
varchar[]
jsonb[]

Perhaps it isn't the most pythonic or easily readable approach to doing this, but I kept it consistent with the style of the rest of the code in this fdw.

rotten · 2016-10-04T22:05:56Z

Actually while the above logic was fine for simple documents, it didn't work very well with many nested layers of subdocuments. I ended up reworking the "dict_traverser" into an equivalent set of recursive functions. I'll make a fork and then send a pull request when I get a chance so you can see what I did.

I'm currently testing it with Python 3.5 on PostgreSQL 9.6 using Multicorn 1.3.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dict_traverser and hstores #2

dict_traverser and hstores #2

rotten commented Sep 28, 2016

rotten commented Sep 28, 2016

rotten commented Oct 4, 2016

dict_traverser and hstores #2

dict_traverser and hstores #2

Comments

rotten commented Sep 28, 2016

rotten commented Sep 28, 2016

rotten commented Oct 4, 2016