-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rest_api: filter, exclude, transform API responses #495
rest_api: filter, exclude, transform API responses #495
Conversation
@francescomucio thank you for your contribution and for proposing the enhancement to the However, I see a few potential drawbacks with the proposed interface:
To address these issues, I suggest an alternative format that puts all operations in one {
"name": "my_nicely_named_resource",
"endpoint": {
"path": "endpoint_name"
},
"operations": [
{"filter": lambda x: x["id"] == 3},
{"map": "delete_fields", "fields": ["id", "another_column"]},
{
"map": "rename_fields",
"fields": {
"user_id": "my_user_id",
"timestamp": "my_timestamp"
}
},
{"map": my_function}
]
} Some other pre-defined possible filters: {"filter": "range", "field": "date", "from": "2021-01-01", "to": "2021-12-31"} or {"filter": "in_set", "field": "category", "values": ["tech", "finance", "health"]} and so on. Again, thank you for the suggestion and valuable input. What do you think about this? Looking forward to your feedback. |
I like the idea, I like that it's possible to transform/map the data before or after a filter and doing both operations multiple times. To start I would keep it simple, just with map or filter
and then extend with additional functionalities, each with its own key word (nothing that cannot be handled passing a callable, just as sugar for the developers):
What do you think? |
Overall this looks good
|
I'm not very firm on "operations", but I was looking for a keyword that could work for both "map" and "filter". In my opinion "transform" is could be a bit specific so it's hard to fit "filter" under it. But I may be wrong. So the alternatives to "operations" could be:
Other than that also agreed with @francescomucio we can start with Python functions. |
I went with @burnash @willi-mueller please take a look at it |
sources/rest_api/__init__.py
Outdated
def process(resource, processing_steps) -> Any: | ||
for step in processing_steps: | ||
if "filter" in step: | ||
resource.add_filter(step["filter"]) | ||
if "map" in step: | ||
resource.add_map(step["map"]) | ||
return resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @francescomucio! I've listed a couple of minor improvements.
Also, let's extend the tests to cover "map" and using processing steps in a child resource. A good improvement is to also extend the "offline"/"mocked" tests (https://github.com/dlt-hub/verified-sources/blob/master/tests/rest_api/test_rest_api_source_offline.py). Let me know if you have capacity for this.
sources/rest_api/typing.py
Outdated
# row_filter: Optional[Callable[[Any], bool]] | ||
# transform: Optional[Callable[[Any], Any]] | ||
# exclude_columns: Optional[List[jsonpath.TJsonPath]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the commented code and implement these ready-to-use operation as a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the tests. I've to questions about the yield
in the comments.
sources/rest_api/__init__.py
Outdated
method=method, | ||
path=path, | ||
params=params, | ||
json=json, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a typo or there's a reason to remove json
param?
sources/rest_api/__init__.py
Outdated
@@ -278,11 +289,10 @@ def paginate_resource( | |||
incremental_cursor_transform, | |||
) | |||
|
|||
yield from client.paginate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep the from
statement.
c6b3ea2
to
c65d26f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Please see my comment about the mutable default.
621a53f
to
d376292
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thank you @francescomucio!
The examples from the tests are very good we'll use them in a follow-up documentation PR
Tell us what you do here
As per the associated issue, rest_api: Allow the REST API config object to exclude rows, columns, and transform data:
row_filter
, in the form of a function or lambdaexclude_columns
), as jsonpaths, that need to be removed from the outputtransform
, in the form of a function or lambdaI have added the additional properties to typing.py and added the code to handle them.
Related Issues
This solves this issue
Additional Context