-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shopify: Incremental and other improvements #222
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
4143cc3
Remove shopify SDK dependency and use requests in source
steinitzu f59f347
Use pagination in test, fix next page URL parse
steinitzu b4db267
Add status, end_date and date conversion helper
steinitzu 3ae96e1
Date helper ensure timezone
steinitzu 053210c
Test end date and incremental load
steinitzu da7fd67
Test date helper, test order status
steinitzu 4a7bbfe
Add chunked loading example
steinitzu c2e151f
Include end_value in request params
steinitzu 51e5c50
Test request query params
steinitzu 2108cb7
Use dlt date parse function
steinitzu 28f17ca
Parse Links header with requests, use urljoin to join root URL
steinitzu 92f4de0
Rename per_page -> items_per_page
steinitzu d970e90
Add created_at_min option
steinitzu 71746c6
Update example with min_created_at
steinitzu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
from typing import Union, Optional | ||
from datetime import datetime, date # noqa: I251 | ||
|
||
from dlt.common import pendulum | ||
from dlt.common.time import parse_iso_like_datetime | ||
|
||
|
||
TAnyDateTime = Union[pendulum.DateTime, pendulum.Date, datetime, date, str] | ||
|
||
|
||
def ensure_pendulum_datetime(value: TAnyDateTime) -> pendulum.DateTime: | ||
"""Coerce a date/time value to a `pendulum.DateTime` object. | ||
|
||
UTC is assumed if the value is not timezone aware. | ||
|
||
Args: | ||
value: The value to coerce. Can be a pendulum.DateTime, pendulum.Date, datetime, date or iso date/time str. | ||
|
||
Returns: | ||
A timezone aware pendulum.DateTime object. | ||
""" | ||
if isinstance(value, datetime): | ||
# both py datetime and pendulum datetime are handled here | ||
ret = pendulum.instance(value) | ||
if ret.tz is None: | ||
return ret.in_tz("UTC") | ||
return ret | ||
elif isinstance(value, date): | ||
return pendulum.datetime(value.year, value.month, value.day) | ||
elif isinstance(value, str): | ||
result = parse_iso_like_datetime(value) | ||
if not isinstance(result, datetime): # TODO: iso date parses to date object | ||
return pendulum.datetime(result.year, result.month, result.day) | ||
return result | ||
raise TypeError(f"Cannot coerce {value} to a pendulum.DateTime object.") |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah interesting question if we should set
created_at_min=start_date
as we did in the previous version.@adrianbr any opinions? if not, I'd set it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better if we use a separate
min_created_at
argument in the source I think.It's a little problematic with end value loading to use the same start date, because an item's created and updated can each be in range of different chunks so they'd never get loaded even if it should be in the total range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right. we need to handle backloading differently.
created_at_min
andcreated_at_max
to select rangecreated_at_min
(set to start_date - which is also initial_value) andupdated_at_min
set to last_valuethere's a grey are here: should we also track updates of records from backloaded range? in case of the above it is not the case. we'd need another parameter to set created_at and updated_at separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I'd say it makes sense to load updates from backloaded range too.
What is the idea behind the created_at cutoff originally?
I'm struggling with seeing the usefulness in it beyond just filtering by
updated_at
. I'd think old orders you want to ignore aren't updated anyway usually.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are always getting the most recent state of the entity and update/create just filter which entities you want to get. so there's no way IMO to get previous state of it. anyway. I kind of agree with what you say.
updated_at_min
andupdated_at_max
(this one you need to add)created_at_min
to cutoff the entities created before that date. the argument is optional and defaults toFIRST_DAY_OF_MILLENNIUM
the typical usage would be like in the
incremental_load_with_backloading
example: backloading and then incremental load. all based on updated_at.On top of that we give possibility to filter out all records that were created before initial value of
current_start_date
in the demo. so theupdated_at_min
argument is always set to it, during backloading and on incremental load.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a lot of sense, and I think it's more clear with the separate argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added
created_at_min
arg now and test for it.updated_at_max
is used now too.