Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tdl 25859/handle s3 files race condition #67

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

rdeshmukh15
Copy link

@rdeshmukh15 rdeshmukh15 commented Aug 9, 2024

Description of change

This PR addresses the issue of S3 file race conditions where the last_modified timestamps of S3 files are being updated while extractions are in progress, causing the bookmark time to advance beyond the current execution time.

Resolution:

  • Record the sync_start_time at the beginning of the extraction.
  • Check if the file's last_modified is greater than the sync_start_time.
    • If so, store the sync_start_time as the bookmark in the state.
    • Otherwise, store the file's last_modified timestamp in the state file.

Manual QA steps

  • tested on the client connection

Risks

Rollback steps

  • revert this branch

Comment on lines +43 to +46
if s3_file['last_modified'] < sync_start_time:
state = singer.write_bookmark(state, table_name, 'modified_since', s3_file['last_modified'].isoformat())
else:
state = singer.write_bookmark(state, table_name, 'modified_since', sync_start_time.isoformat())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit test for this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants