Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve TTL of cached documents #2676

Merged

Conversation

mayaCostantini
Copy link
Contributor

Related Issues and Dependencies

Related to #2666

This introduces a breaking change

  • No

This should yield a new module release

  • Yes

This Pull Request implements

Implement a method to retrieve the time to live of a cached document, considering S3 bucket timezones for different environments.
This implementation is based on the current documents caching time of 2 hours.

@sesheta sesheta added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 9, 2022
Copy link
Member

@VannTen VannTen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm going against the codebases convention here. feel free to point out 👍

Comment on lines 106 to 109
except botocore.exceptions.ClientError as exc:
if exc.response["Error"]["Code"] in ("404", "NoSuchKey"):
raise NotFoundError("Failed to retrieve object, object {!r} does not exist".format(object_key)) from exc
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to catch exceptions at that level ?

Since we're re-raising anyway (-> so the caller has to handle it itself), I would drop that.

Comment on lines 102 to 105
def retrieve_document_last_modification_date(self, object_key: str) -> datetime:
"""Retrieve the last modification date of a document from S3."""
try:
return self._s3.Object(self.bucket, f"{self.prefix}{object_key}").get()["LastModified"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the previous comment would make this a single line, maybe we could just drop the method and have the caller use ['LastModified'] on the object ?

Comment on lines 46 to 47
# No timezone is indicated to calculate the present datetime on purpose to get the correct time delta
# regardless of the environment where the data is retrieved from (production or stage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# No timezone is indicated to calculate the present datetime on purpose to get the correct time delta
# regardless of the environment where the data is retrieved from (production or stage)
# Uses UTC time to be environment agnostic (no timezone)

I don't think we should point out specifics environments, thoth could be deployed on others.

Comment on lines 102 to 104
def retrieve_document_last_modification_date(self, object_key: str) -> datetime:
"""Retrieve the last modification date of a document from S3."""
return self._s3.Object(self.bucket, f"{self.prefix}{object_key}").get()["LastModified"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should probably not have a method for something so specific.

What about

Suggested change
def retrieve_document_last_modification_date(self, object_key: str) -> datetime:
"""Retrieve the last modification date of a document from S3."""
return self._s3.Object(self.bucket, f"{self.prefix}{object_key}").get()["LastModified"]
def retrieve_document_attr(self, object_key: str, attr: str) -> datetime:
"""Retrieve the given attribute of a document from S3."""
return self._s3.Object(self.bucket, f"{self.prefix}{object_key}").get()[str]

and calling that with "LastModified" ?

Maybe I'm worried over nothing. I'm not sure of the correct ratio specialize/generalize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(leaving the method out completely like I initially suggested does not seem practical)

@mayaCostantini
Copy link
Contributor Author

@VannTen is this ready to be merged?

@VannTen
Copy link
Member

VannTen commented Aug 29, 2022 via email

@mayaCostantini
Copy link
Contributor Author

@VannTen thanks for the reviews, changes added 👍

@sesheta
Copy link
Member

sesheta commented Aug 29, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: VannTen
To complete the pull request process, please ask for approval from mayacostantini after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mayaCostantini mayaCostantini merged commit 3ddad86 into thoth-station:master Aug 29, 2022
@mayaCostantini mayaCostantini deleted the ttl-cached-results branch August 29, 2022 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants