Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only log that the scan is complete one time for s3 scan #3168

Merged
merged 1 commit into from
Aug 16, 2023

Conversation

graytaylor0
Copy link
Member

Description

Track when single scan is complete to only log once when the scan has already been completed

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.


if (!(Boolean) globalStateMap.get(SINGLE_SCAN_COMPLETE)) {
LOG.info("Single S3 scan has already been completed");
globalStateMap.put(SINGLE_SCAN_COMPLETE, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between SINGLE_SCAN_COMPLETE and SCAN_COUNT? can we just use SCAN_COUNT to determine if scan is complete?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCAN_COUNT could block scanning since the global state doesn't go away. so it's better to keep it separate for this log

LOG.info("Skipping scan because the buckets have already been scanned once");

if (!(Boolean) globalStateMap.get(SINGLE_SCAN_COMPLETE)) {
LOG.info("Single S3 scan has already been completed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why log only during the seconds attempt at scanning? Why not log once the first scan is complete?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could log after the first scan, but with source coordination this will only get run again after all the scanned objects have been processed, so as far as timing of the log it makes more since when trying to scan again

@graytaylor0 graytaylor0 merged commit b0e5006 into opensearch-project:main Aug 16, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants