Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data #196

Open
click2cloud-SanchitG opened this issue Aug 29, 2024 · 1 comment
Labels
bug Something isn't working local cluster Issues encountered in local cluster triage Issues still not triaged by team workflows Issues encountered when running workflows

Comments

@click2cloud-SanchitG
Copy link

click2cloud-SanchitG commented Aug 29, 2024

In which step did you encounter the bug?

Workflow execution

Are you using a local or a remote (AKS) FarmVibes.AI cluster?

Local cluster

Bug description


Issue: Compute Index Workflow of Sentinel Reprocesses Data on Rerun with Extended Time Range Instead of Utilizing Cached Data

Link to the notebook: Notebook Link


Workflow File spaceeye_index-Sanchit.zip

1. Scenario 1:

  • Time Range: (datetime(2021, 1, 1), datetime(2021, 3, 3))
  • Workflow File: spaceeye_index-Sanchit.yaml (attached)
  • Run Name: SpaceEye and NDVI Timelapse 2021
  • Duration: 01:11:54
  • Screenshot: (attached)

Scenario1

2. Scenario 2:

  • Time Range: (datetime(2021, 1, 1), datetime(2021, 3, 20))
  • Workflow File: spaceeye_index-Sanchit.yaml (attached)
  • Run Name: SpaceEye and NDVI Timelapse 2021
  • Duration: 00:58:59
  • Screenshot: (attached)

Scenario2

Observation:

When running the Compute Index workflow for the first time over a specific time range, it processes the data and stores it in the cache. However, when the workflow is run a second time with an extended time range, it starts reprocessing all the data from scratch instead of utilizing the previously cached data.

Problem:

The Compute Index workflow does not appear to leverage cached data when reprocessing. Instead of utilizing the cached results from the initial run, it processes all data again, leading to increased runtime. This issue results in inefficient processing, particularly problematic since this workflow is executed weekly and has been implemented on the customer's side.


Steps to reproduce the problem

Steps to Reproduce:

  1. Trigger the SpaceEye and NDVI Timelapse 2021 Workflow with the time_range and wf_dict from Scenario 1.
  2. Note the duration.
  3. Increase the time_range and rerun the workflow with the wf_dict from Scenario 2.
  4. Compare the duration.

Expected Behavior:

The workflow in Scenario 2 should complete in less time, proportional to the additional days added, by utilizing the cached data from the initial run.

Environment:

  • FarmVibes.AI
  • Python Version: 3.11
  • Operating System: Ubuntu (cluster environment)

Questions:

  1. Why does the workflow take nearly the same duration for the extended time range as it did for the initial range, despite having cached data from the first run?
  2. Why is the workflow not utilizing the cached data and processing only the additional days?

Please check this issue as soon as possible, as our customers are expecting a resolution.

Thanks & Regards,

Sanchit


@click2cloud-SanchitG click2cloud-SanchitG added the bug Something isn't working label Aug 29, 2024
@github-actions github-actions bot added local cluster Issues encountered in local cluster workflows Issues encountered when running workflows triage Issues still not triaged by team labels Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working local cluster Issues encountered in local cluster triage Issues still not triaged by team workflows Issues encountered when running workflows
Projects
None yet
Development

No branches or pull requests

2 participants
@click2cloud-SanchitG and others