Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple error messages when failing fast #205

Closed
gadomski opened this issue Jul 23, 2024 · 10 comments · Fixed by #206
Closed

Multiple error messages when failing fast #205

gadomski opened this issue Jul 23, 2024 · 10 comments · Fixed by #206
Assignees
Labels
bug Something isn't working

Comments

@gadomski
Copy link
Member

          @gadomski the only concern I have is that, although it prints only one stack trace, the number of ERROR: lines can vary, and it's unclear why they might sometimes be the same. Here is the output I got with the fix:
$ stac-asset -v DEBUG download --fail-fast --path-template '${satl:outcome_id}' --http-timeout 5 --max-concurrent-downloads 5
ERROR: SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                                                                
ERROR: SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                                                                
ERROR: SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                                                                
ERROR: SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                                                                
10/13:   0%|▌                                                                                                                                              | 2.02M/468M [00:01<00:54, 8.99MB/s, 4 errors, 23 skips]Traceback (most recent call last):
  File "/home/denis/git/stac-asset/env/bin/stac-asset", line 8, in <module>
    sys.exit(cli())
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/denis/git/stac-asset/src/stac_asset/_cli.py", line 188, in download
    asyncio.run(
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/denis/git/stac-asset/src/stac_asset/_cli.py", line 298, in download_async
    output = await download()
  File "/home/denis/git/stac-asset/src/stac_asset/_cli.py", line 281, in download
    return await _functions.download_item_collection(
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 355, in download_item_collection
    await downloads.download(messages)
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 152, in download
    raise error
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 145, in download
    results = await asyncio.gather(*tasks)
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 176, in download_with_lock
    return await download.download(messages=messages)
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 59, in download
    raise error
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 49, in download
    await download_asset(
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 431, in download_asset
    raise error
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 419, in download_asset
    await client.download_href(
  File "/home/denis/git/stac-asset/src/stac_asset/client.py", line 120, in download_href
    raise err
  File "/home/denis/git/stac-asset/src/stac_asset/client.py", line 102, in download_href
    async for chunk in self.open_href(
  File "/home/denis/git/stac-asset/src/stac_asset/client.py", line 78, in open_href
    async for chunk in self.open_url(
  File "/home/denis/git/stac-asset/src/stac_asset/http_client.py", line 165, in open_url
    async for chunk, _ in response.content.iter_chunks():
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/aiohttp/streams.py", line 66, in __anext__
    rv = await self._stream.readchunk()
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/aiohttp/streams.py", line 441, in readchunk
    await self._wait("readchunk")
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/aiohttp/streams.py", line 311, in _wait
    with self._timer:
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/aiohttp/helpers.py", line 735, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
10/13:   0%|▌                                                                                                                                              | 2.02M/468M [00:01<04:01, 2.02MB/s, 4 errors, 23 skips]

Originally posted by @drnextgis in #204 (comment)

@gadomski gadomski self-assigned this Jul 23, 2024
@gadomski gadomski added the bug Something isn't working label Jul 23, 2024
@gadomski
Copy link
Member Author

@drnextgis this one's trickier to reproduce — I was using the following to reproduce #201:

$ stac-asset download --fail-fast --max-concurrent-downloads 5 https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2alb-ta/items/LC09_L2SP_070011_20240717_20240719_02_A1_TOA 
ERROR: LC09_L2SP_070011_20240717_20240719_02_A1_TOA[MTL.json] - ContentTypeError: the actual content type does not match the expected: actual=text/html, expected=application/json 
2/7: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.96k/1.96k [00:00<00:00, 8.46kB/s, 1 errors, 0 skips]Traceback (most recent call last):
-- >8 --

This only produces one error message. Do you have an example that hits a public API that I can use as a reproducer?

@ircwaves
Copy link
Member

Looks like throwing the --http-timeout 5 in there, means that all tasks fail near simultaneously.

@gadomski
Copy link
Member Author

@drnextgis I still can't reproduce, even after using --http-timeout to trigger an error:

$ stac search https://planetarycomputer.microsoft.com/api/stac/v1 -c sentinel-2-l2a --max-items 1 | stac-asset download --http-timeout 1 --max-concurrent-downloads 5 --fail-fast - > /dev/null
ERROR: S2B_MSIL2A_20240723T204839_R057_T27XVJ_20240723T230733[B01] - TimeoutError:                                                                                                                                                                                                                            
ERROR: S2B_MSIL2A_20240723T204839_R057_T27XVJ_20240723T230733[AOT] - TimeoutError:                                                                                                                                                                                                                            
ERROR: S2B_MSIL2A_20240723T204839_R057_T27XVJ_20240723T230733[B02] - TimeoutError:                                                                                                                                                                                                                            
ERROR: S2B_MSIL2A_20240723T204839_R057_T27XVJ_20240723T230733[B04] - TimeoutError:                                                                                                                                                                                                                            
4/6:   3%|██████▊                                                                                                                                                                                                                                      | 668k/22.7M [00:00<00:12, 1.82MB/s, 4 errors, 0 skips]Traceback (most recent call last):
  File "/Users/gadomski/Code/stac-asset/.venv/bin/stac-asset", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_cli.py", line 188, in download
    asyncio.run(
  File "/usr/local/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_cli.py", line 298, in download_async
    output = await download()
             ^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_cli.py", line 281, in download
    return await _functions.download_item_collection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 355, in download_item_collection
    await downloads.download(messages)
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 152, in download
    raise error
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 145, in download
    results = await asyncio.gather(*tasks)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 176, in download_with_lock
    return await download.download(messages=messages)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 59, in download
    raise error
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 49, in download
    await download_asset(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 431, in download_asset
    raise error
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 419, in download_asset
    await client.download_href(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/client.py", line 120, in download_href
    raise err
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/client.py", line 102, in download_href
    async for chunk in self.open_href(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/client.py", line 78, in open_href
    async for chunk in self.open_url(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/planetary_computer_client.py", line 96, in open_url
    async for chunk in super().open_url(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/http_client.py", line 165, in open_url
    async for chunk, _ in response.content.iter_chunks():
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 66, in __anext__
    rv = await self._stream.readchunk()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 441, in readchunk
    await self._wait("readchunk")
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 311, in _wait
    with self._timer:
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/aiohttp/helpers.py", line 735, in __exit__
    raise asyncio.TimeoutError from None
TimeoutError
4/6:   3%|██████▊

There are a couple of ERROR: lines but I think that's fine, it's not suprising to me that more than one task got to fire off it's error message to the queue before they get cancelled:

if messages:
await messages.put(
ErrorAssetDownload(key=key, href=href, path=path, error=error)
)
raise error

The only piece I'm not sure about is the duplicate asset keys in your error messages — I can't reproduce those, and it's strange to me.

@drnextgis
Copy link
Contributor

I'll take a closer look at this issue today.

@drnextgis
Copy link
Contributor

drnextgis commented Jul 27, 2024

I've added some progress_bar.write statements to _cli.py and config.py:

diff --git a/src/stac_asset/_cli.py b/src/stac_asset/_cli.py
index 69e196d..3e10d04 100644
--- a/src/stac_asset/_cli.py
+++ b/src/stac_asset/_cli.py
@@ -380,6 +380,8 @@ async def report_progress(messages: Optional[MessageQueue]) -> None:
                 name = f"{owners[message.key]}[{message.key}]"
             else:
                 name = f"[{message.key}]"
+            progress_bar.write(json.dumps(owners))
+            progress_bar.write(str(message.path).split("/")[-1])
             progress_bar.write(
                 f"ERROR: {name} - {type(message.error).__name__}: {message.error}",
                 file=sys.stderr,
diff --git a/src/stac_asset/config.py b/src/stac_asset/config.py
index 9a85fae..d19775c 100644
--- a/src/stac_asset/config.py
+++ b/src/stac_asset/config.py
@@ -12,7 +12,7 @@ DEFAULT_S3_REGION_NAME = "us-west-2"
 DEFAULT_S3_RETRY_MODE = "adaptive"
 DEFAULT_S3_MAX_ATTEMPTS = 10
 DEFAULT_HTTP_CLIENT_TIMEOUT = 300
-DEFAULT_HTTP_MAX_ATTEMPTS = 10
+DEFAULT_HTTP_MAX_ATTEMPTS = 1
 
 
 @dataclass

Here's the output:

{"preview": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "analytic": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "metadata": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8", "thumbnail": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8"}
SL-newsat26-20240714T082256950303Z-micro-L0-v0.5.8_analytic.tiff                                                                                                             
ERROR: SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                          
{"preview": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "analytic": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "metadata": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8", "thumbnail": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8"}
SL-newsat26-20240714T082257077301Z-micro-L0-v0.5.8_analytic.tiff                                                                                                             
ERROR: SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                          
{"preview": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "analytic": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "metadata": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8", "thumbnail": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8"}
SL-newsat26-20240714T082257091447Z-micro-L0-v0.5.8_analytic.tiff                                                                                                             
ERROR: SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                          
{"preview": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "analytic": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "metadata": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8", "thumbnail": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8"}
SL-newsat26-20240714T082256964448Z-micro-L0-v0.5.8_analytic.tiff                                                                                                             
ERROR: SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8[analytic] - TimeoutError: 

The assets reported as failing by the CLI do not match the actual assets that are failing. It seems like the issue might be with the owners value, which remains consistent across all failures.

@gadomski does it ring any bells?

Here is the command I use:

$ stac-asset -v DEBUG download --fail-fast --path-template '${satl:outcome_id}' --http-timeout 5 --max-concurrent-downloads 10

@drnextgis
Copy link
Contributor

drnextgis commented Jul 27, 2024

From what I understand, the owners dictionary tracks information about the most recently started downloads (StartAssetDownload). At the time of a failure, owners contains an item with the same key as the failed asset (since all the items I’m downloading have identical set of asset name), but its value points to a different STAC item's ID.

@drnextgis
Copy link
Contributor

One straightforward approach to address the issue would be to:

  • Introduce an owner_id attribute to ErrorAssetDownload
  • Eliminate the owners attribute and use something like:
    name = f"{message.owner_id}[{message.key}]"
    instead of the current implementation found here.

@gadomski
Copy link
Member Author

@drnextgis thanks for digging in, I think I've got a fix in #206 ... can you check on your side?

@drnextgis
Copy link
Contributor

It did the trick, thank you @gadomski! Looking forward for the new release.

@gadomski
Copy link
Member Author

https://github.com/stac-utils/stac-asset/releases/tag/v0.4.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants