Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 slow down and connection refused problems #7319

Open
txau opened this issue Oct 7, 2024 · 3 comments
Open

S3 slow down and connection refused problems #7319

txau opened this issue Oct 7, 2024 · 3 comments

Comments

@txau
Copy link
Collaborator

txau commented Oct 7, 2024

I believe the intermittent errors we see at S3 with S3 timeout are related to a limitation in the rate.

If there is a fast spike in the request rate for objects in a prefix, Amazon S3 might return 503 Slow Down errors while it scales in the background to handle the increased request rate. To avoid these errors, you can configure your application to gradually increase the request rate and retry failed requests using an exponential backoff algorithm [1].

From our latest logs, we see some slowdown errors, in the form of:

7:58 PM: 2024-10-07T19:58:33.712Z [tenant] requestId: 6529 
url: /api/files/1714673803529gdetwwu5oe9.jpg
SlowDown: UnknownError
    at throwDefaultError (/opt/uwazi/cores/core-1.187.3/node_modules/@smithy/smithy-client/dist-cjs/index.js:840:20)
    at /opt/uwazi/cores/core-1.187.3/node_modules/@smithy/smithy-client/dist-cjs/index.js:849:5
    at de_CommandError (/opt/uwazi/cores/core-1.187.3/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4743:14)

Then:

url: /api/files/17216202011795hfogmyqftu.pdf
Error: connect ECONNREFUSED IP_address:443
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1595:16)
    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
  errno: -111,
  code: 'ECONNREFUSED',

Either we catch the errors and handle the exception or put in place other mitigation measures to handle the high rating.

@txau txau added the Bug 🐞 label Oct 7, 2024
@aphilop
Copy link

aphilop commented Oct 11, 2024

@Jaume
Do we know what is triggering the slow down errors to fully understand the implications of this problem?

@RafaPolit
Copy link
Member

  • Meet with the backend and infra team to iron out a course of action to prevent this and handle it when it happens within uwazi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@txau @RafaPolit @aphilop and others