Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response timeout issues in the AWS SDK for Android #3492

Open
jflmateusnassar opened this issue Dec 1, 2023 · 20 comments
Open

Response timeout issues in the AWS SDK for Android #3492

jflmateusnassar opened this issue Dec 1, 2023 · 20 comments
Labels
feature-request Request a new feature pending-community-response Issue is pending response from the issue requestor

Comments

@jflmateusnassar
Copy link

jflmateusnassar commented Dec 1, 2023

Describe the bug
I encountered a problem in the AWS SDK for Android that took me a long time to understand how it happened. Today, after a few years, I discovered how to reproduce the problem.
The problem occurs when the smartphone is connected to a Wi-Fi network that has no internet access, but the Android did not detect that there is a problem in the network.
The bug is that any request that I make using the SDK to fetch some data (Cognito, IoT Core, DynamoDB, etc.) has no return from the SDK, and the application simply freezes.

To Reproduce
Connect the android to a wifi network without connection, but the android cannot detect that there is a problem with the wifi. To simulate the problem here I created a rule in my router to block packets. The android takes about 5 minutes to detect that there is a problem with the wifi network and that's where all the problems occur.
An example of a request that causes the problem is:

userPool.getCurrentUser().getDetailsInBackground(handler);

When invoking this method, nothing is returned (no error, no success, no failure, no timeout). Debugging inside this method, I found out that it invokes the method:

private CognitoUserDetails getUserDetailsInternal(CognitoUserSession session) {
    if (session != null && session.isValid()) {
        final GetUserRequest getUserRequest = new GetUserRequest();
        getUserRequest.setAccessToken(session.getAccessToken().getJWTToken());
        final GetUserResult userResult = cognitoIdentityProviderClient.getUser(getUserRequest);

        return new CognitoUserDetails(new CognitoUserAttributes(userResult.getUserAttributes()),
                new CognitoUserSettings(userResult.getMFAOptions()));
    } else {
        throw new CognitoNotAuthorizedException("user is not authenticated");
    }
}

The exact point where the application freezes is in:

GetUserResult userResult = cognitoIdentityProviderClient.getUser(getUserRequest);

Which AWS service(s) are affected?
I use Cognito, IoT Core, DynamoDB services and they are all affected. I think any service will be affected.

Expected behavior
The request should return a timeout error, but this does not happen.

Environment Information:

  • AWS Android SDK Version: 2.73.0
  • Device: Pixel, Simulator, Samsung S3 (any device)
  • Android Version: Android 9, 10, 11, 12 (any version)
  • Specific to simulators: No

Additional context
I need to highlight that Android needs to be on a WiFi network that does not have access to the internet, but Android cannot detect that there is a problem with the network.
When Android detects that there is a problem on the network, the SDK returns a failure (which is expected)

An important detail is also that the problem also occurs on networks that do not support IPv6

@ankpshah
Copy link
Contributor

ankpshah commented Dec 4, 2023

Thanks for reporting the issue. I have following questions:

  1. How much time does it take for SDK to report failure and how much time does device take to detect Network issue?
  2. Have you tried reducing the Connection, socket timeout and updating retry policy using ClientConfiguration class.
  3. You can get current timeout values by using getConnectionTimeout(), getSocketTimeout() in ClientConfiguration class. Refer docs here.

@ankpshah ankpshah added the bug Something isn't working label Dec 4, 2023
@jflmateusnassar
Copy link
Author

jflmateusnassar commented Dec 5, 2023

  1. I have waited for more than 10 minutes for a feedback from the SDK, without receiving any return value. Then I closed the application, because that it freezes until the smartphone detects problems in the network. I tested with several devices and versions of Android (including with the Android Studio simulator) and they all took too long to identify the problem of internet access. In some cases, the smartphone could not even identify that there was a problem in the network (this happens in networks that do not support ipv6). In networks that do not support ipv6, the SDK does not return anything and freezes forever. This affects some customers who use my application in networks that do not support ipv6 and make my app freezes
    because of this problem with the SDK.
  2. I was using the standard time of 15000ms, but then I discovered about the class ClientConfiguration and configured it to 10000ms and then to 5000ms. This change worked in normal conditions of use, but did not solve the scenario of the problem. The SDK continued without giving any feedback.

@ankpshah
Copy link
Contributor

ankpshah commented Dec 6, 2023

Hello,
We will try to replicate this issue. Meanwhile to ensure that this doesn't block users facing this issue, I can suggest implementing custom timeout handling in your application. This could involve setting a timer in your app and handling a timeout if the SDK does not respond within the expected time frame.

@jflmateusnassar
Copy link
Author

Hello,
Ok, I'll be waiting.
I will check about your suggestion of implementing custom timeout handling.
Thanks @ankpshah

@jflmateusnassar
Copy link
Author

Hello @ankpshah

Regarding the custom timeout in my app, it is not worth implementing. Because in the IPv6 situation, it will solve the problem of freezing the application, but my client will never be able to login to the application.
It is not worth implementing.

@jflmateusnassar
Copy link
Author

Any news about this bug?

@ankpshah
Copy link
Contributor

Hello,
I was not able to replicate this behavior. Following is the setup I used to replicate this scenario:

  1. I used Proxy and set the bandwidth to 0 kbps using throttling (to replicate Wifi not connected to internet). Note that: device was still showing Wifi connection (without exclamation symbol) indicating device still assumes that it has network available. I also verified that network isnt available by browsing webpage which didnt load. This I think appropriately mimics the scenario.
  2. I tried calling getDetailsInBackground() which throws error immediately if no cached user session is available
    com.amazonaws.mobileconnectors.cognitoidentityprovider.exceptions.CognitoNotAuthorizedException: User-ID is null

Otherwise returns Cached User Details.

At no point application is freezes or goes in ANR state. I also verified that Http Client uses backoff strategy as configured in ClientConfiguration class.

As I am unable to replicate the issue I need to understand where exactly application freezes. It was mentioned earlier that it freezes at cognitoIdentityProviderClient.getUser() but there are more calls within this method.

  1. @jflmateusnassar can you to try to step in to method calls within getUser() call and let us know which file and line it freezes at ?
  2. Also can you place a debug point at AmazonHttpClient.java L295 executeHelper() and step through lines sequentially within this method to see if it freezes at any point here.

@ankpshah ankpshah added the pending-community-response Issue is pending response from the issue requestor label Dec 28, 2023
@jflmateusnassar
Copy link
Author

jflmateusnassar commented Jan 8, 2024

I did what you asked me to do. I placed the debug points and the application freezes in the executeHelper() method on the line

httpResponse = httpClient.execute(httpRequest);

Here's the video where I do the debugging.

20240108_085035.mp4

Below is also a photo of what my app looks like when trying to log in.
1704715100151

@jflmateusnassar
Copy link
Author

jflmateusnassar commented Jan 9, 2024

I tested limiting the network to 0kbps and everything really works as it should.
For you to simulate the problem, there must be a block on the internet.
I did some more debugging here and came across a problem in the http request android classes.
It freezes on this line in the image below
Capturar

The application is running, as shown in the image, but it is frozen forever.

@tylerjroach
Copy link
Member

@jflmateusnassar In reading this:

"To simulate the problem here I created a rule in my router to block packets"

Are you blocking writes out from the device, but possibly not incoming data?

Android/Java's HttpsUrlConnection allows connectTimeout and readTimeout values to be set.

  • connect timeout = timeout if it takes X amount of time to initiate the connection.
  • read timeout = timeout if it takes X amount of time when reading from Input stream when a connection is established to a resource

You may have created a scenario where the device is allowed to make a connection to the url, potentially read from the url, but is in an infinitate wait period trying to write to the connection (connection.getOutputStream).

A custom timer task on our end could potentially help to resolve this, but I want to understand the real world scenarios of this ocurring, as the forced used case feels like a significant edge case.

Can you provide more info on how often you are seeing this happen in your app. Is it believed that in these scenarios the customer may be in a situation such as being in a captive portal (ex: hotel, or airplane internet paywall)

@jflmateusnassar
Copy link
Author

jflmateusnassar commented Jan 9, 2024

Connect timeout and read timeout are set, but are completely ignored in this scenario.
I'm looking for a solution to this problem, as this same problem occurs when clients use IPv6 and have some type of network blocking.
Most of my app complaints are related to this freezing.
My app has more than 100k downloads.
The idea would be to have a timer in the SDK for this type of situation.

@tylerjroach
Copy link
Member

Can you elaborate on what you are referring to by "freezing"?

"Regarding the custom timeout in my app, it is not worth implementing. Because in the IPv6 situation, it will solve the problem of freezing the application, but my client will never be able to login to the application."

Any of these calls should be run in the background and not freeze the ui. I see in your screenshot that you are just showing a white screen. You should be able to continue updating the UI however you think is best for your application (Possibly letting the customer know they seem to not have internet access).

@jflmateusnassar
Copy link
Author

jflmateusnassar commented Jan 9, 2024

When I make any request to the SDK (logging in with Cognito, sending data to IoT Core, accessing Dynamo DB) the SDK never returns a value.
Everyone is on the infinite loading screen.
The blank screen I sent would be the splash screen (when I run it through Android Studio it doesn't even appear. When running outside Android Studio it stays on the splash screen).
When I log in to the app, if I'm already logged in, my credentials are checked in Cognito, which is why it's frozen on a blank screen.

I could implement a custom timeout, to close the splash screen and to close the loadings reporting a net error, but I opened the issue to see with you what else could be done or if you can do something about it.

@tylerjroach
Copy link
Member

Thanks for the context on the splash screen.

We can continue to look at what a reasonable solution would be here, but from what I'm gathering, this would be a writeTimeout. I do not think it would be safe to use either of the other 2 timeouts in our ClientConfig (especially since the change could alter current expected behavior). The implementation of this config would require additional reviews on our end since it expands the API surface of ClientConfig.

If setting a timeout on your end is reasonable on your end, I would suggest to do so at the moment. I can update this ticket if any changes are pushed on our end that would help your scenario.

@jflmateusnassar
Copy link
Author

jflmateusnassar commented Jan 10, 2024

Ok, I got it. And I will implement custom timeout.

Now, to close this case, I would like to know why in the Android SDK most of my customers who use IPv6 have this same problem. Those who use IPv4 do not have this problem.
Is the handling in the SDK for IPv4 different from IPv6?

I have the same application for the iOS platform that uses the SDK and works in any situation, both IPv6 and IPv4. Never had a problem.
Why does it work on iOS and not on Android?

The real reason I opened this issue is this error with IPv6 that my many customers have.

Is there any way to force the use of IPv4 only? (if yes, I believe this would solve all my problems)

@tylerjroach
Copy link
Member

@jflmateusnassar We do not have any special logic in handling IPv4 and IPv6. That is entirely done by Android's default network stack. I want to make sure I understand the scenario you believe is happening, correct me if I am wrong.

You believe that IPv6 + Android SDK are not working correctly for many customers that have full internet access (send/receive)?

We've been looking at this replication scenario where the customer is unable to send packets with speculation their internet may be blocked in some way (ex captive portal).

In the first message you stated "An important detail is also that the problem also occurs on networks that do not support IPv6", but the last message says "Those who use IPv4 do not have this problem".

I'm not sure about differences between Android and IOS, except for that they have entirely different network stacks. We are using HttpsUrlConnection on Android, which is the default network stack for Android. Our newer Amplify v2 and the Kotlin SDK, instead use OkHttp.

@jflmateusnassar
Copy link
Author

jflmateusnassar commented Jan 10, 2024

I can't tell you that customers have full access to the internet. This always depends on the customer's internet provider and I don't have access. But I believe there is some blocking, but not a total blocking of the network.

About this statement:

"An important detail is also that the problem also occurs on networks that do not support IPv6"

When I wrote the message I still didn't understand correctly how the problem occurred with IPv6, but now I managed to discover that the problem always occurs when IPv6 is supported and there is probably some type of blocking. This I believe is the correct scenario.
I apologize for not informing you earlier.

Note: When I say that he does not have a total internet block, I mean that he can access websites normally with the network using IPv6. But our app freezes.

@jflmateusnassar
Copy link
Author

jflmateusnassar commented Jan 12, 2024

I did a test with the custom timeout implementation, and it solved the freezing problem.
But other problems arise that I'm not happy with.

Every time I try to login with cognito it gets stuck in executeHelper() method. If I try 10 times, it will get stuck 10 times, consuming the smartphone's memory resources unnecessarily.

For this reason I think you better:

  1. Create a way to cancel the method call to avoid this other problem.
  2. Or you can create a configurable timeout in the SDK for every time there are requests in executeHelper(). Anyone who needs it uses this timeout, anyone who doesn't need it doesn't use it. When the timeout expires, the SDK itself cancels the call, freeing up resources.

I like option 2 more.

I believe it is possible to do this without too many problems.

@tylerjroach
Copy link
Member

@jflmateusnassar I agree, it is not optimal for these calls to be left hanging in the sdk. Will mark as a task for our team to further take a look and will update this ticket when we have updates to provide.

Thank you for your help in investigating this.

@jflmateusnassar
Copy link
Author

@tylerjroach Ok, thank you very much for your help and information.

@tylerjroach tylerjroach added feature-request Request a new feature and removed bug Something isn't working labels Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request a new feature pending-community-response Issue is pending response from the issue requestor
Projects
None yet
Development

No branches or pull requests

3 participants