Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-5124 configurable http timeouts #5125

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

hmottestad
Copy link
Contributor

@hmottestad hmottestad commented Sep 13, 2024

GitHub issue resolved: #5124

Briefly describe the changes proposed in this PR:

  • introduced timeout for the underlying Apache HttpClient
  • made them configurable
  • made different timeouts for when using with a SPARQL SERVICE call

PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@hmottestad
Copy link
Contributor Author

hmottestad commented Sep 18, 2024

	/**
 	 * Default HTTP connection timeout in milliseconds for general use. Set to 1 hour.
 	 */
 	public static final int DEFAULT_CONNECTION_TIMEOUT = 60 * 60 * 1000; // 1 hour

 	/**
 	 * Default HTTP connection request timeout in milliseconds for general use. Set to 10 days.
 	 */
 	public static final int DEFAULT_CONNECTION_REQUEST_TIMEOUT = 10 * 24 * 60 * 60 * 1000; // 10 days

 	/**
 	 * Default HTTP socket timeout in milliseconds for general use. Set to 10 days.
 	 */
 	public static final int DEFAULT_SOCKET_TIMEOUT = 10 * 24 * 60 * 60 * 1000; // 10 days

 	// Default timeout values for SPARQL SERVICE calls

 	/**
 	 * Default HTTP connection timeout in milliseconds for SPARQL SERVICE calls. Set to 10 minutes.
 	 */
 	public static final int DEFAULT_SPARQL_CONNECTION_TIMEOUT = 10 * 60 * 1000; // 10 minutes

 	/**
 	 * Default HTTP connection request timeout in milliseconds for SPARQL SERVICE calls. Set to 6 hours.
 	 */
 	public static final int DEFAULT_SPARQL_CONNECTION_REQUEST_TIMEOUT = 6 * 60 * 60 * 1000; // 6 hours

 	/**
 	 * Default HTTP socket timeout in milliseconds for SPARQL SERVICE calls. Set to 6 hours.
 	 */
 	public static final int DEFAULT_SPARQL_SOCKET_TIMEOUT = 6 * 60 * 60 * 1000; // 6 hours

@hmottestad
Copy link
Contributor Author

  • connection timeout is the amount of time it will wait for the TCP connection to be established
  • socket timeout is the amount of time it will wait for data on the socket, network level
  • connection request timeout is the amount of time it will wait to get a connection from the pool

@hmottestad
Copy link
Contributor Author

The socket timeout should be sufficiently long that a complex query won't timeout while waiting for a response. When queries are sent using the SERVICE call then we assume that there is a remote server that we don't really have control over, so we don't want to wait as long as we otherwise would on a query to our own workbench/server.

The connection timeout should be long enough not to be disruptive for background tasks or batch request that we would prefer to wait for the server to respond rather than error out. We don't want it to be too long though, in case the server actually isn't responsive. For SERVICE calls we want to fail a bit faster, but still give the server time to respond if it is overloaded.

Finally the connection request timeout is how long we are willing to wait for a connection from the pool. Since we own the connection pool we should be in control of how many connections we have and should be patient if the pool is empty. Here it's important not to break things for our users. Since the timeout was infinite before, then we should be fairly generous now. Since SERVICE calls are performed as part of a SPARQL query we should consider it more acceptable to timeout sooner.

All 6 timeouts are configurable through system properties:

	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.http.connectionTimeout} for specifying the HTTP
 	 * connection timeout in milliseconds for general use. Default is 1 hour.
 	 *
 	 * <p>
 	 * The connection timeout determines the maximum time the client will wait to establish a TCP connection to the
 	 * server. A default of 1 hour is set to allow for potential network delays without causing unnecessary timeouts.
 	 * </p>
 	 */
 	public static final String CONNECTION_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.http.connectionTimeout";

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.http.connectionRequestTimeout} for specifying the
 	 * HTTP connection request timeout in milliseconds for general use. Default is 10 days.
 	 *
 	 * <p>
 	 * The connection request timeout defines how long the client will wait for a connection from the connection pool. A
 	 * longer timeout is acceptable here since operations like large file uploads may need to wait for an available
 	 * connection.
 	 * </p>
 	 */
 	public static final String CONNECTION_REQUEST_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.http.connectionRequestTimeout";

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.http.socketTimeout} for specifying the HTTP socket
 	 * timeout in milliseconds for general use. Default is 10 days.
 	 *
 	 * <p>
 	 * The socket timeout controls the maximum period of inactivity between data packets during data transfer. A longer
 	 * timeout is appropriate for large data transfers, ensuring that operations are not interrupted prematurely.
 	 * </p>
 	 */
 	public static final String SOCKET_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.http.socketTimeout";

 	// System property constants for SPARQL SERVICE timeouts

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.sparql.http.connectionTimeout} for specifying the
 	 * HTTP connection timeout in milliseconds when used in SPARQL SERVICE calls. Default is 10 minutes.
 	 *
 	 * <p>
 	 * A shorter connection timeout is set for SPARQL SERVICE calls to quickly detect unresponsive endpoints in
 	 * federated queries, improving overall query performance by avoiding long waits for unreachable servers.
 	 * </p>
 	 */
 	public static final String SPARQL_CONNECTION_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.sparql.http.connectionTimeout";

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.sparql.http.connectionRequestTimeout} for specifying
 	 * the HTTP connection request timeout in milliseconds when used in SPARQL SERVICE calls. Default is 6 hours.
 	 *
 	 * <p>
 	 * This timeout controls how long the client waits for a connection from the pool when making SPARQL SERVICE calls.
 	 * A shorter timeout than general use ensures that queries fail fast if resources are constrained, maintaining
 	 * responsiveness.
 	 * </p>
 	 */
 	public static final String SPARQL_CONNECTION_REQUEST_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.sparql.http.connectionRequestTimeout";

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.sparql.http.socketTimeout} for specifying the HTTP
 	 * socket timeout in milliseconds when used in SPARQL SERVICE calls. Default is 6 hours.
 	 *
 	 * <p>
 	 * The socket timeout for SPARQL SERVICE calls is set to a shorter duration to detect unresponsive servers during
 	 * data transfer, ensuring that the client does not wait indefinitely for data that may never arrive.
 	 * </p>
 	 */
 	public static final String SPARQL_SOCKET_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.sparql.http.socketTimeout";

@hmottestad hmottestad force-pushed the GH-5124-configurable-http-timeouts branch from 16db967 to 20416e2 Compare September 18, 2024 08:53
@hmottestad hmottestad marked this pull request as ready for review September 18, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant