Details
Description
The latest version of Apache immediately sends back a 500 error when requests are received in the midst of failover or from the AHS server when it handles server side exceptions
This is potentially better than having the connection hang and go unanswered, but the current handling of this error condition shuts down the bridge.
We'd like to adopt the strategy of reloading the page once or twice in the case of these errors. There are a couple of considerations. On graceful shutdown of a node, all of the blocking connections will be released nearly simultaneously and the reloads from all of the servers should potentially be spread out over an interval. No attempt should be made before Apache is willing to receive them or this will start looping, which in the case of 2.2.10, is greater than 2.5 seconds, but somewhere less than 5. Configurable by the user might be nice as well.
This is potentially better than having the connection hang and go unanswered, but the current handling of this error condition shuts down the bridge.
We'd like to adopt the strategy of reloading the page once or twice in the case of these errors. There are a couple of considerations. On graceful shutdown of a node, all of the blocking connections will be released nearly simultaneously and the reloads from all of the servers should potentially be spread out over an interval. No attempt should be made before Apache is willing to receive them or this will start looping, which in the case of 2.2.10, is greater than 2.5 seconds, but somewhere less than 5. Configurable by the user might be nice as well.
A typical strategy in cases like this is to use exponential backoff with steadily increasing delays:
In the full solution, it may also be desirable to delay for a small random amount of time in addition to the incremental delay to avoid all clients hitting the redundant node simultaneously during failover.
(The initial prototype does not need to account for any of this; simply delaying by a sufficiently long fixed amount before the bridge retries after a 500 error will confirm the approach.)