The problem causing the Push storm for me was code I had added (see PUSH-221) to allow registering notification URIs from the server-side. Under certain circumstances, the code would throw an NPE when an old push id was included with the request. I improved the null check so managed to avoid causing the Push storm in that manner but this exception was being caught and handled in the BlockingConnectionServer$RunningServer.service() method. There is a large try/catch that does this:
try {
...
NullPointerException occurs
...
} catch (RuntimeException e) {
log.log(Level.FINE,"Request does not contain pushIDs.",e);
respondIfPendingRequest(ErrorNoopResponse);
}
So the NPE was being caught and a <noop> response being returned with the header:
The bridge receives this and right away tries to re-establish the blocking connection. However, the NPE would happen again and -> Push storm.
In the past (as per PUSH-135), to prevent these types Push storms we had instituted a mechanism to cap the number of retries when an empty response is returned. It doesn't appear that this logic is handling the situation. In connection.async.js we have:
var nonEmptyResponse = notEmpty(contentAsText(response));
if (reconnect) {
if (nonEmptyResponse) {
....
Perhaps because it's not really empty. So I believe we need to:
- Review the BlockingConnectionServer$RunningServer.service() to see if we are doing the right thing. Any RuntimeException in this block of code will result in the same behaviour. We should ensure that the try/catch around the whole block is what we want and that when a RuntimeException occurs, this response we are sending back is appropriate.
- Look at improving the logging as "request does not contain pushIDs" was a not quite what was happening here. The request did contain a push id but it was not active/valid one.
- Depending on what we do with the service() method, we may need to tweak the bridge so that when it sees this type of response, it doesn't just keep trying repeatedly.
The easiest way to reproduce this is synthetically. Just have it throw an NPE somewhere in the middle of that service() method based on some criteria. For example, I wedged this in towards the bottom of the try block:
System.out.println("BlockingConnectionServer$RunningServer.service: sequenceNumber = " + sequenceNumber);
if(sequenceNumber > 2){
throw new NullPointerException("synthetic NPE to create Push storm");
}
After a few listen.icepush requests, a storm should be triggered.
The problem causing the Push storm for me was code I had added (see
PUSH-221) to allow registering notification URIs from the server-side. Under certain circumstances, the code would throw an NPE when an old push id was included with the request. I improved the null check so managed to avoid causing the Push storm in that manner but this exception was being caught and handled in the BlockingConnectionServer$RunningServer.service() method. There is a large try/catch that does this:So the NPE was being caught and a <noop> response being returned with the header:
The bridge receives this and right away tries to re-establish the blocking connection. However, the NPE would happen again and -> Push storm.
In the past (as per
PUSH-135), to prevent these types Push storms we had instituted a mechanism to cap the number of retries when an empty response is returned. It doesn't appear that this logic is handling the situation. In connection.async.js we have:Perhaps because it's not really empty. So I believe we need to:
The easiest way to reproduce this is synthetically. Just have it throw an NPE somewhere in the middle of that service() method based on some criteria. For example, I wedged this in towards the bottom of the try block:
After a few listen.icepush requests, a storm should be triggered.