[PUSH-23] Auction updates stop after a variable period of time - ICEsoft JIRA Issue Tracker

Hide

Permalink

Deryk Sinotte added a comment - 20/Jan/10 4:36 PM

A number of debug logging statements were added to try and narrow down when and why the updates stopped. Here's a subset of that logging that shows what happens in the case of one browser where the clocks stopped ticking after a few minutes:

//Push group is created
Jan 20, 2010 11:30:45 AM org.icepush.PushContext$Group <init>
FINEST: '3B938F0B693880AC6FDA99BBBCEB9160' push group created.

//Push ID is added to the group
Jan 20, 2010 11:30:45 AM org.icepush.PushContext addGroupMember
FINEST: Added pushId '1264015845649' to group '3B938F0B693880AC6FDA99BBBCEB9160'.

//The final time that the lastAccess time is updated for this id. The lastAccess time should be updated every 3 seconds or so provided //there are incoming requests from the client containing the required id.
Jan 20, 2010 11:32:40 AM org.icepush.PushContext$Group touchIfMatching
FINEST: lastAccess updated for 1264015845649

//The last time a successful notification is sent for this push id.
Jan 20, 2010 11:34:41 AM org.icepush.BlockingConnectionServer$4 writeTo
FINEST: Sending notifications for [1264015845649].

//Then the group is discard due to expiration, exactly 2 minutes after final lastAccess update.
Jan 20, 2010 11:34:41 AM org.icepush.PushContext$Group discardIfExpired
FINEST: '3B938F0B693880AC6FDA99BBBCEB9160' push group expired.

//Continue to get these for that same group every 1 second. However, nothing is sent since the group has expired.
Jan 20, 2010 11:35:59 AM org.icepush.PushContext push
FINEST: Push notification triggered for '3B938F0B693880AC6FDA99BBBCEB9160' pushId.

So it appears that the mechanism used to update the lastAccess time is not working. As more browser instances are added, it seems the more time elapses between successful updates. When the period gets longer than 2 minutes, the group is expired/discarded and the updates are no longer sent. The client, however, appears to be still be sending the appropriate requests so it would seem that there is something on the server side that isn't quite behaving.

Show

Deryk Sinotte added a comment - 20/Jan/10 4:36 PM A number of debug logging statements were added to try and narrow down when and why the updates stopped. Here's a subset of that logging that shows what happens in the case of one browser where the clocks stopped ticking after a few minutes: //Push group is created Jan 20, 2010 11:30:45 AM org.icepush.PushContext$Group <init> FINEST: '3B938F0B693880AC6FDA99BBBCEB9160' push group created. //Push ID is added to the group Jan 20, 2010 11:30:45 AM org.icepush.PushContext addGroupMember FINEST: Added pushId '1264015845649' to group '3B938F0B693880AC6FDA99BBBCEB9160'. //The final time that the lastAccess time is updated for this id. The lastAccess time should be updated every 3 seconds or so provided //there are incoming requests from the client containing the required id. Jan 20, 2010 11:32:40 AM org.icepush.PushContext$Group touchIfMatching FINEST: lastAccess updated for 1264015845649 //The last time a successful notification is sent for this push id. Jan 20, 2010 11:34:41 AM org.icepush.BlockingConnectionServer$4 writeTo FINEST: Sending notifications for [1264015845649] . //Then the group is discard due to expiration, exactly 2 minutes after final lastAccess update. Jan 20, 2010 11:34:41 AM org.icepush.PushContext$Group discardIfExpired FINEST: '3B938F0B693880AC6FDA99BBBCEB9160' push group expired. //Continue to get these for that same group every 1 second. However, nothing is sent since the group has expired. Jan 20, 2010 11:35:59 AM org.icepush.PushContext push FINEST: Push notification triggered for '3B938F0B693880AC6FDA99BBBCEB9160' pushId. So it appears that the mechanism used to update the lastAccess time is not working. As more browser instances are added, it seems the more time elapses between successful updates. When the period gets longer than 2 minutes, the group is expired/discarded and the updates are no longer sent. The client, however, appears to be still be sending the appropriate requests so it would seem that there is something on the server side that isn't quite behaving.

Hide

Permalink

Deryk Sinotte added a comment - 20/Jan/10 4:41 PM

In the BlockingConnectionServer, the participating push ids are captured and logged. These appear to occur every 1 second so the client looks like it's doing its thing correctly:

public void service(final Request request) throws Exception {
resetTimeout();
respondIfPendingRequest(CloseResponse);
participatingPushIDs = Arrays.asList(request.getParameterAsStrings("ice.pushid"));
if (log.isLoggable(Level.FINEST))

{ log.finest("Participating pushIds: " + participatingPushIDs + "."); }

pendingRequest.put(request);
respondIfNotificationsAvailable();
inboundNotifier.notifyObservers(participatingPushIDs);
}

This is the code that should be run when necessary to update the lastAccess time on the appropriate Group:

public void update(Observable observable, Object o) {
long now = System.currentTimeMillis();
//avoid to scan/touch the groups on each notification
if (lastScan + GroupScanningTimeResolution < now) {
lastScan = now;
List pushIDs = (List) o;
Iterator i = new ArrayList(groups.values()).iterator();
while (i.hasNext())

{ Group group = (Group) i.next(); group.touchIfMatching(pushIDs); group.discardIfExpired(); }

}
}

Show

Deryk Sinotte added a comment - 20/Jan/10 4:41 PM In the BlockingConnectionServer, the participating push ids are captured and logged. These appear to occur every 1 second so the client looks like it's doing its thing correctly: public void service(final Request request) throws Exception { resetTimeout(); respondIfPendingRequest(CloseResponse); participatingPushIDs = Arrays.asList(request.getParameterAsStrings("ice.pushid")); if (log.isLoggable(Level.FINEST)) { log.finest("Participating pushIds: " + participatingPushIDs + "."); } pendingRequest.put(request); respondIfNotificationsAvailable(); inboundNotifier.notifyObservers(participatingPushIDs); } This is the code that should be run when necessary to update the lastAccess time on the appropriate Group: public void update(Observable observable, Object o) { long now = System.currentTimeMillis(); //avoid to scan/touch the groups on each notification if (lastScan + GroupScanningTimeResolution < now) { lastScan = now; List pushIDs = (List) o; Iterator i = new ArrayList(groups.values()).iterator(); while (i.hasNext()) { Group group = (Group) i.next(); group.touchIfMatching(pushIDs); group.discardIfExpired(); } } }

Hide

Permalink

Deryk Sinotte added a comment - 20/Jan/10 4:47 PM

To recreate this issue on a local machine, we used variations of the following shell script technique to run 9 separate Firefox instances as more browsers appears to make the problem occur more quickly:

for index in 1 2 3 4 5 6 7 8 9; do firefox -no-remote -P $index http://localhost:8080/auction & done

What I noticed doing this is that the lastAccess time for the first browser seems to be reliably updated every 3 seconds. However, as more browsers are pointed at the server, the lastAccess time is less and less frequently. The last (9th) browser instance in my test was the first to fail and the lastAccess time was only ever updated twice. The failure occurred exactly 2 minutes after the lastAccess time was updated. This is more evidence that the problem is likely on the server side as the client continues to send requests and get notifications and updates right up until the group is expired.

Show

Deryk Sinotte added a comment - 20/Jan/10 4:47 PM To recreate this issue on a local machine, we used variations of the following shell script technique to run 9 separate Firefox instances as more browsers appears to make the problem occur more quickly: for index in 1 2 3 4 5 6 7 8 9; do firefox -no-remote -P $index http://localhost:8080/auction & done What I noticed doing this is that the lastAccess time for the first browser seems to be reliably updated every 3 seconds. However, as more browsers are pointed at the server, the lastAccess time is less and less frequently. The last (9th) browser instance in my test was the first to fail and the lastAccess time was only ever updated twice. The failure occurred exactly 2 minutes after the lastAccess time was updated. This is more evidence that the problem is likely on the server side as the client continues to send requests and get notifications and updates right up until the group is expired.

Hide

Permalink

Ted Goddard added a comment - 20/Jan/10 6:27 PM

To determine if the browser is actually sending the desired requests, I added an alert to snowplow/src/main/javascript/connection.async.js here:

} else

{ info(logger, 'blocking connection stopped at server\'s request...'); +alert("Server X-Connection: close"); }

and server-side logging to src/main/java/org/icepush/BlockingConnectionServer.java here:

response.setHeader("X-Connection", "close");
+ response.setHeader("X-icepush", "closed due to previous blocking connection");
response.setHeader("Content-Length", 0);

+System.out.println("close current connection ");

and

public void service(final Request request) throws Exception {
resetTimeout();
+System.out.println(this + " " + pendingRequest + " pendingRequest for " + request.getHeader("Cookie"));

that logs as follows when the problem occurs:

org.icepush.BlockingConnectionServer$3@5bdcd49d [] pendingRequest for JSESSIONID=043B4D5EECBF6ACFFA917FC451A40990; ice.pushids=1264032693662; ice.connection.lease=1264032696040; ice.connection.running=9e20e%3Aacquired; ice.notified.pushids=

org.icepush.BlockingConnectionServer$3@5bdcd49d [org.icepush.servlet.ThreadBlockingAdaptingServlet$ThreadBlockingRequestResponse@7f0abe88] pendingRequest for JSESSIONID=2DD1B04BE3E0E178FA0657CFA9294E92; ice.pushids=1264032693745; ice.connection.lease=1264032696117; ice.connection.running=9d826%3Aacquired; ice.notified.pushids=

close current connection

Two different JSESSIONID values are seen with the same BlockingConnectionServer. Either, the session is being restarted, or the BlockingConnectionServer objects are being incorrectly shared across multiple sessions. Note that ice.push.browser is not set, so perhaps all empty string identified browsers are sharing a single BlockingConnectionServer.

Show

Ted Goddard added a comment - 20/Jan/10 6:27 PM To determine if the browser is actually sending the desired requests, I added an alert to snowplow/src/main/javascript/connection.async.js here: } else { info(logger, 'blocking connection stopped at server\'s request...'); +alert("Server X-Connection: close"); } and server-side logging to src/main/java/org/icepush/BlockingConnectionServer.java here: response.setHeader("X-Connection", "close"); + response.setHeader("X-icepush", "closed due to previous blocking connection"); response.setHeader("Content-Length", 0); +System.out.println("close current connection "); and public void service(final Request request) throws Exception { resetTimeout(); +System.out.println(this + " " + pendingRequest + " pendingRequest for " + request.getHeader("Cookie")); that logs as follows when the problem occurs: org.icepush.BlockingConnectionServer$3@5bdcd49d [] pendingRequest for JSESSIONID=043B4D5EECBF6ACFFA917FC451A40990; ice.pushids=1264032693662; ice.connection.lease=1264032696040; ice.connection.running=9e20e%3Aacquired; ice.notified.pushids= org.icepush.BlockingConnectionServer$3@5bdcd49d [org.icepush.servlet.ThreadBlockingAdaptingServlet$ThreadBlockingRequestResponse@7f0abe88] pendingRequest for JSESSIONID=2DD1B04BE3E0E178FA0657CFA9294E92; ice.pushids=1264032693745; ice.connection.lease=1264032696117; ice.connection.running=9d826%3Aacquired; ice.notified.pushids= close current connection Two different JSESSIONID values are seen with the same BlockingConnectionServer. Either, the session is being restarted, or the BlockingConnectionServer objects are being incorrectly shared across multiple sessions. Note that ice.push.browser is not set, so perhaps all empty string identified browsers are sharing a single BlockingConnectionServer.

Hide

Permalink

Ted Goddard added a comment - 20/Jan/10 6:34 PM

If the error does not occur quickly, close all browsers and launch another set of 10.

Logging in BrowserDispatcher.java shows multiple:

lookupServer found org.icepush.servlet.BrowserDispatcher$BrowserEntry@27a4f43e for browserID null
lookupServer found org.icepush.servlet.BrowserDispatcher$BrowserEntry@27a4f43e for browserID null

So there is a very good possibility that multiple browsers will share the same "previousRequest" and abort their connection.

Assigning to Mircea for comment on whether the browserID failure could case the reported problem.

Show

Ted Goddard added a comment - 20/Jan/10 6:34 PM If the error does not occur quickly, close all browsers and launch another set of 10. Logging in BrowserDispatcher.java shows multiple: lookupServer found org.icepush.servlet.BrowserDispatcher$BrowserEntry@27a4f43e for browserID null lookupServer found org.icepush.servlet.BrowserDispatcher$BrowserEntry@27a4f43e for browserID null So there is a very good possibility that multiple browsers will share the same "previousRequest" and abort their connection. Assigning to Mircea for comment on whether the browserID failure could case the reported problem.

Hide

Permalink

Ted Goddard added a comment - 21/Jan/10 11:23 AM

Adding

response.addCookie(new Cookie(BrowserIDCookieName, browserID));

to the else path in createPushId appears to fix the problem when multiple browsers are launched (the cookie must be applied to the response no matter what stale value is found in the ThreadLocal currentBrowserID.

The ThreadLocal should be removed unless there is a very good reason for it's use.

Show

Ted Goddard added a comment - 21/Jan/10 11:23 AM Adding response.addCookie(new Cookie(BrowserIDCookieName, browserID)); to the else path in createPushId appears to fix the problem when multiple browsers are launched (the cookie must be applied to the response no matter what stale value is found in the ThreadLocal currentBrowserID. The ThreadLocal should be removed unless there is a very good reason for it's use.

Hide

Permalink

Ted Goddard added a comment - 21/Jan/10 11:52 AM

Fix is checked in using request attributes in place of the thread local.

Show

Ted Goddard added a comment - 21/Jan/10 11:52 AM Fix is checked in using request attributes in place of the thread local.

Hide

Permalink

Ted Goddard added a comment - 22/Jan/10 11:52 AM

Request attribute fix resolves push failure for multiple browsers. Still investigating push failures after extended periods.

Show

Ted Goddard added a comment - 22/Jan/10 11:52 AM Request attribute fix resolves push failure for multiple browsers. Still investigating push failures after extended periods.

Hide

Permalink

Mircea Toma added a comment - 28/Jan/10 4:13 PM

Make sure that all inbound pushIds are accumulated from all the participating browsers before a touch & discard group scanning is executed. The previous algorithm was using randomly only the pushIds of one of the browsers when running the touch & discard group scanning.

Show

Mircea Toma added a comment - 28/Jan/10 4:13 PM Make sure that all inbound pushIds are accumulated from all the participating browsers before a touch & discard group scanning is executed. The previous algorithm was using randomly only the pushIds of one of the browsers when running the touch & discard group scanning.

Auction updates stop after a variable period of time

Details

Description

Issue Links

Activity

People

Dates