Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 1.8.2
-
Fix Version/s: 1.8.3, 1.8.2-EE-GA_P02
-
Component/s: Framework
-
Labels:None
-
Environment:ICEFaces + Tomcat 6 + clustered failover
Description
One way to generate a failover for testing purposes is to stop Tomcat using shutdown.sh.
This has been observed to cause a <session-expired /> response to the receive-updates request. In general, the blocking request to the original primary server is release saying there's an update available for the client. When the client goes to fetch the update, Apache has generally been fast enough to switch this receive-updates request to the new primary server (leading down that path) while the pending update on the outgoing primary server has been left waiting.
This is different from the expireSessionsOnShutdown setting in the Manager Element. That setting correctly avoids expiring the session on the secondary node (say lntest2) when the original primary node is shut down. That always works and the earlier created session is always valid on the secondary node. This has more to do with our active bridge shutdown protocol.
A stack trace showing the path of execution is below showing it's the servlet destroy() method that generates the <session-expired/> response:
INFO: Waiting for 1 instance(s) to be deallocated
SHUTTING DOWN SERVER --
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1158)
at com.icesoft.faces.webapp.http.servlet.MainSessionBoundServlet$5.run(MainSessionBoundServlet.java:157)
at com.icesoft.faces.webapp.http.servlet.MainSessionBoundServlet.shutdown(MainSessionBoundServlet.java:199)
at com.icesoft.faces.webapp.http.servlet.SessionDispatcher.shutdown(SessionDispatcher.java:64)
at com.icesoft.faces.webapp.http.servlet.SessionVerifier.shutdown(SessionVerifier.java:38)
at com.icesoft.faces.webapp.http.servlet.PathDispatcher.shutdown(PathDispatcher.java:40)
at com.icesoft.faces.webapp.http.servlet.MainServlet.destroy(MainServlet.java:178)
at org.apache.catalina.core.StandardWrapper.unload(StandardWrapper.java:1393)
at org.apache.catalina.core.StandardWrapper.stop(StandardWrapper.java:1738)
at org.apache.catalina.core.StandardContext.stop(StandardContext.java:4509)
at org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:924)
at org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java:1191)
at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:1162)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:313)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1086)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1098)
at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:448)
at org.apache.catalina.core.StandardService.stop(StandardService.java:584)
at org.apache.catalina.core.StandardServer.stop(StandardServer.java:744)
at org.apache.catalina.startup.Catalina.stop(Catalina.java:628)
at org.apache.catalina.startup.Catalina.start(Catalina.java:603)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
** 2) Dispose method called DisposableBean
This doesn't occur on other servers. Tomcat's really fast shutdown sequence probably catches out the Apache load balancing and in some cases the response from the server gets back to the client. On the lntest servers, this happens about 1/3 of the time.
This has been observed to cause a <session-expired /> response to the receive-updates request. In general, the blocking request to the original primary server is release saying there's an update available for the client. When the client goes to fetch the update, Apache has generally been fast enough to switch this receive-updates request to the new primary server (leading down that path) while the pending update on the outgoing primary server has been left waiting.
This is different from the expireSessionsOnShutdown setting in the Manager Element. That setting correctly avoids expiring the session on the secondary node (say lntest2) when the original primary node is shut down. That always works and the earlier created session is always valid on the secondary node. This has more to do with our active bridge shutdown protocol.
A stack trace showing the path of execution is below showing it's the servlet destroy() method that generates the <session-expired/> response:
INFO: Waiting for 1 instance(s) to be deallocated
SHUTTING DOWN SERVER --
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1158)
at com.icesoft.faces.webapp.http.servlet.MainSessionBoundServlet$5.run(MainSessionBoundServlet.java:157)
at com.icesoft.faces.webapp.http.servlet.MainSessionBoundServlet.shutdown(MainSessionBoundServlet.java:199)
at com.icesoft.faces.webapp.http.servlet.SessionDispatcher.shutdown(SessionDispatcher.java:64)
at com.icesoft.faces.webapp.http.servlet.SessionVerifier.shutdown(SessionVerifier.java:38)
at com.icesoft.faces.webapp.http.servlet.PathDispatcher.shutdown(PathDispatcher.java:40)
at com.icesoft.faces.webapp.http.servlet.MainServlet.destroy(MainServlet.java:178)
at org.apache.catalina.core.StandardWrapper.unload(StandardWrapper.java:1393)
at org.apache.catalina.core.StandardWrapper.stop(StandardWrapper.java:1738)
at org.apache.catalina.core.StandardContext.stop(StandardContext.java:4509)
at org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:924)
at org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java:1191)
at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:1162)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:313)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1086)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1098)
at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:448)
at org.apache.catalina.core.StandardService.stop(StandardService.java:584)
at org.apache.catalina.core.StandardServer.stop(StandardServer.java:744)
at org.apache.catalina.startup.Catalina.stop(Catalina.java:628)
at org.apache.catalina.startup.Catalina.start(Catalina.java:603)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
** 2) Dispose method called DisposableBean
This doesn't occur on other servers. Tomcat's really fast shutdown sequence probably catches out the Apache load balancing and in some cases the response from the server gets back to the client. On the lntest servers, this happens about 1/3 of the time.
It seems it's hard to reproduce this with recent version so it would be instructive at the very least to figure out why. The situation occurred with Tomcat and Apache. I think the exact versions were Tomcat 6.0.18 and apache 2.2.11.
The exact scenario is that the client has a blocking connection open to server 1 which is the primary server. When you shutdown the server using the shell script (on the lntest server) the server is going to try to dispose of the session because of the code path shown above. This will result in the blocking connection returning with a response indicating there is an update for the client.
The client will turn around and fetch the update. The update (if it is received) will contain a session-expired response which will cause the bridge to shut down and the session expired dialog to pop up, so it is very clearly the wrong thing to do. If it is not fetched, then it doesn't matter since the node is going down anyway.
If Apache is quick to switch traffic to server 2 (the new primary node) then the receive-updates request will be routed to server 2 which will cause the page to be reloaded and this is a good thing.
The question is whether there has been any definite change to tomcat+apache failover code that would ensure that this situation doesn't occur.