Observations with Tomcat 6.0.14
I can get session scoped backing bean values to be duplicated from one node to the other, but only under a very prescribed set of circumstances.
Both nodes must be up and in good health when the application creates a session. It appears that Tomcat isn't very robust in duplicating sessions to a new node member when the node comes back online if the node was down when the node was created. This appears to mean that the nodes fire session duplication events which work if all nodes are receiving, but there is no session duplication strategy (persistent attempts at synchronization, retrograde updates, etc) to handle the case where something changes when a node is down.
Getting failover to occur is not that easy. Apache will load balance between the nodes, but gracefully shutting down tomcat on the primary node invalidates the sessions, which kills the sessions on all nodes. The only way to leave the application in the desired state but to achieve failover is to terminate the app server (kill -9).
Our active protocol between client and server is not helping. If the first interaction between client and server is any of the blocking requests (ping, receive-updated-views, receive-updates), and not a manual page reload, the application gets the session timeout message because the blocking request verifies the request via view number, and the necessary object structure is not set up for the user.
All we ever get is a one way failover. A session on node A is found on Node B. Once A is restarted, there doesn't seem any way to fail back. I think this is on account of the lack of coherent session duplication strategy in Tomcat. Also, I have never been able to get the session duplicated from lntest 7 to lntest6, (lntest6 being the first node in Apache's load balancing table, or node A). I can't account for this, and have no idea why. The configuration should not have any bias one way or the other.
In the scenario that works, when I kill tomcat on Node A, I immediately get a 'service not available' display in the browser. I can reload the application, and the session scoped values appear. When I kill Tomcat on Node B (lntest7) I don't get this interaction. The application just stays on the same page, and when the reload comes back, it has always started a new session with empty backing bean values.
Sticky session mechanics between Apache and Tomcat Tomcat append a string (defined in httpd.conf) to the session id which is used to identify the intended target Node. The approach puts the burden of identifying the target node with the client, since this string is visible to the client and the server. The client passes this string around rather that Apache keeping a hash of sessionId's pointing to their current target node. This means that sessions like CDXDC[...].node1 and CDXDC[...].node2 can be created for the same client. Tomcat manages to keep this straight and is able to duplicate the session information between these two independent sessions, but our SessionDispatcher Hashmap cannot. This doesn't really effect the Session failover stuff, but it does mean we will occasionally keep a MainSessionBoundServlet sub-object tree around for no reason. However, we need to see more of the server strategies for session affinity before we start looking for "." characters in the session id.
Failover in Java enterprise clusters is typically implemented through the following: make use of a database that has an internal (clustered) failover mechanism, and support failover in the web-tier through session replication.
Database clustering is outside the scope of this discussion and is provided by existing third party implementations.
The central interest in failover for ICEfaces then becomes support for session replication.
A stock JSF application supports failover through persistence of component saved state in the session. This saved state is replicated to alternate cluster nodes. In the case of node failure, the "sticky session" is transferred to the alternate node and execution proceeds with minimal impact (since the session is the only user-specific state stored in the web tier).
The following is a strategy that should allow an ICEfaces application to failover to an alternate node:
1. Support existing JSF state-saving. This is currently being undertaken to reduce server-side memory requirements (the persistent component tree becomes optional) but will have important benefits in failover. the essential aspect is that it allows the application to continue execution with the component tree in the same state as it was before the failure occurred.
2. Treat the case of a null "old DOM" as a full page refresh. Rather than suffer the performance cost of constantly transferring the DOM to the alternate node, simply refresh user's pages during the (rare) event of node failure. The important point here is that application functionality is not affected; a full-page refresh imparts only a usability impact and can occur under circumstances as well.
3. Analyze the servlet stack for "session-like" state (such as icefacesID and viewNumbers). It may be worthwhile to move some of this state into the session in a controlled manner (such as a single serializable session object).