Details
-
Type: Bug
-
Status: Open
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: 1.8.2.GA
-
Fix Version/s: None
-
Component/s: Enterprise Push Server
-
Labels:None
-
Environment:ICEFaces + EPS in WebSphere (but should be in all app server environments)
Description
During failover, the code in the ICEFaces framework has been designed to request a full page reload if a xmlhttp request arrives for a view that hasn't been created on the new primary node. This creates the object structure necessary to process requests. This works as intended.
From our discussion:
I believe this is EPS doing it. It's hard for EPS to know when fail-over has occurred. The Core can rely on a view structure and such, EPS cannot. To know when a fail-over occurred we introduced the EPSID. Whenever a blocking request is first handled by a particular EPS instance, that instance sets its EPSID as a Cookie. There are three scenarios:
1. The blocking request to a particular EPS instance does not have an EPSID
This is the first blocking request to any EPS instance as it doesn't echo back any EPSID. The EPS instance adds its EPSID to the response and handling of the response continues normally.
2. The blocking request to a particular EPS instance does have the EPSID of that particular EPS instance
This is not the first request to this EPS instance as it does echo back the EPSID of this EPS instance. The EPS instance doesn't have to do anything special and handling of the response continues normally.
3. The blocking request to a particular EPS instance does echo back an EPSID of a different EPS instance
This is the first request to this EPS instance as it does echo back the EPSID of a different EPS instance. Fail-over might have occurred! The EPS instance adds its EPSID to the response and responds with a ReloadResponse because of the possible fail-over scenario.
I'll explain why I put "might" in italics. EPSIDs are generated upon start-up of each EPS instance and thus can differ from one run to the other. Let's say we have a two node cluster. Node 1 has EPSID id1 generated by chance, and Node 2 has EPSID id2. A browser has accessed the cluster and currently has EPSID id2 in its blocking requests. Upon restart of the cluster Node 1 now has EPSID id3 generated by chance, and Node 2 has EPSID id4. The Cookies aren't cleared in the browser instance and hits the cluster again. It hits Node 2 with a blocking request and that request has EPSID id2. Node 2 now has EPSID id4 associated with it and as the EPSIDs differ a ReloadResponse is send back as Node 2 thinks a fail-over occurred.
There's also a chance of double reload. If fail-over is detected on a non-blocking request, that request gets a ReloadResponse from the Core and the reload occurs. Now the blocking request comes in but still has the other EPSID assigned to it and another ReloadResponse is send back.
I agree this is not elegant. There are opportunities to improve this, but we didn't have much time back then:
* The various EPS instance could announce there EPSIDs in order for each instance to know which EPSIDs are valid. This should help avoid the additional reload when a cluster has been restarted and the Cookies haven't been cleared in the browser.
* Additionally when the Core has send a ReloadResponse it could indicate that to the EPS instance on that node. This should help avoid the double reload scenario.
The problem with the double reload is that an additional full component tree for whatever page the user was sitting on when failover occured is created and maintained until session expiry. It's not possible to know in advance which page this will be naturally or how large this extra memory consumption will be.
From our discussion:
I believe this is EPS doing it. It's hard for EPS to know when fail-over has occurred. The Core can rely on a view structure and such, EPS cannot. To know when a fail-over occurred we introduced the EPSID. Whenever a blocking request is first handled by a particular EPS instance, that instance sets its EPSID as a Cookie. There are three scenarios:
1. The blocking request to a particular EPS instance does not have an EPSID
This is the first blocking request to any EPS instance as it doesn't echo back any EPSID. The EPS instance adds its EPSID to the response and handling of the response continues normally.
2. The blocking request to a particular EPS instance does have the EPSID of that particular EPS instance
This is not the first request to this EPS instance as it does echo back the EPSID of this EPS instance. The EPS instance doesn't have to do anything special and handling of the response continues normally.
3. The blocking request to a particular EPS instance does echo back an EPSID of a different EPS instance
This is the first request to this EPS instance as it does echo back the EPSID of a different EPS instance. Fail-over might have occurred! The EPS instance adds its EPSID to the response and responds with a ReloadResponse because of the possible fail-over scenario.
I'll explain why I put "might" in italics. EPSIDs are generated upon start-up of each EPS instance and thus can differ from one run to the other. Let's say we have a two node cluster. Node 1 has EPSID id1 generated by chance, and Node 2 has EPSID id2. A browser has accessed the cluster and currently has EPSID id2 in its blocking requests. Upon restart of the cluster Node 1 now has EPSID id3 generated by chance, and Node 2 has EPSID id4. The Cookies aren't cleared in the browser instance and hits the cluster again. It hits Node 2 with a blocking request and that request has EPSID id2. Node 2 now has EPSID id4 associated with it and as the EPSIDs differ a ReloadResponse is send back as Node 2 thinks a fail-over occurred.
There's also a chance of double reload. If fail-over is detected on a non-blocking request, that request gets a ReloadResponse from the Core and the reload occurs. Now the blocking request comes in but still has the other EPSID assigned to it and another ReloadResponse is send back.
I agree this is not elegant. There are opportunities to improve this, but we didn't have much time back then:
* The various EPS instance could announce there EPSIDs in order for each instance to know which EPSIDs are valid. This should help avoid the additional reload when a cluster has been restarted and the Cookies haven't been cleared in the browser.
* Additionally when the Core has send a ReloadResponse it could indicate that to the EPS instance on that node. This should help avoid the double reload scenario.
The problem with the double reload is that an additional full component tree for whatever page the user was sitting on when failover occured is created and maintained until session expiry. It's not possible to know in advance which page this will be naturally or how large this extra memory consumption will be.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion