Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: EE-4.0.0.GA, EE-3.3.0.GA_P03
-
Fix Version/s: EE-4.1.0.RC1, EE-4.1.0.GA, EE-3.3.0.GA_P04, 4.2.BETA, 4.2
-
Component/s: Push Library
-
Labels:None
-
Environment:Tomcat 7, pre 7.0.62 and Tomcat8 pre 8.0.22 releases
-
Assignee Priority:P3
-
Support Case References:Support Case 13470:- https://icesoft.my.salesforce.com/5007000001GBRTQ
-
Workaround Description:Hidesuggested workarounds- but as of yet unconfirmed--use newer tomcat versions 7.0.62 or newer or 8.0.22 or newer that include the following 2 apache-tomcat fixes:-
57779 <http://bz.apache.org/bugzilla/show_bug.cgi?id=57779> : When an I/O error occurs on a non-container thread only dispatch to a container thread to handle the error if using Servlet 3+ asynchronous processing. This avoids potential deadlocks if an application is performing I/O on a non-container thread without using the Servlet 3+ asynchronous API. (markt)
57621 <http://bz.apache.org/bugzilla/show_bug.cgi?id=57621> : When an async request completes, ensure that any remaining request body data is swallowed. (markt)
OR
disable using tomcat ARP with the following context-param:-
<context-param>
<param-name>org.icepush.useAsyncContext</param-name>
<param-value>false</param-value>
</context-param>Showsuggested workarounds- but as of yet unconfirmed--use newer tomcat versions 7.0.62 or newer or 8.0.22 or newer that include the following 2 apache-tomcat fixes:- 57779 < http://bz.apache.org/bugzilla/show_bug.cgi?id=57779 > : When an I/O error occurs on a non-container thread only dispatch to a container thread to handle the error if using Servlet 3+ asynchronous processing. This avoids potential deadlocks if an application is performing I/O on a non-container thread without using the Servlet 3+ asynchronous API. (markt) 57621 < http://bz.apache.org/bugzilla/show_bug.cgi?id=57621 > : When an async request completes, ensure that any remaining request body data is swallowed. (markt) OR disable using tomcat ARP with the following context-param:- <context-param> <param-name>org.icepush.useAsyncContext</param-name> <param-value>false</param-value> </context-param>
Description
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.coyote.AsyncStateMachine.pauseNonContainerThread(AsyncStateMachine.java:382)
- eliminated <0xfffffffe5f76aae0> (a org.apache.coyote.AsyncStateMachine)
at org.apache.coyote.AsyncStateMachine.asyncComplete(AsyncStateMachine.java:240)
- locked <0xfffffffe5f76aae0> (a org.apache.coyote.AsyncStateMachine)
at org.apache.coyote.http11.Http11Processor.actionInternal(Http11Processor.java:356)
at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:906)
at org.apache.coyote.Request.action(Request.java:344)
at org.apache.catalina.core.AsyncContextImpl.complete(AsyncContextImpl.java:93)
at org.icepush.servlet.AsyncAdaptingServlet$AsyncRequestResponse.respondWith(AsyncAdaptingServlet.java:74)at org.icepush.SequenceTaggingServer$TaggingRequest.respondWith(SequenceTaggingServer.java:67)at org.icepush.BlockingConnectionServer.respondIfPendingRequest(BlockingConnectionServer.java:177)
at org.icepush.BlockingConnectionServer.respondIfNotificationsAvailable(BlockingConnectionServer.java:149)
- locked <0xfffffffe69880ba0> (a org.icepush.BlockingConnectionServer)
at org.icepush.BlockingConnectionServer.sendNotifications(BlockingConnectionServer.java:126)
at org.icepush.BlockingConnectionServer.receive(BlockingConnectionServer.java:293)
at org.icepush.LocalNotificationBroadcaster.broadcast(LocalNotificationBroadcaster.java:39)
at org.icepush.LocalPushGroupManager$Notification.run(LocalPushGroupManager.java:394)
at org.icepush.LocalPushGroupManager$QueueConsumerTask.run(LocalPushGroupManager.java:432)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
The attempt is being made for an incoming request to establish a new blocking connection but it just piles up waiting for the lock.
What’s unclear is why the “Notification queue consumer.” thread is having trouble completing and/or releasing the lock.
On our end, we should be looking into the current locking strategy being used by BlockConnectionServer's respondIfBackOffRequested() and respondIfNotificationsAvailable() and possible coming up with a different locking strategy that avoid any future pile-ups of Threads in State BLOCKED.
-
- after-thread-dump.txt
- 33 kB
- Arran Mccullough
-
- before-thread-dump.txt
- 16 kB
- Arran Mccullough
-
Hide
- Case13690Example.war
- 3.53 MB
- Arran Mccullough
-
- META-INF/MANIFEST.MF 0.1 kB
- META-INF/context.xml 0.1 kB
- WEB-INF/classes/MyPhaseListener.class 0.9 kB
- WEB-INF/classes/PushView.class 2 kB
- WEB-INF/classes/TestBean.class 0.8 kB
- WEB-INF/faces-config.xml 0.4 kB
- WEB-INF/lib/icefaces-ee.jar 633 kB
- WEB-INF/lib/icepush-ee.jar 376 kB
- WEB-INF/lib/javax.faces.jar 2.54 MB
- WEB-INF/web.xml 2 kB
- index.xhtml 0.8 kB
- mainTemplate.xhtml 0.4 kB
-
- Case13690Example.zip
- 21 kB
- Arran Mccullough
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
On the BlockConnectionServer's respondIfBackOffRequested() and respondIfNotificationsAvailable() locking front, I had a closer look at our code. Originally we theorized that having them use different locks might prevent the thread pile-up happening behind the stuck thread. However, it seems only to possibly delay it and not prevent it.
As both methods use the synchronized keyword and thus use the BlockingConnectionServer instance as the lock it seemed reasonable to introduce separate locks for both looking at the implementations of these methods. However, when handling an incoming listen.icepush request the code dealing with this uses both mentioned methods:
if (!respondIfBackOffRequested()) { if (!sendNotifications(pushGroupManager.getPendingNotifications())) { if (resend) { resendLastNotifications(); } else { respondIfNotificationsAvailable(); } } }
Generally a back-off is not requested and a resend is not necessary ensuring it eventually hits the respondIfNotificationsAvailable() method leading to the eventual thread pile-up.
I can confirm we are hitting the same issue on both Tomcat 7.0.67 and Tomcat 8.0.30, using EE-3.0.0.GA_P03, so it's not fixed in the latest versions of Tomcat.
Disabling the async context works but obviously falls back to the threadblocking server. I'll see if I can isolate the part of our test that consistently fails with this error and submit it.
Attached test case that reproduces this behavior.
Steps:
- Click on the "Test AsyncFlood" link between 10 and 30 tries. When the issue happens it won't show any further push updates no matter how many clicks of the link.
- I was able to reproduce it with a Tomcat 8.0.23 instance.
- I uploaded the thread dump before and after the issue happened.
Detailed Problem Description
When hitting this issue, the thread dump will show a number of threads of interest as shown below:
"Notification queue consumer." daemon prio=10 tid=0x0a895400 nid=0x43ef in Object.wait() [0xa119e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xa8723010> (a org.apache.coyote.AsyncStateMachine) at java.lang.Object.wait(Object.java:503) at org.apache.coyote.AsyncStateMachine.pauseNonContainerThread(AsyncStateMachine.java:422) - locked <0xa8723010> (a org.apache.coyote.AsyncStateMachine) at org.apache.coyote.AsyncStateMachine.asyncComplete(AsyncStateMachine.java:256) - locked <0xa8723010> (a org.apache.coyote.AsyncStateMachine) at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:849) at org.apache.coyote.Request.action(Request.java:378) at org.apache.catalina.core.AsyncContextImpl.complete(AsyncContextImpl.java:86) at org.icepush.servlet.AsyncAdaptingServlet$AsyncRequestResponse.respondWith(AsyncAdaptingServlet.java:109) at com.icesoft.icepush.AdjustHeartbeatServer$HeartbeatRequest.respondWith(AdjustHeartbeatServer.java:67) at org.icepush.SequenceTaggingServer$TaggingRequest.respondWith(SequenceTaggingServer.java:67) at org.icepush.BlockingConnectionServer.respondIfPendingRequest(BlockingConnectionServer.java:200) at org.icepush.BlockingConnectionServer.respondIfNotificationsAvailable(BlockingConnectionServer.java:163) - locked <0xa201e860> (a org.icepush.BlockingConnectionServer) at org.icepush.BlockingConnectionServer.sendNotifications(BlockingConnectionServer.java:138) at org.icepush.BlockingConnectionServer.receive(BlockingConnectionServer.java:338) at org.icepush.LocalNotificationBroadcaster.broadcast(LocalNotificationBroadcaster.java:39) at org.icepush.LocalPushGroupManager$Notification.run(LocalPushGroupManager.java:502) at org.icepush.LocalPushGroupManager$QueueConsumerTask.run(LocalPushGroupManager.java:611) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505)
and
"http-nio-8080-exec-3" daemon prio=10 tid=0x0a299400 nid=0x43f2 waiting for monitor entry [0xa10ab000] java.lang.Thread.State: BLOCKED (on object monitor) at org.icepush.BlockingConnectionServer.respondIfBackOffRequested(BlockingConnectionServer.java:148) - waiting to lock <0xa201e860> (a org.icepush.BlockingConnectionServer) at org.icepush.BlockingConnectionServer.access$1500(BlockingConnectionServer.java:39) at org.icepush.BlockingConnectionServer$RunningServer.service(BlockingConnectionServer.java:392) at org.icepush.BlockingConnectionServer.service(BlockingConnectionServer.java:99) at org.icepush.SequenceTaggingServer.service(SequenceTaggingServer.java:54) at com.icesoft.icepush.AdjustHeartbeatServer.service(AdjustHeartbeatServer.java:54) at org.icepush.ConfigurationServer.service(ConfigurationServer.java:109) at org.icepush.servlet.AsyncAdaptingServlet.service(AsyncAdaptingServlet.java:52) at org.icepush.servlet.EnvironmentAdaptingServlet.service(EnvironmentAdaptingServlet.java:56) at org.icepush.servlet.PathDispatcher.service(PathDispatcher.java:39) at org.icepush.servlet.BrowserDispatcher$BrowserEntry.service(BrowserDispatcher.java:101) at org.icepush.servlet.BrowserDispatcher.service(BrowserDispatcher.java:45) at org.icepush.servlet.PathDispatcher.service(PathDispatcher.java:39) at org.icepush.servlet.MainServlet.service(MainServlet.java:129) at org.icefaces.impl.push.servlet.ICEpushResourceHandler$ICEpushResourceHandlerImpl.handleResourceRequest(ICEpushResourceHandler.java:235) at org.icefaces.impl.push.servlet.ICEpushResourceHandler.handleResourceRequest(ICEpushResourceHandler.java:128) at org.icefaces.impl.application.SessionTimeoutMonitor.handleSessionAwareResourceRequest(SessionTimeoutMonitor.java:63) at org.icefaces.impl.application.SessionAwareResourceHandlerWrapper.handleResourceRequest(SessionAwareResourceHandlerWrapper.java:40) at javax.faces.application.ResourceHandlerWrapper.handleResourceRequest(ResourceHandlerWrapper.java:125) at org.icefaces.impl.application.AuxUploadResourceHandler.handleResourceRequest(AuxUploadResourceHandler.java:82) at javax.faces.webapp.FacesServlet.service(FacesServlet.java:591) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:617) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1521) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1478) - locked <0xa1e71ca0> (a org.apache.tomcat.util.net.NioChannel) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:724)
Originally, we suspected that looking at the thread dump the container threads were piling up behind the non-container thread due to Apache Tomcat's AsyncStateMachine invoking Object's wait() on the non-container thread. However, this is only partially true. Having a better understanding of the AsyncStateMachine's inner works and state changes tells us the following.
There are to pieces of logic involved here:
- When an incoming listen.icepush request has been received by the BlockingConnectionServer, before obtaining a lock on the BlockingConnectionServer object this request is put on the pendingRequest queue. This always happens on a container thread.
- When a request for a push has been received by the LocalPushGroupManager, the non-container "Notification queue consumer" thread picks it up and if a request is on the pendingRequest queue it will poll it and then obtain a lock on the BlockingConnectionServer object.
Once the lock on the BlockingConnectionServer has been obtained by either thread it will try to send a response to the listen.icepush request. In this edge case the non-container thread obtains the lock on the BlockingConnectionServer object before the container thread handling the same request has a chance, and thus the latter is in the BLOCKED state waiting on the lock on the BlockingConnectionServer object. Once the non-container thread has written the response it invokes the AsyncContext.complete() method eventually resulting in the AsyncStateMachine to invoke the Object.wait() method on this thread. As the container thread handling the same request does not finish due to the BLOCKED state the AsyncStateMachine's state will not change from STARTING to STARTED. Any other following incoming listen.icepush requests will end up trying to obtain the lock on the BlockingConnectionServer object and end up in the BLOCKED state as well. Seemingly resulting in the container threads piling up behind the non-container thread. The thread dumps of these BLOCKED container threads look similar though. However, the container thread handling the same request differs in its situation than the container threads handling different requests.
The idea of a solution is simple: the request should only be put on the pendingRequest queue by the container thread after trying to handle the request first and only if the container thread was unable to send a response. However, the logic around the pendingRequest queue is quite fundamental to the current workings of the BlockingConnectionServer. We need to be careful not to break any related logic.
Finally, to be clear this issue is specific to ICEpush in combination with Tomcat ARP. The original logic works fine with Tomcat non-ARP and other application servers as far as we know.
ICEfaces 3
Sending core/src/main/java/org/icepush/BlockingConnectionServer.java
Sending core/src/main/java/org/icepush/servlet/AsyncAdaptingServlet.java
Transmitting file data ..
Committed revision 47624.
ICEfaces 4
Sending core/src/main/java/org/icepush/BlockingConnectionServer.java
Sending core/src/main/java/org/icepush/Browser.java
Sending core/src/main/java/org/icepush/ConfirmationTimeout.java
Sending core/src/main/java/org/icepush/LocalNotificationBroadcaster.java
Sending core/src/main/java/org/icepush/LocalPushGroupManager.java
Sending core/src/main/java/org/icepush/NoopPushGroupManager.java
Sending core/src/main/java/org/icepush/NotificationBroadcaster.java
Sending core/src/main/java/org/icepush/PushGroupManager.java
Sending core/src/main/java/org/icepush/servlet/AsyncAdaptingServlet.java
Transmitting file data .........
Committed revision 47948.
Sending eps/src/main/java/com/icesoft/push/DynamicPushGroupManager.java
Sending eps/src/main/java/com/icesoft/push/RemotePushGroupManager.java
Sending eps/src/main/java/com/icesoft/push/messaging/MessagePayload.java
Transmitting file data ...
Committed revision 44450.
ICEfaces 4
Sending icepush-ee/core/src/main/java/org/icepush/BlockingConnectionServer.java
Transmitting file data .
Committed revision 47949.
Emporium app fails on ICEfaces4 trunk r.47948/ Jenkins ICEfaces4 trunk Build # 1819/ all browsers/ server: Tomcat 7.0.42/8.0.21; not an issue on previous Jenkins build: clocks do not work, they tick just once on page load, then stop. Also, the automated bids are not seen to occur. This is not an issue on Jenkins ICEfaces 4.1.1 Build #5.
This failure can no longer be reproduced on ICEfaces4 trunk Build# 1820 (r. 47949); all tests pass (FF41).
The BlockingConnectionServer and associated classes have been re-factored to include the following:
- An incoming listen.icepush request handled by the BlockingConnectionServer is only added to the pendingPushRequestQueue if the handling container thread is unable to send a response in the end.
- The BlockingConnectionServer itself is not used as a lock anymore in order to operate on the backOff or the notifiedPushIDQueue. Instead 2 separate locks have been introduced: one for the backOff and one for the notifiedPushIDQueue access.
Marking this one as FIXED.
It looks like org.apache.coyote.AsyncStateMachine.pauseNonContainerThread method is pausing the ICEpush (timer) thread responsible for emptying (and dispatching) the push notifications. (from Mircea's email July 20th) - further pointing to the tomcat implementation.