Some additional thoughts as there seem to be 2 mechanisms involved in the automatic clean-up of Groups and/or Push-IDs.
The first mechanism is as described above and focuses only on automatic Group clean-up. It is driven by receiving listen.icepush requests and invoking the scan(String[]) method for the listening Push-IDs. This leads to possible clean-up of a Group if no listen.icepush requests have been received with listening Push-IDs that are members of that Group.
The second mechanism focuses on automatic Push-ID clean-up and the possible subsequent Group clean-up. Upon creating a Push-ID a new Expiry Timeout is started for this Push-ID. This Expiry Timeout is cancelled upon receiving a listen.icepush request containing the Push-ID as a listening Push-ID. Subsequently, a new Expiry Timeout is started. If no listen.icepush request containing the Push-ID as a listening Push-ID was received within the Expiry Timeout, the Expire Timeout executes and removes the Push-ID as well as removing the Push-ID as a member of any Groups it was a member of. If in result a Group became empty because of this removal, the Group is removed as well.
Non-automatic clean-up of Groups happens through the removeMember(String, String) method. If the specified Group is empty after the removal of the specified Push-ID from it, the Group is also removed.
A Group is never created without a Push-ID. Only the addMember(String, String) method results in the creation of Group and Push-ID objects. (Please note that createPushID(...) doesn't actually result in the creation of a Push-ID object.)
Focusing on just the second mechanism for automatic clean-up of Groups and Push-IDs, here is a list of scenarios:
- An add-group-member.icepush request is handled with a specified Group and Push-ID. However, no subsequent listen.icepush request is received with the Push-ID as a listening Push-ID. The Expiry Timeout gets executed after the specified timeout (default: 120000) and results in the removal of the Group if empty in result.
- An add-group-member.icepush request is handled with a specified Group and Push-ID. Subsequent listen.icepush requests are received with the Push-ID as a listening Push-ID. An Expiry Timeout gets subsequently cancelled and created due to the incoming listen.icepush requests.
- Take the previous scenario, but this time the listen.icepush requests with the Push-ID as a listening Push-ID stop coming in after awhile. The Expiry Timeout gets executed after the specified timeout (default: 120000) and results in the removal of the Group if empty in result.
The 2nd mechanism seems to cover both healthy (add-group-member.icepush with subsequent listen.icepush requests) and unhealthy (add-group-member.icepush without subsequent listen.icepush requests) scenarios well by automatic clean-up of Groups.
I think we should consider removing the 1st mechanism, unless I overlooked a scenario.
This one was quite tricky to understand why pushes stop working after awhile. It only seems to happen when multiple Groups and multiple Push-IDs are involved, and here is why:
Herein lies the problem. If the heartbeat interval is up to 45000 ms a double miss of the touch/discard logic due to the built-in cap into the scan(String[]) method can result in the discard of a Group. Even though listen.icepush requests are still being received containing Push-IDs that are a member of that Group. Once this Group has been removed, pushes to that Group no longer have an effect.
We should reconsider the implemented capping mechanism for the touch/discard logic. For now, moving the touch logic outside of the capped mechanism should suffice for an immediate fix of this issue.