[PUSH-224] Cloud Push for Cluster Environment with Fail-Over (Continued) - ICEsoft JIRA Issue Tracker

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: EE-3.2.0.GA
Fix Version/s: EE-3.3.0.GA
Component/s: Push Library, Push Server
Labels:
None
Environment:
Cluster environment

Assignee Priority:
P1

Description

This is a follow-up to PUSH-222

REQUIREMENTS:

CLOUD PUSH IN CLUSTER ENVIRONMENT

The first set of requirements in order to support Cloud Push in a cluster environment with or without fail-over (graceful and non-graceful) are as follows:

- If a Server Push requires Cloud Push it must be send at least once and preferably only once regardless of graceful or non-graceful fail-over.
- A Confirmation Timeout timer task for a particular Push ID must only be scheduled on a single EPS instance within the cluster environment.
- A Confirmation Timeout timer task must be cancelled upon receiving a following listen.icepush with a participating Push ID that matches the Push ID the Confirmation Timeout timer task was scheduled for.
- If a following listen.icepush is not received within the Confirmation Timeout a Cloud Push should be send and the Confirmation Timeout should only be cancelled if the Cloud Push has been send successfully.
- If a fail-over occurs before the Confirmation Timeout was cancelled the successor EPS instance to the failure EPS instance must resume the Confirmation Timeout and cancel it only if a following listen.icepush is received or if the Cloud Push has been send successfully.
- Due to this initial logic it is possible that a particular Cloud Push gets send more than once, but currently we accept this behaviour as long as that particular Cloud Push gets send at least once. (It is better to send a Cloud Push more than once than risking it not being send at all.)
- Additionally, a Cloud Push flood protection must be in place in order to avoid accidental flooding of a device with Priority Push messages.

CONFIRMATION TIMEOUT

Before a Cloud Push can be send the Confirmation Timeout timer task must be scheduled. This should only occur when Priority Push is requested. To ensure this Confirmation Timeout is scheduled on a single EPS instance, the EPS instance that last received the listen.icepush request is responsible for scheduling the Confirmation Timeout. All other EPS instances are only responsible for recording the Confirmation Timeout data in order to potentially function as a successor EPS instance to failure EPS instances. Upon Confirmation Timeout cancellation the EPS instances are responsible for clearing the associated data.

GRACEFUL SHUTDOWN

During a graceful shutdown an EPS instance will send a final Shutdown message to let the other EPS instances know it is being shutdown gracefully.

NON-GRACEFUL SHUTDOWN DETECTION

Each EPS instance has a UUID generated at start-up. This UUID is part of the Status message which gets send every second by each EPS instance. Upon receiving a Status message of another EPS instance, the UUID is stored together with the timestamp when it was received as a Record in a Map. This information is used in the non-graceful shutdown detection.
Every 5 seconds a scan runs over the Map of Records to see if each UUID has an associated timestamp not older than 5 seconds. If an old timestamp was detected the assumption is made that the EPS instance with the associated UUID was shutdown non-gracefully.

DETERMINING THE SUCCESSOR EPS INSTANCE FOR THE FAILURE EPS INSTANCE

Each EPS instance has a Map with UUIDs mapped to Records containing the UUIDs of all EPS instances, that it knows of based on received Status messages, within the cluster including itself. This Map has its UUIDs ordered. The successor EPS instance is the next EPS instance in the Map, or the first EPS instance if the failure EPS instance was the last EPS instances in the Map. As each EPS instance will do determination, each EPS instance itself will know if it is the successor or not. The successor EPS instance is responsible for resuming any Confirmation Timeout of the failure EPS instance that has not been cancelled or has not been confirmed.

Activity

People

Assignee:

Jack Van Ooststroom

Reporter:

Jack Van Ooststroom

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

21/Feb/13 9:08 PM

Updated:

17/Nov/14 11:17 AM

Resolved:

30/May/13 3:44 PM