ICEpush
  1. ICEpush
  2. PUSH-224

Cloud Push for Cluster Environment with Fail-Over (Continued)

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: EE-3.2.0.GA
    • Fix Version/s: EE-3.3.0.GA
    • Component/s: Push Library, Push Server
    • Labels:
      None
    • Environment:
      Cluster environment
    • Assignee Priority:
      P1

      Description

      This is a follow-up to PUSH-222

      REQUIREMENTS:

      CLOUD PUSH IN CLUSTER ENVIRONMENT

      The first set of requirements in order to support Cloud Push in a cluster environment with or without fail-over (graceful and non-graceful) are as follows:

      - If a Server Push requires Cloud Push it must be send at least once and preferably only once regardless of graceful or non-graceful fail-over.
      - A Confirmation Timeout timer task for a particular Push ID must only be scheduled on a single EPS instance within the cluster environment.
      - A Confirmation Timeout timer task must be cancelled upon receiving a following listen.icepush with a participating Push ID that matches the Push ID the Confirmation Timeout timer task was scheduled for.
      - If a following listen.icepush is not received within the Confirmation Timeout a Cloud Push should be send and the Confirmation Timeout should only be cancelled if the Cloud Push has been send successfully.
      - If a fail-over occurs before the Confirmation Timeout was cancelled the successor EPS instance to the failure EPS instance must resume the Confirmation Timeout and cancel it only if a following listen.icepush is received or if the Cloud Push has been send successfully.
      - Due to this initial logic it is possible that a particular Cloud Push gets send more than once, but currently we accept this behaviour as long as that particular Cloud Push gets send at least once. (It is better to send a Cloud Push more than once than risking it not being send at all.)
      - Additionally, a Cloud Push flood protection must be in place in order to avoid accidental flooding of a device with Priority Push messages.

      CONFIRMATION TIMEOUT

      Before a Cloud Push can be send the Confirmation Timeout timer task must be scheduled. This should only occur when Priority Push is requested. To ensure this Confirmation Timeout is scheduled on a single EPS instance, the EPS instance that last received the listen.icepush request is responsible for scheduling the Confirmation Timeout. All other EPS instances are only responsible for recording the Confirmation Timeout data in order to potentially function as a successor EPS instance to failure EPS instances. Upon Confirmation Timeout cancellation the EPS instances are responsible for clearing the associated data.

      GRACEFUL SHUTDOWN

      During a graceful shutdown an EPS instance will send a final Shutdown message to let the other EPS instances know it is being shutdown gracefully.

      NON-GRACEFUL SHUTDOWN DETECTION

      Each EPS instance has a UUID generated at start-up. This UUID is part of the Status message which gets send every second by each EPS instance. Upon receiving a Status message of another EPS instance, the UUID is stored together with the timestamp when it was received as a Record in a Map. This information is used in the non-graceful shutdown detection.
      Every 5 seconds a scan runs over the Map of Records to see if each UUID has an associated timestamp not older than 5 seconds. If an old timestamp was detected the assumption is made that the EPS instance with the associated UUID was shutdown non-gracefully.

      DETERMINING THE SUCCESSOR EPS INSTANCE FOR THE FAILURE EPS INSTANCE

      Each EPS instance has a Map with UUIDs mapped to Records containing the UUIDs of all EPS instances, that it knows of based on received Status messages, within the cluster including itself. This Map has its UUIDs ordered. The successor EPS instance is the next EPS instance in the Map, or the first EPS instance if the failure EPS instance was the last EPS instances in the Map. As each EPS instance will do determination, each EPS instance itself will know if it is the successor or not. The successor EPS instance is responsible for resuming any Confirmation Timeout of the failure EPS instance that has not been cancelled or has not been confirmed.

        Activity

        Jack Van Ooststroom created issue -
        Jack Van Ooststroom made changes -
        Field Original Value New Value
        Link This issue depends on PUSH-222 [ PUSH-222 ]
        Ken Fyten made changes -
        Fix Version/s EE-3.3.0.GA [ 10575 ]
        Jack Van Ooststroom made changes -
        Description This is a follow-up to PUSH-222 This is a follow-up to PUSH-222

        Requirements

        Cloud Push in a Cluster Environment

        The first set of requirements in order to support Cloud Push in a cluster environment with or without fail-over (graceful and non-graceful) are as follows:

        - If a Server Push requires Cloud Push it must be send at least once and preferably only once regardless of graceful or non-graceful fail-over.
        - A Confirmation Timeout timer task for a particular Push ID must only be scheduled on a single EPS instance within the cluster environment.
        - A Confirmation Timeout timer task must be cancelled upon receiving a following listen.icepush with a participating Push ID that matches the Push ID the Confirmation Timeout timer task was scheduled for.
        - If a following listen.icepush is not received within the Confirmation Timeout a Cloud Push should be send and the Confirmation Timeout should only be cancelled if the Cloud Push has been send successfully.
        - If a fail-over occurs before the Confirmation Timeout was cancelled the successor EPS instance to the failure EPS instance must resume the Confirmation Timeout and cancel it only if a following listen.icepush is received or if the Cloud Push has been send successfully.
        - Due to this initial logic it is possible that a particular Cloud Push gets send more than once, but currently we accept this behaviour as long as that particular Cloud Push gets send at least once. (It is better to send a Cloud Push more than once than risking it not being send at all.)
        - Additionally, a Cloud Push flood protection must be in place in order to avoid accidental flooding of a device with Priority Push messages.
        Jack Van Ooststroom made changes -
        Description This is a follow-up to PUSH-222

        Requirements

        Cloud Push in a Cluster Environment

        The first set of requirements in order to support Cloud Push in a cluster environment with or without fail-over (graceful and non-graceful) are as follows:

        - If a Server Push requires Cloud Push it must be send at least once and preferably only once regardless of graceful or non-graceful fail-over.
        - A Confirmation Timeout timer task for a particular Push ID must only be scheduled on a single EPS instance within the cluster environment.
        - A Confirmation Timeout timer task must be cancelled upon receiving a following listen.icepush with a participating Push ID that matches the Push ID the Confirmation Timeout timer task was scheduled for.
        - If a following listen.icepush is not received within the Confirmation Timeout a Cloud Push should be send and the Confirmation Timeout should only be cancelled if the Cloud Push has been send successfully.
        - If a fail-over occurs before the Confirmation Timeout was cancelled the successor EPS instance to the failure EPS instance must resume the Confirmation Timeout and cancel it only if a following listen.icepush is received or if the Cloud Push has been send successfully.
        - Due to this initial logic it is possible that a particular Cloud Push gets send more than once, but currently we accept this behaviour as long as that particular Cloud Push gets send at least once. (It is better to send a Cloud Push more than once than risking it not being send at all.)
        - Additionally, a Cloud Push flood protection must be in place in order to avoid accidental flooding of a device with Priority Push messages.
        This is a follow-up to PUSH-222

        REQUIREMENTS:

        CLOUD PUSH IN CLUSTER ENVIRONMENT

        The first set of requirements in order to support Cloud Push in a cluster environment with or without fail-over (graceful and non-graceful) are as follows:

        - If a Server Push requires Cloud Push it must be send at least once and preferably only once regardless of graceful or non-graceful fail-over.
        - A Confirmation Timeout timer task for a particular Push ID must only be scheduled on a single EPS instance within the cluster environment.
        - A Confirmation Timeout timer task must be cancelled upon receiving a following listen.icepush with a participating Push ID that matches the Push ID the Confirmation Timeout timer task was scheduled for.
        - If a following listen.icepush is not received within the Confirmation Timeout a Cloud Push should be send and the Confirmation Timeout should only be cancelled if the Cloud Push has been send successfully.
        - If a fail-over occurs before the Confirmation Timeout was cancelled the successor EPS instance to the failure EPS instance must resume the Confirmation Timeout and cancel it only if a following listen.icepush is received or if the Cloud Push has been send successfully.
        - Due to this initial logic it is possible that a particular Cloud Push gets send more than once, but currently we accept this behaviour as long as that particular Cloud Push gets send at least once. (It is better to send a Cloud Push more than once than risking it not being send at all.)
        - Additionally, a Cloud Push flood protection must be in place in order to avoid accidental flooding of a device with Priority Push messages.

        CONFIRMATION TIMEOUT

        Before a Cloud Push can be send the Confirmation Timeout timer task must be scheduled. This should only occur when Priority Push is requested. To ensure this Confirmation Timeout is scheduled on a single EPS instance, the EPS instance that last received the listen.icepush request is responsible for scheduling the Confirmation Timeout. All other EPS instances are only responsible for recording the Confirmation Timeout data in order to potentially function as a successor EPS instance to failure EPS instances. Upon Confirmation Timeout cancellation the EPS instances are responsible for clearing the associated data.

        GRACEFUL SHUTDOWN

        During a graceful shutdown an EPS instance will send a final Shutdown message to let the other EPS instances know it is being shutdown gracefully.

        NON-GRACEFUL SHUTDOWN DETECTION

        Each EPS instance has a UUID generated at start-up. This UUID is part of the Status message which gets send every second by each EPS instance. Upon receiving a Status message of another EPS instance, the UUID is stored together with the timestamp when it was received as a Record in a Map. This information is used in the non-graceful shutdown detection.
        Every 5 seconds a scan runs over the Map of Records to see if each UUID has an associated timestamp not older than 5 seconds. If an old timestamp was detected the assumption is made that the EPS instance with the associated UUID was shutdown non-gracefully.

        DETERMINING THE SUCCESSOR EPS INSTANCE FOR THE FAILURE EPS INSTANCE

        Each EPS instance has a Map with UUIDs mapped to Records containing the UUIDs of all EPS instances, that it knows of based on received Status messages, within the cluster including itself. This Map has its UUIDs ordered. The successor EPS instance is the next EPS instance in the Map, or the first EPS instance if the failure EPS instance was the last EPS instances in the Map. As each EPS instance will do determination, each EPS instance itself will know if it is the successor or not. The successor EPS instance is responsible for resuming any Confirmation Timeout of the failure EPS instance that has not been cancelled or has not been confirmed.
        Jack Van Ooststroom made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Ken Fyten made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Assignee Jack Van Ooststroom [ jack.van.ooststroom ]
        Assignee Priority P1 [ 10010 ]
        Jack Van Ooststroom made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Ken Fyten made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Jack Van Ooststroom
            Reporter:
            Jack Van Ooststroom
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: