How can I make my ShowRunnerCLC™ system redundant?

Version 1.1 by Mark Kohlmann on 2024/04/05 17:53

ShowRunnerCLC™ itself does not support a high availability/clustering capability at this time.  To understand why you have to understand the capabilities of the hardware that ShowRunnerCLC™ controls.  Crestron processors run ShowRunnerCLC™ the same as any other program built to run on a Crestron system.  Crestron hardware communicates to processors using one of the following methods: the processor's Cresnet port, an Ethernet network, an RF gateway built-in to the processor, or any I/O port attached to the processor.  Typical designs for current generation Crestron hardware use an Ethernet connection.  This includes Zum Wired, Zum Wireless (via RF gateway(s)), and any Cresnet devices that communicate over a Cresnet bridge (DIN-CENCN-2/CAEN-BLOCK-CENCN/ZUMNET/etc).  Crestron Ethernet devices have their communication settings defined in their IP Table.  Some devices allow multiple IP Table entries (usually max of 2) which others only allow 1.  When multiple are allowed, typically the processor will claim specific hardware once the connection is established and will own that hardware until the connection is disconnected.  Any device desiring to claim hardware that is already claimed will be denied.  Crestron firmware, to our knowledge, does not support configuration of primary/backup IP Table entries.  Devices placed on the Control Subnet of a primary processor increase the complexity as the processor's internal router controls the IP allocations on the Control Subnet.  There is no ability to have an immediate failover for anything other than Cresnet.  All Ethernet connected devices will have failover times measured in 10s of seconds to minutes depend on the approach.

Facts

  • ShowRunnerCLC™ does not presently have the ability to synchronize system state to a backup processor
  • Crestron devices do not support redudant connections

How do I make my processor redundant?

  • Cold Spare - Manual Recovery
    • Have an identically configured Crestron processor ready to be connected in place of the primary processor.
    • When primary fails, all connections will need to be moved to the backup.
    • Power Up the backup
    • Caveats:
      • Any devices paired to the internal RF gateway of the processor (if MC3/DIN-AP3MEX/MC4) cannot be moved.
      • The backup's configuration will only be as current as the last time it was copied from the primary to the backup.  While not presently a feature it could be possible to have the primary backup itself up to a 3rd device.  The backup could pull the config from the 3rd device at startup to run with an up to date configuration.
      • Current system state will attempt to be rebuilt at startup as devices come online but not all pieces of hardware support transmitting current state.
  • Hot Spare - Automated recovery
    • Option A - Reconfigure processor only:
      • Have an identically configured Crestron processor connected to the network at a different IP address.
      • A supervisory program will need to monitor the primary (contact Chief Integrations to discuss).  If the primary goes down then the supervisory program will need to physically disconnect the primary processor's Ethernet.
      • If processor Cresnet is used then an A/B physical RS-485 switch will be needed to switch the Cresnet connection from the primary to the backup.
      • The processor will change its IP address to the primary's and reboot.
      • Caveats:
        • Any devices paired to the internal RF gateway of the processor (if MC3/DIN-AP3MEX/MC4) cannot be moved.
        • The backup's configuration will only be as current as the last time it was copied from the primary to the backup.  While not presently a feature it could be possible to have the primary backup itself up to a 3rd device.  The backup could pull the config from the 3rd device at startup to run with an up to date configuration.
        • Current system state will attempt to be rebuilt at startup as devices come online but not all pieces of hardware support transmitting current state.
        • Devices will come back online as their ARP caches renew and identify the new processor's MAC address.
    • Option B - Reconfigure devices:
      • Have an identically configured Crestron processor connected to the network at a different IP address.
      • A supervisory program will need to monitor the primary (contact Chief Integrations to discuss).
      • If processor Cresnet is used then an A/B physical RS-485 switch will be needed to switch the Cresnet connection from the primary to the backup.
      • The supervisory program will connect to each Ethernet device and modify the device's IP table entry to point to the new 
      • Caveats:
        • Any devices paired to the internal RF gateway of the processor (if MC3/DIN-AP3MEX/MC4) cannot be moved.
        • The backup's configuration will only be as current as the last time it was copied from the primary to the backup.  While not presently a feature it could be possible to have the primary backup itself up to a 3rd device.  The backup could pull the config from the 3rd device at startup to run with an up to date configuration.
        • Current system state will attempt to be rebuilt at startup as devices come online but not all pieces of hardware support transmitting current state.
        • Most Crestron Ethernet devices only accept a single console connection.  If this connection is in use or the last connection failed to gracefully exit then the console may not be available.  In this case, reconfiguration would fail and the device would be orphaned until the primary came back online.
  • VC-4 Virtual Machine High Availability
    • VC-4 supports high availability at the hypervisor or operating system level
    • Option A - Hypervisor HA:
      • Run VMWare ESXi with VMMotion or similar product
      • VM infrastructure maintains state of VC-4 across multiple hosts.  If a host fails the system automatically fails over to a different host.
      • Caveats:
        • Doesn't protect against a failure of the operating system, VC-4, or ShowRunnerCLC™
    • Option B - OS Level HA:
      • Implement high availability using Linux level capabilities.
      • Caveats:
        • Will not maintain state, just ability to fire up an identical VC-4 instance when the primary fails
        • Difficult to configure
        • Doesn't protect against a failure of the operating system, VC-4, or ShowRunnerCLC™
    • Caveats:
      • Expensive
      • Heavy IT requirement
      •