ShowRunnerCLC™ itself does not support a high availability/clustering capability at this time.  To understand why you have to understand the capabilities of the hardware that ShowRunnerCLC™ controls.  Crestron processors run ShowRunnerCLC™ the same as any other program built to run on a Crestron system.  Crestron hardware communicates to processors using one of the following methods: the processor's Cresnet port, the Ethernet network, the RF gateway built-in to the processor (if supported), or any I/O port attached to the processor.  Typical designs for current generation Crestron hardware use an Ethernet connection.  This includes Zum Wired, Zum Wireless (via RF gateway[s]), and any Cresnet devices that communicate over a Cresnet bridge (DIN-CENCN-2/CAEN-BLOCK-CENCN/ZUMNET/etc).  Crestron Ethernet devices have their communication settings defined in their IP Table.  Some devices allow multiple IP Table entries (usually max of 2) while others only allow 1.  When multiple are allowed, typically the processor will claim specific hardware once the connection is established and will own that hardware until the connection is disconnected.  Any device desiring to claim hardware that is already claimed will be denied.  Crestron firmware, to our knowledge, does not support configuration of primary/backup IP Table entries.  Devices placed on the Control Subnet of a primary processor increase the complexity as the processor's internal router controls the IP allocations on the Control Subnet.  There is no ability to have an immediate failover for anything other than Cresnet.  All Ethernet connected devices will have failover times measured in 10s of seconds to minutes depending on the approach.

Facts

  • ShowRunnerCLC™ does not presently have the ability to synchronize system state to a backup processor
  • ShowRunnerCLC™ does not presently have support for automatically reconfiguring hardware or the processor should it be activated as a backup
  • Crestron hardware devices do not support redundant physical or software based connections.  Any activation of a backup will require configuration changes that may be done automatically or require manual intervention.
  • Zum Wired supports a fallback to local app mode if the processor connection the hardware is connected to fails, provided the room is configured in CNET mode.  Please see our Zum guide for more information.
  • The failure rate of a power supply powering the processor is typically higher than the processor itself.

How can I detect if ShowRunnerCLC™ failed?

  • SNMP traps on program status from the processor
  • ShowRunnerCLC™'s REST API becomes unavailable or returns an invalid status
  • Configure ShowRunnerCLC™ to close a relay or trigger a digital output when it starts up.  A failure of the program would open/release the output.

How do I make my processor redundant?

  • Cold Spare - Manual Recovery
    • Have an identically configured Crestron processor ready to be connected in place of the primary processor.
    • When primary fails, all connections will need to be moved to the backup.
    • Power Up the backup
    • Caveats:
      • Any devices paired to the internal RF gateway of the processor (if MC3/DIN-AP3MEX/MC4) cannot be moved.
      • The backup's configuration will only be as current as the last time it was copied from the primary to the backup.  While not presently a feature it could be possible to have the primary backup itself up to a 3rd device.  The backup could pull the config from the 3rd device at startup to run with an up to date configuration.
      • Current system state will attempt to be rebuilt at startup as devices come online but not all pieces of hardware support transmitting current state.
  • Hot Spare - Automated recovery
    • Option A - Reconfigure processor only:
      • Have an identically configured Crestron processor connected to the network at a different IP address.
      • A supervisory program will need to monitor the primary (contact Chief Integrations to discuss).  If the primary goes down then the supervisory program will need to physically disconnect the primary processor's Ethernet so there are no conflicts.
      • If processor Cresnet is used then an A/B physical RS-485 switch will be needed to switch the Cresnet connection from the primary to the backup.
      • The processor will change its IP address to the primary's and reboot.
      • Caveats:
        • Devices will come back online as their ARP caches renew and identify the new processor's MAC address.
    • Option B - Reconfigure devices:
      • Have an identically configured Crestron processor connected to the network at a different IP address.
      • A supervisory program will need to monitor the primary (contact Chief Integrations to discuss).
      • If processor Cresnet is used then an A/B physical RS-485 switch will be needed to switch the Cresnet connection from the primary to the backup.
      • The supervisory program will connect to each Ethernet device and modify the device's IP table entry to point to the new 
      • Caveats:
        • Most Crestron Ethernet devices only accept a single console connection.  If this connection is in use or the last connection failed to gracefully exit then the console may not be available.  In this case, reconfiguration would fail and the device would be orphaned until the primary came back online.
    • Caveats:
      • Using the control subnet complicates this significantly.  If deploying one of these 2 approaches it is recommended that a proper network is used rather than using the control subnet.
      • Any devices paired to the internal RF gateway of the processor (if MC3/DIN-AP3MEX/MC4) cannot be moved (need to investigate trust center backup/restore but better to use external RF gateway).
      • The backup's configuration will only be as current as the last time it was copied from the primary to the backup.  While not presently a feature it could be possible to have the primary push its configuration to the backup as the backup is online and reachable.
      • Current system state will attempt to be rebuilt at startup as devices come online but not all pieces of hardware support transmitting current state.
  • VC-4 Virtual Machine High Availability
    • VC-4 supports high availability at the hypervisor or operating system level
    • Option A - Hypervisor HA:
      • Run VMWare ESXi with vMotion or similar product
      • VM infrastructure maintains state of VC-4 across multiple hosts.  If a host fails the system automatically fails over to a different host.
      • Caveats:
        • Doesn't protect against a failure of the operating system, VC-4, or ShowRunnerCLC™
    • Option B - OS Level HA:
      • Implement high availability using Linux level capabilities.
      • Caveats:
        • Will not maintain state, just ability to fire up an identical VC-4 instance when the primary fails
        • Difficult to configure
        • Doesn't protect against a failure of the operating system, VC-4, or ShowRunnerCLC™
    • Caveats:
      • Expensive
      • Heavy IT requirement

How do I make Cresnet/Zumlink redundant?

  • Processor Cresnet Port:
    • Must use an RS-485 A/B switch to physically switch between the primary and the backup.  Cresnet does not support multiple masters active simultaneously.  Only one master may poll the Cresnet network.
    • Once line is cutover, if the backup is online, the devices will be pulled and online within a few seconds or less.
  • Cresnet Bridge
    • Cresnet devices may be redirected to a new processor via a simple IP Table change and reboot
    • Bridges generally support fewer devices and provide isolation between NETs with alternate power options offering better protection than a processor connected Cresnet network.
    • Caveats:
      • Typically the bridge must be rebooted after the change.  Downtime is typically 30 seconds to a minute before everything comes back online.
  • Caveats:
    • Cresnet can not be run in a loop
    • Any physical damage, loose connections, shorts, power faults will cause a failure that cannot be recovered from

How do I make Ethernet redundant?

  • Touchpanels
    • Connect to touchpanel and remove the failed processor's entry and add the now active processor's entry
    • No reboot required
  • Cresnet Bridges/RF Gateways/Other Ethernet Devices
    • Connect and remove the failed processor's entry and add the now active processor's entry
    • Reboot required, downtime about 30 seconds to a minute
    • Caveats:
      • If device console has an active connection or the old connection failed and blocked the port then this will fail.

How do I make RF redundant?

  • Facts:
    • RF devices are paired to a gateway and require a complete handshake
    • Crestron RF gateways support backing up the trust center (the certificates and pairing data)
    • Backup would need to be reachable on the network
    • Primary and Backup Gateways would need to be the same model
    • Supervisory program would need to have a backup copy of the trust center from the primary gateway
    • The below thoughts are theoretically and have not been tested
  • Connect to the backup gateway and load the trust center from the primary gateway
  • Configure the IP table on the backup to point to the correct processor
  • Connect to primary gateway and wipe its trust center and IP table if reachable
  • Devices should join the now active gateway.  How long this takes would need to be tested.
  • It is not known if the trust center backup/restore processor must be done through Toolbox or if it's something that could be embedded in a program without the need for Toolbox.  This will require significant development effort.

Best Practices

As you can see there are many things to consider when trying to build a redundant lighting control system.  They key points are:

  • Avoid processor Cresnet except in specific scenarios
  • Use Cresnet Bridges to allow software based reconfiguration of connections
  • Use IT grade infrastructure with its own redundant capabilities
  • Provide high quality power to the processor and infrastructure with UPS capabilities
  • Leverage Zum Wired's app mode to control the rooms if it is compatible with the site's sequence of operations.  This way a failure of the processor does not impact local lighting control.
Tags:
Created by Mark Kohlmann on 2024/04/05 17:53