This draft documentation may be incomplete or inaccurate, and is subject to change until this release is generally available (GA).

High Availability (HA) and resilience

Consider these guidelines to ensure HA of ThoughtSpot app, and node resilience.

Requirements for node resilience

  • The cluster must have at least 3 nodes.

  • The cluster must have spare capacity; if one node fails, the remaining nodes must be able to host and serve all loaded data.

What happens during node failure

  • When a node loses connection with the main service manager process, it becomes unhealthy.

  • ThoughtSpot migrates all migratable services that run on the failed node to other (healthy) nodes. For all practical purposes, ThoughtSpot ignores the failed node until it reports itself as healthy.

  • ThoughtSpot rebalances and redistributes the data served from the failed node onto healthy nodes. Healthy nodes read the data from the HDFS storage layer into the in-memory database processes.

Disruption: impact on users

The process of redistributing and loading the data in the affected tables on HDFS layer from a failed node to the remaining healthy nodes is not instantaneous. The failover may impact the user experience.


Was this page helpful?