High Availability (HA) and resilience
Consider these guidelines to ensure HA of ThoughtSpot app, and node resilience.
Requirements for node resilience
- 
The cluster must have at least 3 nodes. 
- 
The cluster must have spare capacity; if one node fails, the remaining nodes must be able to host and serve all loaded data. 
What happens during node failure
- 
When a node loses connection with the main service manager process, it becomes unhealthy. 
- 
ThoughtSpot migrates all migratable services that run on the failed node to other (healthy) nodes. For all practical purposes, ThoughtSpot ignores the failed node until it reports itself as healthy. 
- 
ThoughtSpot rebalances and redistributes the data served from the failed node onto healthy nodes. Healthy nodes read the data from the HDFS storage layer into the in-memory database processes. 
Disruption: impact on users
The process of redistributing and loading the data in the affected tables on HDFS layer from a failed node to the remaining healthy nodes is not instantaneous. The failover may impact the user experience.