Optimizing ThoughtSpot Workloads with Databricks
To enhance performance and cost-efficiency for BI workloads, a strategic approach involving serverless clusters, Databricks' Photon engine, and Delta Cache is essential. The following document outlines key recommendations to optimize BI workloads effectively.
Recommendations
Utilize Serverless SQL Warehouse Clusters
- Benefits
- 
- Instant Start
- 
Serverless SQL Warehouse clusters have a startup time of seconds compared to minutes for general-purpose, non-serverless clusters. 
- Elastic Scaling
- 
Automatically adjusts to the workload with options for minimum and maximum workers. 
- Fully Managed Service
- 
Simplifies operations with no need for manual cluster management or software updates. 
 
- Strategy
- 
- Auto Stop
- 
Set clusters to auto-stop after n minutes of inactivity to prevent unnecessary costs. 
- Concurrency Tuning
- 
Scale between a minimum of 2 and a maximum of 10 workers, depending on the workload. Monitor and tune accordingly. 
- Engage Databricks Team
- 
Collaborate with the Databricks account team to fine-tune SQL Warehouses for optimal performance and cost. 
 
Leverage Photon
- Benefits
- 
- High Performance
- 
Utilizes CPU-level optimization and effective memory management for increased speed. 
- Optimized Parquet Writing
- 
With a C++ Parquet writer, operations involving Parquet and Delta files are expedited. 
- Serverless Integration
- 
Available by default with serverless clusters, enhancing performance without additional configuration. 
 
Implement Delta Cache
- Benefits
- 
- Faster Access
- 
Keeps frequently accessed data on worker SSDs, significantly reducing query times. 
- Automatic Inclusion
- 
Standard with SQL Serverless warehouses, requiring no extra setup. 
 
- Usage Tip
- 
- Preload Data
- 
Use CACHE SELECT * FROMtable at the start of an endpoint to preload "hot" tables, ensuring rapid access.
 
Be Cognizant of Other Tunables
- Lazy Evaluation
- 
Important for Data Engineering and writing pipelines, although not directly impacting ThoughtSpot workloads. 
- Z-Order Optimize
- 
Regularly employ Z-Ordering to co-locate related data, which accelerates queries and decreases cloud storage costs through more efficient data reads. 
Related information