Performance Tuning Guide: Difference between revisions
No edit summary |
No edit summary |
||
(3 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
== Snowflake models == | == Snowflake models == | ||
In Snowflake, calculations are performed in virtual warehouses (https://docs.snowflake.com/en/user-guide/warehouses). There are two methods to affect the performance: warehouse size and multiclustering. | |||
Larger the warehouse size, the faster individual queries run. Usually, when going to larger warehouse sizes the incremental performance improvement decreases while costs increase more. Thus, try to find a balance where there is still notable performance improvements but not too high cost increase. | |||
If there are multiple simultaneous queries (e.g., there are multiple users or dashboards with lot of charts), multiclustering is the right solution (https://docs.snowflake.com/en/user-guide/warehouses-multicluster). In multiclustering, there are several paraller warehouses, allowing to run more queries at the same time. Each warehouse can process limited number of queries simultaneously and excess queries go to a queue waiting for an available warehouse. If queue starts to build up, increase the number of clusters in the multiclustering. Instructions how to monitor the load: https://docs.snowflake.com/en/user-guide/warehouses-load-monitoring. | |||
Warehouse suspension time affect both the performance and costs. When a warehouse is suspended after idling some time, the costs stop incurring, but also the caches in the warehouse are lost. When the warehouse is started again, the caches need to be built which slows down the queries in the beginning. Thus shorter warehouse suspension time saves costs, but there is a performance decrease when a suspended warehouse needs to start again. | |||
If there are individual queries that take considerably longer than other queries, the query acceleration service might be helpful (https://docs.snowflake.com/en/user-guide/query-acceleration-service). | |||
If running machine learning with large datasets, warehouses might run out of memory. If that occurs, consider Snowpark-optimized warehouse which has more memory (https://docs.snowflake.com/en/user-guide/warehouses-snowpark-optimized). | |||
== In-memory models == | == In-memory models == | ||
Server resources | |||
* Memory | * Memory | ||
* Processors | * Processors | ||
* Database | * Database |
Latest revision as of 23:38, 18 October 2023
This guide contains information how to get the best performance out of QPR ProcessAnalyzer system while taking into account incurred infrastructure costs. Performance optimization is entirely different in Snowflake and in-memory models, so there are separate chapters for them.
Snowflake models
In Snowflake, calculations are performed in virtual warehouses (https://docs.snowflake.com/en/user-guide/warehouses). There are two methods to affect the performance: warehouse size and multiclustering.
Larger the warehouse size, the faster individual queries run. Usually, when going to larger warehouse sizes the incremental performance improvement decreases while costs increase more. Thus, try to find a balance where there is still notable performance improvements but not too high cost increase.
If there are multiple simultaneous queries (e.g., there are multiple users or dashboards with lot of charts), multiclustering is the right solution (https://docs.snowflake.com/en/user-guide/warehouses-multicluster). In multiclustering, there are several paraller warehouses, allowing to run more queries at the same time. Each warehouse can process limited number of queries simultaneously and excess queries go to a queue waiting for an available warehouse. If queue starts to build up, increase the number of clusters in the multiclustering. Instructions how to monitor the load: https://docs.snowflake.com/en/user-guide/warehouses-load-monitoring.
Warehouse suspension time affect both the performance and costs. When a warehouse is suspended after idling some time, the costs stop incurring, but also the caches in the warehouse are lost. When the warehouse is started again, the caches need to be built which slows down the queries in the beginning. Thus shorter warehouse suspension time saves costs, but there is a performance decrease when a suspended warehouse needs to start again.
If there are individual queries that take considerably longer than other queries, the query acceleration service might be helpful (https://docs.snowflake.com/en/user-guide/query-acceleration-service).
If running machine learning with large datasets, warehouses might run out of memory. If that occurs, consider Snowpark-optimized warehouse which has more memory (https://docs.snowflake.com/en/user-guide/warehouses-snowpark-optimized).
In-memory models
Server resources
- Memory
- Processors
- Database