Snowflake Connection Configuration: Difference between revisions
Line 20: | Line 20: | ||
# Select the created database, and click the '''Schema''' button to create a new schema. Define '''Name''' ''QPRPA'', and click the '''Create''' button. | # Select the created database, and click the '''Schema''' button to create a new schema. Define '''Name''' ''QPRPA'', and click the '''Create''' button. | ||
# (Optional security hardening) Select the created schema, and in the '''Privileges''' section, click the '''Privilege''' button. For '''Role''', select the ''QPRPA'' role. For '''Privileges''', select '''CREATE VIEW''', '''CREATE TABLE''', '''CREATE FILE FORMAT''' and '''USAGE'''. Click the '''Grant Privileges''' button. | # (Optional security hardening) Select the created schema, and in the '''Privileges''' section, click the '''Privilege''' button. For '''Role''', select the ''QPRPA'' role. For '''Privileges''', select '''CREATE VIEW''', '''CREATE TABLE''', '''CREATE FILE FORMAT''' and '''USAGE'''. Click the '''Grant Privileges''' button. | ||
# (Optional) If the connection to Snowflake is lost for some reason, QPR ProcessAnalyzer may not be able to cancel pending queries. To get pending queries automatically cancelled, it's advisable to change the ABORT_DETACHED_QUERY session setting: https://docs.snowflake.com/en/sql-reference/parameters.html#abort-detached-query. | # (Optional) If the connection to Snowflake is lost for some reason, QPR ProcessAnalyzer may not be able to cancel pending queries. To get pending queries automatically cancelled to save costs, it's advisable to change the ABORT_DETACHED_QUERY session setting to ''true'': https://docs.snowflake.com/en/sql-reference/parameters.html#abort-detached-query. | ||
# (Optional) Running a Snowflake warehouse consumes credits in the Snowflake account, and thus it's a good practice to set resource monitoring to control the credit usage: In '''Compute''' > '''Resource Monitors''', click the '''Resource Monitor''' button to create a new resources monitor. Choose suitable monitor settings for your needs, and click the '''Create Resource Monitor''' button. | # (Optional) Running a Snowflake warehouse consumes credits in the Snowflake account, and thus it's a good practice to set resource monitoring to control the credit usage: In '''Compute''' > '''Resource Monitors''', click the '''Resource Monitor''' button to create a new resources monitor. Choose suitable monitor settings for your needs, and click the '''Create Resource Monitor''' button. | ||
Revision as of 19:33, 10 November 2022
This page describes the native method to use Snowflake where the process mining queries are run in the Snowflake and the eventlog data stays in the Snowflake. There are two steps to configure: setup a Snowflake account and configure QPR ProcessAnalyzer to use the account through an ODBC connection.
Alternatively, the traditional method can be used to import data from Snowflake to QPR ProcessAnalyzer and use the QPR ProcessAnalyzer's in-memory calculation engine. More information about importing data from an ODBC datasource:
- Load models directly from ODBC
- Import data from ODBC using expression scripting
- Import data from ODBC using SQL scripting
Snowflake account configuration
These instructions provide one possible way to configure the Snowflake account for QPR ProcessAnalyzer. Configuration can also be done differently based on requirements, e.g., if there is other use for the same Snowflake account. These instructions optionally set the least possible permissions for the Snowflake user account (minimum privilege principle).
- Create a new Snowflake account in the Snowflake site: https://www.snowflake.com/ (or use an existing account). The account is created to the selected cloud platform and site, so consider a location that is close to the QPR ProcessAnalyzer hosting site.
- Go to https://app.snowflake.com and login in to your Snowflake account.
- In Account > Roles, click the Role button to create a new role. Define Name QPRPA, and click Create Role.
- In Account > Users, click the User button to create a new user. Define User Name QPRPA and define a strong password. Check also that the Force user to change password on first time login is disabled. Click the Create User button.
- Select the created user, and click the Grant Role button. In the Role to grant list, select QPRPA, and click Grant.
- In Compute > Warehouses, click the Warehouse button to create a new warehouse. Define Name QPRPA, and select a suitable Size for the warehouse. In the Advanced Warehouse Options, check that Auto Resume and Auto Suspend are enabled. Set the Suspend After time based on your performance requirements (for example 5 minutes is a good initial value). Click Create Warehouse.
- (Optional security hardening) Select the created warehouse and in the Privileges section and click the Privilege button. For Role, select the QPRPA role. For Privileges, select MONITOR, OPERATE and USAGE. Click the Grant Privileges button.
- In Data > Databases, click the Database button to create a new database. Define Name QPRPA, and click the Create button.
- (Optional security hardening) Select the created database, and in the Privileges section, click the Privilege button. For Role, select the QPRPA role. For Privileges, select USAGE. Click Grant Privileges.
- Select the created database, and click the Schema button to create a new schema. Define Name QPRPA, and click the Create button.
- (Optional security hardening) Select the created schema, and in the Privileges section, click the Privilege button. For Role, select the QPRPA role. For Privileges, select CREATE VIEW, CREATE TABLE, CREATE FILE FORMAT and USAGE. Click the Grant Privileges button.
- (Optional) If the connection to Snowflake is lost for some reason, QPR ProcessAnalyzer may not be able to cancel pending queries. To get pending queries automatically cancelled to save costs, it's advisable to change the ABORT_DETACHED_QUERY session setting to true: https://docs.snowflake.com/en/sql-reference/parameters.html#abort-detached-query.
- (Optional) Running a Snowflake warehouse consumes credits in the Snowflake account, and thus it's a good practice to set resource monitoring to control the credit usage: In Compute > Resource Monitors, click the Resource Monitor button to create a new resources monitor. Choose suitable monitor settings for your needs, and click the Create Resource Monitor button.
Set Snowflake ODBC connection
To add the Snowflake ODBC connection to QPR ProcessAnalyzer, following steps are required:
- Install Snowflake ODBC driver in the machine running QPR ProcessAnalyzer Server. More information about the ODBC driver installation: https://docs.snowflake.com/en/user-guide/odbc.html.
- Configure Snowflake ODBC connection string to the QPR ProcessAnalyzer configuration table. When configuring Snowflake as instructed above, the following connection string can be used:
Driver={SnowflakeDSIIDriver};Application=QPR_ProcessAnalyzer;Server=<account_identifier>.snowflakecomputing.com;Database=QPRPA;Schema=QPRPA;Warehouse=QPRPA;Role=QPRPA;uid=QPRPA;pwd=<password>
Replace <password> with the password of the QPRPA user. In addition, replace <account_identifier> with the account identifier of your Snowflake account. More information about finding your Snowflake account identifier: https://docs.snowflake.com/en/user-guide/admin-account-identifier.html.
Please also include the Application tag to the connection string, identifying QPR ProcessAnalyzer related usage for collecting usage statistics.
More information about Snowflake ODBC connection strings: https://docs.snowflake.com/en/user-guide/odbc-parameters.html
Using Snowflake tables managed by QPR ProcessAnalyzer
When a datatable is created in QPR ProcessAnalyzer, corresponding table is created in the Snowflake storing the actual data. The table is not yet created when a new datatable is created, but when there is at least one column in the datatable. The Snowflake tables created by QPR ProcessAnalyzer are named as follows: qprpa_dt_<databaseId>_<datatableId>, where <databaseId> identifies the QPR ProcessAnalyzer environment and <datatableId> identifies the datatable within the environment. The database id is set in the QPR ProcessAnalyzer configuration table. Due to the naming, it's possible to configure multiple QPR ProcessAnalyzer environments to use the same Snowflake database and schema. Alternatively, there can be different schemas or databases for each environment.
Datatables can managed both in the QPR ProcessAnalyzer web UI and in the Snowflake. When datatables are modified in the Snowflake, a synchronization (see see synchronize function) needs to be called in QPR ProcessAnalyzer to inform that the underlying data in Snowflake has changed. Creating and deleting datatables are operations that need to be done in the QPR ProcessAnalyzer web UI. Deleting the underlying table in the Snowflake corresponds to removing all rows and columns from the datatable.
Using Snowflake views as datatable source
QPR ProcessAnalyzer can be connected to any data in Snowflake by using Snowflake views (https://docs.snowflake.com/en/user-guide/views-introduction.html) that are linked to the QPR ProcessAnalyzer datatables. When a process mining query is made to the cases and events datatables, the queries in the linked Snowflake views are executed as part of the process mining queries.
The views need to be created to the same Snowflake database and schema as the tables used by QPR ProcessAnalyzer (defined by the ODBC connection string). If QPR ProcessAnalyzer has already created a table for the datatable, the table needs to be removed before the view can be created (as there cannot be both a table and a view with the same name). If the datatable doesn't have columns yet, the table doesn't exist, and it's enough to create the view. Note that the Snowflake tables behind the QPR ProcessAnalyzer datatables are managed by QPR ProcessAnalyzer automatically, whereas the Snowflake views need to be managed by users directly in the Snowflake.
The view linked to a datatable needs to have a specific name which can be queried using the Expression Designer with following query (define your datatable id):
DatatableById(123).NameInDatasource
New view can be created in Snowflake based on the following example (replace <ViewNameInDatasource> with the result from the previous expression):
CREATE OR REPLACE VIEW "<ViewNameInDatasource>" AS SELECT * FROM VALUES (1, 'red'), (2, 'orange'), (3, 'yellow'), (4, 'green') AS Colors(Id, Value)
Note that table and view names used by QPR ProcessAnalyzer are in lower case, and thus the object names need to be written in quotation marks, because without the marks Snowflake will create the view in upper case (which would be an incorrect name).
The SELECT part in the above statement can be any query allowed by the Snowflake. The query can also contain data transformations which means that those transformation will be performed when the dashboards are used in QPR ProcessAnalyzer. This is following the ELT (extract, load, transform) principle, where the non-transformed data is loaded into the system and needed transformations are performed when the data is used by QPR ProcessAnalyzer. Note that there is a negative impact on the performance when performing transformations on demand.
When the view has been created, the Synchronize needs to be called to update the status to QPR ProcessAnalyzer (define your datatable id):
DatatableById(123).Synchronize()
When a datatable with a Snowflake view is deleted, the view in the datasource is not deleted (this is unlike to tables which are deleted). For datatables with Snowflake view, following data modification operation are not allowed: Import, Persist, AddColumn, Merge, RemoveColumns, RenameColumns, and Truncate.
Note that the query in the view is executed using permissions of the user account defined in the ODBC connection string, and thus the permissions are managed in Snowflake. Still, the normal QPR ProcessAnalyzer datatable permissions are applied for datatable with Snowflake views, so if user cannot see the datatable, user cannot access the data in the view either.
Automatic query cancellation
If the result of a pending query running in Snowflake is not needed anymore, the query is automatically cancelled by QPR ProcessAnalyzer, to save costs and reserve computing capacity for new queries. Queries are cancelled, e.g., when user makes filtering in a dashboard or changes chart settings, because the new filter or settings will make an updated query and the old query (if still pending) is cancelled.