Snowflake Connection Configuration

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search

This page describes the native method to use Snowflake where the process mining queries are run in the Snowflake and the eventlog data stays in the Snowflake. There are two steps to configure: setup a Snowflake account and configure QPR ProcessAnalyzer to use the account through an ODBC connection.

Snowflake account configuration

These instructions provide one possible way to configure the Snowflake account for QPR ProcessAnalyzer. Configuration can also be done differently based on requirements, e.g., if there is other use for the same Snowflake account. These instructions optionally set the least possible permissions for the Snowflake user account (minimum privilege principle).

  1. Create a new Snowflake account in the Snowflake site: https://www.snowflake.com/ (or use an existing account). The account is created to the selected cloud platform and site, so consider a location that is close to the QPR ProcessAnalyzer hosting site.
  2. Go to https://app.snowflake.com and login in to your Snowflake account. Change the role to ACCOUNTADMIN.
  3. In Account > Roles, click the Role button to create a new role. Define Name QPRPA, and click Create Role.
  4. In Account > Users, click the User button to create a new user. Define User Name QPRPA and define a strong password. Check also that the Force user to change password on first time login is disabled. Click the Create User button.
  5. Select the created user, and click the Grant Role button. In the Role to grant list, select QPRPA, and click Grant.
  6. In Compute > Warehouses, click the Warehouse button to create a new warehouse. Define Name QPRPA, and select a suitable Size for the warehouse. In the Advanced Warehouse Options, check that Auto Resume and Auto Suspend are enabled. Set the Suspend After time based on your performance requirements (for example 3 minutes is a suitable value). Click Create Warehouse.
  7. Select the created warehouse and in the Privileges section and click the Privilege button. For Role, select the QPRPA role. For Privileges, select MONITOR, OPERATE and USAGE. Click the Grant Privileges button.
  8. In Data > Databases, click the Database button to create a new database. Define Name QPRPA, and click the Create button.
  9. Set the role as the owner of the database by selecting Transfer Ownership for the database in the menu and selecting the QPRPA role.
  10. Select the created database, and click the Schema button to create a new schema. Define Name QPRPA, and click the Create button.
  11. Set the role as the owner of the schema by selecting Transfer Ownership for the schema in the menu and selecting the QPRPA role.
  12. (Optional) If the connection to Snowflake is lost for some reason, QPR ProcessAnalyzer may not be able to cancel pending queries. To get pending queries automatically cancelled to save costs, it's advisable to change the ABORT_DETACHED_QUERY session setting to true: https://docs.snowflake.com/en/sql-reference/parameters.html#abort-detached-query.
  13. (Optional) Running a Snowflake warehouse consumes credits in the Snowflake account, and thus it's a good practice to set resource monitoring to control the credit usage: In Compute > Resource Monitors, click the Resource Monitor button to create a new resources monitor. Choose suitable monitor settings for your needs, and click the Create Resource Monitor button.

Set Snowflake ODBC connection

To add the Snowflake ODBC connection to QPR ProcessAnalyzer, following steps are required:

  1. Install Snowflake ODBC driver in the machine running QPR ProcessAnalyzer Server. More information about the ODBC driver installation: https://docs.snowflake.com/en/user-guide/odbc.html.
  2. Configure Snowflake ODBC connection string to the QPR ProcessAnalyzer configuration table. When configuring Snowflake as instructed above, the following connection string can be used:
Driver={SnowflakeDSIIDriver};Application=QPR_ProcessAnalyzer;Server=<account_identifier>.snowflakecomputing.com;Database=QPRPA;Schema=QPRPA;Warehouse=QPRPA;Role=QPRPA;uid=QPRPA;pwd=<password>

Replace <password> with the password of the QPRPA user. In addition, replace <account_identifier> with the account identifier of your Snowflake account. More information about finding your Snowflake account identifier: https://docs.snowflake.com/en/user-guide/admin-account-identifier.html.

Please also include the Application tag to the connection string, identifying QPR ProcessAnalyzer related usage for collecting usage statistics.

More information about Snowflake ODBC connection strings: https://docs.snowflake.com/en/user-guide/odbc-parameters.html

Hardened security with key-pair authentication

Instead of using password for the ODBC connection, it's more secure to use the key-pair authentication available in Snowflake. The idea is that a key-pair is generated (public key and private key), the public key is stored to the user settings in Snowflake cloud, and the private key stored in QPR ProcessAnalyzer server. QPR ProcessAnalyzer will authenticate to Snowflake by proving that it has the private key corresponding to the public key. To setup the key-pair authentication, follow the steps here: https://docs.snowflake.com/en/user-guide/key-pair-auth.

When the key-pair authentication is used, the following parameters will be added to the connection string (and pwd parameter is not used anymore):

  • Authenticator: Use value "SNOWFLAKE_JWT"
  • PRIV_KEY_FILE: The private key file location in the local disk.
  • PRIV_KEY_FILE_PWD: Password for the private key file.

Using Snowflake tables managed by QPR ProcessAnalyzer

When a datatable is created in QPR ProcessAnalyzer, corresponding table is created in the Snowflake storing the actual data. The table is not yet created when a new datatable is created, but when there is at least one column in the datatable. The Snowflake tables created by QPR ProcessAnalyzer are named as follows: qprpa_dt_<databaseId>_<datatableId>, where <databaseId> identifies the QPR ProcessAnalyzer environment and <datatableId> identifies the datatable within the environment. The database id is set in the QPR ProcessAnalyzer configuration table. Due to the naming, it's possible to configure multiple QPR ProcessAnalyzer environments to use the same Snowflake database and schema. Alternatively, there can be different schemas or databases for each environment.

Datatables can managed both in the QPR ProcessAnalyzer web UI and in the Snowflake. When datatables are modified in the Snowflake, a synchronization (see see synchronize function) needs to be called in QPR ProcessAnalyzer to inform that the underlying data in Snowflake has changed. Creating and deleting datatables are operations that need to be done in the QPR ProcessAnalyzer web UI. Deleting the underlying table in the Snowflake corresponds to removing all rows and columns from the datatable.

Using Snowflake views as datatable source

QPR ProcessAnalyzer can be connected to any data in Snowflake by using Snowflake views (https://docs.snowflake.com/en/user-guide/views-introduction.html) that are linked to the QPR ProcessAnalyzer datatables. When a process mining query is made to the cases and events datatables, the queries in the linked Snowflake views are executed as part of the process mining queries.

The views need to be created to the same Snowflake database and schema as the tables used by QPR ProcessAnalyzer (defined by the ODBC connection string). If QPR ProcessAnalyzer has already created a table for the datatable, the table needs to be removed before the view can be created (as there cannot be both a table and a view with the same name). If the datatable doesn't have columns yet, the table doesn't exist, and it's enough to create the view. Note that the Snowflake tables behind the QPR ProcessAnalyzer datatables are managed by QPR ProcessAnalyzer automatically, whereas the Snowflake views need to be managed by users directly in the Snowflake.

The view linked to a datatable needs to have a specific name which can be queried using the Expression Designer with following query (define your datatable id):

DatatableById(<datatableId>).NameInDatasource

New view can be created in Snowflake based on the following example (replace <ViewNameInDatasource> with the result from the previous expression):

CREATE OR REPLACE VIEW "<ViewNameInDatasource>" AS
SELECT * FROM VALUES (1, 'red'), (2, 'orange'), (3, 'yellow'), (4, 'green') AS Colors(Id, Value)

Note that table and view names used by QPR ProcessAnalyzer are in lower case, and thus the object names need to be written in quotation marks, because without the marks Snowflake will create the view in upper case (which would be an incorrect name).

The SELECT part in the above statement can be any query allowed by the Snowflake. The query can also contain data transformations which means that those transformation will be performed when the dashboards are used in QPR ProcessAnalyzer. This is following the ELT (extract, load, transform) principle, where the non-transformed data is loaded into the system and needed transformations are performed when the data is used by QPR ProcessAnalyzer. Note that there is a negative impact on the performance when performing transformations on demand.

When the view has been created, the Synchronize needs to be called to update the status to QPR ProcessAnalyzer (define your datatable id):

DatatableById(<datatableId>).Synchronize()

When a datatable with a Snowflake view is deleted, the view in the datasource is not deleted (this is unlike to tables which are deleted). For datatables with Snowflake view, following data modification operation are not allowed: Import, Persist, AddColumn, Merge, RemoveColumns, RenameColumns, and Truncate.

Note that the query in the view is executed using permissions of the user account defined in the ODBC connection string, and thus the permissions are managed in Snowflake. Still, the normal QPR ProcessAnalyzer datatable permissions are applied for datatable with Snowflake views, so if user cannot see the datatable, user cannot access the data in the view either.

Automatic query cancellation

If the result of a pending query running in Snowflake is not needed anymore, the query is automatically cancelled by QPR ProcessAnalyzer, to save costs and reserve computing capacity for new queries. Queries are cancelled, e.g., when user makes filtering in a dashboard or changes chart settings, because the new filter or settings will make an updated query and the old query (if still pending) is cancelled.

Alternative traditional ODBC Import method

Alternatively, the traditional method can be used to import data from Snowflake to QPR ProcessAnalyzer and use the QPR ProcessAnalyzer's in-memory calculation engine. More information about importing data from an ODBC datasource: