Snowflake Connection Configuration

From QPR ProcessAnalyzer Wiki
Revision as of 09:36, 11 October 2024 by Ollvihe (talk | contribs) (→‎Set Snowflake ODBC connection)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page describes the native method to use Snowflake where the process mining queries are run in the Snowflake and the eventlog data stays in the Snowflake. There are two steps to configure: setup a Snowflake account and configure QPR ProcessAnalyzer to use the account through an ODBC connection.

QPR ProcessAnalyzer needs one Snowflake schema where it can create functions and store cache tables, and write permissions are needed to that schema. QPR ProcessAnalyzer can use any tables in Snowflake as eventlogs source data, requiring at least read permission to the tables.

Snowflake account configuration

These instructions provide one scenario to configure Snowflake for QPR ProcessAnalyzer. Configuration can also be done differently, e.g., if there is other use for the same Snowflake account.

For permissions, these instructions are based on the database and schema ownership, giving all permissions to the objects. The next chapter provides instruction for the minimum permissions.

  1. Create a new Snowflake account in the Snowflake site: https://www.snowflake.com/ (or use an existing account). The account is created to the selected cloud platform and site, so consider a location that is close to the QPR ProcessAnalyzer hosting site.
  2. Go to https://app.snowflake.com and login in to your Snowflake account. Change the role to ACCOUNTADMIN.
  3. In Account > Roles, click the Role button to create a new role. Define Name QPRPA, and click Create Role.
  4. In Account > Users, click the User button to create a new user. Define User Name QPRPA and define a strong password. Check also that the Force user to change password on first time login is disabled. Click the Create User button.
  5. Select the created user, and click the Grant Role button. In the Role to grant list, select QPRPA, and click Grant.
  6. In Compute > Warehouses, click the Warehouse button to create a new warehouse. Define Name QPRPA, and select a suitable Size for the warehouse. In the Advanced Warehouse Options, check that Auto Resume and Auto Suspend are enabled. Set the Suspend After time based on your performance requirements (for example 3 minutes is a suitable value). Click Create Warehouse.
  7. Select the created warehouse and in the Privileges section and click the Privilege button. For Role, select the QPRPA role. For Privileges, select MONITOR, OPERATE and USAGE. Click the Grant Privileges button.
  8. In Data > Databases, click the Database button to create a new database. Define Name QPRPA, and click the Create button.
  9. Set the role as the owner of the database by selecting Transfer Ownership for the database in the menu and selecting the QPRPA role.
  10. Select the created database, and click the Schema button to create a new schema. Define Name QPRPA, and click the Create button.
  11. Set the role as the owner of the schema by selecting Transfer Ownership for the schema in the menu and selecting the QPRPA role.
  12. (Optional) If the connection to Snowflake is lost for any reason, QPR ProcessAnalyzer may not be able to cancel pending queries. To get pending queries automatically cancelled to save costs, it's advisable to change the ABORT_DETACHED_QUERY session setting to true: https://docs.snowflake.com/en/sql-reference/parameters.html#abort-detached-query.
  13. (Optional) Running a Snowflake warehouse consumes credits in the Snowflake account, and thus it's a good practice to set resource monitoring to control the credit usage: In Compute > Resource Monitors, click the Resource Monitor button to create a new resources monitor. Choose suitable monitor settings for your needs, and click the Create Resource Monitor button.

Set Snowflake ODBC connection

To add the Snowflake ODBC connection to QPR ProcessAnalyzer, following steps are required:

  1. Install Snowflake ODBC driver in the machine running QPR ProcessAnalyzer Server. More information about the ODBC driver installation: https://docs.snowflake.com/en/user-guide/odbc.html.
  2. Configure Snowflake ODBC connection string to the QPR ProcessAnalyzer configuration table. When configuring Snowflake as instructed above, the following connection string can be used:
Driver={SnowflakeDSIIDriver};Application=QPR_ProcessAnalyzer;Server=<account_identifier>.snowflakecomputing.com;Database=QPRPA;Schema=QPRPA;Warehouse=QPRPA;Role=QPRPA;uid=QPRPA;pwd=<password>

Replace <password> with the password of the QPRPA user. In addition, replace <account_identifier> with the account identifier of your Snowflake account. More information about finding your Snowflake account identifier: https://docs.snowflake.com/en/user-guide/admin-account-identifier.html.

Please also include the Application tag to the connection string, identifying QPR ProcessAnalyzer related usage for collecting usage statistics.

More information about Snowflake ODBC connection strings: https://docs.snowflake.com/en/user-guide/odbc-parameters.html

Note: By default the Snowflake ODBC driver generates quite much logging to files which might unnecessarily fill up the disk space. It's recommended to set the LogLevel parameter to value 0. The path where the logs are created can be seen in Windows registry (key LogPath). More information about configuring Snowflake ODBC parameters: https://docs.snowflake.com/en/developer-guide/odbc/odbc-parameters.

Test connection string

The connection string can be tested in the QPR ProcessAnalyzer UI to make sure it works before configuring it. The following expression tests a connection string, and if the connection works, it returns the Snowflake user name:

ImportOdbc(
  `...`,
  "SELECT CURRENT_USER()"
).toCsv();

If an error dialog is shown, connection to Snowflake couldn't be made and thus the connection string doesn't work.

Hardened security with key-pair authentication

Instead of using password for the ODBC connection, it's more secure to use the key-pair authentication available in Snowflake. The idea is that a key-pair is generated (public key and private key), the public key is stored to the user settings in Snowflake cloud, and the private key stored in QPR ProcessAnalyzer server. QPR ProcessAnalyzer will authenticate to Snowflake by proving that it has the private key corresponding to the public key. To setup the key-pair authentication, follow the steps here: https://docs.snowflake.com/en/user-guide/key-pair-auth.

When the key-pair authentication is used, the following parameters need to be added to the connection string (and pwd parameter is not used anymore):

  • Authenticator: Use value "SNOWFLAKE_JWT".
  • PRIV_KEY_FILE: The private key file location in the local disk.
  • PRIV_KEY_FILE_PWD: Password for the private key file.

Minimal Snowflake permissions

The above instructions configure the Snowflake user as the database and schema owner, giving all permissions to the database and schema. If required, it's possible to configure the Snowflake user with less permissions. The minimal permissions for the database and schema configured to the Snowflake connection string are as follows (concerns both the global and project-level).

Database needs following permission:

  • USAGE

Schema needs following permissions:

  • USAGE
  • CREATE TABLE (for creating cache tables and being able to ingest data through QPR ProcessAnalyzer UI)
  • CREATE VIEW (temporarily views are used to determine column data types)
  • CREATE FUNCTION (Python functions are needed for business calendar, conformance calculation, and clustering)
  • CREATE STAGE (internal table stages used when ingesting data)
  • CREATE FILE FORMAT (for reading data from stage when ingesting data)

If data is not imported using QPR ProcessAnalyzer but using Snowflake's own tools, the CREATE STAGE and CREATE FILE FORMAT permissions are not required.

When QPR ProcessAnalyzer creates Snowflake objects to the schema (e.g., tables, functions and file format), the Snowflake user will get ownership to them (and thus full permissions).

QPR ProcessAnalyzer can use any table or view in Snowflake, and the following minimal permissions are required:

  • Database: USAGE
  • Schema : USAGE
  • Table/view: SELECT

Datatable key for events datatable

Analytics queries executed by QPR ProcessAnalyzer in Snowflake require to know key column(s) for each table. The key columns mean that the combination of values in the key columns are unique for each row in the table. For cases datatable, the case id column is assumed to be the key. For the events datatable, there is no key information available by default, so for the events datatable the key column(s) need to be defined in the Datatable properties dialog. If the key columns are not specified, all columns are assumed to be the key (which is the slowest option from the performance viewpoint). Note that defining the key columns incorrectly, may lead to incorrect calculation results.

If there are no suitable columns for keys, a new column containing a sequence number can be added. A sequence can be created as follows:

CREATE OR REPLACE SEQUENCE seq1;

Snowflake table can be set to use the sequence as follows:

CREATE OR REPLACE TABLE foo (a NUMBER DEFAULT seq1.NEXTVAL, b NUMBER);

When new data is imported to the table, new rows automatically get a unique sequence number.

Connecting to Snowflake tables

When a datatable is created in QPR ProcessAnalyzer, corresponding table is created in Snowflake storing the actual data. The table is not yet created when a new datatable is added, but when there is at least one column in the datatable. When using default names, Snowflake tables created by QPR ProcessAnalyzer are named as follows: qprpa_dt_<databaseId>_<datatableId>, where <databaseId> identifies the QPR ProcessAnalyzer environment and <datatableId> identifies the datatable within the environment. The database id is set in the QPR ProcessAnalyzer configuration table. Datatables can also be configured to use user-defined table, schema or database. The custom table behaves similarly as the default table, for example columns can be modified and data imported, but when the datatable is deleted, the custom-named table is not dropped in Snowflake.

Datatables can managed both in the QPR ProcessAnalyzer web UI and in the Snowflake. When datatables are modified in the Snowflake, a synchronization (see see synchronize function) is needed in QPR ProcessAnalyzer to check whether the underlying data in Snowflake has changed. Note that while the tables can be modified in Snowflake, new datatables don't appear to QPR ProcessAnalyzer until they have been created in the QPR ProcessAnalyzer UI. When the underlying table in the Snowflake is deleted all rows and columns appear to be removed from the datatable, but the datatable itself is not deleted. Schemas and databases are not created or modified by QPR ProcessAnalyzer (just tables).

Connecting to Snowflake views

In addition to tables, the datatables can be connected to Snowflake views (https://docs.snowflake.com/en/user-guide/views-introduction.html). When a process mining query is made to the cases and events datatables, the queries in the linked Snowflake views are executed as part of the process mining queries. Note that there cannot be a table and a view with the same name in the same schema, so the table may need to be removed before a view can be created.

New view can be created in Snowflake as follows (where ViewName is the name of the view):

CREATE OR REPLACE VIEW "ViewName" AS
SELECT * FROM VALUES (1, 'red'), (2, 'orange'), (3, 'yellow'), (4, 'green') AS Colors(Id, Value)

The SELECT part in the above statement can be any query allowed by the Snowflake. The query can also contain data transformations which means that those transformation will be performed when the dashboards are used in QPR ProcessAnalyzer. This is following the ELT (extract, load, transform) principle, where the non-transformed data is loaded into the system and needed transformations are performed when the data is used by QPR ProcessAnalyzer. Note that there is a negative impact on the performance when performing transformations on demand.

When a datatable with a Snowflake view is deleted, the view in the datasource is not deleted. For datatables with Snowflake view, following data modification operation are not allowed: Import, Persist, AddColumn, Merge, RemoveColumns, RenameColumns, and Truncate.

Note that the query in the view is executed using permissions of the user account defined in the ODBC connection string, and thus the permissions are managed in Snowflake. Still, the normal QPR ProcessAnalyzer datatable permissions are applied for datatable with Snowflake views, so if user cannot see the datatable, user cannot access the data in the view either.

Automatic data update detection

When data in Snowflake tables changes (data is added, modified or removed), QPR ProcessAnalyzer automatically detects that data has updated and performs calculations using the latest data. Changing of data is checked every 5 minutes, so it takes maximum of that time for new data to appear in the dashboards. QPR ProcessAnalyzer has different levels of caches (to improve performance) explaining why the new data might not immediately appear in the dashboards. When the change of data is detected, the caches are cleared and that calculations are performed using the latest data in Snowflake.

Automatic query cancellation

If the result of a pending query running in Snowflake is not needed anymore, the query is automatically cancelled by QPR ProcessAnalyzer, to save costs and reserve computing capacity for new queries. Queries are cancelled, e.g., when user makes filtering in a dashboard or changes chart settings, because the new filter or settings will make an updated query and the old query (if still pending) is cancelled.

Alternative traditional ODBC Import method

Alternatively, the traditional method can be used to import data from Snowflake to QPR ProcessAnalyzer and use the QPR ProcessAnalyzer's in-memory calculation engine. More information about importing data from an ODBC datasource: