Snowflake Connection Configuration: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
Line 36: Line 36:
More information about Snowflake ODBC connection strings: https://docs.snowflake.com/en/user-guide/odbc-parameters.html
More information about Snowflake ODBC connection strings: https://docs.snowflake.com/en/user-guide/odbc-parameters.html


== Snowflake database tables ==
== Using Snowflake tables managed by QPR ProcessAnalyzer ==
When a datatable is created in QPR ProcessAnalyzer, corresponding table is created in the Snowflake storing the actual data. The table is not yet created when a new datatable is created, but instead when there is at least one column in the datatable. The Snowflake tables created by QPR ProcessAnalyzer are named as follows: qprpa_dt_<databaseId>_<datatableId>, where <databaseId> identifies the QPR ProcessAnalyzer environment and <datatableId> identify the datatable within the environment. The database id is set in the [[PA_Configuration_database_table|QPR ProcessAnalyzer configuration table]]. Due to the naming, it's possible to configure multiple QPR ProcessAnalyzer environments to use the same Snowflake database and schema. Alternatively, there can be different schemas or databases for each environment.
When a datatable is created in QPR ProcessAnalyzer, corresponding table is created in the Snowflake storing the actual data. The table is not yet created when a new datatable is created, but instead when there is at least one column in the datatable. The Snowflake tables created by QPR ProcessAnalyzer are named as follows: qprpa_dt_<databaseId>_<datatableId>, where <databaseId> identifies the QPR ProcessAnalyzer environment and <datatableId> identify the datatable within the environment. The database id is set in the [[PA_Configuration_database_table|QPR ProcessAnalyzer configuration table]]. Due to the naming, it's possible to configure multiple QPR ProcessAnalyzer environments to use the same Snowflake database and schema. Alternatively, there can be different schemas or databases for each environment.


Datatables can managed both in the QPR ProcessAnalyzer web UI and in the Snowflake. When datatables are modified in the Snowflake, a synchronization (see [[QPR_ProcessAnalyzer_Objects_in_Expression_Language#Datatable|see synchronize function]]) needs to be called in QPR ProcessAnalyzer to inform that the underlying data in Snowflake has changed. Still, creating and deleting datatables, are operations that are only possible in QPR ProcessAnalyzer web UI.
Datatables can managed both in the QPR ProcessAnalyzer web UI and in the Snowflake. When datatables are modified in the Snowflake, a synchronization (see [[QPR_ProcessAnalyzer_Objects_in_Expression_Language#Datatable|see synchronize function]]) needs to be called in QPR ProcessAnalyzer to inform that the underlying data in Snowflake has changed. Still, creating and deleting datatables, are operations that are only possible in QPR ProcessAnalyzer web UI.


== Using database views ==
== Using Snowflake views as datatables source ==
It's possible to use any data available in Snowflake as source for cases and events. This is done by using database views that are linked to the QPR ProcessAnalyzer datatables. When the datatable is used, the query in the database view is executed as part of the usual process mining queries. The views need to be created to the same Snowflake database and schema as the tables. The table (that has been created automatically) needs to be removed before the view can be created. Note that if the datatable doesn't have columns, the database table doesn't exist, and thus it's enough to create the database view. The expected name of the view can be queried using the [[Navigation_Menu#Expression_Designer|Expression Designer]] with following query (define your datatable id):
It's possible to use any data available in Snowflake as source for cases and events. This is achieved by using Snowflake ''views'' (https://docs.snowflake.com/en/user-guide/views-introduction.html) linked to the QPR ProcessAnalyzer datatables. When the datatable is used by a model, the query in the Snowflake view is executed as part of the usual process mining queries. The views need to be created to the same Snowflake database and schema as the tables used by QPR ProcessAnalyzer (defined by the [[Set Snowflake ODBC connection|ODBC connection string]]). If QPR ProcessAnalyzer has already created a table for the datatable, the table needs to be removed before the view can be created (as there cannot be both a table and a view with the same name). If the datatable doesn't have columns yet, the table doesn't exist, and it's enough to create the view.
 
The view linked to a datatable needs to have a specific name which can be queried using the [[Navigation_Menu#Expression_Designer|Expression Designer]] with following query (define your datatable id):
<pre>
<pre>
DatatableById(123).NameInDatasource
DatatableById(123).NameInDatasource
</pre>
</pre>


The view can be created in Snowflake as follows (replace <ViewNameInDatasource> with the result of the previous expression query):
New view can be created in Snowflake as follows (replace <ViewNameInDatasource> with the result of the previous expression query):
<pre>
<pre>
CREATE OR REPLACE VIEW <ViewNameInDatasource> AS
CREATE OR REPLACE VIEW "<ViewNameInDatasource>" AS
SELECT * FROM VALUES (1, 'red'), (2, 'orange'), (3, 'yellow'), (4, 'green') AS Colors(Id, Value)
SELECT * FROM VALUES (1, 'red'), (2, 'orange'), (3, 'yellow'), (4, 'green') AS Colors(Id, Value)
</pre>
</pre>


When the view has been created, the [[QPR_ProcessAnalyzer_Objects_in_Expression_Language#Datatable|Synchronize]] needs to be called update the status (define your datatable id):
Note that table and view names used by QPR ProcessAnalyzer are in lower case, and thus the object names need to be written in quotation marks, because without the marks Snowflake will create the view in upper case (which would be an incorrect name).
 
The SELECT part in the above statement can be anything to get any data in Snowflake.
 
When the view has been created, the [[QPR_ProcessAnalyzer_Objects_in_Expression_Language#Datatable|Synchronize]] needs to be called to update the status to QPR ProcessAnalyzer (define your datatable id):
<pre>
<pre>
DatatableById(123).Synchronize()
DatatableById(123).Synchronize()
</pre>
</pre>


Type of the datasource of a datatable can be queried as follows (define your datatable id):
When a datatable with a Snowflake view is deleted, the view in the datasource is not deleted (this is unlike to tables which are deleted). For datatables with Snowflake view, following data modification operation are not allowed: Import, Persist, AddColumn, Merge, RemoveColumns, RenameColumns, and Truncate. Also for datatables with Snowflake view, the LastImportDate and LastImportBy settings are not available.
<pre>
DatatableById(123).DataSourceObjectType
</pre>


When a datatable having a datatable view is deleted, the view in the datasource is not deleted (this is unlike to the database table which is deleted). For datatables having a datatable view, following data modification operation are not allowed: Import, Persist, AddColumn, Merge, RemoveColumns, RenameColumns, and Truncate. If the data for a datatable comes from a view, the LastImportDate and LastImportBy settings are not available.
Note that the query in the view is executed using permissions of the user account defined in the [[Set Snowflake ODBC connection|ODBC connection string]], and thus the permissions are managed in Snowflake. Still, the normal QPR ProcessAnalyzer datatable permissions are applied for datatable with Snowflake views, so if user cannot see the datatable, user cannot access the data in the view either.

Revision as of 22:52, 26 October 2022

This page describes the native method to use Snowflake where the process mining queries are run in the Snowflake and the eventlog data stays in the Snowflake. There are two steps to configure: setup a Snowflake account and configure QPR ProcessAnalyzer to use the account through an ODBC connection.

Alternatively, the traditional method can be used to import data from Snowflake to QPR ProcessAnalyzer and use the QPR ProcessAnalyzer's in-memory calculation engine. More information about importing data from an ODBC datasource:

Snowflake account configuration

These instructions provide one possible way to configure the Snowflake account for QPR ProcessAnalyzer. Configuration can also be done differently based on requirements, e.g., if there is other use for the same Snowflake account. These instructions optionally set the least possible permissions for the Snowflake user account (minimum privilege principle).

  1. Create a new Snowflake account in the Snowflake site: https://www.snowflake.com/ (or use an existing account). The account is created to the selected cloud platform and site, so consider a location that is close to the QPR ProcessAnalyzer hosting site.
  2. Go to https://app.snowflake.com and login in to your Snowflake account.
  3. In Account > Roles, click the Role button to create a new role. Define Name QPRPA, and click Create Role.
  4. In Account > Users, click the User button to create a new user. Define User Name QPRPA and define a strong password. Check also that the Force user to change password on first time login is disabled. Click the Create User button.
  5. Select the created user, and click the Grant Role button. In the Role to grant list, select QPRPA, and click Grant.
  6. In Compute > Warehouses, click the Warehouse button to create a new warehouse. Define Name QPRPA, and select a suitable Size for the warehouse. In the Advanced Warehouse Options, check that Auto Resume and Auto Suspend are enabled. Set the Suspend After time based on your performance requirements (for example 5 minutes is a good initial value). Click Create Warehouse.
  7. (Optional security hardening) Select the created warehouse and in the Privileges section and click the Privilege button. For Role, select the QPRPA role. For Privileges, select MONITOR, OPERATE and USAGE. Click the Grant Privileges button.
  8. In Data > Databases, click the Database button to create a new database. Define Name QPRPA, and click the Create button.
  9. (Optional security hardening) Select the created database, and in the Privileges section, click the Privilege button. For Role, select the QPRPA role. For Privileges, select USAGE. Click Grant Privileges.
  10. Select the created database, and click the Schema button to create a new schema. Define Name QPRPA, and click the Create button.
  11. (Optional security hardening) Select the created schema, and in the Privileges section, click the Privilege button. For Role, select the QPRPA role. For Privileges, select CREATE VIEW, CREATE TABLE, CREATE FILE FORMAT and USAGE. Click the Grant Privileges button.
  12. (Optional) Running a Snowflake warehouse consumes credits in the Snowflake account, and thus it's a good practice to set resource monitoring to control the credit usage: In Compute > Resource Monitors, click the Resource Monitor button to create a new resources monitor. Choose suitable monitor settings for your needs, and click the Create Resource Monitor button.

Set Snowflake ODBC connection

To add the Snowflake ODBC connection to QPR ProcessAnalyzer, following steps are required:

  1. Install Snowflake ODBC driver in the machine running QPR ProcessAnalyzer Server. More information about the ODBC driver installation: https://docs.snowflake.com/en/user-guide/odbc.html.
  2. Configure Snowflake ODBC connection string to the QPR ProcessAnalyzer configuration table. When configuring Snowflake as instructed above, the following connection string can be used:
Driver={SnowflakeDSIIDriver};Application=QPR_ProcessAnalyzer;Server=<account_identifier>.snowflakecomputing.com;Database=QPRPA;Schema=QPRPA;Warehouse=QPRPA;Role=QPRPA;uid=QPRPA;pwd=<password>

Replace <password> with the password of the QPRPA user. In addition, replace <account_identifier> with the account identifier of your Snowflake account. More information about finding your Snowflake account identifier: https://docs.snowflake.com/en/user-guide/admin-account-identifier.html.

Please also include the Application tag to the connection string, identifying QPR ProcessAnalyzer related usage for collecting usage statistics.

More information about Snowflake ODBC connection strings: https://docs.snowflake.com/en/user-guide/odbc-parameters.html

Using Snowflake tables managed by QPR ProcessAnalyzer

When a datatable is created in QPR ProcessAnalyzer, corresponding table is created in the Snowflake storing the actual data. The table is not yet created when a new datatable is created, but instead when there is at least one column in the datatable. The Snowflake tables created by QPR ProcessAnalyzer are named as follows: qprpa_dt_<databaseId>_<datatableId>, where <databaseId> identifies the QPR ProcessAnalyzer environment and <datatableId> identify the datatable within the environment. The database id is set in the QPR ProcessAnalyzer configuration table. Due to the naming, it's possible to configure multiple QPR ProcessAnalyzer environments to use the same Snowflake database and schema. Alternatively, there can be different schemas or databases for each environment.

Datatables can managed both in the QPR ProcessAnalyzer web UI and in the Snowflake. When datatables are modified in the Snowflake, a synchronization (see see synchronize function) needs to be called in QPR ProcessAnalyzer to inform that the underlying data in Snowflake has changed. Still, creating and deleting datatables, are operations that are only possible in QPR ProcessAnalyzer web UI.

Using Snowflake views as datatables source

It's possible to use any data available in Snowflake as source for cases and events. This is achieved by using Snowflake views (https://docs.snowflake.com/en/user-guide/views-introduction.html) linked to the QPR ProcessAnalyzer datatables. When the datatable is used by a model, the query in the Snowflake view is executed as part of the usual process mining queries. The views need to be created to the same Snowflake database and schema as the tables used by QPR ProcessAnalyzer (defined by the ODBC connection string). If QPR ProcessAnalyzer has already created a table for the datatable, the table needs to be removed before the view can be created (as there cannot be both a table and a view with the same name). If the datatable doesn't have columns yet, the table doesn't exist, and it's enough to create the view.

The view linked to a datatable needs to have a specific name which can be queried using the Expression Designer with following query (define your datatable id):

DatatableById(123).NameInDatasource

New view can be created in Snowflake as follows (replace <ViewNameInDatasource> with the result of the previous expression query):

CREATE OR REPLACE VIEW "<ViewNameInDatasource>" AS
SELECT * FROM VALUES (1, 'red'), (2, 'orange'), (3, 'yellow'), (4, 'green') AS Colors(Id, Value)

Note that table and view names used by QPR ProcessAnalyzer are in lower case, and thus the object names need to be written in quotation marks, because without the marks Snowflake will create the view in upper case (which would be an incorrect name).

The SELECT part in the above statement can be anything to get any data in Snowflake.

When the view has been created, the Synchronize needs to be called to update the status to QPR ProcessAnalyzer (define your datatable id):

DatatableById(123).Synchronize()

When a datatable with a Snowflake view is deleted, the view in the datasource is not deleted (this is unlike to tables which are deleted). For datatables with Snowflake view, following data modification operation are not allowed: Import, Persist, AddColumn, Merge, RemoveColumns, RenameColumns, and Truncate. Also for datatables with Snowflake view, the LastImportDate and LastImportBy settings are not available.

Note that the query in the view is executed using permissions of the user account defined in the ODBC connection string, and thus the permissions are managed in Snowflake. Still, the normal QPR ProcessAnalyzer datatable permissions are applied for datatable with Snowflake views, so if user cannot see the datatable, user cannot access the data in the view either.