QPR ProcessAnalyzer Model Datasources: Difference between revisions
No edit summary |
|||
Line 8: | Line 8: | ||
Following settings are available: | Following settings are available: | ||
* '''Datasource''': | * '''Datasource''': Can be ''Datatable'' (local or Snowflake), ''ODBC'' (data is fetched by executing an ODBC query to an OCBC datasource) or ''Loading Script'' (data is generated by running an expression language query). | ||
* '''Datatable''': When ''datasource'' is a ''datatable'', this settings defines the datatable to use. | * '''Datatable''': When ''datasource'' is a ''datatable'', this settings defines the datatable to use. | ||
* '''Case ID''': Defines column containing case id. This mapping | * '''Case ID''': Defines column containing case id. This mapping is needed for both cases and events data. The column datatype can be either string or integer. | ||
* '''Event Type''': Defines column containing event type name. This mapping | * '''Event Type''': Defines column containing event type name. This mapping is needed for events data. This column datatype must be string. | ||
* '''Event Time''': Defines column containing event timestamp. This mapping | * '''Event Time''': Defines column containing event timestamp. This mapping is needed for events data. The column datatype must be date. | ||
* '''Connection String''': ODBC | * '''Connection String''': ODBC connection string used to fetch cases/events data. Connection string depends on the data source type and suitable connection strings can be found e.g. in https://www.connectionstrings.com. | ||
* '''Connection String Key''': [[Secure Strings|Secure string key]] of the ODBC connection string | * '''Connection String Key''': [[Secure Strings|Secure string key]] of the ODBC connection string. Alternative to the ''Connection String''. | ||
* '''ODBC Query''': ODBC query to run | * '''ODBC Query''': ODBC query to run in the datasource to fetch cases/events data. Query syntax depends on the source system where the data is fetched from. | ||
* '''Loading Script''': Expression language script that provides | * '''Loading Script''': Expression language script that provides cases/events data. For more information, see the examples below. | ||
Note that when changing datasource settings, the model is dropped from the memory, so that it can be reloaded using the changed settings. | Note that when changing datasource settings, the model is dropped from the memory, so that it can be reloaded using the changed settings. |
Revision as of 14:10, 30 October 2023
QPR Processanalyzer in-memory models can load data from variety of sources:
- Datatables: Data can be loaded from QPR ProcessAnalyzer datatables, located either in the local data storage or in Snowflake.
- Loading Script: Data can be loaded using a loading script written in QPR ProcessAnalyzer expression language. The loading script can extract the data and perform needed transformations to the suitable eventlog format during model is loaded. The loading script is started when the model loading into memory starts (pull logic). With the expression language, it's also possible to fetch data from other QPR ProcessAnalyzer models.
- ODBC datasources: Data can be loaded from any ODBC compliant source system, such as databases, SAP HANA, ServiceNow, Excel files or CSV files.
Model Datasource Settings
Datasource needs to be defined both for cases and events. Model datasource settings are in the Model Properties dialog in Cases datasource and Events datasource tabs. It's possible that cases and events are loaded from different types of datasources, e.g. cases from an ODBC datasource and events from a datatable. Events datasource is optional and cases datasource is mandatory.
Following settings are available:
- Datasource: Can be Datatable (local or Snowflake), ODBC (data is fetched by executing an ODBC query to an OCBC datasource) or Loading Script (data is generated by running an expression language query).
- Datatable: When datasource is a datatable, this settings defines the datatable to use.
- Case ID: Defines column containing case id. This mapping is needed for both cases and events data. The column datatype can be either string or integer.
- Event Type: Defines column containing event type name. This mapping is needed for events data. This column datatype must be string.
- Event Time: Defines column containing event timestamp. This mapping is needed for events data. The column datatype must be date.
- Connection String: ODBC connection string used to fetch cases/events data. Connection string depends on the data source type and suitable connection strings can be found e.g. in https://www.connectionstrings.com.
- Connection String Key: Secure string key of the ODBC connection string. Alternative to the Connection String.
- ODBC Query: ODBC query to run in the datasource to fetch cases/events data. Query syntax depends on the source system where the data is fetched from.
- Loading Script: Expression language script that provides cases/events data. For more information, see the examples below.
Note that when changing datasource settings, the model is dropped from the memory, so that it can be reloaded using the changed settings.
Datatable Models
When the DataSource is Datatable, data is loaded from a QPR ProcessAnalyzer datatable. Datatables can be created by importing data to them from CSV files, or as a result of a script.
When a model using datatables as a datasource, is moved to another project, also the linked datatables are moved with the model. Note that there may be several models using a same datatable, and in that case the linkages to other models may be broken when moving a model to other project. Note also that if another datatable with the same name already exist in the target project, the datatable cannot be moved, because there cannot be two datatables having the same name in the same project.
Loading Script
Data loaded to a QPR ProcessAnalyzer model can be defined using a loading script, which is written using the expression language. The loading script needs to create a DataFrame containing the cases or events data. In addition, mappings from the DataFrame column names defining which column is case id, event type name and timestamp needs to be defined. Note that in the loading script, it's not possible to refer to the same model's eventlog, because the loading script is run to generate that eventlog (it's a circular reference).
The loading script is run in a security context where there are GenericRead and GenericWrite permissions for the project, where the loaded model is located. This means that the loading script has access to all the data in the same project, and also a write access e.g. to the datatables. User permissions are not used by the loading script, so that the user who initiated the loading, wouldn't affect the result. In addition, the model may be loaded automatically on server startup, and in that case there is no user who initiated the loading.
When loading data from an ODBC datasource, note also the AllowExternalDatasources setting, which may prevent the loading.
Examples
Loading script for events that takes the eventlog from another model:
ToDataFrame( EventLogById(1).Events.[Case.Name, TypeName, Timestamp], ["CaseId", "EventType", "Timestamp"] )
Loading script for events:
let maxEventsPerModel = 1000; let ms = ["SAP_OrderToCash", "BPIC13_incidents"].( Let("modelName", _), Models.Where(Name == modelName))[0]); Let("eventAttributes", ToDictionary()); ms.EventAttributes.( Let("ea", _), eventAttributes.Set(ea.Name, ea)); Let("orderedEventAttributes", OrderByValue(eventAttributes.Keys)); ConcatTop([[Flatten( ["Case", "EventType", "TimeStamp", orderedEventAttributes])], ConcatTop(ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Flatten( [Case.Name, Type.Name, TimeStamp, (Let("evt", _), orderedEventAttributes.( Let("att", _), evt.Attribute(att) ) )] ) )] )
Loading script for cases:
Let("maxEventsPerModel", 1000);Let("ms", ["SAP_OrderToCash", "BPIC13_incidents"].(Let("modelName", _), Models.Where(Name == modelName))[0]);Let("caseAttributes", ToDictionary());ms.CaseAttributes.(Let("ca", _), caseAttributes.Set(ca.Name, ca));Let("orderedCaseAttributes", OrderByValue(caseAttributes.Keys));Let("cases", ToDictionary());ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Case.(Let("c", _), cases.Set(c, true));ConcatTop([[Flatten(["Case", "Model", orderedCaseAttributes])],cases.Keys.Flatten([Name, EventLog.Name, (Let("c", _), orderedCaseAttributes.(Let("att", _), c.Attribute(att)))])])
Create a model with eventlog data embedded into the expression.
ToDataFrame([["a", "et1", DateTime(2019,1)],["a", "et2", DateTime(2019,2)],["a", "et3", DateTime(2019,3)],["b", "et1", DateTime(2019,2)],["c", "et3", DateTime(2019,2)],["c", "et2", DateTime(2019,3)],["c", "et2", DateTime(2019,4)],["c", "et3", DateTime(2019,5)],["c", "et4", DateTime(2019,6)]], ["Case","Timestamp","EventType"])
This loading script loads events using several odbc queries (each event type separately).
let eventTypes = ["Invoice Created", "Requested Delivery", "Inventory Date", "Invoice Payment", "Actual Delivery"]; let data = eventTypes.( ImportODBC( "Driver={ODBC Driver 17 for SQL Server};Server=SERVER\\MSSQLSERVER2017;DataBase=Process Mining Data;Uid=user1;Pwd=pw1;", "SELECT [Case] AS \"CaseId\", Event_Type AS \"EventType\", Start_Time AS TimeStamp, Cost, Variable_cost, booleanAttribute1 FROM [OtC Model events] Where [Event_Type]=\'" + _ +"\'" ) ); let combinedData = data[0]; For("i", 1, i < CountTop(data), i + 1, let("combinedData", combinedData.Append(data[i]) ) ); combinedData;
ODBC Datasource
When Datasource is ODBC, data is loaded directly from an ODBC datasource when the model is loaded. The data is kept only in memory and not stored permanently in QPR ProcessAnalyzer. There needs to be a suitable ODBC driver installed in QPR ProcessAnalyzer server, and ODBC connection string and ODBC query needs to be defined (examples below). In addition, AllowExternalDatasources setting needs to be set to True to use the ODBC datasource.
ODBC Driver Installation
Microsoft Access Database Engine 2016 Redistributable is a recommended software package containing ODBC drivers for
- Microsoft SQL Server
- Microsoft Access (*.mdb and *.accdb) files
- Microsoft Excel (*.xls, *.xlsx, and *.xlsb) files
- CSV text files
The package is installed in the computer where the QPR ProcessAnalyzer Server is running. Installation instructions:
- Go to https://www.microsoft.com/en-us/download/details.aspx?id=54920 and click Download.
- Select the 64-bit (x64) version.
- Double-click the executable file on your hard disk to start the setup program.
- Follow the instructions on the screen to complete the installation.
Read data from CSV file
Data can be loaded from CSV files located in the file system using Microsoft Access Text Driver (*.txt, *.csv) driver. The following example loads files C:\ProcessMiningData\ModelCaseAttributes.csv and C:\ProcessMiningData\ModelEventData.csv.
- Cases: Connection String: Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt
- Cases: ODBC Query: SELECT * FROM [ModelCaseAttributes.csv]
- Events: Connection String: Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt
- Events: ODBC Query: SELECT * FROM [ModelEventData.csv]
When reading from CSV files, you may need to set the CSV file format for the ODBC driver using the Schema.ini file (more information: https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-2017). ODBC text driver may guess column data type and return null values in case when guessed column type not match. Guessing based top rows of data. ODBC text driver support only one data type per column mixed data types is not supported.
Read data from Excel file
In this example, data is loaded from a Excel file that is accessible from the file system. In this example, the loaded file is C:\ProcessMiningData\ModelData.xlsx and cases are in sheet MyCases and events in sheet MyEvents.
- Cases: Connection String: Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx
- Cases: ODBC Query: SELECT * FROM [MyCases$]
- Events: Connection String: Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx
- Events: ODBC Query: SELECT * FROM [MyEvents$]
Read data from SQL Server
Recommended driver: https://www.microsoft.com/en-us/download/details.aspx?id=56567
In this example, data is loaded from an SQL Server table. SQL Server hostname is myServerName, instance name is myInstanceName and database name is MyDatabase. The Trusted_Connection is for Windows authentication.
- Cases: Connection String: Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;
- Cases: ODBC Query: SELECT CASENAME, COST, COUNTRY, PRODUCT FROM CASES_TABLE
- Events: Connection String: Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;
- Events: ODBC Query: SELECT CASENAME, CREATED_DATE, CREATED_BY FROM EVENTS_TABLE
If SQL Server authentication is used, the connection string is Driver={ODBC Driver 17 for SQL Server};Server=myServerName\myInstanceName;DataBase=MyDatabase;Trusted_Connection=No;Uid=myUsername;PWD=myPassword;
More about SQL Server connection strings: https://www.connectionstrings.com/sql-server/
Troubleshooting for ODBC Loading
The following error message may be encountered: System.Data.Odbc.OdbcException (0x80131937): ERROR [IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified. Probable reason is that the ODBC driver is missing or driver name is not correct. To solve the issue:
- Check ODBC drivers is installed by running C:\Windows\System32\odbcad32.exe.
- Check the connection string.
The following error message may be encountered: ERROR [HY024] [Microsoft][ODBC Text Driver] '(unknown)' is not a valid path. Make sure that the path name is spelled correctly and that you are connected to the server on which the file resides.. Example error when specified directory is missing:
Error when CSV file is missing: System.Data.Odbc.OdbcException (0x80131937): ERROR [42S02] [Microsoft][ODBC Text Driver] The Microsoft Access database engine could not find the object 'CaseAttributes.csv'. Make sure the object exists and that you spell its name and the path name correctly. If 'CaseAttributes.csv' is not a local object, check your network connection or contact the server administrator.
Error codes for troubleshooting: ODBC Error Codes.
If the error message mention about Case or Event attributes named as F<column number> (e.g. F10), the data might be inconsistent, as there might be different amount of columns in different rows.