QPR ProcessAnalyzer Model Datasources: Difference between revisions
Line 129: | Line 129: | ||
</pre> | </pre> | ||
When reading from CSV files, you may need to set the CSV file format for the ODBC driver using the '''Schema.ini''' file (more information: https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-2017). Example: ODBC text driver may guess column data type and return null values in case when guessed column type not match. Guessing based top rows of data. | When reading from CSV files, you may need to set the CSV file format for the ODBC driver using the '''Schema.ini''' file (more information: https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-2017). | ||
* Example: ODBC text driver may guess column data type and return null values in case when guessed column type not match. Guessing based top rows of data. | |||
* Note: ODBC text driver support only one data type per column mixed data types is not supported. | |||
==== Read data from Excel file ==== | ==== Read data from Excel file ==== |
Revision as of 07:17, 15 October 2019
Data to QPR Processanalyzer models can be loaded from many different sources using different techniques. The following type of datasources can be used:
- Datatables: Data can be loaded from QPR ProcessAnalyzer's own datatables.
- ODBC datasources: Data can be loaded from any ODBC compliant source system, such as many database systems, SAP HANA, ServiceNow, Excel files and CSV files.
- Expression: Data can be loaded from a dataset that is produced by an expression written in QPR ProcessAnalyzer expression language. With the expression language, it's possible to e.g. fetch data from other QPR ProcessAnalyzer models, transform the data and then load them to a model using the expression datasource.
- Models stored to QPR ProcessAnalyze database (legacy method)
For the datatables, ODBC and expression datasources two datasource queries needs to be defined: one for cases and one for events. It's also possible that cases and event are loaded from different types of datasources, e.g. cases from an ODBC datasource and events from a datatable.
Mappings for the columns in the datasets can be defined freely, so the order of columns is not relevant for QPR ProcessAnalyzer.
Loading Data from Datatables
In the DataSource section, when the DataSourceType is defined as datatable, data is loaded from a QPR ProcessAnalyzer datatable. Datatables can be created by importing data to them from a CSV file, or as a result of an ETL script.
Property | Description |
---|---|
DataSource/Cases/DataSourceType | Datasource type to use, when fetching the Cases data. Supported values are database (default), datatable, expression and odbc. |
DataSource/Cases/DataTableName | Name of the datatable where to fetch the cases data. |
DataSource/Cases/Columns | Column name mappings for QPR ProcessAnalyzer data model. Only supported mapping is CaseId defining the case id (case name). Example:
{ "CaseId": "SalesOrderHeaderId" } All unmapped columns are taken to the model as case attributes. If the fetched data doesn't contain a mapped column, the model loading fails and an error message is given. |
DataSource/Events/DataSourceType | Datasource type to use, when fetching the Events data. Supported values are database (default), datatable, expression and odbc. |
DataSource/Events/DataTableName | Name of the datatable where to fetch the events data. |
DataSource/Events/Columns |
Column name mappings for QPR ProcessAnalyzer data model. Supported mappings for Events are CaseId, EventType and Timestamp. Example: { "CaseId": "SalesOrderHeaderId", "EventType": "EventType", "Timestamp": "CreatedDate" } All unmapped columns are taken to the model as event attributes. If the fetched data doesn't contain a mapped column, the model loading fails and an error message is given. |
Loading Data from ODBC Datasource
In the DataSource section, when the DataSourceType is defined as odbc (for cases or events), data is loaded from an ODBC datasource. There needs to be a suitable ODBC driver installed in QPR ProcesAnalyzer server, and ODBC connection string and query needs to be defined (more below).
Property | Description |
---|---|
DataSource/Cases/DataSourceType | Use value odbc when loading data from an ODBC datasource. More information about DataSourceType, see Loading Data from Datatables. |
DataSource/Cases/OdbcConnectionString | ODBC connection string to use to fetch the Cases data. Connection strings can be found in https://www.connectionstrings.com. |
DataSource/Cases/OdbcQuery | ODBC query to use to fetch the Cases data. Query syntax depends on the source system where the data is fetched. |
DataSource/Cases/Columns | For more information, see Loading Data from Datatables. |
DataSource/Events/DataSourceType | Use value odbc when loading data from an ODBC datasource. More information about DataSourceType, see Loading Data from Datatables. |
DataSource/Events/OdbcConnectionString | ODBC connection string to use to fetch the Events data. Connection strings can be found in https://www.connectionstrings.com. |
DataSource/Events/OdbcQuery | ODBC query to use to fetch the Events data. Query syntax depends on the source system where the data is fetched. |
DataSource/Events/Columns | For more information, see Loading Data from Datatables. |
Configurations examples
ODBC Driver Installation
Microsoft Access Database Engine 2016 Redistributable is software package that contains ODBC drivers for
- Microsoft SQL Server
- Microsoft Access (*.mdb and *.accdb) files
- Microsoft Excel (*.xls, *.xlsx, and *.xlsb) files
- CSV text files
The package needs to be installed in the same computer where the QPR ProcessAnalyzer Server is running. Installation instructions:
- Go to https://www.microsoft.com/en-us/download/details.aspx?id=54920 and click Download.
- Select whether to use the 32bit or 64bit (x64) version (usually it's the 64bit version).
- Double-click the executable file on your hard disk to start the setup program.
- Follow the instructions on the screen to complete the installation.
If 32-bit office is installed in machine then 64-bit ODBC driver need install using /passive parameter. More information
- https://stackoverflow.com/questions/7116019/hand-install-of-64-bit-ms-access-odbc-drivers-when-32-bit-office-is-present
- https://techblog.aimms.com/2014/10/27/installing-32-bit-and-64-bit-microsoft-access-drivers-next-to-each-other/
Read data from CSV file
In this example, data is loaded from CSV files located in the file system using Microsoft Access Text Driver (*.txt, *.csv) driver. Loaded files are C:\ProcessMiningData\ModelCaseAttributes.csv and C:\ProcessMiningData\ModelEventData.csv.
{ "DataSource": { "Cases": { "DataSourceType": "odbc", "OdbcConnectionString": "Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt", "OdbcQuery": "SELECT * FROM [ModelCaseAttributes.csv]", "Columns": { "CaseId": "Name" } }, "Events": { "DataSourceType": "odbc", "OdbcConnectionString": "Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt", "OdbcQuery": "SELECT * FROM [ModelEventData.csv]", "Columns": { "CaseId": "Case", "EventType": "Event Type", "Timestamp": "Start Time" } } } }
When reading from CSV files, you may need to set the CSV file format for the ODBC driver using the Schema.ini file (more information: https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-2017).
- Example: ODBC text driver may guess column data type and return null values in case when guessed column type not match. Guessing based top rows of data.
- Note: ODBC text driver support only one data type per column mixed data types is not supported.
Read data from Excel file
In this example, data is loaded from a Excel file that is accessible in the file system. In this example, the loaded file is C:\ProcessMiningData\ModelData.xlsx and cases are in sheet MyCases and events in sheet MyEvents.
{ "DataSource": { "Cases": { "DataSourceType": "odbc", "OdbcConnectionString": "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx", "OdbcQuery": "SELECT * FROM [MyCases$]", "Columns": { "CaseId": "Case ID" } }, "Events": { "DataSourceType": "odbc", "OdbcConnectionString": "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx", "OdbcQuery": "SELECT * FROM [MyEvents$]", "Columns": { "CaseId": "Case ID", "EventType": "Activity", "Timestamp": "Start Time" } } } }
Read data from SQL Server table
Recommended driver: https://www.microsoft.com/en-us/download/details.aspx?id=56567
In this example, data is loaded from an SQL Server table. SQL Server hostname is myServerName, instance name is myInstanceName and database name is MyDatabase. The Trusted_Connection is for Windows authentication.
{ "DataSource": { "Cases": { "DataSourceType": "odbc", "OdbcConnectionString": "Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;", "OdbcQuery": "SELECT CASENAME, COST, COUNTRY, PRODUCT FROM CASES_TABLE", "Columns": { "CaseId": "CASENAME" } }, "Events": { "DataSourceType": "odbc", "OdbcConnectionString": "Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;", "OdbcQuery": "SELECT CASENAME, CREATED_DATE, CREATED_BY FROM EVENTS_TABLE", "Columns": { "CaseId": "CASENAME", "Timestamp": "CREATED_DATE", "EventType": "CREATED_BY" } } } }
If SQL Server authentication is used, the connection string is
Driver={ODBC Driver 17 for SQL Server};Server=myServerName\myInstanceName;DataBase=MyDatabase;Trusted_Connection=No;Uid=myUsername;PWD=myPassword;
More about SQL Server connection strings: https://www.connectionstrings.com/sql-server/
Troubleshooting
The following error message may be encountered: System.Data.Odbc.OdbcException (0x80131937): ERROR [IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified. Probable reason is that the ODBC driver is missing or driver name is not correct. To solve the issue:
- Check ODBC drivers is installed by running C:\Windows\System32\odbcad32.exe.
- Check the connection string.
The following error message may be encountered: ERROR [HY024] [Microsoft][ODBC Text Driver] '(unknown)' is not a valid path. Make sure that the path name is spelled correctly and that you are connected to the server on which the file resides.. Example error when specified directory is missing:
Error when CSV file is missing: System.Data.Odbc.OdbcException (0x80131937): ERROR [42S02] [Microsoft][ODBC Text Driver] The Microsoft Access database engine could not find the object 'CaseAttributes.csv'. Make sure the object exists and that you spell its name and the path name correctly. If 'CaseAttributes.csv' is not a local object, check your network connection or contact the server administrator.
Error codes for troubleshooting: ODBC Error Codes.
If the error message mention about Case or Event attributes named as F<column number> (e.g. F10), the data might be inconsistent, as there might be different amount of columns in different rows.
Loading Data from Expression Datasource
Data (cases and events) loaded to a QPR ProcessAnalyzer model can be defined using an expression. The principle with column mappings is similar than in the odbc datasource.
Property | Description |
---|---|
DataSource/Cases/DataSourceType | Use value expression when loading data from an expression. More information about DataSourceType, see Loading Data from Datatables. |
DataSource/Cases/Expression | Expression to get cases. The data structure is an array of arrays, where the outer array contains rows and inner array contains data cells in a row. The first row must contain headers. |
DataSource/Cases/Columns | For more information, see Loading Data from Datatables. |
DataSource/Events/DataSourceType | Use value expression when loading data from an expression. More information about DataSourceType, see Loading Data from Datatables. |
DataSource/Events/Expression | Expression to get events. The data structure is an array of arrays, where the outer array contains rows and inner array contains data cells in a row. The first row must contain headers. |
DataSource/Events/Columns | For more information, see Loading Data from Datatables. |
Examples:
Create model from another model:
{ "DataSource": { "Events": { "DataSourceType": "expression", "Expression": "ConcatTop([[["CaseId", "EventType", "Timestamp"]], EventLogById(1).Events.[Case.Name, TypeName, Timestamp]])", "Columns": { "CaseId": "Case", "Timestamp": "Timestamp", "EventType": "EventType" } } } }
{ "DataSource": { "Events": { "DataSourceType": "expression", "Expression": "Let(\"maxEventsPerModel\", 1000);Let(\"ms\", [\"SAP_OrderToCash\", \"BPIC13_incidents\"].(Let(\"modelName\", _), Models.Where(Name == modelName))[0]);Let(\"eventAttributes\", ToDictionary());ms.EventAttributes.(Let(\"ea\", _), eventAttributes.Set(ea.Name, ea));Let(\"orderedEventAttributes\", OrderByValue(eventAttributes.Keys));ConcatTop([[Flatten([\"Case\", \"EventType\", \"TimeStamp\", orderedEventAttributes])],ConcatTop(ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Flatten([Case.Name, Type.Name, TimeStamp, (Let(\"evt\", _), orderedEventAttributes.(Let(\"att\", _), evt.Attribute(att)))]))])", "Columns": { "CaseId": "Case", "Timestamp": "TimeStamp", "EventType": "EventType" } }, "Cases": { "DataSourceType": "expression", "Expression": "Let(\"maxEventsPerModel\", 1000);Let(\"ms\", [\"SAP_OrderToCash\", \"BPIC13_incidents\"].(Let(\"modelName\", _), Models.Where(Name == modelName))[0]);Let(\"caseAttributes\", ToDictionary());ms.CaseAttributes.(Let(\"ca\", _), caseAttributes.Set(ca.Name, ca));Let(\"orderedCaseAttributes\", OrderByValue(caseAttributes.Keys));Let(\"cases\", ToDictionary());ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Case.(Let(\"c\", _), cases.Set(c, true));ConcatTop([[Flatten([\"Case\", \"Model\", orderedCaseAttributes])],cases.Keys.Flatten([Name, EventLog.Name, (Let(\"c\", _), orderedCaseAttributes.(Let(\"att\", _), c.Attribute(att)))])])", "Columns": { "CaseId": "Case" } } } }
{ "DataSource": { "Events": { "DataSourceType": "expression", "Expression": "[[\"Case\", \"EventType\", \"TimeStamp\"],[\"a\", \"et1\", DateTime(2019,1)],[\"a\", \"et2\", DateTime(2019,2)],[\"a\", \"et3\", DateTime(2019,3)],[\"b\", \"et1\", DateTime(2019,2)],[\"c\", \"et3\", DateTime(2019,2)],[\"c\", \"et2\", DateTime(2019,3)],[\"c\", \"et2\", DateTime(2019,4)],[\"c\", \"et3\", DateTime(2019,5)],[\"c\", \"et4\", DateTime(2019,6)]]", "Columns": { "CaseId": "Case", "Timestamp": "TimeStamp", "EventType": "EventType" } } } }