QPR ProcessAnalyzer Model Datasources: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
(TK-54573)
 
(79 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Data to QPR Processanalyzer models can be loaded from many different sources using different techniques. The following type of datasources can be used:
QPR Processanalyzer in-memory models can load data from variety of sources:
* [[#Loading Data from Datatables|Datatables]]: Data can be loaded from QPR ProcessAnalyzer's own datatables.
* [[#Datatable Models|Datatables]]: Data can be loaded from QPR ProcessAnalyzer [[QPR_ProcessAnalyzer_Project_Workspace#Managing_Datatables|datatables]], located either in SQL Server or in Snowflake.
* [[#Loading Data from ODBC Datasource|ODBC datasources]]: Data can be loaded from any ODBC compliant source system, such as many database systems, SAP HANA, ServiceNow, Excel files and CSV files.
* [[#Loading Script|Loading Script]]: Data can be loaded using a loading script written in QPR ProcessAnalyzer [[QPR_ProcessAnalyzer_Expressions|expression language]]. The loading script can extract the data and perform needed transformations to the suitable eventlog format during model is loaded. The loading script is started when the model loading into memory starts (pull logic). With the expression language, it's also possible to fetch data from other QPR ProcessAnalyzer models.
* [[#Loading Data from Expression Datasource|Expression]]: Data can be loaded from a dataset that is produced by an expression written in QPR ProcessAnalyzer [[QPR_ProcessAnalyzer_Expressions|expression language]]. With the expression language, it's possible to e.g. fetch data from other QPR ProcessAnalyzer models, transform the data and then load them to a model using the expression datasource.
* [[#ODBC Datasource|ODBC datasources]]: Data can be loaded from any ODBC compliant source system, such as databases, SAP HANA, ServiceNow, Excel files or CSV files.
* Models stored to QPR ProcessAnalyze database (legacy method)


For the datatables, ODBC and expression datasources two datasource queries needs to be defined: one for '''cases''' and one for '''events'''. It's also possible that cases and event are loaded from different types of datasources, e.g. cases from an ODBC datasource and events from a datatable.
== Model Datasource Settings ==
Datasource needs to be defined both for cases and events. Model datasource settings are in the ''Model Properties'' dialog in '''Cases datasource''' and '''Events datasource''' tabs. It's possible that cases and events are loaded from different types of datasources, e.g. cases from an ODBC datasource and events from a datatable. Events datasource is optional and cases datasource is mandatory.


Mappings for the columns in the datasets can be defined freely, so the order of columns is not relevant for QPR ProcessAnalyzer.
Following settings are available:
* '''Datasource''': Can be ''Datatable'' (SQL Server or Snowflake), ''ODBC'' (data is fetched by executing an ODBC query to an OCBC datasource) or ''Loading Script'' (data is generated by running an expression language query).
* '''Datatable''': When ''datasource'' is a ''datatable'', this settings defines the datatable to use.
* '''Case ID''': Defines column containing case id. This mapping is needed for both cases and events data. The column datatype can be either string or integer.
* '''Event Type''': Defines column containing event type name. This mapping is needed for events data. This column datatype must be string.
* '''Event Time''': Defines column containing event timestamp. This mapping is needed for events data. The column datatype must be date.
* '''Connection String''': ODBC connection string used to fetch cases/events data. Connection string depends on the data source type and suitable connection strings can be found e.g. in https://www.connectionstrings.com.
* '''Connection String Key''': [[Storing_Secrets_for_Scripts|Secret name]] of the ODBC connection string. Alternative to the ''Connection String''.
* '''ODBC Query''': ODBC query to run in the datasource to fetch cases/events data. Query syntax depends on the source system where the data is fetched from.
* '''Loading Script''': Expression language script that provides cases/events data. For more information, see the examples below.


== Loading Data from Datatables ==
Note that when changing datasource settings, the model is dropped from the memory, so that it can be reloaded using the changed settings.
In the ''DataSource'' section, when the ''DataSourceType'' is defined as ''datatable'', data is loaded from a QPR ProcessAnalyzer [[Data_Tables_in_QPR_ProcessAnalyzer|datatable]]. Datatables can be created by importing data to them from a CSV file, or as a result of an [[Data_Extraction,_Transformation,_and_Loading|ETL script]].


{| class="wikitable"
== Datatable  Models ==
!'''Property'''
When the ''DataSource'' is ''Datatable'', data is loaded from [[QPR_ProcessAnalyzer_Project_Workspace#Datatables|datatables]]. Datatables can be created by importing data to them from CSV files, or they can be created by ETL scripts. Datatables can be SQL Server or Snowflake datatables. Note that when using Snowflake datatables, processing is done in Snowflake data cloud.
! '''Description'''
|-
||DataSource/Cases/DataSourceType
||Datasource type to use, when fetching the Cases data. Supported values are '''database''' (default), '''datatable''', '''expression''' and '''odbc'''.
|-
||DataSource/Cases/DataTableName
||Name of the datatable where to fetch the cases data.
|-
||DataSource/Cases/Columns
||Column name mappings for QPR ProcessAnalyzer data model. Only supported mapping is '''CaseId''' defining the case id (case name). Example:
<pre>
{
  "CaseId": "SalesOrderHeaderId"
}
</pre>
All unmapped columns are taken to the model as case attributes. If the fetched data doesn't contain a mapped column, the model loading fails and an error message is given.
|-
||DataSource/Events/DataSourceType
||Datasource type to use, when fetching the Events data. Supported values are '''database''' (default), '''datatable''', '''expression''' and '''odbc'''.
|-
||DataSource/Events/DataTableName
||Name of the datatable where to fetch the events data.
|-
||DataSource/Events/Columns
||
Column name mappings for QPR ProcessAnalyzer data model. Supported mappings for Events are '''CaseId''', '''EventType''' and '''Timestamp'''. Example:
<pre>
{
  "CaseId": "SalesOrderHeaderId",
  "EventType": "EventType",
  "Timestamp": "CreatedDate"
}
</pre>
All unmapped columns are taken to the model as event attributes. If the fetched data doesn't contain a mapped column, the model loading fails and an error message is given.
|}


== Loading Data from ODBC Datasource ==
== Loading Script ==
In the ''DataSource'' section, when the ''DataSourceType'' is defined as ''odbc'' (for cases or events), data is loaded from an ODBC datasource. There needs to be a suitable ODBC driver installed in QPR ProcesAnalyzer server, and ODBC connection string and query needs to be defined (more below).  
Data loaded into a model can be defined using a loading script written with the expression language. Suitable script creates a DataFrame containing the cases/events data. In addition, mapping are needed from the DataFrame column names defining which column is case id, event type name and timestamp. Note that in the loading script, it's not possible to refer to the same model's eventlog to avoid a circular reference.


{| class="wikitable"
The loading script is run in a security context where there are [[Roles_and_Permissions#Global_and_Project_Roles|GenericRead and GenericWrite]] permissions for the project, where the loaded model is located. This means that the loading script has access to all data in the same project and also a write access e.g. to the datatables. Note that user's own permissions are not applied by the loading script. This is also when the model is loaded [[Automatic Model Loading on Server Startup|automatically on server startup]], as in that case there is no user who initiated the loading.
!'''Property'''
! '''Description'''
|-
||DataSource/Cases/DataSourceType
||Use value '''odbc''' when loading data from an ODBC datasource. More information about DataSourceType, see [[QPR_ProcessAnalyzer_Model_JSON_Settings#Loading_Data_from_Datatables|Loading Data from Datatables]].
|-
||DataSource/Cases/OdbcConnectionString
||ODBC connection string to use to fetch the Cases data. Connection strings can be found in https://www.connectionstrings.com.
|-
||DataSource/Cases/OdbcQuery
||ODBC query to use to fetch the Cases data. Query syntax depends on the source system where the data is fetched.
|-
||DataSource/Cases/Columns
||For more information, see [[QPR_ProcessAnalyzer_Model_JSON_Settings#Loading_Data_from_Datatables|Loading Data from Datatables]].
|-
||DataSource/Events/DataSourceType
||Use value '''odbc''' when loading data from an ODBC datasource. More information about DataSourceType, see [[QPR_ProcessAnalyzer_Model_JSON_Settings#Loading_Data_from_Datatables|Loading Data from Datatables]].
|-
||DataSource/Events/OdbcConnectionString
||ODBC connection string to use to fetch the Events data. Connection strings can be found in https://www.connectionstrings.com.
|-
||DataSource/Events/OdbcQuery
||ODBC query to use to fetch the Events data. Query syntax depends on the source system where the data is fetched.
|-
||DataSource/Events/Columns
||For more information, see [[QPR_ProcessAnalyzer_Model_JSON_Settings#Loading_Data_from_Datatables|Loading Data from Datatables]].
|}


=== Configurations examples ===
When loading data from an ODBC datasource, the [[PA_Configuration_database_table|AllowExternalDatasources]] setting needs to be enabled.


==== ODBC Driver Installation ====
=== Examples ===
'''Microsoft Access Database Engine 2016 Redistributable''' is software package that contains ODBC drivers for
==== Load from Snowflake datatable ====
* Microsoft SQL Server
This example loading script loads data from datatable ''MyDatatable'' located in the same project as the model. It can be used to load data from a Snowflake datatable to an in-memory model.
* Microsoft Access (*.mdb and *.accdb) files
<pre>
* Microsoft Excel (*.xls, *.xlsx, and *.xlsb) files
Project.DatatableByName("MyDatatable").SqlDataFrame.Collect();
* CSV text files
</pre>


The package needs to be installed in the same computer where the QPR ProcessAnalyzer Server is running. Installation instructions:
==== Load from other model ====
# Go to https://www.microsoft.com/en-us/download/details.aspx?id=54920 and click Download.
Loading script for events that takes the eventlog from another model:
# Select whether to use the 32bit or 64bit (x64) version (usually it's the 64bit version).
<pre>
# Double-click the executable file on your hard disk to start the setup program.
ToDataFrame(
# Follow the instructions on the screen to complete the installation.
  EventLogById(1).Events.[Case.Name, TypeName, Timestamp],
  ["CaseId", "EventType", "Timestamp"]
)
</pre>


If 32-bit office is installed in machine then 64-bit ODBC driver need install using /passive parameter.
==== Load sample from other model ====
More information
Loading script for events:
* https://stackoverflow.com/questions/7116019/hand-install-of-64-bit-ms-access-odbc-drivers-when-32-bit-office-is-present
<pre>
* https://techblog.aimms.com/2014/10/27/installing-32-bit-and-64-bit-microsoft-access-drivers-next-to-each-other/
let maxEventsPerModel = 1000;
let ms = ["SAP_OrderToCash", "BPIC13_incidents"].(
  Let("modelName", _), Models.Where(Name == modelName))[0]);
  Let("eventAttributes", ToDictionary());
  ms.EventAttributes.(
    Let("ea", _), eventAttributes.Set(ea.Name, ea));
    Let("orderedEventAttributes", OrderByValue(eventAttributes.Keys));
    ConcatTop([[Flatten(
      ["Case", "EventType", "TimeStamp", orderedEventAttributes])],
      ConcatTop(ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Flatten(
        [Case.Name, Type.Name, TimeStamp, (Let("evt", _), orderedEventAttributes.(
          Let("att", _), evt.Attribute(att)
        )
      )]
    )
  )]
)
</pre>


==== Read data from CSV file ====
Loading script for cases:
In this example, data is loaded from CSV files located in the file system using ''Microsoft Access Text Driver (*.txt, *.csv)'' driver. Loaded files are C:\ProcessMiningData\ModelCaseAttributes.csv and C:\ProcessMiningData\ModelEventData.csv.
<pre>
<pre>
{
let maxEventsPerModel = 1000;
  "DataSource": {
let ms = ["SAP_OrderToCash", "BPIC13_incidents"];
    "Cases": {   
let modelName =_;
      "DataSourceType": "odbc",
Models.Where(Name == modelName))[0]);
      "OdbcConnectionString": "Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt",
let caseAttributes = ToDictionary();
      "OdbcQuery": "SELECT * FROM [ModelCaseAttributes.csv]",
ms.CaseAttributes.{
      "Columns": {  
  let ca = _;
        "CaseId": "Name"
  caseAttributes.Set(ca.Name, ca);
      }
};
    },
let orderedCaseAttributes = OrderByValue(caseAttributes.Keys);
    "Events": {
let cases = ToDictionary());
      "DataSourceType": "odbc",
ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Case.{
      "OdbcConnectionString": "Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt",
  let c= _;
      "OdbcQuery": "SELECT * FROM [ModelEventData.csv]",
  cases.Set(c, true);
      "Columns": {  
}
        "CaseId": "Case",
ConcatTop([[
        "EventType": "Event Type",
  Flatten(["Case", "Model", orderedCaseAttributes]),
        "Timestamp": "Start Time"
  cases.Keys.Flatten([Name, EventLog.Name, {
       }
    let c = _;
    orderedCaseAttributes.{
      let att = _;
       c.Attribute(att);
     }
     }
   }
   }]);
}
]]);
</pre>
</pre>


When reading from CSV files, you may need to set the CSV file format for the ODBC driver using the '''Schema.ini''' file (more information: https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-2017).
==== Load data embedded into expression ====
* Example: ODBC text driver may guess column data type and return null values in case when guessed column type not match. Guessing based top rows of data.
Create a model with eventlog data embedded into the expression.
* Note: ODBC text driver support only one data type per column mixed data types is not supported.
<pre>
 
ToDataFrame(
==== Read data from Excel file ====
  [
In this example, data is loaded from a Excel file that is accessible in the file system. In this example, the loaded file is C:\ProcessMiningData\ModelData.xlsx and cases are in sheet ''MyCases'' and events in sheet ''MyEvents''.
    ["a", "et1", DateTime(2019,1)],
    ["a", "et2", DateTime(2019,2)],
    ["a", "et3", DateTime(2019,3)],
    ["b", "et1", DateTime(2019,2)],
    ["c", "et3", DateTime(2019,2)],
    ["c", "et2", DateTime(2019,3)],
    ["c", "et2", DateTime(2019,4)],
    ["c", "et3", DateTime(2019,5)],
    ["c", "et4", DateTime(2019,6)]
  ],
  ["Case","Timestamp","EventType"]
)
</pre>


==== Load data with several ODBC queries ====
This loading script loads events using several ODBC queries (each event type separately).
<pre>
<pre>
{
let eventTypes = ["Invoice Created", "Requested Delivery", "Inventory Date", "Invoice Payment", "Actual Delivery"];
  "DataSource": {
let data = eventTypes.(
    "Cases": {
  ImportODBC(
      "DataSourceType": "odbc",
    "Driver={ODBC Driver 17 for SQL Server};Server=SERVER\\MSSQLSERVER2017;DataBase=Process Mining Data;Uid=user1;Pwd=pw1;",
      "OdbcConnectionString": "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx",
    "SELECT [Case] AS \"CaseId\", Event_Type AS \"EventType\", Start_Time AS TimeStamp, Cost, Variable_cost, booleanAttribute1 FROM [OtC Model events] Where [Event_Type]=\'" + _ +"\'"
      "OdbcQuery": "SELECT * FROM [MyCases$]",
  )
      "Columns": {
);
        "CaseId": "Case ID"
let combinedData = data[0];
      }
For("i", 1, i < CountTop(data), i + 1,
    },
  let("combinedData",
    "Events": {
     combinedData.Append(data[i])
      "DataSourceType": "odbc",
   )
      "OdbcConnectionString": "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx",
);
      "OdbcQuery": "SELECT * FROM [MyEvents$]",
combinedData;
      "Columns": {
        "CaseId": "Case ID",
        "EventType": "Activity",
        "Timestamp": "Start Time"
      }
     }
   }
}
</pre>
</pre>


==== Read data from SQL Server table ====
== ODBC Datasource ==
Recommended driver: https://www.microsoft.com/en-us/download/details.aspx?id=56567
When getting model data from an ODBC datasource, data is loaded directly from the source when the model is loaded. As improved security, the data is kept only in memory and not stored permanently in QPR ProcessAnalyzer. Suitable ODBC driver needs to be installed in QPR ProcessAnalyzer server, and ''ODBC connection string'' and ''ODBC query'' are to be defined (examples below). In addition, [[PA_Configuration_database_table|AllowExternalDatasources]] setting needs to ''True'' to use the ODBC datasource.


=== Read data from CSV file ===
Data can be loaded from CSV files located in the file system using ''Microsoft Access Text Driver (*.txt, *.csv)'' driver. The following example loads files C:\ProcessMiningData\ModelCaseAttributes.csv and C:\ProcessMiningData\ModelEventData.csv.
* Cases: Connection String: Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt
* Cases: ODBC Query: SELECT * FROM [ModelCaseAttributes.csv]
* Events: Connection String: Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt
* Events: ODBC Query: SELECT * FROM [ModelEventData.csv]
When reading from CSV files, you may need to set the CSV file format for the ODBC driver using the '''Schema.ini''' file (more information: https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-2017). ODBC text driver may guess column data type and return null values in case when guessed column type not match. Guessing based top rows of data. ODBC text driver support only one data type per column mixed data types is not supported.
=== Read data from Excel file ===
In this example, data is loaded from a Excel file that is accessible from the file system. In this example, the loaded file is C:\ProcessMiningData\ModelData.xlsx and cases are in sheet ''MyCases'' and events in sheet ''MyEvents''.
* Cases: Connection String: Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx
* Cases: ODBC Query: SELECT * FROM [MyCases$]
* Events: Connection String: Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx
* Events: ODBC Query: SELECT * FROM [MyEvents$]
=== Read data from SQL Server ===
In this example, data is loaded from an SQL Server table. SQL Server hostname is ''myServerName'', instance name is ''myInstanceName'' and database name is ''MyDatabase''. The ''Trusted_Connection'' is for Windows authentication.
In this example, data is loaded from an SQL Server table. SQL Server hostname is ''myServerName'', instance name is ''myInstanceName'' and database name is ''MyDatabase''. The ''Trusted_Connection'' is for Windows authentication.
* Cases: Connection String: Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;
* Cases: ODBC Query: SELECT CASENAME, COST, COUNTRY, PRODUCT FROM CASES_TABLE
* Events: Connection String: Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;
* Events: ODBC Query: SELECT CASENAME, CREATED_DATE, CREATED_BY FROM EVENTS_TABLE


<pre>
If SQL Server authentication is used, the connection string is Driver={ODBC Driver 17 for SQL Server};Server=myServerName\myInstanceName;DataBase=MyDatabase;Trusted_Connection=No;Uid=myUsername;PWD=myPassword;
{
  "DataSource": {
    "Cases": {
      "DataSourceType": "odbc",
      "OdbcConnectionString": "Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;",
      "OdbcQuery": "SELECT CASENAME, COST, COUNTRY, PRODUCT FROM CASES_TABLE",
      "Columns": {
        "CaseId": "CASENAME"
      }
    },
    "Events": {
      "DataSourceType": "odbc",
      "OdbcConnectionString": "Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;",
      "OdbcQuery": "SELECT CASENAME, CREATED_DATE, CREATED_BY FROM EVENTS_TABLE",
      "Columns": {
        "CaseId": "CASENAME",
        "Timestamp": "CREATED_DATE",
        "EventType": "CREATED_BY"
      }
    }
  }
}
</pre>
 
If SQL Server authentication is used, the connection string is  
<pre>
Driver={ODBC Driver 17 for SQL Server};Server=myServerName\myInstanceName;DataBase=MyDatabase;Trusted_Connection=No;Uid=myUsername;PWD=myPassword;
</pre>


More about SQL Server connection strings: https://www.connectionstrings.com/sql-server/
More about SQL Server connection strings: https://www.connectionstrings.com/sql-server/


=== Troubleshooting ===
=== Troubleshooting for ODBC Loading ===


The following error message may be encountered: ''System.Data.Odbc.OdbcException (0x80131937): ERROR [IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified''. Probable reason is that the ODBC driver is missing or driver name is not correct. To solve the issue:
The following error message may be encountered: ''System.Data.Odbc.OdbcException (0x80131937): ERROR [IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified''. Probable reason is that the ODBC driver is missing or driver name is not correct. To solve the issue:
* Check ODBC drivers is installed by running '''C:\Windows\System32\odbcad32.exe'''.
* Check ODBC drivers is installed by running ''C:\Windows\System32\odbcad32.exe''.
* Check the connection string.
* Check the connection string.


The following error message may be encountered: '''ERROR [HY024] [Microsoft][ODBC Text Driver] '(unknown)' is not a valid path. Make sure that the path name is spelled correctly and that you are connected to the server on which the file resides.'''.
The following error message may be encountered: ''ERROR [HY024] [Microsoft][ODBC Text Driver] '(unknown)' is not a valid path. Make sure that the path name is spelled correctly and that you are connected to the server on which the file resides.''.
Example error when specified directory is missing:
Example error when specified directory is missing:


Error when CSV file is missing: '''System.Data.Odbc.OdbcException (0x80131937): ERROR [42S02] [Microsoft][ODBC Text Driver] The Microsoft Access database engine could not find the object 'CaseAttributes.csv'. Make sure the object exists and that you spell its name and the path name correctly. If 'CaseAttributes.csv' is not a local object, check your network connection or contact the server administrator.'''
Error when CSV file is missing: ''System.Data.Odbc.OdbcException (0x80131937): ERROR [42S02] [Microsoft][ODBC Text Driver] The Microsoft Access database engine could not find the object 'CaseAttributes.csv'. Make sure the object exists and that you spell its name and the path name correctly. If 'CaseAttributes.csv' is not a local object, check your network connection or contact the server administrator.''


Error codes for troubleshooting: [https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/appendix-a-odbc-error-codes?view=sql-server-2017| ODBC Error Codes].
Error codes for troubleshooting: [https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/appendix-a-odbc-error-codes?view=sql-server-2017| ODBC Error Codes].


If the error message mention about Case or Event attributes named as ''F<column number>'' (e.g. F10), the data might be inconsistent, as there might be different amount of columns in different rows.
If the error message mention about Case or Event attributes named as ''F<column number>'' (e.g. F10), the data might be inconsistent, as there might be different amount of columns in different rows.
== Loading Data from Expression Datasource ==
Data (cases and events) loaded to a QPR ProcessAnalyzer model can be defined using an expression. The principle with column mappings is similar than in the odbc datasource.
{| class="wikitable"
!'''Property'''
! '''Description'''
|-
||DataSource/Cases/DataSourceType
||Use value '''expression''' when loading data from an expression. More information about DataSourceType, see [[QPR_ProcessAnalyzer_Model_JSON_Settings#Loading_Data_from_Datatables|Loading Data from Datatables]].
|-
||DataSource/Cases/Expression
||Expression to get cases. The data structure is an array of arrays, where the outer array contains rows and inner array contains data cells in a row. The first row must contain headers.
|-
||DataSource/Cases/Columns
||For more information, see [[QPR_ProcessAnalyzer_Model_JSON_Settings#Loading_Data_from_Datatables|Loading Data from Datatables]].
|-
||DataSource/Events/DataSourceType
||Use value '''expression''' when loading data from an expression. More information about DataSourceType, see [[QPR_ProcessAnalyzer_Model_JSON_Settings#Loading_Data_from_Datatables|Loading Data from Datatables]].
|-
||DataSource/Events/Expression
||Expression to get events. The data structure is an array of arrays, where the outer array contains rows and inner array contains data cells in a row. The first row must contain headers.
|-
||DataSource/Events/Columns
||For more information, see [[QPR_ProcessAnalyzer_Model_JSON_Settings#Loading_Data_from_Datatables|Loading Data from Datatables]].
|-
|}
Examples:
Create model from another model:
<pre>
{
  "DataSource": {
    "Events": {
      "DataSourceType": "expression",
      "Expression": "ConcatTop([[["CaseId", "EventType", "Timestamp"]], EventLogById(1).Events.[Case.Name, TypeName, Timestamp]])",
      "Columns": {
        "CaseId": "Case",
        "Timestamp": "Timestamp",
        "EventType": "EventType"
      }
    }
  }
}
</pre>
<pre>
{
  "DataSource": {
    "Events": {
      "DataSourceType": "expression",
      "Expression": "Let(\"maxEventsPerModel\", 1000);Let(\"ms\", [\"SAP_OrderToCash\", \"BPIC13_incidents\"].(Let(\"modelName\", _), Models.Where(Name == modelName))[0]);Let(\"eventAttributes\", ToDictionary());ms.EventAttributes.(Let(\"ea\", _), eventAttributes.Set(ea.Name, ea));Let(\"orderedEventAttributes\", OrderByValue(eventAttributes.Keys));ConcatTop([[Flatten([\"Case\", \"EventType\", \"TimeStamp\", orderedEventAttributes])],ConcatTop(ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Flatten([Case.Name, Type.Name, TimeStamp, (Let(\"evt\", _), orderedEventAttributes.(Let(\"att\", _), evt.Attribute(att)))]))])",
      "Columns": {
        "CaseId": "Case",
        "Timestamp": "TimeStamp",
        "EventType": "EventType"
      }
    },
    "Cases": {
      "DataSourceType": "expression",
      "Expression": "Let(\"maxEventsPerModel\", 1000);Let(\"ms\", [\"SAP_OrderToCash\", \"BPIC13_incidents\"].(Let(\"modelName\", _), Models.Where(Name == modelName))[0]);Let(\"caseAttributes\", ToDictionary());ms.CaseAttributes.(Let(\"ca\", _), caseAttributes.Set(ca.Name, ca));Let(\"orderedCaseAttributes\", OrderByValue(caseAttributes.Keys));Let(\"cases\", ToDictionary());ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Case.(Let(\"c\", _), cases.Set(c, true));ConcatTop([[Flatten([\"Case\", \"Model\", orderedCaseAttributes])],cases.Keys.Flatten([Name, EventLog.Name, (Let(\"c\", _), orderedCaseAttributes.(Let(\"att\", _), c.Attribute(att)))])])",
      "Columns": {
        "CaseId": "Case"
      }
    }
  }
}
</pre>
<pre>
{
  "DataSource": {
    "Events": {
      "DataSourceType": "expression",
      "Expression": "[[\"Case\", \"EventType\", \"TimeStamp\"],[\"a\", \"et1\", DateTime(2019,1)],[\"a\", \"et2\", DateTime(2019,2)],[\"a\", \"et3\", DateTime(2019,3)],[\"b\", \"et1\", DateTime(2019,2)],[\"c\", \"et3\", DateTime(2019,2)],[\"c\", \"et2\", DateTime(2019,3)],[\"c\", \"et2\", DateTime(2019,4)],[\"c\", \"et3\", DateTime(2019,5)],[\"c\", \"et4\", DateTime(2019,6)]]",
      "Columns": {
        "CaseId": "Case",
        "Timestamp": "TimeStamp",
        "EventType": "EventType"
      }
    }
  }
}
</pre>


[[Category: QPR ProcessAnalyzer]]
[[Category: QPR ProcessAnalyzer]]

Latest revision as of 09:14, 30 September 2024

QPR Processanalyzer in-memory models can load data from variety of sources:

  • Datatables: Data can be loaded from QPR ProcessAnalyzer datatables, located either in SQL Server or in Snowflake.
  • Loading Script: Data can be loaded using a loading script written in QPR ProcessAnalyzer expression language. The loading script can extract the data and perform needed transformations to the suitable eventlog format during model is loaded. The loading script is started when the model loading into memory starts (pull logic). With the expression language, it's also possible to fetch data from other QPR ProcessAnalyzer models.
  • ODBC datasources: Data can be loaded from any ODBC compliant source system, such as databases, SAP HANA, ServiceNow, Excel files or CSV files.

Model Datasource Settings

Datasource needs to be defined both for cases and events. Model datasource settings are in the Model Properties dialog in Cases datasource and Events datasource tabs. It's possible that cases and events are loaded from different types of datasources, e.g. cases from an ODBC datasource and events from a datatable. Events datasource is optional and cases datasource is mandatory.

Following settings are available:

  • Datasource: Can be Datatable (SQL Server or Snowflake), ODBC (data is fetched by executing an ODBC query to an OCBC datasource) or Loading Script (data is generated by running an expression language query).
  • Datatable: When datasource is a datatable, this settings defines the datatable to use.
  • Case ID: Defines column containing case id. This mapping is needed for both cases and events data. The column datatype can be either string or integer.
  • Event Type: Defines column containing event type name. This mapping is needed for events data. This column datatype must be string.
  • Event Time: Defines column containing event timestamp. This mapping is needed for events data. The column datatype must be date.
  • Connection String: ODBC connection string used to fetch cases/events data. Connection string depends on the data source type and suitable connection strings can be found e.g. in https://www.connectionstrings.com.
  • Connection String Key: Secret name of the ODBC connection string. Alternative to the Connection String.
  • ODBC Query: ODBC query to run in the datasource to fetch cases/events data. Query syntax depends on the source system where the data is fetched from.
  • Loading Script: Expression language script that provides cases/events data. For more information, see the examples below.

Note that when changing datasource settings, the model is dropped from the memory, so that it can be reloaded using the changed settings.

Datatable Models

When the DataSource is Datatable, data is loaded from datatables. Datatables can be created by importing data to them from CSV files, or they can be created by ETL scripts. Datatables can be SQL Server or Snowflake datatables. Note that when using Snowflake datatables, processing is done in Snowflake data cloud.

Loading Script

Data loaded into a model can be defined using a loading script written with the expression language. Suitable script creates a DataFrame containing the cases/events data. In addition, mapping are needed from the DataFrame column names defining which column is case id, event type name and timestamp. Note that in the loading script, it's not possible to refer to the same model's eventlog to avoid a circular reference.

The loading script is run in a security context where there are GenericRead and GenericWrite permissions for the project, where the loaded model is located. This means that the loading script has access to all data in the same project and also a write access e.g. to the datatables. Note that user's own permissions are not applied by the loading script. This is also when the model is loaded automatically on server startup, as in that case there is no user who initiated the loading.

When loading data from an ODBC datasource, the AllowExternalDatasources setting needs to be enabled.

Examples

Load from Snowflake datatable

This example loading script loads data from datatable MyDatatable located in the same project as the model. It can be used to load data from a Snowflake datatable to an in-memory model.

Project.DatatableByName("MyDatatable").SqlDataFrame.Collect();

Load from other model

Loading script for events that takes the eventlog from another model:

ToDataFrame(
  EventLogById(1).Events.[Case.Name, TypeName, Timestamp],
  ["CaseId", "EventType", "Timestamp"]
)

Load sample from other model

Loading script for events:

let maxEventsPerModel = 1000;
let ms = ["SAP_OrderToCash", "BPIC13_incidents"].(
  Let("modelName", _), Models.Where(Name == modelName))[0]);
  Let("eventAttributes", ToDictionary());
  ms.EventAttributes.(
    Let("ea", _), eventAttributes.Set(ea.Name, ea));
    Let("orderedEventAttributes", OrderByValue(eventAttributes.Keys));
    ConcatTop([[Flatten(
      ["Case", "EventType", "TimeStamp", orderedEventAttributes])],
      ConcatTop(ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Flatten(
        [Case.Name, Type.Name, TimeStamp, (Let("evt", _), orderedEventAttributes.(
          Let("att", _), evt.Attribute(att)
        )
      )]
    )
  )]
)

Loading script for cases:

let maxEventsPerModel = 1000;
let ms = ["SAP_OrderToCash", "BPIC13_incidents"];
let modelName =_;
Models.Where(Name == modelName))[0]);
let caseAttributes = ToDictionary();
ms.CaseAttributes.{
  let ca = _;
  caseAttributes.Set(ca.Name, ca);
};
let orderedCaseAttributes = OrderByValue(caseAttributes.Keys);
let cases = ToDictionary());
ms.EventLog.Events[NumberRange(0, Min(CountTop(_), maxEventsPerModel) - 1)].Case.{
  let c= _;
  cases.Set(c, true);
}
ConcatTop([[
  Flatten(["Case", "Model", orderedCaseAttributes]),
  cases.Keys.Flatten([Name, EventLog.Name, {
    let c = _;
    orderedCaseAttributes.{
      let att = _;
      c.Attribute(att);
    }
  }]);
]]);

Load data embedded into expression

Create a model with eventlog data embedded into the expression.

ToDataFrame(
  [
    ["a", "et1", DateTime(2019,1)],
    ["a", "et2", DateTime(2019,2)],
    ["a", "et3", DateTime(2019,3)],
    ["b", "et1", DateTime(2019,2)],
    ["c", "et3", DateTime(2019,2)],
    ["c", "et2", DateTime(2019,3)],
    ["c", "et2", DateTime(2019,4)],
    ["c", "et3", DateTime(2019,5)],
    ["c", "et4", DateTime(2019,6)]
  ],
  ["Case","Timestamp","EventType"]
)

Load data with several ODBC queries

This loading script loads events using several ODBC queries (each event type separately).

let eventTypes = ["Invoice Created", "Requested Delivery", "Inventory Date", "Invoice Payment", "Actual Delivery"];
let data = eventTypes.(
  ImportODBC(
    "Driver={ODBC Driver 17 for SQL Server};Server=SERVER\\MSSQLSERVER2017;DataBase=Process Mining Data;Uid=user1;Pwd=pw1;",
    "SELECT [Case] AS \"CaseId\", Event_Type AS \"EventType\", Start_Time AS TimeStamp, Cost, Variable_cost, booleanAttribute1 FROM [OtC Model events] Where [Event_Type]=\'" + _ +"\'"
  )
);
let combinedData = data[0];
For("i", 1, i < CountTop(data), i + 1,
  let("combinedData",
    combinedData.Append(data[i])
  )
);
combinedData;

ODBC Datasource

When getting model data from an ODBC datasource, data is loaded directly from the source when the model is loaded. As improved security, the data is kept only in memory and not stored permanently in QPR ProcessAnalyzer. Suitable ODBC driver needs to be installed in QPR ProcessAnalyzer server, and ODBC connection string and ODBC query are to be defined (examples below). In addition, AllowExternalDatasources setting needs to True to use the ODBC datasource.

Read data from CSV file

Data can be loaded from CSV files located in the file system using Microsoft Access Text Driver (*.txt, *.csv) driver. The following example loads files C:\ProcessMiningData\ModelCaseAttributes.csv and C:\ProcessMiningData\ModelEventData.csv.

  • Cases: Connection String: Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt
  • Cases: ODBC Query: SELECT * FROM [ModelCaseAttributes.csv]
  • Events: Connection String: Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\\ProcessMiningData\\;Extensions=asc,csv,tab,txt
  • Events: ODBC Query: SELECT * FROM [ModelEventData.csv]

When reading from CSV files, you may need to set the CSV file format for the ODBC driver using the Schema.ini file (more information: https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-2017). ODBC text driver may guess column data type and return null values in case when guessed column type not match. Guessing based top rows of data. ODBC text driver support only one data type per column mixed data types is not supported.

Read data from Excel file

In this example, data is loaded from a Excel file that is accessible from the file system. In this example, the loaded file is C:\ProcessMiningData\ModelData.xlsx and cases are in sheet MyCases and events in sheet MyEvents.

  • Cases: Connection String: Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx
  • Cases: ODBC Query: SELECT * FROM [MyCases$]
  • Events: Connection String: Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\\ProcessMiningData\\ModelData.xlsx
  • Events: ODBC Query: SELECT * FROM [MyEvents$]

Read data from SQL Server

In this example, data is loaded from an SQL Server table. SQL Server hostname is myServerName, instance name is myInstanceName and database name is MyDatabase. The Trusted_Connection is for Windows authentication.

  • Cases: Connection String: Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;
  • Cases: ODBC Query: SELECT CASENAME, COST, COUNTRY, PRODUCT FROM CASES_TABLE
  • Events: Connection String: Driver={ODBC Driver 17 for SQL Server};Server=myServerName\\myInstanceName;DataBase=MyDatabase;Trusted_Connection=Yes;
  • Events: ODBC Query: SELECT CASENAME, CREATED_DATE, CREATED_BY FROM EVENTS_TABLE

If SQL Server authentication is used, the connection string is Driver={ODBC Driver 17 for SQL Server};Server=myServerName\myInstanceName;DataBase=MyDatabase;Trusted_Connection=No;Uid=myUsername;PWD=myPassword;

More about SQL Server connection strings: https://www.connectionstrings.com/sql-server/

Troubleshooting for ODBC Loading

The following error message may be encountered: System.Data.Odbc.OdbcException (0x80131937): ERROR [IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified. Probable reason is that the ODBC driver is missing or driver name is not correct. To solve the issue:

  • Check ODBC drivers is installed by running C:\Windows\System32\odbcad32.exe.
  • Check the connection string.

The following error message may be encountered: ERROR [HY024] [Microsoft][ODBC Text Driver] '(unknown)' is not a valid path. Make sure that the path name is spelled correctly and that you are connected to the server on which the file resides.. Example error when specified directory is missing:

Error when CSV file is missing: System.Data.Odbc.OdbcException (0x80131937): ERROR [42S02] [Microsoft][ODBC Text Driver] The Microsoft Access database engine could not find the object 'CaseAttributes.csv'. Make sure the object exists and that you spell its name and the path name correctly. If 'CaseAttributes.csv' is not a local object, check your network connection or contact the server administrator.

Error codes for troubleshooting: ODBC Error Codes.

If the error message mention about Case or Event attributes named as F<column number> (e.g. F10), the data might be inconsistent, as there might be different amount of columns in different rows.