Event Ordering for Identical Timestamps: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
This article contains information how to define the occurence order of events correctly in situations where there are identical event timestamps within same cases. In QPR ProcessAnalyzer, the order of events in those situations are based on the order rows in the eventlog data, such as in the datatable.
This article contains information how to define the order of occurrence of events in situations where there are events with identical timestamps within a case.


== Introduction ==
== Introduction ==
The order in which events occur within cases is important in process mining, because the order is visualized for example in flowcharts. Also the order affects to which variation the case belongs to. Mostly the order of events comes from event timestamps which are always available in the eventlog data. There is still an exception, when there are events that have the same timestamp within a same case, because the eventlog data cannot reveal the order in which the events occurred. In many eventlogs, events with exactly identical timestamps is quite rare, but the situation may occur quite often in following circumstances:
The order in which events occur within cases is important in process mining, because the order is visualized for example in flowcharts. Also the order affects to which variation the case belongs to. Mostly the order of events comes from event timestamps which belong in the eventlog data. Still, there is an exception, when there are events that have the same timestamp within the same case, because the eventlog data is not defining the order in which the events occurred. In many eventlogs, events with exactly identical timestamps are quite rare, but the situation may occur quite often in following circumstances:
* The origin system creates several events during a single processing and records all the events using the same timestamp.
* The origin system creates several events during a single processing and records all the events using the same timestamp.
* Accurate timestamps are not available (e.g. in millisecond level), but for example only the date of occurrence is available. In this situation, all events that occurred within the same day, have the same recorded timestamp.
* Accurate timestamps are not available (e.g. in millisecond level), but for example only the date of occurrence is available. In this situation, all events that occurred within the same day, have the same recorded timestamp.
Line 8: Line 8:
If the order of events is known, it's possible to adjust the order in QPR ProcessAnalyzer as explained below.
If the order of events is known, it's possible to adjust the order in QPR ProcessAnalyzer as explained below.


== Event Order in QPR ProcessAnalyzer ==
== Event order for Snowflake models ==
In QPR ProcessAnalyzer, ordering of events having identical timestamps within the same case is based on the order of rows in the eventlog data. In practice, if the model uses a datatable for events, the order of rows in the datatable determines the order of events. Thus, if the ordering is a concern, make sure to order the data as desired in the datatable, when new data is imported to the datatable.
For Snowflake models, ordering of events having identical timestamps within the same case is based on the alphabetical order of the event type names. Thus, if there are two events with the same timestamp, the one which is alphabetically first by name is considered occurring first. Note that if there are events with same timestamp and same name within the same case, then the order of those events is undetermined, and may change between analyses.
 
== Event order for in-memory models ==
For in-memory models, ordering of events having identical timestamps within the same case is based on the order of rows in the eventlog data. In practice, if the model uses a datatable for events, the order of rows in the datatable determines the order of events. Thus, if the ordering is a concern, make sure to order the data as desired in the datatable, when new data is imported to the datatable.


This same logic also applies for models, that get their data from an ODBC datasource or from a loading script.
This same logic also applies for models, that get their data from an ODBC datasource or from a loading script.


== Eventlog Sorting Example: SQL Script ==
== Reordering events by adjusting timestamps ==
One method to define the order of events is to slightly adjust their timestamps. For example, if there are two events with the same timestamp, the timestamp for the event that should occur last can be adjusted by offsetting it by one millisecond. The difference is effective in determining the order, but in practice the difference is usually negligible to affect the analyses.
 
== Eventlog sorting example: SQL script ==
This example script sorts rows in an events datatable based on the defined rules: event order starting from the first is ''Delivery planned'', ''Invoice sent'', ''Payment received''. Other event types are equal and thus they are not reordered. The script reads the events datatable to a temporary table and writes the data back to the datatable after sorting it. Each sorted event type name are mapped to an integer value which is used as the basis for sorting. The data is sorted primarily based on the ''case id'' and secondarily based on the ''event timestamp'', and only if they are equal, the defined event ordering takes effect as the third sorting criteria.
This example script sorts rows in an events datatable based on the defined rules: event order starting from the first is ''Delivery planned'', ''Invoice sent'', ''Payment received''. Other event types are equal and thus they are not reordered. The script reads the events datatable to a temporary table and writes the data back to the datatable after sorting it. Each sorted event type name are mapped to an integer value which is used as the basis for sorting. The data is sorted primarily based on the ''case id'' and secondarily based on the ''event timestamp'', and only if they are equal, the defined event ordering takes effect as the third sorting criteria.


Line 41: Line 47:
</pre>
</pre>


== Eventlog Sorting Example: Expression Language ==
== Eventlog sorting example: Expression script ==
This example is implemented using the expression language, and it achieves the same result as the previous SQL Script example. Events which order is defined are listed in the first variable. Note that only those event types that have the defined order, needs to be listed.
This example is implemented using the expression language, and it achieves the same result as the previous SQL Script example. Events which order is defined are listed in the first variable. Note that only those event types that have the defined order, needs to be listed.



Revision as of 19:28, 6 September 2022

This article contains information how to define the order of occurrence of events in situations where there are events with identical timestamps within a case.

Introduction

The order in which events occur within cases is important in process mining, because the order is visualized for example in flowcharts. Also the order affects to which variation the case belongs to. Mostly the order of events comes from event timestamps which belong in the eventlog data. Still, there is an exception, when there are events that have the same timestamp within the same case, because the eventlog data is not defining the order in which the events occurred. In many eventlogs, events with exactly identical timestamps are quite rare, but the situation may occur quite often in following circumstances:

  • The origin system creates several events during a single processing and records all the events using the same timestamp.
  • Accurate timestamps are not available (e.g. in millisecond level), but for example only the date of occurrence is available. In this situation, all events that occurred within the same day, have the same recorded timestamp.

If the order of events is known, it's possible to adjust the order in QPR ProcessAnalyzer as explained below.

Event order for Snowflake models

For Snowflake models, ordering of events having identical timestamps within the same case is based on the alphabetical order of the event type names. Thus, if there are two events with the same timestamp, the one which is alphabetically first by name is considered occurring first. Note that if there are events with same timestamp and same name within the same case, then the order of those events is undetermined, and may change between analyses.

Event order for in-memory models

For in-memory models, ordering of events having identical timestamps within the same case is based on the order of rows in the eventlog data. In practice, if the model uses a datatable for events, the order of rows in the datatable determines the order of events. Thus, if the ordering is a concern, make sure to order the data as desired in the datatable, when new data is imported to the datatable.

This same logic also applies for models, that get their data from an ODBC datasource or from a loading script.

Reordering events by adjusting timestamps

One method to define the order of events is to slightly adjust their timestamps. For example, if there are two events with the same timestamp, the timestamp for the event that should occur last can be adjusted by offsetting it by one millisecond. The difference is effective in determining the order, but in practice the difference is usually negligible to affect the analyses.

Eventlog sorting example: SQL script

This example script sorts rows in an events datatable based on the defined rules: event order starting from the first is Delivery planned, Invoice sent, Payment received. Other event types are equal and thus they are not reordered. The script reads the events datatable to a temporary table and writes the data back to the datatable after sorting it. Each sorted event type name are mapped to an integer value which is used as the basis for sorting. The data is sorted primarily based on the case id and secondarily based on the event timestamp, and only if they are equal, the defined event ordering takes effect as the third sorting criteria.

(SELECT 'AnalysisType', '18') UNION ALL 
(SELECT 'ProjectName', 'MyProject') UNION ALL
(SELECT 'DataTableName', 'MyModel events') UNION ALL 
(SELECT 'TargetTable', '#EventsTemp')
(SELECT 'MaximumCount', '0') UNION ALL
--#GetAnalysis

(SELECT 'ProjectName', 'MyProject') UNION ALL
(SELECT 'DataTableName', 'MyModel events') UNION ALL
(SELECT 'Append', '0');

SELECT *
FROM #EventsTemp
ORDER BY
[CaseId],
[EventTimestamp],
(CASE [EventType]   
WHEN 'Delivery planned' THEN 1
WHEN 'Invoice sent' THEN 2
WHEN 'Payment received' THEN 3
ELSE 4 END)
--#ImportDataTable

Eventlog sorting example: Expression script

This example is implemented using the expression language, and it achieves the same result as the previous SQL Script example. Events which order is defined are listed in the first variable. Note that only those event types that have the defined order, needs to be listed.

let eventOrdering = ["Delivery planned", "Invoice sent", "Payment received"];
let projectName = "MyProject";
let datatableName = "MyModel events";
let project = (Projects.Where(Name==projectName))[0];
let datatable = (project.Datatables.Where(Name==datatableName))[0];
let eventsData = DatatableById(datatable.Id).DataFrame;
eventsData = eventsData.OrderBy(
	Column("Case Name"),
	Column("Start Time"),
	If(
		Column("Event Type").In(eventOrdering),
		IndexOfSubArray([Column("Event Type")], eventOrdering)[0],
		null
	)
);
eventsData.Persist(datatable.Name, ["ProjectId": project.Id, "Append": false]);