SQL Expressions: Difference between revisions
No edit summary |
|||
Line 180: | Line 180: | ||
</pre> | </pre> | ||
|} | |} | ||
== Process mining objects == | |||
The following variable names are supported in the beginning of root expression and in the AggregateFrom and GetValueFrom functions: | |||
* '''Cases''': Returns SqlDataFrame for cases. There is the following properties: | |||
** '''CaseId''': Case id. | |||
** All columns in the cases data (can be referred using Column("<column name>") | |||
* '''Events''': Returns SqlDataFrame for events with following properties: | |||
** '''CaseId''': Case id. | |||
** '''EventType''': Event type name. | |||
** '''Timestamp''': Event timestamp. | |||
** All columns in the events data (can be referred using Column("<column name>") | |||
* '''EventTypes''': Returns SqlDataFrame for event types. There are the following properties: | |||
** '''EventType''': Event type name. | |||
* '''Variations''': Returns SqlDataFrame for variations. There the following properties: | |||
** '''Variation''': Variation identifier, which is concatenated event type names separated by separator "#,#". | |||
* '''Flows''': Returns SqlDataFrame for flows. There are the following properties: | |||
** '''FromEventType''': Event type name of the flow start. | |||
** '''ToEventType''': Event type name of the flow end. | |||
* '''FlowOccurrences''': Returns SqlDataFrame for flow occurrences. There are the following properties: | |||
** '''CaseId''': Case id. | |||
** '''FromEventType''': Event type name of the flow start. | |||
** '''FromTimeStamp''': Time stamp of the flow start event. | |||
** '''From<event attribute name>''': Event attribute value of the flow start event. (''<event attribute name>'' is replaced by the actual attribute name.) | |||
** '''ToEventType''': Event type name of the flow end. | |||
** '''ToTimeStamp''': Time stamp of the flow end event. | |||
** '''To<event attribute name>''': Event attribute value of the flow end event. (''<event attribute name>'' is replaced by the actual attribute name.) | |||
* '''Model''': Returns SqlDataFrame containing one row representing the model. There are the following properties: | |||
** '''ModelId''': Model id. | |||
After the alias, all functions supported by the SqlDataFrame can be used. | |||
Examples: | |||
For cases (and also events), the case id can be referred using ''CaseId'': | |||
<pre> | |||
Cases.Where(CaseId == "Case_123") | |||
</pre> | |||
Assuming that there is an ''Order Id'' column that is mapped to the CaseId, also the original column name can be used: | |||
<pre> | |||
Cases.Where(Column("Order Id") == "Case_123") | |||
</pre> | |||
For events, the event type can be referred using ''EventType'': | |||
<pre> | |||
Events.Where(EventType == "Order created") | |||
</pre> | |||
Assuming that there is an ''Process step'' column that is mapped to the EventType, also the original column name can be used: | |||
<pre> | |||
Cases.Where(Column("Process step") == "Order created") | |||
</pre> |
Revision as of 16:37, 20 February 2022
SQL expressions are special expressions, which are converted into SQL and evaluated in an external system that supports SQL (e.g., Snowflake, SQL Server, Databricks, Redshift). Only a subset of expression language functionalities are supported by the SQL expressions (which are explained in this page). SQL expressions are used e.g. in the Where and WithExpressionColumn functions.
Operators
Following operators are supported by by the SQL expressions:
- Arithmetic operators: +, -, *, /, %
- Comparison operators: ==, <, <=, >, >=, !=.
- Logical operators: &&, or, !
- Data types: strings ("this is a string"), integers (123), decimal numbers (123.45), booleans (true, false), null value (null)
Mathematical functions
Function | Description |
---|---|
Ceiling |
Returns given value rounded to the nearest equal or larger integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. |
Floor |
Returns given value rounded to the nearest equal or smaller integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. |
Date functions
Function | Description¨ |
---|---|
DateDiff |
Calculates how many of the specified date part boundaries there are between the specified dates. Parameters:
|
Day |
Returns the days of the month (1-31) of given timestamp. Day(Column("DateColumn")) |
Hour | Returns the hours part (0-59) of given timestamp. |
Millisecond | Returns the milliseconds part (0-999) of given timestamp. |
Minute | Returns the minutes part (0-59) of given timestamp. |
Month | Returns the months part (1-12) of given timestamp. |
Second | Returns the seconds part (0-59) of given timestamp. |
Year | Returns the year of given timestamp. |
AggregateFrom function
AggregateFrom function aggregates value from an aggregation level that is has smaller grain size than the current level used in the analysis to the current level. Parameters:
- Aggregation level: Aggregation level to aggregate from including possible additional data frame expressions to prepare the aggregation level.
- Aggregation function: Aggregation function or object definition.
- Expression (optional, default = null): Value expression used to generate the expression evaluated in the external system prior to aggregation.
- Filter: Optional filter to apply prior to performing value aggregation. Filter is given as JSON selection configuration transformed into expression language dictionaries, arrays and scalar values.
Example for EventTypes:
AggregateFrom(Events, "Count") Returns the number of events having each event type.
Example for Cases:
AggregateFrom(Events, #{ "Function": "List", "Ordering": ["TimeStamp"], "Separator": "#,#" }, Column("EventType")) Returns variation/event type path string for all the cases. GetValueFrom(Variations, AggregateFrom(Cases, "Count")) Returns the number of cases having the same variation for every case. Cast(DateDiff("Seconds", AggregateFrom(Events.Where(Column("EventType") == "Sales Order"), "Min", Column("TimeStamp")), AggregateFrom(Events.Where(Column("EventType") == "Invoice"), "Max", Column("TimeStamp"))), "Float") Returns the duration in seconds between the first occurrence of "Sales Order"-event type and the last occurrence of "Invoice" event type for each case.
Example for Model:
AggregateFrom(Cases, "Count", null, #{"Items":[#{"Type":"IncludeCases","Items":[#{"Type":"CaseAttributeValue","Values":["Dallas"], "Attribute":"Region"}]}]}) Returns the total number of cases in the model having "Dallas" as the value of "Region" case attribute.
GetValueFrom function
GetValueFrom function retrieves the value from aggregation level that has bigger or the same grain size than the current level to current level. Parameters:
- Aggregation level: Aggregation level to aggregate from including possible additional data frame expressions to prepare the aggregation level.
- Expression: Expression to evaluate in given aggregation level to get the returned value.
- Filter: Optional filter to apply prior to performing expression evaluation. Filter is given as dictionary following the JSON filter syntax.
Examples as measure expression for events:
GetValueFrom(Cases, Column("Account Manager\")) Returns for each event the value of Account Manager case attribute. GetValueFrom(Variations, Column("Variation")) Returns for each event variation/event type path string of its case.
Examples as measure expression for events:
GetValueFrom(Variations, AggregateFrom(Cases, "Count")) Returns the number of cases having the same variation for every case. GetValueFrom(Cases, Column("Variation"), #{"Items":[#{"Type":"IncludeEventTypes","Items":[#{"Type":"EventType","Values":["Shipment","Invoice"]}]}]}) Returns cases with their variations where only "Shipment" and "Invoice" event types are taken into account.
Other functions
Function | Description |
---|---|
CaseWhen |
Goes through conditions and returns a value when the first condition is met, similar to an if-then-else. Once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the else-expression. Consists of any number of pairs of condition and value expressions followed by an optional else expression. The odd parameters are the conditions and the even parameters are the return values. If no conditions are true, it returns the value in the last parameter which is the "else" parameter. If the "else" parameter is not defined (i.e. there are even number of parameters), null value is used as default. CaseWhen(Column("a") == null, 1, Column("a") < 1.0, 2, 3) Returns 1 if the value of column "a" is null. Returns 2 if the value of column "a" is less than 1.0. Returns 3 otherwise. Returns given value rounded to the nearest equal or larger integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. |
Coalesce |
Returns the first non-null parameter. There can be any number of parameters. If all parameters are null, returns null. Coalesce(null, 3, 2) Returns 3. Coalesce(Column("column1"), "N/A") Returns column "column1" value, except replaces nulls with "N/A". |
Column |
Return the value of given column. Column("column1") Column("My Column 2") |
Concat |
Return the concatenated string value of given values. Concat("part 1", "part 2") Returns "part 1part 2" Concat(Column("column1"), " ", Column("column2")) Returns column1 and column2 value concatenated separated by space. |
Variable |
Returns value of given variable. Supports number, string and boolean values. Examples: let myRegion = "Dallas"; DatatableById(123).SqlDataFrame.Where(Column("Region") == Variable("myRegion")).Collect() Filters datatable by Region is Dallas. |
Process mining objects
The following variable names are supported in the beginning of root expression and in the AggregateFrom and GetValueFrom functions:
- Cases: Returns SqlDataFrame for cases. There is the following properties:
- CaseId: Case id.
- All columns in the cases data (can be referred using Column("<column name>")
- Events: Returns SqlDataFrame for events with following properties:
- CaseId: Case id.
- EventType: Event type name.
- Timestamp: Event timestamp.
- All columns in the events data (can be referred using Column("<column name>")
- EventTypes: Returns SqlDataFrame for event types. There are the following properties:
- EventType: Event type name.
- Variations: Returns SqlDataFrame for variations. There the following properties:
- Variation: Variation identifier, which is concatenated event type names separated by separator "#,#".
- Flows: Returns SqlDataFrame for flows. There are the following properties:
- FromEventType: Event type name of the flow start.
- ToEventType: Event type name of the flow end.
- FlowOccurrences: Returns SqlDataFrame for flow occurrences. There are the following properties:
- CaseId: Case id.
- FromEventType: Event type name of the flow start.
- FromTimeStamp: Time stamp of the flow start event.
- From<event attribute name>: Event attribute value of the flow start event. (<event attribute name> is replaced by the actual attribute name.)
- ToEventType: Event type name of the flow end.
- ToTimeStamp: Time stamp of the flow end event.
- To<event attribute name>: Event attribute value of the flow end event. (<event attribute name> is replaced by the actual attribute name.)
- Model: Returns SqlDataFrame containing one row representing the model. There are the following properties:
- ModelId: Model id.
After the alias, all functions supported by the SqlDataFrame can be used.
Examples:
For cases (and also events), the case id can be referred using CaseId:
Cases.Where(CaseId == "Case_123")
Assuming that there is an Order Id column that is mapped to the CaseId, also the original column name can be used:
Cases.Where(Column("Order Id") == "Case_123")
For events, the event type can be referred using EventType:
Events.Where(EventType == "Order created")
Assuming that there is an Process step column that is mapped to the EventType, also the original column name can be used:
Cases.Where(Column("Process step") == "Order created")