SQL Expressions: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
Line 63: Line 63:
== AggregateFrom function ==
== AggregateFrom function ==


AggregateFrom function aggregates value from an aggregation level that is has smaller grain size than the current level used in the analysis to the current level. Parameters:
Aggregates a value from objects that there may be multiple for a single source object, for example when going from cases to events. Parameters:
# '''Aggregation level''': Aggregation level to aggregate from including possible additional data frame expressions to prepare the aggregation level.
# '''Aggregation level''': Aggregation level to aggregate from. This includes possible additional data frame expressions to prepare the aggregation level.
# '''Aggregation function''': Aggregation function or object definition.
# '''Aggregation function''': Aggregation function or object definition.
# '''Expression''' (optional, default = null): Value expression used to generate the expression evaluated in the external system prior to aggregation.
# '''Expression''': Value expression used to generate the expression evaluated in the external system prior to aggregation. Default value is ''null''.
# '''Filter''': Optional filter to apply prior to performing value aggregation. Filter is given as JSON selection configuration transformed into expression language dictionaries, arrays and scalar values.
# '''Filter''': Optional filter to apply prior to performing value aggregation. Filter is given as JSON selection configuration transformed into expression language dictionaries, arrays and scalar values.


Line 95: Line 95:
== GetValueFrom function ==
== GetValueFrom function ==


GetValueFrom function retrieves the value from aggregation level that has bigger or the same grain size than the current level to current level. Parameters:
Retrieves a value from an object that there may be only one for a single source object, e.g., when going from events to cases. Parameters:
# '''Aggregation level''': Aggregation level to aggregate from including possible additional data frame expressions to prepare the aggregation level.
# '''Aggregation level''': Aggregation level to aggregate from. This includes possible additional data frame expressions to prepare the aggregation level.
# '''Expression''': Expression to evaluate in given aggregation level to get the returned value.
# '''Expression''': Expression to evaluate in given aggregation level to get the returned value.
# '''Filter''': Optional filter to apply prior to performing expression evaluation. Filter is given as dictionary following the [[Filtering_in_QPR_ProcessAnalyzer_Queries|JSON filter syntax]].
# '''Filter''': Optional filter to apply prior to performing expression evaluation. Filter is given as dictionary following the [[Filtering_in_QPR_ProcessAnalyzer_Queries|JSON filter syntax]].
Line 125: Line 125:
||CaseWhen
||CaseWhen
||
||
Goes through conditions and returns a value when the first condition is met, similar to an if-then-else. Once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the else-expression. Consists of any number of pairs of condition and value expressions followed by an optional else expression.
Goes through conditions and returns a value when the first condition is true, similar to an if-then-else structure. Once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the else expression. If the else expression is not defined (i.e. there are even number of parameters), null value is returned.


The odd parameters are the conditions and the even parameters are the return values. If no conditions are true, it returns the value in the last parameter which is the "else" parameter. If the "else" parameter is not defined (i.e. there are even number of parameters), null value is used as default.
Consists of any number of pairs of condition and value expressions followed by an optional else expression. The odd parameters are the conditions and the even parameters are the return values.


<pre>
<pre>
Line 183: Line 183:
== Process mining objects ==
== Process mining objects ==


The following variable names are supported in the beginning of root expression and in the AggregateFrom and GetValueFrom functions:
The following variable names are supported in the beginning of a root expression and in the AggregateFrom and GetValueFrom functions:
* '''Cases''': Returns SqlDataFrame for cases. There is the following properties:
* '''Cases''': Returns SqlDataFrame for cases. There are following properties:
** '''CaseId''': Case id.
**'''CaseId''': Case id.
** All columns in the cases data (can be referred using Column("<column name>")
** All columns in the cases data (can be referred using Column("<column name>")
* '''Events''': Returns SqlDataFrame for events with following properties:
* '''Events''': Returns SqlDataFrame for events with following properties:

Revision as of 11:48, 22 February 2022

SQL expressions are special expressions, which are converted into SQL and run in an external system that supports SQL (e.g., SQL Server, Snowflake, Databricks, Redshift). Only a subset of QPR ProcessAnalyzer expression language functionalities are supported by the SQL expressions (which are explained in this page). SQL expressions are used e.g. in the Where and WithExpressionColumn functions in SQLDataFrames.

Operators

Following operators are supported by by the SQL expressions:

  • Arithmetic operators: +, -, *, /, %
  • Comparison operators: ==, <, <=, >, >=, !=.
  • Logical operators: &&, or, !
  • Data types: strings ("this is a string"), integers (123), decimal numbers (123.45), booleans (true, false), null value (null)

Mathematical functions

Function Description
Ceiling

Returns given value rounded to the nearest equal or larger integer. The data type should be one of the numeric data types. If the value is null, then the result is also null.

Floor

Returns given value rounded to the nearest equal or smaller integer. The data type should be one of the numeric data types. If the value is null, then the result is also null.

Date functions

Function Description¨
DateDiff

Calculates how many of the specified date part boundaries there are between the specified dates. Parameters:

  1. date period: Date period in which the difference is calculated between the dates. Supported date periods: second, minute, hour, day, week, quarter, month, year.
  2. start date: Starting timestamp.
  3. end date: Ending timestamp.
Day

Returns the days of the month (1-31) of given timestamp.

Day(Column("DateColumn"))
Hour Returns the hours part (0-59) of given timestamp.
Millisecond Returns the milliseconds part (0-999) of given timestamp.
Minute Returns the minutes part (0-59) of given timestamp.
Month Returns the months part (1-12) of given timestamp.
Second Returns the seconds part (0-59) of given timestamp.
Year Returns the year of given timestamp.

AggregateFrom function

Aggregates a value from objects that there may be multiple for a single source object, for example when going from cases to events. Parameters:

  1. Aggregation level: Aggregation level to aggregate from. This includes possible additional data frame expressions to prepare the aggregation level.
  2. Aggregation function: Aggregation function or object definition.
  3. Expression: Value expression used to generate the expression evaluated in the external system prior to aggregation. Default value is null.
  4. Filter: Optional filter to apply prior to performing value aggregation. Filter is given as JSON selection configuration transformed into expression language dictionaries, arrays and scalar values.

Example for EventTypes:

AggregateFrom(Events, "Count")
Returns the number of events having each event type.

Example for Cases:

AggregateFrom(Events, #{ "Function": "List", "Ordering": ["TimeStamp"], "Separator": "#,#" }, Column("EventType"))
Returns variation/event type path string for all the cases.

GetValueFrom(Variations, AggregateFrom(Cases, "Count"))
Returns the number of cases having the same variation for every case.

Cast(DateDiff("Seconds", AggregateFrom(Events.Where(Column("EventType") == "Sales Order"), "Min", Column("TimeStamp")), AggregateFrom(Events.Where(Column("EventType") == "Invoice"), "Max", Column("TimeStamp"))), "Float")
Returns the duration in seconds between the first occurrence of "Sales Order"-event type and the last occurrence of "Invoice" event type for each case.

Example for Model:

AggregateFrom(Cases, "Count", null, #{"Items":[#{"Type":"IncludeCases","Items":[#{"Type":"CaseAttributeValue","Values":["Dallas"], "Attribute":"Region"}]}]})
Returns the total number of cases in the model having "Dallas" as the value of "Region" case attribute.

GetValueFrom function

Retrieves a value from an object that there may be only one for a single source object, e.g., when going from events to cases. Parameters:

  1. Aggregation level: Aggregation level to aggregate from. This includes possible additional data frame expressions to prepare the aggregation level.
  2. Expression: Expression to evaluate in given aggregation level to get the returned value.
  3. Filter: Optional filter to apply prior to performing expression evaluation. Filter is given as dictionary following the JSON filter syntax.

Examples as measure expression for events:

GetValueFrom(Cases, Column("Account Manager\"))
Returns for each event the value of Account Manager case attribute.

GetValueFrom(Variations, Column("Variation"))
Returns for each event variation/event type path string of its case.

Examples as measure expression for events:

GetValueFrom(Variations, AggregateFrom(Cases, "Count"))
Returns the number of cases having the same variation for every case.

GetValueFrom(Cases, Column("Variation"), #{"Items":[#{"Type":"IncludeEventTypes","Items":[#{"Type":"EventType","Values":["Shipment","Invoice"]}]}]})
Returns cases with their variations where only "Shipment" and "Invoice" event types are taken into account.

Other functions

Function Description
CaseWhen

Goes through conditions and returns a value when the first condition is true, similar to an if-then-else structure. Once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the else expression. If the else expression is not defined (i.e. there are even number of parameters), null value is returned.

Consists of any number of pairs of condition and value expressions followed by an optional else expression. The odd parameters are the conditions and the even parameters are the return values.

CaseWhen(Column("a") == null, 1, Column("a") < 1.0, 2, 3)
Returns 1 if the value of column "a" is null.
Returns 2 if the value of column "a" is less than 1.0.
Returns 3 otherwise.

Returns given value rounded to the nearest equal or larger integer. The data type should be one of the numeric data types. If the value is null, then the result is also null.

Coalesce

Returns the first non-null parameter. There can be any number of parameters. If all parameters are null, returns null.

Coalesce(null, 3, 2)
Returns 3.

Coalesce(Column("column1"), "N/A")
Returns column "column1" value, except replaces nulls with "N/A".
Column

Return the value of given column.

Column("column1")

Column("My Column 2")
Concat

Return the concatenated string value of given values.

Concat("part 1", "part 2")
Returns "part 1part 2"

Concat(Column("column1"), " ", Column("column2"))
Returns column1 and column2 value concatenated separated by space.
Variable

Returns value of given variable. Supports number, string and boolean values.

Examples:

let myRegion = "Dallas";
DatatableById(123).SqlDataFrame.Where(Column("Region") == Variable("myRegion")).Collect()
Filters datatable by Region is Dallas.

Process mining objects

The following variable names are supported in the beginning of a root expression and in the AggregateFrom and GetValueFrom functions:

  • Cases: Returns SqlDataFrame for cases. There are following properties:
    • CaseId: Case id.
    • All columns in the cases data (can be referred using Column("<column name>")
  • Events: Returns SqlDataFrame for events with following properties:
    • CaseId: Case id.
    • EventType: Event type name.
    • Timestamp: Event timestamp.
    • All columns in the events data (can be referred using Column("<column name>")
  • EventTypes: Returns SqlDataFrame for event types. There are the following properties:
    • EventType: Event type name.
  • Variations: Returns SqlDataFrame for variations. There the following properties:
    • Variation: Variation identifier, which is concatenated event type names separated by separator "#,#".
  • Flows: Returns SqlDataFrame for flows. There are the following properties:
    • FromEventType: Event type name of the flow start.
    • ToEventType: Event type name of the flow end.
  • FlowOccurrences: Returns SqlDataFrame for flow occurrences. There are the following properties:
    • CaseId: Case id.
    • FromEventType: Event type name of the flow start.
    • FromTimeStamp: Time stamp of the flow start event.
    • From<event attribute name>: Event attribute value of the flow start event. (<event attribute name> is replaced by the actual attribute name.)
    • ToEventType: Event type name of the flow end.
    • ToTimeStamp: Time stamp of the flow end event.
    • To<event attribute name>: Event attribute value of the flow end event. (<event attribute name> is replaced by the actual attribute name.)
  • Model: Returns SqlDataFrame containing one row representing the model. There are the following properties:
    • ModelId: Model id.

After these variables, all functions supported by the SqlDataFrame can be used.

Examples: For cases (and also events), the case id can be referred using CaseId:

Cases.Where(CaseId == "Case_123")

Assuming that there is an Order Id column that is mapped to the CaseId, also the original column name can be used:

Cases.Where(Column("Order Id") == "Case_123")

For events, the event type can be referred using EventType:

Events.Where(EventType == "Order created")

Assuming that there is an Process step column that is mapped to the EventType, also the original column name can be used:

Cases.Where(Column("Process step") == "Order created")