SQL Expressions: Difference between revisions
(216 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
SQL expressions are special expressions, which are | SQL expressions are special expressions, which are run in Snowflake by converting them into SQL. Note that SQL expressions are different than the in-memory expressions. In addition to the [[Big_Data_Chart|Big data chart]], SQL expressions are used in the Where and WithColumn functions in [[SqlDataFrame in Expression Language|SQLDataFrames]]. | ||
It's possible to run in-memory expressions that are embedded into SQL expressions using [[QPR_ProcessAnalyzer_Expressions#In-memory_expression_blocks_in_SQL_expressions|#expr{}]] syntax. For examples, see the [[SQL_Expressions#Date_functions|DurationBetweenDates]] function. | |||
== Operators == | == Operators == | ||
Following operators are supported by | Following operators are supported by SQL expressions: | ||
* Arithmetic operators | * Arithmetic operators <pre>+ - * / %</pre> | ||
* Comparison operators | * Comparison operators <pre>== < <= > >= !=</pre> | ||
* Logical operators | * Logical operators <pre>&& || !</pre> | ||
Following data types are supported by SQL expressions: | |||
* String <pre>"this is a string"</pre> | |||
* Integer <pre>123</pre> | |||
* Decimal number <pre>123.45</pre> | |||
* Boolean <pre>true false</pre> | |||
* Null value <pre>null</pre> | |||
* Arrays <pre>[1, 2, 3]</pre> | |||
* Objects <pre>#{ "key1": "value1", "key2": 3.7, "key3": false }</pre> | |||
To get an item from an array, use following syntax:<pre>Column("arraydata")[2]</pre> | |||
To get a value from an object, use following syntax:<pre>Column("objectdata")["key1"]</pre> | |||
Expressions resulting in boolean value need to be enclosed to the CaseWhen function. For example, expression ''Column("MyColumn") < 5'' doesn't work, but following does: | |||
<pre> | |||
CaseWhen(Column("MyColumn") < 5, true, false) | |||
</pre> | |||
== Mathematical functions == | == Mathematical functions == | ||
Line 12: | Line 30: | ||
!'''Function''' | !'''Function''' | ||
! '''Description''' | ! '''Description''' | ||
|- | |||
||Abs | |||
||Returns absolute value of given number. Example: | |||
<pre> | |||
Abs(-5) | |||
Returns: 5 | |||
</pre> | |||
|- | |- | ||
||Ceiling | ||Ceiling | ||
Line 20: | Line 45: | ||
|| | || | ||
Returns given value rounded to the nearest equal or smaller integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. | Returns given value rounded to the nearest equal or smaller integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. | ||
|- | |||
||Log | |||
||Returns given logarithm of given number. The first parameter is the base number and the second parameter is number to take the logarithm from. Example: | |||
<pre> | |||
Log(10, 1000) | |||
Returns: 3 | |||
Log(2, 16) | |||
Returns: 4 | |||
</pre> | |||
|- | |||
||Mod | |||
||Returns remainder of division operation of given numbers. The first parameter is the dividend and the second parameter is the divisor. Example: | |||
<pre> | |||
Mod(10, 4) | |||
Returns: 2 | |||
</pre> | |||
|- | |||
||Power | |||
||Returns given power of given number. First parameter is the base number and second is the exponent: Example: | |||
<pre> | |||
Power(2, 4) | |||
Returns: 16 | |||
</pre> | |||
|- | |||
||Rand | |||
||Returns random number between 0 and 1, where 0 is included and 1 is excluded. It's possible that when getting random for multiple columns, the same row may have the same random number. Example: | |||
<pre> | |||
Rand() | |||
Returns: 0.78452600 (for example) | |||
</pre> | |||
|- | |||
||Round | |||
|| | |||
Rounds given number to given precision. Parameters: | |||
* '''number''': Number to round. | |||
* '''precision''' (optional): Number of digits to round the given number to. When positive value, the given number is rounded to the corresponding number of decimals. When zero or omitted, the given number is rounded to the nearest integer. When negative value, the given number is rounded to the corresponding number of digits left of the decimal point (e.g. tens, hundreds, etc.). | |||
Examples: | |||
<pre> | |||
Round(123.456) | |||
Returns 123 | |||
Round(123.456, 2) | |||
Returns 123.46 | |||
Round(123.456, -1) | |||
Returns 120 | |||
Round(123.456, 0) | |||
Returns 123 | |||
</pre> | |||
|- | |||
||Sign | |||
||Return sign of given number, i.e., returns 1 (when number is a positive number) or 0 (when number is a negative number or zero). Example: | |||
<pre> | |||
Sign(-2) | |||
Returns: -1 | |||
</pre> | |||
|- | |||
||Sqrt | |||
||Return square root of given number. Example: | |||
<pre> | |||
Sqrt(9) | |||
Returns: 3 | |||
</pre> | |||
|} | |} | ||
Line 26: | Line 117: | ||
{| class="wikitable" | {| class="wikitable" | ||
!'''Function''' | !'''Function''' | ||
! '''Description''' | ! '''Description''' | ||
|- | |||
||Year | |||
||Returns the year of given timestamp. | |||
|- | |- | ||
|| | ||Quarter | ||
|| | ||Returns the quarter (1-4) of given timestamp. | ||
|- | |||
||Month | |||
||Returns the months part (1-12) of given timestamp. | |||
|- | |- | ||
||Day | ||Day | ||
Line 44: | Line 137: | ||
||Hour | ||Hour | ||
||Returns the hours part (0-59) of given timestamp. | ||Returns the hours part (0-59) of given timestamp. | ||
|- | |||
||Minute | |||
||Returns the minutes part (0-59) of given timestamp. | |||
|- | |||
||Second | |||
||Returns the seconds part (0-59) of given timestamp. | |||
|- | |- | ||
||Millisecond | ||Millisecond | ||
||Returns the milliseconds part (0-999) of given timestamp. | ||Returns the milliseconds part (0-999) of given timestamp. | ||
|- | |- | ||
|| | ||DayOfWeek | ||
||Returns the | ||Returns the day of week for given timestamp as number. Monday is 1, Tuesday is 2, ... and Sunday is 7. Note that Snowflake's WEEK_OF_YEAR_POLICY and WEEK_START session parameters affect the behavior. | ||
<pre> | |||
DayOfWeek("2023-07-10 00:00:00.000") | |||
Returns: 1 | |||
</pre> | |||
|- | |- | ||
|| | ||DayOfYear | ||
||Returns the | ||Returns the day of year for given timestamp. The 1st of January is 1, the 2nd of January is 2 and so on. The value can be between 1 and 366. | ||
<pre> | |||
DayOfYear("2023-01-01 00:00:00.000") | |||
Returns: 1 | |||
DayOfYear("2023-12-31 00:00:00.000") | |||
Returns: 365 | |||
</pre> | |||
|- | |- | ||
|| | ||WeekOfYear | ||
||Returns the | ||Returns the ISO week number for given timestamp. The value can be between 1 and 53. The 1st of January is in the week 1 if it's Monday, Tuesday, Wednesday or Thursday. If the 1st of January is Friday, Saturday or Sunday, week 1 is the next week. | ||
<pre> | |||
WeekOfYear("2023-01-01 00:00:00.000") | |||
Returns: 52 | |||
WeekOfYear("2023-01-02 00:00:00.000") | |||
Returns: 1 | |||
</pre> | |||
|- | |- | ||
|| | ||DateAdd | ||
|| | || | ||
Adds a duration to a date, i.e., moves the date back or forth in time. The duration is specified as integer and the unit can be chosen from variety of options. The duration can also be a negative number for moving to an earlier time. | |||
Parameters: | |||
# '''time unit''': Unit of time that the duration number represents. Supported values are '''year''', '''quarter''', '''month''', '''week''', '''day''', '''hour''', '''minute''', '''second''' and '''millisecond'''. | |||
# '''duration''': Duration as number to add to the date. This needs to be integer, so decimal numbers cannot be used. | |||
# '''date''': Date to move. | |||
<pre> | |||
DateAdd("month", 1, "01/01/2022") | |||
Returns 1st of February 2022 at midnight. | |||
DateAdd("year", -2, "01/01/2022") | |||
Returns 1st of January 2020 at midnight. | |||
DateAdd("hour", 12, "01/01/2022") | |||
Returns 1st of January 2022 at 12 o'clock. | |||
</pre> | |||
|- | |- | ||
|| | ||DateDiff | ||
|| | || | ||
Calculates how many of the specified date part boundaries there are between the specified dates. | |||
Parameters: | |||
# '''time unit''': Time unit in which the difference is calculated between the dates. Supported values are '''year''', '''quarter''', '''month''', '''week''', '''day''', '''hour''', '''minute''', '''second''' and '''millisecond''' | |||
# '''start date''': Starting timestamp. | |||
# '''end date''': Ending timestamp. | |||
|- | |||
||DurationBetweenDates | |||
||Calculates duration between the specified dates and returns the duration as float. This function gives more precise result than the DateDiff function that complies with the traditional logic in the SQL language. Parameters: | |||
# '''time unit''': Time unit in which the duration is returned. Supported values are '''year''', '''quarter''', '''month''', '''week''', '''day''', '''hour''', '''minute''', '''second''' and '''millisecond'''. | |||
# '''start date''': Starting timestamp. | |||
# '''end date''': Ending timestamp. | |||
# '''business calendar''': When a business calendar is provided as this parameter, the calendar is used to calculate the duration. If ''null'' is provided, no business calendar is used in the calculation. Business calendars are not supported by SQL Server calculation. More information how to use [[Business_Calendar#Business_Calendars_in_SQL_Expressions|business calendars in SQL expressions]]. | |||
Example: Calculated duration between two dates: | |||
<pre> | |||
DurationBetweenDates("hour", Column("date1"), Column("date2")) | |||
</pre> | |||
Example: Calculated duration between two dates using model's business calendar: | |||
<pre> | <pre> | ||
DurationBetweenDates("hour", Column("date1"), Column("date2"), #expr{_model.DefaultCalendar()}) | |||
</pre> | </pre> | ||
|- | |- | ||
|| | ||TruncateDate | ||
|| | || | ||
Truncates given date to given time unit. The truncation gives the timestamp of the beginning of the period defined by the time unit (logic thus resembles the ''floor'' function for numbers). | |||
Parameters: | |||
# '''date''': Date to be truncated. | |||
# '''time unit''': Time unit to which the date is truncated. Supported values are '''year''', '''halfyear''', '''quarter''', '''month''', '''week''', '''day''', '''hour''', '''minute''', '''second''', and '''millisecond'''. For the ''week'' unit, SQL Server or Snowflake settings (depending where the processing is done) determine the first day of the week (usually Sunday or Monday). | |||
Examples: | |||
<pre> | <pre> | ||
TruncateDate(Column("MyDateColumn"), "year") | |||
Returns | Returns timestamp representing beginning of a year, e.g. 2022-01-01 00:00:00. | ||
TruncateDate(Column("MyDateColumn"), "day") | |||
Returns timestamp representing beginning of a day, e.g. 2022-04-16 00:00:00. | |||
</pre> | |||
|- | |||
||ToTimestamp | |||
||Converts string into corresponding date type of value. If the conversion fails, an exception is thrown. Parameters: | |||
# '''expression''': Expression (evaluating to a string value) converted into date. | |||
# '''format''': Specifies the format of the string (more information about formats: https://docs.snowflake.com/en/sql-reference/functions-conversion#date-and-time-formats-in-conversion-functions) | |||
Examples: | |||
<pre> | |||
ToTimeStamp("2023-09-15 16:31:01", "YYYY-MM-DD HH24:MI:SS") | |||
ToTimeStamp("01.12.2023 09:14:58.849 PM", "DD.MM.YYYY HH12:MI:SS.FF3 AM") | |||
</pre> | </pre> | ||
|- | |- | ||
|| | ||TryToTimestamp | ||
|| | || | ||
Tries to converts string into corresponding date type of value. If the conversion fails, null is returned. Parameters: | |||
# '''expression''': Expression (evaluating to a string value) converted into date. | |||
# '''format''': Specifies the format of the string (more information about formats: https://docs.snowflake.com/en/sql-reference/functions-conversion#date-and-time-formats-in-conversion-functions) | |||
Example: Successful date conversion: | |||
<pre> | |||
TryToTimeStamp("2023-09-15 16:31:01", "YYYY-MM-DD HH24:MI:SS") | |||
</pre> | |||
Example: Returns null because formatting doesn't match with the data: | |||
<pre> | |||
TryToTimeStamp("2023-09-15", "DD.MM.YYYY") | |||
</pre> | |||
|- | |||
||UnixMilliseconds | |||
||Returns the time as Unix compatible milliseconds (i.e., milliseconds from the first of January 1970 midnight UTC). | |||
<pre> | |||
UnixMilliseconds("2024-06-19 21:10:00.123") | |||
Returns: 1718831400123 | |||
</pre> | |||
|} | |||
== String functions == | |||
{| class="wikitable" | |||
!'''Function''' | |||
! '''Description''' | |||
|- | |||
||Char | |||
||Converts Unicode code into corresponding character. If an invalid code point is specified, an error is returned. | |||
Examples: | |||
<pre> | |||
Char(65) | |||
Returns: A | |||
Concat(Char(52), Char(53)) | |||
Returns: 45 | |||
</pre> | |||
|- | |||
||CharIndex | |||
||Returns the starting position of the first occurrence of the first argument in the second argument. The first character has the starting position 1. Returns 0 if the search doesn't find any results. | |||
Parameters: | |||
* '''Target expression''': String where to search from. | |||
* '''Searched expression''': String that is searched from the other string. | |||
* '''Start index''' (optional): Number for the position from where to start the search. Value '''1''' is the start position. | |||
Examples: | |||
<pre> | <pre> | ||
CharIndex("c", "abcdefg") | |||
Returns: 3 | |||
CharIndex("ef", "abcdefg") | |||
Returns: 5 | |||
CharIndex("fe", "abcdefg") | |||
Returns: 0 | |||
CharIndex("c", "abcdefgc", 4) | |||
Returns: 8 | |||
</pre> | </pre> | ||
|- | |- | ||
Line 112: | Line 323: | ||
</pre> | </pre> | ||
|- | |- | ||
|| | ||Length | ||
|| | |||
Returns length of a given string, i.e., the number of characters in a string. | |||
Examples: | |||
<pre> | |||
Length("test") | |||
Returns: 4 | |||
Length("long text ending to space ") | |||
Returns: 26 | |||
Length(Column("MyData")) | |||
</pre> | |||
|- | |||
||Like | |||
|| | |||
Determines whether given string matches given pattern, and returns either ''true'' (for match) or ''false'' (for no match). Following special characters can be used in the pattern: | |||
* '''% (percentage)''': for matching any sequence of characters | |||
* '''_ (underscore)''': for matching any single character | |||
Parameters: | |||
* String to find matches from | |||
* Search pattern | |||
* Escape character (optional) | |||
When the percentage and underscore characters need to be used as normal characters in the search pattern (not as special characters), escaping for them is needed. By default, there is no escape character available, so it needs to be defined using the 3rd parameter. Any character can be used as the escape character, but it might be a good practice to use the backslash (\). See below an example of using the escape character. | |||
The comparison is case-sensitive for Snowflake, and for SQL Server it is dependent on the server collation. | |||
The Like function can be used with Where and CaseWhen functions which accept boolean values. | |||
Examples: | |||
<pre> | |||
Like(Column("TextData"), "a%") | |||
Returns true for all rows starting with "a". | |||
Like(Column("TextData"), "%aa%") | |||
Returns true for all rows containing "aa". | |||
Like(Column("TextData"), "abc_") | |||
Returns true for all rows starting with "abc" and having four characters. | |||
Like(Column("TextData"), "__a") | |||
Returns true for all rows ending to "a" and having three characters. | |||
dataFrame.Where(Like(Column("Region"), "t%")) | |||
Filters dataframe rows where column Region starts with "t". | |||
Like("a%b", "%\\%%", "\\") | |||
Searches for a percentage character. | |||
</pre> | |||
|- | |||
||Regexp | |||
||Returns true if the subject matches the specified regular expression pattern. First parameter is the subject and second parameter is the regular expression pattern. | |||
Examples: | |||
<pre> | |||
Regexp("abcd", "(ab).*") | |||
Returns: true | |||
Regexp("abcd", ".*(bd).*") | |||
Returns: false | |||
Regexp("1234", "\d") | |||
Returns: false | |||
Regexp("1234", "\\d*") | |||
Returns: true | |||
Regexp("123A4", "\\d*") | |||
Returns: false | |||
</pre> | |||
|- | |||
||Substring | |||
|| | |||
Returns substring of given string, based on given start index and length. Function has following parameters: | |||
* '''expression''': Expression that evaluates to the string where the substring is taken. | |||
* '''start''': Starting index of the substring. Start index 1 is the beginning of the string. | |||
* '''length''': Length of the substring. This parameter is optional, and if left out, the substring spans until the end. | |||
<pre> | |||
Substring("123456789", 5) | |||
Returns: 56789 | |||
Substring("123456789", 5, 3) | |||
Returns: 567 | |||
Substring(Column("MyColumn"), 1, 4) | |||
Returns the first 4 characters of the MyColumn data. | |||
</pre> | |||
|- | |||
||ToLower | |||
||Return string where all the characters of the input string have been converted into lower case characters. | |||
Examples: | |||
<pre> | |||
ToLower("Test") | |||
Returns: test | |||
</pre> | |||
|- | |||
||ToUpper | |||
||Return string where all the characters of the input string have been converted into upper case characters. | |||
Example: | |||
<pre> | |||
ToUpper("Test") | |||
Returns: TEST | |||
</pre> | |||
|- | |||
||Trim | |||
|| | |||
Trims a string, i.e., removes spaces from the beginning and end of a string. Only characters with code 32 are removed, which is the most common space character. | |||
Example: | |||
<pre> | |||
Trim(Column("MyColumn")) | |||
Returns column "MyColumn" where leading and trailing spaces have been removed. | |||
Trim(" test test ") | |||
Returns "test test". | |||
</pre> | |||
|- | |||
||Unicode | |||
||Returns Unicode code for the first character in the string. If the string is empty, a value of 0 is returned. | |||
Examples: | |||
<pre> | |||
Unicode("A") | |||
Returns: 65 | |||
Unicode("45") | |||
Returns: 52 | |||
</pre> | |||
|} | |||
== Rank functions == | |||
{| class="wikitable" | |||
!'''Function''' | |||
! '''Description''' | |||
|- | |||
||FirstValue | |||
||Returns the first value generated by an expression within an ordered group of values. Parameters: | |||
# '''expression''': Expression that is evaluated for each row. | |||
# '''order by''': Array of column(s) to order the rows within each partition. | |||
# '''ascending/descending order''': Array of boolean values, where each value indicates whether to sort the corresponding column in ascending (true) or descending (false) order. By default, it's the ascending order. The length of the array must be equal to the length of previous parameter (''order by''). | |||
# '''partition by''': Array of column(s) to partition the rows. Partitioning divides the dataset into distinct groups which are each sorted separately. When referring to the next row(s), partitions cannot see each others. | |||
# '''window frame''': See [[SQL_Expressions#Window_functions|window frame configuration]]. | |||
Example: Get the case start time: | |||
<pre> | |||
FirstValue(Timestamp, [TimeStamp, EventType], [true, true], [CaseId]) | |||
</pre> | |||
|- | |||
||LastValue | |||
||Returns the last value generated by an expression within an ordered group of values. Parameters: | |||
# '''expression''': Expression that is evaluated for each row. | |||
# '''order by''': Array of column(s) to order the rows within each partition. | |||
# '''ascending/descending order''': Array of boolean values, where each value indicates whether to sort the corresponding column in ascending (true) or descending (false) order. By default, it's the ascending order. The length of the array must be equal to the length of previous parameter (''orderBy''). | |||
# '''partition by''': Array of column(s) to partition the rows. Partitioning divides the dataset into distinct groups which are each sorted separately. When referring to the next row(s), partitions cannot see each others. | |||
# '''window frame''': See [[SQL_Expressions#Window_functions|window frame configuration]]. | |||
Example: Get the case end time: | |||
<pre> | |||
LastValue(TimeStamp, [TimeStamp, EventType], [true, true], [CaseId]) | |||
</pre> | |||
|- | |||
||NthValue | |||
||Returns the nth value generated by an expression within an ordered group of values. Parameters: | |||
# '''expression''': Expression that is evaluated for each row. | |||
# '''n''': Defines the position where to get the value. Must be between 1 and 1000. | |||
# '''order by''': Array of column(s) to order the rows within each partition. | |||
# '''ascending/descending order''': Array of boolean values, where each value indicates whether to sort the corresponding column in ascending (true) or descending (false) order. By default, it's the ascending order. The length of the array must be equal to the length of previous parameter (''orderBy''). | |||
# '''partition by''': Array of column(s) to partition the rows. Partitioning divides the dataset into distinct groups which are each sorted separately. When referring to the next row(s), partitions cannot see each others. | |||
# '''window frame''': See [[SQL_Expressions#Window_functions|window frame configuration]]. | |||
Example: Get the timestamp of the 3rd event in a case: | |||
<pre> | |||
NthValue(TimeStamp, 2, [TimeStamp, EventType], [true, true], [CaseId]) | |||
</pre> | |||
|- | |||
||Lag | |||
||Similar to the Lead function, except refers to the previous row(s) instead of the next row(s). | |||
Example: For events list, get the ''MyAttribute'' attribute value from the two steps previous event: | |||
<pre> | |||
Lag(Column("MyAttribute"), [TimeStamp, EventType], [true, true], [CaseId], 2, "N/A")) | |||
</pre> | |||
|- | |||
||Lead | |||
|| | |||
Returns value that is evaluated by accessing data in the next row. If a column with given columnId already exists in the DataFrame, the new column will overwrite the existing column. | |||
Parameters: | |||
# '''expression''': Expression that is evaluated in the context of the next row(s). | |||
# '''order by''': Array of column(s) to order the rows within each partition. | |||
# '''ascending/descending order''': Array of boolean values, where each value indicates whether to sort the corresponding column in ascending (true) or descending (false) order. By default, it's the ascending order. The length of the array must be equal to the length of previous parameter (''orderBy''). | |||
# '''partition by''': Array of column(s) to partition the rows. Partitioning divides the dataset into distinct groups which are each sorted separately. When referring to the next row(s), partitions cannot see each others. | |||
# '''offset''' (optional): Number of rows to go forward. Setting a negative offset will refer to the previous row(s) instead of next row(s). Default value is 1. | |||
# '''default''' (optional): Expression to return when the offset goes out of bounds. | |||
Example: For events list, get the next event timestamp: | |||
<pre> | |||
Lead(TimeStamp, [TimeStamp, EventType], [true, true], [CaseId])) | |||
</pre> | |||
|- | |||
||RowNumber | |||
||Returns a unique row number for each row within an ordered group of values. The row number starts at 1 and continues up sequentially. Parameters: | |||
# '''order by''': Array of column(s) to order the rows within each partition. | |||
# '''ascending/descending order''': Array of boolean values, where each value indicates whether to sort the corresponding column in ascending (true) or descending (false) order. By default, it's the ascending order. The length of the array must be equal to the length of previous parameter (''orderBy''). | |||
# '''partition by''': Array of column(s) to partition the rows. Partitioning divides the dataset into distinct groups which are each sorted separately. When referring to the next row(s), partitions cannot see each others. | |||
Example: Returns the row number of the row when the rows are sorted in an ascending order based on the Revenue column: | |||
<pre> | |||
RowNumber([Column("Revenue")]) | |||
</pre> | |||
Same as the previous, except that the row numbers are also partitioned by CaseId column: | |||
<pre> | |||
RowNumber([Column("Revenue")], null, [Column("CaseId")]) | |||
</pre> | |||
|- | |||
||Rank | |||
||Returns the rank of a value within an ordered group of values. The rank value starts at 1 and continues up sequentially. If two values are the same, they have the same rank. Parameters: | |||
# '''order by''': Array of column(s) to order the rows within each partition. | |||
# '''ascending/descending order''': Array of boolean values, where each value indicates whether to sort the corresponding column in ascending (true) or descending (false) order. By default, it's the ascending order. The length of the array must be equal to the length of previous parameter (''orderBy''). | |||
# '''partition by''': Array of column(s) to partition the rows. Partitioning divides the dataset into distinct groups which are each sorted separately. When referring to the next row(s), partitions cannot see each others. | |||
Example: Returns the rank of the row when the rows are sorted in an ascending order based on the value of Revenue column: | |||
<pre> | |||
Rank([Column("Revenue")]) | |||
</pre> | |||
Same as the previous, except that the ranks are also partitioned by CaseId column. | |||
<pre> | |||
Rank([Column("Revenue")], null, [Column("CaseId")]) | |||
</pre> | |||
|- | |||
||DenseRank | |||
||Returns the rank of a value within a group of values, without gaps in the ranks. The rank value starts at 1 and continues up sequentially. If two values are the same, they have the same rank. Parameters: | |||
# '''order by''': Array of column(s) to order the rows within each partition. | |||
# '''ascending/descending order''': Array of boolean values, where each value indicates whether to sort the corresponding column in ascending (true) or descending (false) order. By default, it's the ascending order. The length of the array must be equal to the length of previous parameter (''orderBy''). | |||
# '''partition by''': Array of column(s) to partition the rows. Partitioning divides the dataset into distinct groups which are each sorted separately. When referring to the next row(s), partitions cannot see each others. | |||
# '''window frame''': See [[SQL_Expressions#Window_functions|window frame configuration]]. | |||
Example: Returns the dense rank of the row when the rows are sorted in an ascending order based on the value of "Revenue"-column: | |||
<pre> | |||
DenseRank([Column("Revenue")]) | |||
</pre> | |||
Same as the previous, except that the dense ranks are also partitioned by column "CaseId". | |||
<pre> | |||
DenseRank([Column("Revenue")], null, [Column("CaseId")]) | |||
</pre> | |||
|- | |||
||NTile | |||
||Divides an ordered dataset equally into defined number of buckets. Buckets are sequentially numbered 1 through the constant value. Parameters: | |||
# '''buckets count''': Number of buckets. Must be a positive integer. | |||
# '''order by''': Array of column(s) to order the rows within each partition. | |||
# '''ascending/descending order''': Array of boolean values, where each value indicates whether to sort the corresponding column in ascending (true) or descending (false) order. By default, it's the ascending order. The length of the array must be equal to the length of previous parameter (''orderBy''). | |||
# '''partition by''': Array of column(s) to partition the rows. Partitioning divides the dataset into distinct groups which are each sorted separately. When referring to the next row(s), partitions cannot see each others. | |||
Example: Returns the bucket number of the row when the rows are sorted in an ascending order based on the value of Revenue column: | |||
<pre> | |||
NTile([Column("Revenue")]) | |||
</pre> | |||
Same as the previous, except that the buckets are also partitioned by column CaseId column: | |||
<pre> | |||
NTile([Column("Revenue")], null, [Column("CaseId")]) | |||
</pre> | |||
|} | |||
== Window functions == | |||
The window functions are based on functionality provided by Snowflake (https://docs.snowflake.com/en/user-guide/functions-window-using) to aggregations for sliding or cumulative window of rows. To use the window functions, typically the following parameters need to be defined: | |||
# '''expression''': Expression that is calculated for each row and the window operation is done for the results. | |||
# '''order by''': Array of expressions to order the rows for the window operation. | |||
# '''ascending''': Array of boolean values to determine whether the ordering is done in ascending (''true'') or descending (''false'') order. | |||
# '''partition by''': Array of expressions to divide the data into different parts. The windowing operation is done separately for each part. | |||
# '''window frame''': The ''window frame'' parameter is a dictionary with properties described below. | |||
Window frame dictionary parameters: | |||
* '''Sliding''': Used for the sliding window frame, i.e., a fixed number or rows in relation to the current row. | |||
** '''Preceding''': Number of preceding rows included into the window. If null, all the preceding rows in the partition are included. Default value is null. | |||
** '''Following''': Number of following rows included into the window. If null, all the following rows in the partition are included. Default value is null. | |||
* '''Cumulative''': Used for the cumulative window frame, i.e., calculate all rows either from start or end. | |||
** '''UseRange''' (boolean): Should range (''true'') be used instead of rows (''false''). When the range is used, the window consists of all rows having same value in the sorting order. Default value is false. | |||
** '''Following''' (boolean): Whether to calculate the cumulative values for preceding rows (''true'') or following rows (''false''). Default value is false. | |||
{| class="wikitable" | |||
!'''Function''' | |||
! '''Description''' | |||
|- | |||
||Average | |||
||Calculate average of values within given window. Parameters: | |||
# '''expression''': Expression (evaluated for each row) for which average is calculated. | |||
# '''order by''': See above. | |||
# '''ascending''': See above. | |||
# '''partition by''': See above. | |||
# '''window frame''': See above. | |||
Example: Calculates average of values of Revenue column within a sliding window of 1 rows preceding and following the current row where window is partitioned by Id column and rows ordered by Year column: | |||
<pre> | |||
Average(Column("Revenue"), [Column("Year")], null, [Column("Id")], #{"Sliding": #{ "Preceding": 1, "Following": 1 }}) | |||
</pre> | |||
|- | |||
||Max | |||
|| | |||
Calculate maximum value within given window. Parameters: | |||
# '''expression''': Expression (evaluated for each row) for which maximum is calculated. | |||
# '''order by''': See above. | |||
# '''ascending''': See above. | |||
# '''partition by''':See above. | |||
# '''window frame''': See above. | |||
Example: Calculates the maximum value of Revenue column within a sliding window of 3 rows preceding and the current row where window is partitioned by Id column and rows ordered by Year column: | |||
<pre> | |||
Max(Column("Revenue"), [Column("Year")], null, [Column("Id")], #{"Sliding": #{ "Preceding": 3 }}) | |||
</pre> | |||
|- | |||
||Median | |||
||Calculate median value within given window. Parameters: | |||
# '''expression''': Expression (evaluated for each row) for which median is calculated. | |||
# '''partition by''': Window configuration parameter. | |||
Examples: Calculates median value of Revenue column partitioned by Id column: | |||
<pre> | |||
Median(Column("Revenue"), [Column("Id")]) | |||
</pre> | |||
|- | |||
||Min | |||
||Calculate minimum value within given window. Parameters: | |||
# '''expression''': Expression (evaluated for each row) for which minimum is calculated. | |||
# '''order by''': See above. | |||
# '''ascending''': See above. | |||
# '''partition by''': See above. | |||
# '''window frame''': See above. | |||
Example: Calculates the minimum value of Revenue column within a cumulative window from the beginning where window is partitioned by Id column and rows ordered by Year column: | |||
<pre> | |||
Min(Column("Revenue"), [Column("Year")], null, [Column("Id")], #{"Cumulative": #{ "UseRange": false, "Following": false }}) | |||
</pre> | |||
|- | |||
||Sum | |||
||Calculate sum of values within given window. Returns the minimum value within an ordered group of values configured using window parameters. | |||
Parameters: | |||
# '''expression''': Expression (evaluated for each row) for which sum is calculated. | |||
# '''order by''': See above. | |||
# '''ascending''': See above. | |||
# '''partition by''': See above. | |||
# '''window frame''': See above. | |||
Example: Calculates sum of the values of Revenue column within a cumulative window from the current row to the end where window is partitioned by Id column and rows ordered by Year column: | |||
<pre> | |||
Sum(Column("Revenue"), [Column("Year")], null, [Column("Id")], #{"Sliding": #{ "UseRange": false, "Following": true }}) | |||
</pre> | |||
|} | |||
== Array functions == | |||
{| class="wikitable" | |||
!'''Function''' | |||
! '''Description''' | |||
|- | |||
||ArrayConcat | |||
||Combines two arrays together. Example: | |||
<pre> | |||
ArrayConcat(["PO Item Created"], ["PR Created", "Invoice Payment"]) | |||
Returns: ["PO Item Created", "PR Created", "Invoice Payment"] | |||
</pre> | |||
|- | |||
||ArrayCompact | |||
||Removes nulls from an array. Example: | |||
<pre> | |||
ArrayCompact([null, "a", null, "b"]) | |||
Returns: ["a", "b"] | |||
</pre> | |||
|- | |||
||ArrayContains | |||
||Checks whether given array contains given item. Returns true if item is found, or false if not found. Example: | |||
<pre> | |||
ArrayContains("PR Created", ["PO Item Created", "PR Created", "Invoice Payment"]) | |||
Returns: true | |||
</pre> | |||
|- | |||
||ArrayDistinct | |||
||Returns an array containing all distinct items of given array. The order of items in the returned array is not defined and may even vary between calls. Example: | |||
<pre> | |||
ArrayDistinct([null, "a", "b", "b", "c", null]) | |||
Returns: ["a", "b", "c", null] | |||
</pre> | |||
|- | |||
||ArrayExcept | |||
||Returns an array containing items of the first array, except those that are present in the second array. The order of items in the returned array is not defined and may even vary between calls. Example: | |||
<pre> | |||
ArrayIntersect(["A", "B", "C"], ["B", "D"]) | |||
Returns: ["A", "C"] | |||
</pre> | |||
|- | |||
||ArrayIntersect | |||
||Returns an array containing all items that appear in both parameter arrays. The order of items in the returned array is not defined and may even vary between calls. Example: | |||
<pre> | |||
ArrayIntersect(["Invoice Created", "PO Item Created"], ["PO Item Created", "PR Created", "Invoice Payment"]) | |||
Returns: ["PO Item Created"] | |||
</pre> | |||
|- | |||
||ArraysOverlap | |||
||Returna ''true'' if both arrays have at least one common item, or otherwise returns ''false''. | |||
Examples: | |||
<pre> | |||
ArraysOverlap(["PR Created"], ["PO Created", "PR Created", "Invoice Payment"]) | |||
Returns: true | |||
ArraysOverlap(["Invoice Created", "Invoice Payment"], ["PO Created", "PR Created"]) | |||
Returns: false | |||
</pre> | |||
|- | |||
||ArrayPosition | |||
||Finds index of first occurrence of given item in given array. Returns null if the item doesn't exist in the array. Parameters: | |||
# '''item''': Item to search for. | |||
# '''array''': Array to search the item from. | |||
Example: | |||
<pre> | |||
ArrayPosition("PR Created", ["PO Item Created", "PR Created", "Invoice Payment"]) | |||
Returns: 1 | |||
</pre> | |||
|- | |||
||ArraySize | |||
||Returns number of items in given array. Example: | |||
<pre> | |||
ArraySize(["PO Item Created", "PR Created", "Invoice Payment"]) | |||
Returns: 3 | |||
</pre> | |||
|- | |||
||ArraySlice | |||
||Returns a specified range (or a slice) of given array. Parameters: | |||
# '''array''': Array to be sliced. | |||
# '''from index''': Starting position in the array. First item position is 0. If the value is negative, it's relative to the end of the array, e.g., -2 refers to the second last item. | |||
# '''to index''': Ending position in the array. The item in the ending position is not included. If the value is negative, it's relative to the end of the array, e.g., -2 refers to the second last item. | |||
Example: | |||
<pre> | |||
ArraySlice([0, 1, 2, 3, 4, 5], 2, 4) | |||
Returns: [2, 3] | |||
</pre> | |||
|- | |||
||ArrayToString | |||
||Converts array into string by converting all individual items into strings and concatenating them using the separator character. Parameters: | |||
* '''array''': Array to convert into a string. | |||
* '''separator''': String to put between each item. | |||
Example: | |||
<pre> | |||
ArrayToString(["PO Item Created", "PR Created","Invoice Payment"], ", ") | |||
Returns: PO Item Created, PR Created, Invoice Payment | |||
</pre> | |||
|- | |||
||Split | |||
||Splits given string with given separator and returns the result in an array of strings. Parameters: | |||
# '''string''': String to split. | |||
# '''separator''': String separator to split the string by. | |||
Example: | |||
<pre> | |||
Split("127.0.0.1", ".") | |||
Returns: ["127", "0", "0", "1"] | |||
</pre> | |||
|} | |||
== Conformance checking functions == | |||
{| class="wikitable" | |||
!'''Function''' | |||
! '''Description''' | |||
|- | |||
||IsConformant | |||
|| | || | ||
Checks conformance of a variation against a BPMN model (design model) and returns true if the variation conform to the model. | |||
Parameters: | |||
# Event type names (string array): Array of event type names describing the variation to be checked. | |||
# BPMN model (string): BPMN model as BPMN XML 2.0 format. | |||
# ignoreMissingInModel (boolean): When ''true'', the nonconformance is ignored where the nonconformant event type doesn't exist in the BPMN model. Default value is ''false''. | |||
# ignoreIncompleteCases (boolean): When ''true'', variations that don't reach the end of the BPMN model, are considered nonconformant. Default value is ''false''. | |||
Examples: | Examples: | ||
Using the Array aggregation to evaluate variation: | |||
<pre> | |||
IsConformant(AggregateFrom(Events, #{{ "Function": "Array", "Ordering": [TimeStamp, EventType] }}, EventType), "<?xml...") | |||
</pre> | |||
Using constant array as variation: | |||
<pre> | |||
IsConformant(["Sales order received", "Delivery planned", "Confirmed delivery date"], "<?xml...") | |||
</pre> | |||
|- | |||
||ConformanceViolationsFlows | |||
|| | |||
Checks conformance of variation against BPMN model (design model) and returns the first flow related violation if the variation was invalid. Returns null, if the variation is conformant. If the variation is nonconformant, returns a json string with properties '''s''' ("start") and '''e''' ("end") for the flow start and end event type names. When the flow is a starting flow, the ''s'' property is null, and when the flow is an ending flow, the ''e'' property is null. | |||
Parameters: | |||
# Event type names (string array): Array of event type names describing the variation to be checked. | |||
# BPMN model (string): BPMN model as BPMN XML 2.0 format, working as the design model in the conformance checking. | |||
# Parameters object with following parameters supported: | |||
#* ignoreMissingInModel (boolean): When ''true'', those nonconformances are ignored where the nonconformant event type doesn't exist in the BPMN model at all. Default value is ''false''. | |||
#* ignoreIncompleteCases (boolean): When ''false'', variations that don't reach the end of the BPMN model, are considered nonconformant. Default value is ''false''. | |||
#* includeEventIndex: Whether to include property '''i''' containing the zero-based index of the violating flow (index zero is the starting flow). Default value is ''false''. | |||
Example: | |||
<pre> | |||
ConformanceViolationsFlows(["Sales order received", "Delivery planned", "Confirmed delivery date"], "<?xml...", #{ "IncludeEventIndex": true} ) | |||
</pre> | |||
Example result: | |||
<pre> | <pre> | ||
{ | |||
"s": "Invoice created", | |||
"e": "Goods receipt", | |||
"i": 3 | |||
} | |||
</pre> | </pre> | ||
|} | |} | ||
== AggregateFrom function == | == AggregateFrom function == | ||
Aggregates a value from related objects, where there can be several of them a source object (e.g., when going from cases to events). See the type of relations between objects in the [[Process_Mining_Concepts|process mining concepts]]. The diagram shows that when going to direction where the target objects has count of N, AggregateFrom function needs to be used. | |||
Parameters: | |||
# '''Aggregation level''': Aggregation level to aggregate from | # '''Aggregation level''': Aggregation level to aggregate from. Also an expression can be used to produce the dataset. Available aggregations are listed in [[#Process_mining_objects,_properties_and_aggregations|table below]] (cardinality N). | ||
# '''Aggregation function''': Aggregation function or object definition. | # '''Aggregation function''': Aggregation function or object definition ([[#Aggregation_operations|see more]]). | ||
# '''Expression''' | # '''Expression''': Expression to generate the values to be aggregated. Default value is ''null''. | ||
# '''Filter''': Optional filter to apply prior to performing | # '''Filter''': Optional filter to apply prior to performing the aggregation. Filter is given as JSON like syntax using dictionaries, arrays and scalar values. | ||
Example for EventTypes: | Example for EventTypes: | ||
Line 146: | Line 870: | ||
Returns the number of cases having the same variation for every case. | Returns the number of cases having the same variation for every case. | ||
DateDiff("Seconds", AggregateFrom(Events.Where(Column("EventType") == "Sales Order"), "Min", Column("TimeStamp")), AggregateFrom(Events.Where(Column("EventType") == "Invoice"), "Max", Column("Timestamp"))) | |||
Returns the duration in seconds between the first occurrence of "Sales Order"-event type and the last occurrence of "Invoice" event type for each case. | Returns the duration in seconds between the first occurrence of "Sales Order"-event type and the last occurrence of "Invoice" event type for each case. | ||
</pre> | </pre> | ||
Line 158: | Line 882: | ||
== GetValueFrom function == | == GetValueFrom function == | ||
Retrieves a value from a related object that there may be only one for a source object (e.g., when going from events to cases). See the type of relations between objects in the [[Process_Mining_Concepts|process mining concepts]]. The diagram shows that when going to direction where the target objects has count of 1, GetValueFrom function needs to be used. | |||
# ''' | |||
# ''' | Parameters: | ||
# ''' | # '''Aggregation level''': Aggregation level to aggregate from. This includes possible additional data frame expressions to prepare the aggregation level. Available aggregations are listed in [[#Process_mining_objects,_properties_and_aggregations|table below]] (cardinality 1). | ||
# '''Expression''': Expression to evaluate in given aggregation level to get the returned value. | |||
# '''Filter''': Optional filter to apply prior to performing expression evaluation. Filter is given as dictionary following the [[Filtering_in_QPR_ProcessAnalyzer_Queries|JSON filter syntax]]. | |||
Examples as measure expression for events: | Examples as measure expression for events: | ||
Line 180: | Line 906: | ||
Returns cases with their variations where only "Shipment" and "Invoice" event types are taken into account. | Returns cases with their variations where only "Shipment" and "Invoice" event types are taken into account. | ||
</pre> | </pre> | ||
== Other functions == | |||
{| class="wikitable" | |||
!'''Function''' | |||
! '''Description''' | |||
|- | |||
||CaseWhen | |||
|| | |||
Goes through conditions and returns a value when the first condition is true, similar to an if-then-else structure. Once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the else expression. If the else expression is not defined (i.e. there are even number of parameters), null value is returned. | |||
Consists of any number of pairs of condition and value expressions followed by an optional else expression. The odd parameters are the conditions and the even parameters are the return values. | |||
<pre> | |||
CaseWhen(Column("a") == null, 1, Column("a") < 1.0, 2, 3) | |||
Returns 1 if the value of column "a" is null. | |||
Returns 2 if the value of column "a" is less than 1.0. | |||
Returns 3 otherwise. | |||
</pre> | |||
Returns given value rounded to the nearest equal or larger integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. | |||
|- | |||
||Coalesce | |||
|| | |||
Returns the first non-null parameter. There can be any number of parameters. If all parameters are null, returns null. | |||
<pre> | |||
Coalesce(null, 3, 2) | |||
Returns 3. | |||
Coalesce(Column("column1"), "N/A") | |||
Returns column "column1" value, except replaces nulls with "N/A". | |||
</pre> | |||
|- | |||
||Column | |||
|| | |||
Return the value of given column. | |||
<pre> | |||
Column("column1") | |||
Column("My Column 2") | |||
</pre> | |||
|- | |||
||GetObjectProperty | |||
|| | |||
Returns a sub-object in a key from an object. The first parameter is the object and the second parameter is the key path. This function returns an object (that's why string values contain quotation marks). This function doesn't work in SQL Server. | |||
Examples: | |||
<pre> | |||
GetObjectProperty(#{"key1": "value 1", "key2": 3 }, "key1") | |||
Returns: "value 1" | |||
GetObjectProperty(#{"key1": ["value 1", "value 2"] }, "key1[1]") | |||
Returns: "value 2" | |||
GetObjectProperty(#{"key1": [ #{"key2": 3 }, #{"key2": 4 } ] }, "key1[0].key2") | |||
Returns: 3 | |||
GetObjectProperty(#{"key1": [ #{"key2": 3 }, #{"key2": 4 } ] }, "key1[0]") | |||
Returns: { "key2": 3 } | |||
</pre> | |||
|- | |||
||GetJsonValue | |||
|| | |||
Returns value of key from a json string. The first parameter is the json string and the second parameter is the key path. This function returns a string. This function works both in Snowflake and in SQL Server, but in SQL Server the returned value cannot be a json object. | |||
Examples: | |||
<pre> | |||
GetJsonValue("{\"key1\": \"value 1\", \"key2\": 3 }", "key1") | |||
Returns: value 1 | |||
GetJsonValue("{\"key1\": [\"value 1\", \"value 2\"] }", "key1[1]") | |||
Returns: value 2 | |||
GetJsonValue("{\"key1\": [ {\"key2\": 3 }, {\"key2\": 4 } ] }", "key1[0].key2") | |||
Returns: 3 | |||
GetJsonValue("{\"key1\": [ {\"key2\": 3 }, {\"key2\": 4 } ] }", "key1[0]") | |||
Returns: {"key2":3} (this only works in Snowflake) | |||
</pre> | |||
|- | |||
||In | |||
|| | |||
Returns true, if the test expression (given as the first parameter) matches with any of the other expressions given as parameters (starting from the seconds parameter). | |||
Examples: | |||
<pre> | |||
In(Column("Country"), "Germany", "France", "UK") | |||
Return true if column "Country" is Germany, France or UK. | |||
In("Germany", Column("SourceCountry"), Column("DestinationCountry")) | |||
Returns true, if column SourceCountry or DestinationCountry is Germany (or both columns). | |||
</pre> | |||
|- | |||
||Variable | |||
|| | |||
Gets variable value available in the in-memory expression context where the SQL expression is run. Supports string, number, boolean and date values. In addition, the variable can contain SQL expression which is added as such to the contained SQL expression. | |||
Example of an in-memory expression having SQL expression in the ''Where'' function: | |||
<pre> | |||
let myRegion = "Dallas"; | |||
DatatableById(123) | |||
.SqlDataFrame | |||
.Where(Column("Region") == Variable("myRegion")) | |||
.Collect() | |||
Filters datatable by Region is Dallas. | |||
</pre> | |||
Note that in the above example ''Column("Region") == Variable("myRegion")'' is the SQL expression. | |||
|} | |||
== Process mining objects, properties and aggregations == | |||
The following table summarizes the available process mining objects and which properties and aggregations are available for them. For aggregations, the ''AggregateFrom'' function is used with cardinality N aggregations and the ''GetValueFrom'' function with the cardinality 1 aggregations. | |||
{| class="wikitable" | |||
!'''Object type''' | |||
! '''Properties''' | |||
! '''Aggregations''' (cardinality in parenthesis) | |||
|- | |||
||Cases | |||
|| | |||
* '''CaseId''': Case id. | |||
* All case attributes which can be referred with Column("<AttributeName>") | |||
|| | |||
* Events (N) | |||
* EventTypes (N) | |||
* Variations (1) | |||
* Flows (N) | |||
* FlowOccurrences (N) | |||
* Model (1) | |||
|- | |||
||Events | |||
|| | |||
* '''CaseId''': Case id. | |||
* '''EventType''': Event type name. | |||
* '''Timestamp''': Event timestamp. | |||
* All event attributes which can be referred with Column("<AttributeName>") | |||
|| | |||
* Cases (1) | |||
* EventTypes (N) | |||
* Model (1) | |||
|- | |||
||EventTypes | |||
|| | |||
* '''EventType''': Event type name. | |||
|| | |||
* Cases (N) | |||
* Events (N) | |||
|- | |||
||Variations | |||
|| | |||
* '''Variation''': Variation identifier, which is concatenated event type names separated by separator "#,#". | |||
|| | |||
* Cases (N) | |||
* Events (N) | |||
* EventTypes (N) | |||
* Model (1) | |||
|- | |||
||Flows | |||
|| | |||
* '''FromEventType''': Event type name of the flow start. | |||
* '''ToEventType''': Event type name of the flow end. | |||
|| | |||
* Cases (N) | |||
* FromEventType (1) | |||
* ToEventType (1) | |||
* Model (1) | |||
|- | |||
||FlowOccurrences | |||
|| | |||
* '''CaseId''': Case id. | |||
* '''FromEventType''': Event type name of the flow start. | |||
* '''FromTimeStamp''': Time stamp of the flow start event. | |||
* Starting event attributes can be used when adding text '''From''' to the beginning of the attribute name, e.g., Column("From<AttributeName>") | |||
* '''ToEventType''': Event type name of the flow end. | |||
* '''ToTimeStamp''': Time stamp of the flow end event. | |||
* Ending event attributes when adding text '''To''' to the beginning of the attribute name, e.g., Column("To<AttributeName>") | |||
|| | |||
* Cases (N) | |||
* FromEvent (1) | |||
* ToEvent (1) | |||
* Model (1) | |||
|- | |||
||Model | |||
|| | |||
* '''ModelId''': Model id. | |||
|| | |||
* Cases (N) | |||
* Events (N) | |||
* EventTypes (N) | |||
* Variations (N) | |||
* Flows (N) | |||
* FlowOccurrences (N) | |||
|} | |||
Root expression examples: | |||
For cases (and also events), the case id can be referred using ''CaseId'': | |||
<pre> | |||
Cases.Where(CaseId == "Case_123") | |||
</pre> | |||
Assuming that there is an ''Order Id'' column that is mapped to the CaseId, also the original column name can be used: | |||
<pre> | |||
Cases.Where(Column("Order Id") == "Case_123") | |||
</pre> | |||
For events, the event type can be referred using ''EventType'': | |||
<pre> | |||
Events.Where(EventType == "Order created") | |||
</pre> | |||
Assuming that there is an ''Process step'' column that is mapped to the EventType, also the original column name can be used: | |||
<pre> | |||
Cases.Where(Column("Process step") == "Order created") | |||
</pre> | |||
== Aggregation operations == | |||
Aggregations are used when multiple items or rows are combined into a single value. The following aggregations are available: | |||
* '''Sum''': Sum of the items. Can be used for numbers. | |||
* '''Average''': Average of the items (sum of items divided by count of items). Can be used for numbers. | |||
* '''Median''': The middle value of the sorted data. If there are even number of items, the average of the two middle items. Can be used for numbers. | |||
* '''Min''': Lowest value for numbers, alphabetically first value for texts and earliest value for dates. | |||
* '''Max''': Highest value for numbers, alphabetically last value for texts and last value for dates. | |||
* '''Count''': Number of items. Can be used for any type of data. | |||
* '''CountDistinct''': Number of unique items (i.e., duplicates are removed). Can be used for any type of data. | |||
* '''Percentile''': Percentile value. Can be used for numbers. Can be defined as an object as follows: <pre>#{ "Function": "Percentile", "Percentile": 0.75 }</pre>Percentile aggregation has following parameter: | |||
** '''Percentile''': Percentile value between 0 and 1. | |||
* '''Stdev''': Sample standard deviation. Can be calculated for numbers. | |||
* '''Stdevp''': Population standard deviation. Can be calculated for numbers. | |||
* '''Var''': Sample variance. Can be used for numbers. | |||
* '''Varp''': Population variance. Can be used for numbers. | |||
* '''List''': Combines textual values into a single list in defined order and separated by defined characters. Can be defined as an object as follows: <pre>#{ "Function": "List", "Ordering": [EventType], "Separator": ", ", "Distinct": true }</pre>List aggregation has following parameters: | |||
** '''Ordering''': Array of columns defining how items are ordered. Ordering is supported only when ''Distinct'' is ''false''. | |||
** '''Separator''': Characters separating the combined items. | |||
** '''Distinct''': Whether duplicate items are removed. Default value is ''false''. When showing distinct items, the items are sorted by their values. | |||
* '''Any''': Returns any value of the aggregated values. This is the most efficient aggregation when it's known that there is maximum of one value among the aggregated values. Note that there is no guarantee that the returned value will be the same every time. | |||
* '''Array''': Returns the aggregated values as a single array object. This aggregation has following parameters: | |||
** '''Ordering''': Array of columns defining how items are ordered. | |||
** '''Distinct''': Whether duplicate items are removed. Default value is ''false''. When showing distinct items, the items are sorted by their values. |
Revision as of 14:42, 7 February 2024
SQL expressions are special expressions, which are run in Snowflake by converting them into SQL. Note that SQL expressions are different than the in-memory expressions. In addition to the Big data chart, SQL expressions are used in the Where and WithColumn functions in SQLDataFrames.
It's possible to run in-memory expressions that are embedded into SQL expressions using #expr{} syntax. For examples, see the DurationBetweenDates function.
Operators
Following operators are supported by SQL expressions:
- Arithmetic operators
+ - * / %
- Comparison operators
== < <= > >= !=
- Logical operators
&& || !
Following data types are supported by SQL expressions:
- String
"this is a string"
- Integer
123
- Decimal number
123.45
- Boolean
true false
- Null value
null
- Arrays
[1, 2, 3]
- Objects
#{ "key1": "value1", "key2": 3.7, "key3": false }
To get an item from an array, use following syntax:
Column("arraydata")[2]
To get a value from an object, use following syntax:
Column("objectdata")["key1"]
Expressions resulting in boolean value need to be enclosed to the CaseWhen function. For example, expression Column("MyColumn") < 5 doesn't work, but following does:
CaseWhen(Column("MyColumn") < 5, true, false)
Mathematical functions
Function | Description |
---|---|
Abs | Returns absolute value of given number. Example:
Abs(-5) Returns: 5 |
Ceiling |
Returns given value rounded to the nearest equal or larger integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. |
Floor |
Returns given value rounded to the nearest equal or smaller integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. |
Log | Returns given logarithm of given number. The first parameter is the base number and the second parameter is number to take the logarithm from. Example:
Log(10, 1000) Returns: 3 Log(2, 16) Returns: 4 |
Mod | Returns remainder of division operation of given numbers. The first parameter is the dividend and the second parameter is the divisor. Example:
Mod(10, 4) Returns: 2 |
Power | Returns given power of given number. First parameter is the base number and second is the exponent: Example:
Power(2, 4) Returns: 16 |
Rand | Returns random number between 0 and 1, where 0 is included and 1 is excluded. It's possible that when getting random for multiple columns, the same row may have the same random number. Example:
Rand() Returns: 0.78452600 (for example) |
Round |
Rounds given number to given precision. Parameters:
Examples: Round(123.456) Returns 123 Round(123.456, 2) Returns 123.46 Round(123.456, -1) Returns 120 Round(123.456, 0) Returns 123 |
Sign | Return sign of given number, i.e., returns 1 (when number is a positive number) or 0 (when number is a negative number or zero). Example:
Sign(-2) Returns: -1 |
Sqrt | Return square root of given number. Example:
Sqrt(9) Returns: 3 |
Date functions
Function | Description |
---|---|
Year | Returns the year of given timestamp. |
Quarter | Returns the quarter (1-4) of given timestamp. |
Month | Returns the months part (1-12) of given timestamp. |
Day |
Returns the days of the month (1-31) of given timestamp. Day(Column("DateColumn")) |
Hour | Returns the hours part (0-59) of given timestamp. |
Minute | Returns the minutes part (0-59) of given timestamp. |
Second | Returns the seconds part (0-59) of given timestamp. |
Millisecond | Returns the milliseconds part (0-999) of given timestamp. |
DayOfWeek | Returns the day of week for given timestamp as number. Monday is 1, Tuesday is 2, ... and Sunday is 7. Note that Snowflake's WEEK_OF_YEAR_POLICY and WEEK_START session parameters affect the behavior.
DayOfWeek("2023-07-10 00:00:00.000") Returns: 1 |
DayOfYear | Returns the day of year for given timestamp. The 1st of January is 1, the 2nd of January is 2 and so on. The value can be between 1 and 366.
DayOfYear("2023-01-01 00:00:00.000") Returns: 1 DayOfYear("2023-12-31 00:00:00.000") Returns: 365 |
WeekOfYear | Returns the ISO week number for given timestamp. The value can be between 1 and 53. The 1st of January is in the week 1 if it's Monday, Tuesday, Wednesday or Thursday. If the 1st of January is Friday, Saturday or Sunday, week 1 is the next week.
WeekOfYear("2023-01-01 00:00:00.000") Returns: 52 WeekOfYear("2023-01-02 00:00:00.000") Returns: 1 |
DateAdd |
Adds a duration to a date, i.e., moves the date back or forth in time. The duration is specified as integer and the unit can be chosen from variety of options. The duration can also be a negative number for moving to an earlier time. Parameters:
DateAdd("month", 1, "01/01/2022") Returns 1st of February 2022 at midnight. DateAdd("year", -2, "01/01/2022") Returns 1st of January 2020 at midnight. DateAdd("hour", 12, "01/01/2022") Returns 1st of January 2022 at 12 o'clock. |
DateDiff |
Calculates how many of the specified date part boundaries there are between the specified dates. Parameters:
|
DurationBetweenDates | Calculates duration between the specified dates and returns the duration as float. This function gives more precise result than the DateDiff function that complies with the traditional logic in the SQL language. Parameters:
Example: Calculated duration between two dates: DurationBetweenDates("hour", Column("date1"), Column("date2")) Example: Calculated duration between two dates using model's business calendar: DurationBetweenDates("hour", Column("date1"), Column("date2"), #expr{_model.DefaultCalendar()}) |
TruncateDate |
Truncates given date to given time unit. The truncation gives the timestamp of the beginning of the period defined by the time unit (logic thus resembles the floor function for numbers). Parameters:
Examples: TruncateDate(Column("MyDateColumn"), "year") Returns timestamp representing beginning of a year, e.g. 2022-01-01 00:00:00. TruncateDate(Column("MyDateColumn"), "day") Returns timestamp representing beginning of a day, e.g. 2022-04-16 00:00:00. |
ToTimestamp | Converts string into corresponding date type of value. If the conversion fails, an exception is thrown. Parameters:
Examples: ToTimeStamp("2023-09-15 16:31:01", "YYYY-MM-DD HH24:MI:SS") ToTimeStamp("01.12.2023 09:14:58.849 PM", "DD.MM.YYYY HH12:MI:SS.FF3 AM") |
TryToTimestamp |
Tries to converts string into corresponding date type of value. If the conversion fails, null is returned. Parameters:
Example: Successful date conversion: TryToTimeStamp("2023-09-15 16:31:01", "YYYY-MM-DD HH24:MI:SS") Example: Returns null because formatting doesn't match with the data: TryToTimeStamp("2023-09-15", "DD.MM.YYYY") |
UnixMilliseconds | Returns the time as Unix compatible milliseconds (i.e., milliseconds from the first of January 1970 midnight UTC).
UnixMilliseconds("2024-06-19 21:10:00.123") Returns: 1718831400123 |
String functions
Function | Description |
---|---|
Char | Converts Unicode code into corresponding character. If an invalid code point is specified, an error is returned.
Examples: Char(65) Returns: A Concat(Char(52), Char(53)) Returns: 45 |
CharIndex | Returns the starting position of the first occurrence of the first argument in the second argument. The first character has the starting position 1. Returns 0 if the search doesn't find any results.
Parameters:
Examples: CharIndex("c", "abcdefg") Returns: 3 CharIndex("ef", "abcdefg") Returns: 5 CharIndex("fe", "abcdefg") Returns: 0 CharIndex("c", "abcdefgc", 4) Returns: 8 |
Concat |
Return the concatenated string value of given values. Concat("part 1", "part 2") Returns "part 1part 2" Concat(Column("column1"), " ", Column("column2")) Returns column1 and column2 value concatenated separated by space. |
Length |
Returns length of a given string, i.e., the number of characters in a string. Examples: Length("test") Returns: 4 Length("long text ending to space ") Returns: 26 Length(Column("MyData")) |
Like |
Determines whether given string matches given pattern, and returns either true (for match) or false (for no match). Following special characters can be used in the pattern:
Parameters:
When the percentage and underscore characters need to be used as normal characters in the search pattern (not as special characters), escaping for them is needed. By default, there is no escape character available, so it needs to be defined using the 3rd parameter. Any character can be used as the escape character, but it might be a good practice to use the backslash (\). See below an example of using the escape character. The comparison is case-sensitive for Snowflake, and for SQL Server it is dependent on the server collation. The Like function can be used with Where and CaseWhen functions which accept boolean values. Examples: Like(Column("TextData"), "a%") Returns true for all rows starting with "a". Like(Column("TextData"), "%aa%") Returns true for all rows containing "aa". Like(Column("TextData"), "abc_") Returns true for all rows starting with "abc" and having four characters. Like(Column("TextData"), "__a") Returns true for all rows ending to "a" and having three characters. dataFrame.Where(Like(Column("Region"), "t%")) Filters dataframe rows where column Region starts with "t". Like("a%b", "%\\%%", "\\") Searches for a percentage character. |
Regexp | Returns true if the subject matches the specified regular expression pattern. First parameter is the subject and second parameter is the regular expression pattern.
Examples: Regexp("abcd", "(ab).*") Returns: true Regexp("abcd", ".*(bd).*") Returns: false Regexp("1234", "\d") Returns: false Regexp("1234", "\\d*") Returns: true Regexp("123A4", "\\d*") Returns: false |
Substring |
Returns substring of given string, based on given start index and length. Function has following parameters:
Substring("123456789", 5) Returns: 56789 Substring("123456789", 5, 3) Returns: 567 Substring(Column("MyColumn"), 1, 4) Returns the first 4 characters of the MyColumn data. |
ToLower | Return string where all the characters of the input string have been converted into lower case characters.
Examples: ToLower("Test") Returns: test |
ToUpper | Return string where all the characters of the input string have been converted into upper case characters.
Example: ToUpper("Test") Returns: TEST |
Trim |
Trims a string, i.e., removes spaces from the beginning and end of a string. Only characters with code 32 are removed, which is the most common space character. Example: Trim(Column("MyColumn")) Returns column "MyColumn" where leading and trailing spaces have been removed. Trim(" test test ") Returns "test test". |
Unicode | Returns Unicode code for the first character in the string. If the string is empty, a value of 0 is returned.
Examples: Unicode("A") Returns: 65 Unicode("45") Returns: 52 |
Rank functions
Function | Description |
---|---|
FirstValue | Returns the first value generated by an expression within an ordered group of values. Parameters:
Example: Get the case start time: FirstValue(Timestamp, [TimeStamp, EventType], [true, true], [CaseId]) |
LastValue | Returns the last value generated by an expression within an ordered group of values. Parameters:
Example: Get the case end time: LastValue(TimeStamp, [TimeStamp, EventType], [true, true], [CaseId]) |
NthValue | Returns the nth value generated by an expression within an ordered group of values. Parameters:
Example: Get the timestamp of the 3rd event in a case: NthValue(TimeStamp, 2, [TimeStamp, EventType], [true, true], [CaseId]) |
Lag | Similar to the Lead function, except refers to the previous row(s) instead of the next row(s).
Example: For events list, get the MyAttribute attribute value from the two steps previous event: Lag(Column("MyAttribute"), [TimeStamp, EventType], [true, true], [CaseId], 2, "N/A")) |
Lead |
Returns value that is evaluated by accessing data in the next row. If a column with given columnId already exists in the DataFrame, the new column will overwrite the existing column. Parameters:
Example: For events list, get the next event timestamp: Lead(TimeStamp, [TimeStamp, EventType], [true, true], [CaseId])) |
RowNumber | Returns a unique row number for each row within an ordered group of values. The row number starts at 1 and continues up sequentially. Parameters:
Example: Returns the row number of the row when the rows are sorted in an ascending order based on the Revenue column: RowNumber([Column("Revenue")]) Same as the previous, except that the row numbers are also partitioned by CaseId column: RowNumber([Column("Revenue")], null, [Column("CaseId")]) |
Rank | Returns the rank of a value within an ordered group of values. The rank value starts at 1 and continues up sequentially. If two values are the same, they have the same rank. Parameters:
Example: Returns the rank of the row when the rows are sorted in an ascending order based on the value of Revenue column: Rank([Column("Revenue")]) Same as the previous, except that the ranks are also partitioned by CaseId column. Rank([Column("Revenue")], null, [Column("CaseId")]) |
DenseRank | Returns the rank of a value within a group of values, without gaps in the ranks. The rank value starts at 1 and continues up sequentially. If two values are the same, they have the same rank. Parameters:
Example: Returns the dense rank of the row when the rows are sorted in an ascending order based on the value of "Revenue"-column: DenseRank([Column("Revenue")]) Same as the previous, except that the dense ranks are also partitioned by column "CaseId". DenseRank([Column("Revenue")], null, [Column("CaseId")]) |
NTile | Divides an ordered dataset equally into defined number of buckets. Buckets are sequentially numbered 1 through the constant value. Parameters:
Example: Returns the bucket number of the row when the rows are sorted in an ascending order based on the value of Revenue column: NTile([Column("Revenue")]) Same as the previous, except that the buckets are also partitioned by column CaseId column: NTile([Column("Revenue")], null, [Column("CaseId")]) |
Window functions
The window functions are based on functionality provided by Snowflake (https://docs.snowflake.com/en/user-guide/functions-window-using) to aggregations for sliding or cumulative window of rows. To use the window functions, typically the following parameters need to be defined:
- expression: Expression that is calculated for each row and the window operation is done for the results.
- order by: Array of expressions to order the rows for the window operation.
- ascending: Array of boolean values to determine whether the ordering is done in ascending (true) or descending (false) order.
- partition by: Array of expressions to divide the data into different parts. The windowing operation is done separately for each part.
- window frame: The window frame parameter is a dictionary with properties described below.
Window frame dictionary parameters:
- Sliding: Used for the sliding window frame, i.e., a fixed number or rows in relation to the current row.
- Preceding: Number of preceding rows included into the window. If null, all the preceding rows in the partition are included. Default value is null.
- Following: Number of following rows included into the window. If null, all the following rows in the partition are included. Default value is null.
- Cumulative: Used for the cumulative window frame, i.e., calculate all rows either from start or end.
- UseRange (boolean): Should range (true) be used instead of rows (false). When the range is used, the window consists of all rows having same value in the sorting order. Default value is false.
- Following (boolean): Whether to calculate the cumulative values for preceding rows (true) or following rows (false). Default value is false.
Function | Description |
---|---|
Average | Calculate average of values within given window. Parameters:
Example: Calculates average of values of Revenue column within a sliding window of 1 rows preceding and following the current row where window is partitioned by Id column and rows ordered by Year column: Average(Column("Revenue"), [Column("Year")], null, [Column("Id")], #{"Sliding": #{ "Preceding": 1, "Following": 1 }}) |
Max |
Calculate maximum value within given window. Parameters:
Example: Calculates the maximum value of Revenue column within a sliding window of 3 rows preceding and the current row where window is partitioned by Id column and rows ordered by Year column: Max(Column("Revenue"), [Column("Year")], null, [Column("Id")], #{"Sliding": #{ "Preceding": 3 }}) |
Median | Calculate median value within given window. Parameters:
Examples: Calculates median value of Revenue column partitioned by Id column: Median(Column("Revenue"), [Column("Id")]) |
Min | Calculate minimum value within given window. Parameters:
Example: Calculates the minimum value of Revenue column within a cumulative window from the beginning where window is partitioned by Id column and rows ordered by Year column: Min(Column("Revenue"), [Column("Year")], null, [Column("Id")], #{"Cumulative": #{ "UseRange": false, "Following": false }}) |
Sum | Calculate sum of values within given window. Returns the minimum value within an ordered group of values configured using window parameters.
Parameters:
Example: Calculates sum of the values of Revenue column within a cumulative window from the current row to the end where window is partitioned by Id column and rows ordered by Year column: Sum(Column("Revenue"), [Column("Year")], null, [Column("Id")], #{"Sliding": #{ "UseRange": false, "Following": true }}) |
Array functions
Function | Description |
---|---|
ArrayConcat | Combines two arrays together. Example:
ArrayConcat(["PO Item Created"], ["PR Created", "Invoice Payment"]) Returns: ["PO Item Created", "PR Created", "Invoice Payment"] |
ArrayCompact | Removes nulls from an array. Example:
ArrayCompact([null, "a", null, "b"]) Returns: ["a", "b"] |
ArrayContains | Checks whether given array contains given item. Returns true if item is found, or false if not found. Example:
ArrayContains("PR Created", ["PO Item Created", "PR Created", "Invoice Payment"]) Returns: true |
ArrayDistinct | Returns an array containing all distinct items of given array. The order of items in the returned array is not defined and may even vary between calls. Example:
ArrayDistinct([null, "a", "b", "b", "c", null]) Returns: ["a", "b", "c", null] |
ArrayExcept | Returns an array containing items of the first array, except those that are present in the second array. The order of items in the returned array is not defined and may even vary between calls. Example:
ArrayIntersect(["A", "B", "C"], ["B", "D"]) Returns: ["A", "C"] |
ArrayIntersect | Returns an array containing all items that appear in both parameter arrays. The order of items in the returned array is not defined and may even vary between calls. Example:
ArrayIntersect(["Invoice Created", "PO Item Created"], ["PO Item Created", "PR Created", "Invoice Payment"]) Returns: ["PO Item Created"] |
ArraysOverlap | Returna true if both arrays have at least one common item, or otherwise returns false.
Examples: ArraysOverlap(["PR Created"], ["PO Created", "PR Created", "Invoice Payment"]) Returns: true ArraysOverlap(["Invoice Created", "Invoice Payment"], ["PO Created", "PR Created"]) Returns: false |
ArrayPosition | Finds index of first occurrence of given item in given array. Returns null if the item doesn't exist in the array. Parameters:
Example: ArrayPosition("PR Created", ["PO Item Created", "PR Created", "Invoice Payment"]) Returns: 1 |
ArraySize | Returns number of items in given array. Example:
ArraySize(["PO Item Created", "PR Created", "Invoice Payment"]) Returns: 3 |
ArraySlice | Returns a specified range (or a slice) of given array. Parameters:
Example: ArraySlice([0, 1, 2, 3, 4, 5], 2, 4) Returns: [2, 3] |
ArrayToString | Converts array into string by converting all individual items into strings and concatenating them using the separator character. Parameters:
Example: ArrayToString(["PO Item Created", "PR Created","Invoice Payment"], ", ") Returns: PO Item Created, PR Created, Invoice Payment |
Split | Splits given string with given separator and returns the result in an array of strings. Parameters:
Example: Split("127.0.0.1", ".") Returns: ["127", "0", "0", "1"] |
Conformance checking functions
Function | Description |
---|---|
IsConformant |
Checks conformance of a variation against a BPMN model (design model) and returns true if the variation conform to the model. Parameters:
Examples: Using the Array aggregation to evaluate variation: IsConformant(AggregateFrom(Events, #{{ "Function": "Array", "Ordering": [TimeStamp, EventType] }}, EventType), "<?xml...") Using constant array as variation: IsConformant(["Sales order received", "Delivery planned", "Confirmed delivery date"], "<?xml...") |
ConformanceViolationsFlows |
Checks conformance of variation against BPMN model (design model) and returns the first flow related violation if the variation was invalid. Returns null, if the variation is conformant. If the variation is nonconformant, returns a json string with properties s ("start") and e ("end") for the flow start and end event type names. When the flow is a starting flow, the s property is null, and when the flow is an ending flow, the e property is null. Parameters:
Example: ConformanceViolationsFlows(["Sales order received", "Delivery planned", "Confirmed delivery date"], "<?xml...", #{ "IncludeEventIndex": true} ) Example result: { "s": "Invoice created", "e": "Goods receipt", "i": 3 } |
AggregateFrom function
Aggregates a value from related objects, where there can be several of them a source object (e.g., when going from cases to events). See the type of relations between objects in the process mining concepts. The diagram shows that when going to direction where the target objects has count of N, AggregateFrom function needs to be used.
Parameters:
- Aggregation level: Aggregation level to aggregate from. Also an expression can be used to produce the dataset. Available aggregations are listed in table below (cardinality N).
- Aggregation function: Aggregation function or object definition (see more).
- Expression: Expression to generate the values to be aggregated. Default value is null.
- Filter: Optional filter to apply prior to performing the aggregation. Filter is given as JSON like syntax using dictionaries, arrays and scalar values.
Example for EventTypes:
AggregateFrom(Events, "Count") Returns the number of events having each event type.
Example for Cases:
AggregateFrom(Events, #{ "Function": "List", "Ordering": ["TimeStamp"], "Separator": "#,#" }, Column("EventType")) Returns variation/event type path string for all the cases. GetValueFrom(Variations, AggregateFrom(Cases, "Count")) Returns the number of cases having the same variation for every case. DateDiff("Seconds", AggregateFrom(Events.Where(Column("EventType") == "Sales Order"), "Min", Column("TimeStamp")), AggregateFrom(Events.Where(Column("EventType") == "Invoice"), "Max", Column("Timestamp"))) Returns the duration in seconds between the first occurrence of "Sales Order"-event type and the last occurrence of "Invoice" event type for each case.
Example for Model:
AggregateFrom(Cases, "Count", null, #{"Items":[#{"Type":"IncludeCases","Items":[#{"Type":"CaseAttributeValue","Values":["Dallas"], "Attribute":"Region"}]}]}) Returns the total number of cases in the model having "Dallas" as the value of "Region" case attribute.
GetValueFrom function
Retrieves a value from a related object that there may be only one for a source object (e.g., when going from events to cases). See the type of relations between objects in the process mining concepts. The diagram shows that when going to direction where the target objects has count of 1, GetValueFrom function needs to be used.
Parameters:
- Aggregation level: Aggregation level to aggregate from. This includes possible additional data frame expressions to prepare the aggregation level. Available aggregations are listed in table below (cardinality 1).
- Expression: Expression to evaluate in given aggregation level to get the returned value.
- Filter: Optional filter to apply prior to performing expression evaluation. Filter is given as dictionary following the JSON filter syntax.
Examples as measure expression for events:
GetValueFrom(Cases, Column("Account Manager\")) Returns for each event the value of Account Manager case attribute. GetValueFrom(Variations, Column("Variation")) Returns for each event variation/event type path string of its case.
Examples as measure expression for events:
GetValueFrom(Variations, AggregateFrom(Cases, "Count")) Returns the number of cases having the same variation for every case. GetValueFrom(Cases, Column("Variation"), #{"Items":[#{"Type":"IncludeEventTypes","Items":[#{"Type":"EventType","Values":["Shipment","Invoice"]}]}]}) Returns cases with their variations where only "Shipment" and "Invoice" event types are taken into account.
Other functions
Function | Description |
---|---|
CaseWhen |
Goes through conditions and returns a value when the first condition is true, similar to an if-then-else structure. Once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the else expression. If the else expression is not defined (i.e. there are even number of parameters), null value is returned. Consists of any number of pairs of condition and value expressions followed by an optional else expression. The odd parameters are the conditions and the even parameters are the return values. CaseWhen(Column("a") == null, 1, Column("a") < 1.0, 2, 3) Returns 1 if the value of column "a" is null. Returns 2 if the value of column "a" is less than 1.0. Returns 3 otherwise. Returns given value rounded to the nearest equal or larger integer. The data type should be one of the numeric data types. If the value is null, then the result is also null. |
Coalesce |
Returns the first non-null parameter. There can be any number of parameters. If all parameters are null, returns null. Coalesce(null, 3, 2) Returns 3. Coalesce(Column("column1"), "N/A") Returns column "column1" value, except replaces nulls with "N/A". |
Column |
Return the value of given column. Column("column1") Column("My Column 2") |
GetObjectProperty |
Returns a sub-object in a key from an object. The first parameter is the object and the second parameter is the key path. This function returns an object (that's why string values contain quotation marks). This function doesn't work in SQL Server. Examples: GetObjectProperty(#{"key1": "value 1", "key2": 3 }, "key1") Returns: "value 1" GetObjectProperty(#{"key1": ["value 1", "value 2"] }, "key1[1]") Returns: "value 2" GetObjectProperty(#{"key1": [ #{"key2": 3 }, #{"key2": 4 } ] }, "key1[0].key2") Returns: 3 GetObjectProperty(#{"key1": [ #{"key2": 3 }, #{"key2": 4 } ] }, "key1[0]") Returns: { "key2": 3 } |
GetJsonValue |
Returns value of key from a json string. The first parameter is the json string and the second parameter is the key path. This function returns a string. This function works both in Snowflake and in SQL Server, but in SQL Server the returned value cannot be a json object. Examples: GetJsonValue("{\"key1\": \"value 1\", \"key2\": 3 }", "key1") Returns: value 1 GetJsonValue("{\"key1\": [\"value 1\", \"value 2\"] }", "key1[1]") Returns: value 2 GetJsonValue("{\"key1\": [ {\"key2\": 3 }, {\"key2\": 4 } ] }", "key1[0].key2") Returns: 3 GetJsonValue("{\"key1\": [ {\"key2\": 3 }, {\"key2\": 4 } ] }", "key1[0]") Returns: {"key2":3} (this only works in Snowflake) |
In |
Returns true, if the test expression (given as the first parameter) matches with any of the other expressions given as parameters (starting from the seconds parameter). Examples: In(Column("Country"), "Germany", "France", "UK") Return true if column "Country" is Germany, France or UK. In("Germany", Column("SourceCountry"), Column("DestinationCountry")) Returns true, if column SourceCountry or DestinationCountry is Germany (or both columns). |
Variable |
Gets variable value available in the in-memory expression context where the SQL expression is run. Supports string, number, boolean and date values. In addition, the variable can contain SQL expression which is added as such to the contained SQL expression. Example of an in-memory expression having SQL expression in the Where function: let myRegion = "Dallas"; DatatableById(123) .SqlDataFrame .Where(Column("Region") == Variable("myRegion")) .Collect() Filters datatable by Region is Dallas. Note that in the above example Column("Region") == Variable("myRegion") is the SQL expression. |
Process mining objects, properties and aggregations
The following table summarizes the available process mining objects and which properties and aggregations are available for them. For aggregations, the AggregateFrom function is used with cardinality N aggregations and the GetValueFrom function with the cardinality 1 aggregations.
Object type | Properties | Aggregations (cardinality in parenthesis) |
---|---|---|
Cases |
|
|
Events |
|
|
EventTypes |
|
|
Variations |
|
|
Flows |
|
|
FlowOccurrences |
|
|
Model |
|
|
Root expression examples:
For cases (and also events), the case id can be referred using CaseId:
Cases.Where(CaseId == "Case_123")
Assuming that there is an Order Id column that is mapped to the CaseId, also the original column name can be used:
Cases.Where(Column("Order Id") == "Case_123")
For events, the event type can be referred using EventType:
Events.Where(EventType == "Order created")
Assuming that there is an Process step column that is mapped to the EventType, also the original column name can be used:
Cases.Where(Column("Process step") == "Order created")
Aggregation operations
Aggregations are used when multiple items or rows are combined into a single value. The following aggregations are available:
- Sum: Sum of the items. Can be used for numbers.
- Average: Average of the items (sum of items divided by count of items). Can be used for numbers.
- Median: The middle value of the sorted data. If there are even number of items, the average of the two middle items. Can be used for numbers.
- Min: Lowest value for numbers, alphabetically first value for texts and earliest value for dates.
- Max: Highest value for numbers, alphabetically last value for texts and last value for dates.
- Count: Number of items. Can be used for any type of data.
- CountDistinct: Number of unique items (i.e., duplicates are removed). Can be used for any type of data.
- Percentile: Percentile value. Can be used for numbers. Can be defined as an object as follows:
#{ "Function": "Percentile", "Percentile": 0.75 }
Percentile aggregation has following parameter:- Percentile: Percentile value between 0 and 1.
- Stdev: Sample standard deviation. Can be calculated for numbers.
- Stdevp: Population standard deviation. Can be calculated for numbers.
- Var: Sample variance. Can be used for numbers.
- Varp: Population variance. Can be used for numbers.
- List: Combines textual values into a single list in defined order and separated by defined characters. Can be defined as an object as follows:
#{ "Function": "List", "Ordering": [EventType], "Separator": ", ", "Distinct": true }
List aggregation has following parameters:- Ordering: Array of columns defining how items are ordered. Ordering is supported only when Distinct is false.
- Separator: Characters separating the combined items.
- Distinct: Whether duplicate items are removed. Default value is false. When showing distinct items, the items are sorted by their values.
- Any: Returns any value of the aggregated values. This is the most efficient aggregation when it's known that there is maximum of one value among the aggregated values. Note that there is no guarantee that the returned value will be the same every time.
- Array: Returns the aggregated values as a single array object. This aggregation has following parameters:
- Ordering: Array of columns defining how items are ordered.
- Distinct: Whether duplicate items are removed. Default value is false. When showing distinct items, the items are sorted by their values.