Web API: Expression/query: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
Line 62: Line 62:
* '''ResponseType''': Determines the format of the response, either '''array''' (default) or '''object'''. See below how the structure of the JSON differs.
* '''ResponseType''': Determines the format of the response, either '''array''' (default) or '''object'''. See below how the structure of the JSON differs.
* '''StringifyValues''': determines that the data items are stringified (either '''true''' or '''false''', by default false). When no stringification is used, values use JSON standard formatting for strings, numbers, booleans and nulls. Dates use format ''yyyy-MM-ddTHH:mm:ss.fff'', and other data types use default stringification (more information about [[Stringified_Value_Format|stringified values]]).
* '''StringifyValues''': determines that the data items are stringified (either '''true''' or '''false''', by default false). When no stringification is used, values use JSON standard formatting for strings, numbers, booleans and nulls. Dates use format ''yyyy-MM-ddTHH:mm:ss.fff'', and other data types use default stringification (more information about [[Stringified_Value_Format|stringified values]]).
* '''Timeout''': Timeout in seconds for the query. If the timeout is exceeded in a long-running query, the query execution is stopped and an error is returned. When no timeout is specified for the query, the server level global timeout is applied.


Example ''array'' type of response (without stringification):
Example ''array'' type of response (without stringification):

Revision as of 08:45, 22 August 2022

The Expression query runs a query written using the expression language in the server and returns the query results. The expression language allows to query both the evenlog data and metadata in the system (e.g. datatables, projects, users).

See also, Expression Query examples.

Request

The request endpoint url is qprpa/api/expression/query and there are optional ResponseType and StringifyValues url parameters (see more in the response chapter. The payload of the request is a JSON object which properties are described in the query chapter. Example query:

Url: POST /api/expression/query?ResponseType=object&StringifyValues=false
Content-Type: application/json;charset=UTF-8
Body:
{
  "ModelId": 1234,
  "Filter": {
    "Items": [
      {
        "Type": "IncludeCases",
        "Items": [
          {
            "Type": "EventType",
            "Values": [
              "Change Price"
            ]
          }
        ]
      }
    ]
  },
  "Root": "Cases",
  "Dimensions": [
    {
      "Name": "Start Month",
      "Expression": "StartTime.Month"
    },
    {
      "Name": "Region",
      "Expression": "Attribute(\"Region\")"
    }
  ],
  "Values": [
    {
      "Name": "Average Case Duration in Days",
      "Expression": "Average(_.Duration).TotalDays"
    }
  ],
  "Ordering": [
    {
      "Name": "Start Month",
      "Direction": "Ascending"
    },
    {
      "Name": "Average Case Duration in Days",
      "Direction": "Descending"
    }
  ]
}

HTTP request header Authorization with value Bearer <access token> needs to be in place to identify the session.

Response

The response contains data in a tabular format (one-to-many columns with names and zero-to-many rows). The following url parameters are available to define the results format:

  • ResponseType: Determines the format of the response, either array (default) or object. See below how the structure of the JSON differs.
  • StringifyValues: determines that the data items are stringified (either true or false, by default false). When no stringification is used, values use JSON standard formatting for strings, numbers, booleans and nulls. Dates use format yyyy-MM-ddTHH:mm:ss.fff, and other data types use default stringification (more information about stringified values).
  • Timeout: Timeout in seconds for the query. If the timeout is exceeded in a long-running query, the query execution is stopped and an error is returned. When no timeout is specified for the query, the server level global timeout is applied.

Example array type of response (without stringification):

[
  {"A": "DataA", "B": "DataB", "C": "DataC"},
  {"A": "DataA", "B": "DataB", "C": "DataC"},
  {"A": "DataA", "B": "DataB", "C": "DataC"}
]

Example object type of response (without stringification):

{
  "Columns": [
    {"Name": "A"},
    {"Name": "B"},
    {"Name": "C"}
  ],
  "Rows": [
    ["DataA", "DataB", "DataC"],
    ["DataA", "DataB", "DataC"],
    ["DataA", "DataB", "DataC"]
  ]
}

Query

When making a query to a model, the ModelId property is mandatory and additionally Filter property can be used to filter the data. The Root parameter is mandatory, and optionally Dimensions and Measures are used. When making a query that is not targeted a certain model, the ContextType needs to be generic and then ModelId and Filter properties are not used.

Properties

Property Description
ProcessingMethod

Defines the processing method to perform calculations:

  • inmemory : Calculation is done using QPR ProcessAnalyzer in-memory core based on the EventLog object and related entities. (default)
  • dataframe: : Calculation is done using SqlDataFrames. The root is assumed to produce an SqlDataFrame for which defined dimensions, measures, sorting etc. are applied. All calculations are performed in the datasource where the data is stored (for example: in QPR ProcessAnalyzer SQL Server database, Snowflake, AWS Redshift or Azure Databricks).
PreferSqlDataFrame When set to true (default), dataframe mode calculations (i.e. ProcessingMethod=dataframe) are done in the original datasource using SQL language queries. When false, dataframe mode calculations are performed in-memory (which also requires to load the data to memory). Usually calculations are done in the original datasource.
ContextType

Determines in which context the root expression is run. Following contexts can be used:

  • eventlog: root expression is run in the EventLog context (and thus EventLog functions and properties can be used). When EventLog context is used, the model is always loaded into memory already before the root expression is run. ProcessingMethod dataframe cannot be used with this context type.
  • model: root expression is run in the Model context. The model is not automatically loaded into memory (although using the EventLog property will load it into memory).
  • generic: root expression is run in the generic context (i.e. context that is not related to any model)

If not specified, eventlog is used if FilterId or ModelId parameter is specified.

ModelId Model for which the calculation is run. This parameter is mandatory, if the calculation is run in an eventlog context.
SourceData

Defines expressions and column mappings for the events and cases data for which the query is performed. If SourceData is not defined and ContextType is Model or EventLog, case and events is read from the model (defined by the ModelId and Filter parameters). It's possible to override individual model level settings by defining them in the SourceData. For examples, when the Expressions are not defined, the model defined datatables are used, but it's possible to change the column mappings by defining them in the SourceData.

SourceData has the following structure:

  • Events:
    • Expression: Expression to return events data as SqlDataFrame (one row for each event). There needs to be columns for the case id, event type and timestamp.
    • Columns: Section to specify mappings for the events data.
      • CaseId: Mapping to the case ID column.
      • EventType: Mapping to the event type name column.
      • Timestamp: Mapping to the event timestamp column.
  • Cases:
    • Expression: Expression to return cases data as SqlDataFrame (one row for each case). There needs be a column for the case id.
    • Columns: Section to specify mappings for the cases data.
      • CaseId: Mapping to the case ID column.

Example where all settings are defined:

{
	"SourceData": {
		"Events": {
			"Expression": "...",
			"Columns": {
				"CaseId": "CaseIdColumn",
				"EventType": "EventTypeColumn",
				"Timestamp": "TimestampColumn"
			}
		},
		"Cases": {
			"Expression": "...",
			"Columns": {
				"CaseId": "CaseIdColumn"
			}
		}
	}
}

Example where only the event type mapping is overridden for this query:

{
	"SourceData": {
		"Events": {
			"Columns": {
				"EventType": "EventTypeColumn"
			}
		}
	}
}
Filter Filter definition as JSON. If Filter is defined, please do not use the FilterId parameter.
FilterId Stored filter for which the query is run. Alternative to the Filter parameter. If neither FilterId nor Filter is specified, calculation is run from the entire model.
Comparison Comparison definition as similar JSON structure than Filter.
Root The root expression is evaluated first, and objects it returns are used in the next calculation step (which is dimensioning if it's enabled). The root expression is run in the context defined by the ContextType parameter. If ProcessingMethod is dataframe, also expression aliases can be used in the root expression.
Ordering Array of Ordering objects. Ordering defined how the result table is sorted.
Property Description
Name Dimension or Value name to sort the data.
Direction Sorting direction, either Ascending or Descending.
AggregateOthers

When true, all rows that are left out due to the MaximumRowCount limit, are aggregated and shown as the last row. Expressions to calculate the aggregations, can be defined for each dimension and measure using the AggregationExpression property. When the AggregationExpression is not defined, the default aggregated value is null. When the AggregateOthers is used, the maximum number of rows is still the MaximumRowCount, so with the aggregation there is one data row less shown. Default for AggregateOthers is false.

MaximumRowCount

Defines a limit how many rows at maximum are returned by the query. Defined as an integer. The default is null meaning that all rows will be returned. If sorting is used, the query results are first sorted before the limit is applied.

FirstRow

Defines the starting row number from the query results to return. Defined as an integer, and row numbering start from 0. The default is null meaning that no rows are skipped. If sorting is used, the query results are first sorted before this setting is applied. If MaximumRowCount is also defined, the FirstRow is applied before the MaximumRowCount.

Criteria Expression returning boolean value to filter the resulting dataset row-by-row after dimensions and values are calculated. The criteria expression is calculated for each row of the dataset, and those rows are removed where the criteria expression results a false value. The respective row in the dataset is as a context for the criteria expression, similar to the DataFrame's Where function.
RowInitExpression Specifies an expression that is evaluated for every generated row before evaluating the value expressions. RowInitExpression is calculated after the dimension expressions have been calculated and dimensions generated. RowInitExpression can be used to make common calculations and define variables that are needed in several value expressions.
ColumnOrdering

An array of column names that defines the order in which the columns are returned. If an array contains a name that does not exist in the results, its value will be returned as null values in the returned result. If an array does not contain a name that is in the original result, then that column will not be returned at all. If ColumnOrdering is not defined, all the result rows will be returned in the defined order so that all the dimensions are returned before the values. Example:

"ColumnOrdering": ["Case Count", "Account Manager", "Average Duration"]
EnableResultCaching When set to true, expression query results are cached both in the client and server side. When false, query results are not cached. the default value is true when ContextType is eventlog or model. When ContextType is generic, no caching is used regardless of this setting.
QueryIdentifier Identifier of the query. Can be used to refer to the query, when cancelling pending queries.
CancelEarlierQueriesWithIdentifier Boolean value defining whether possible previous query with the same QueryIdentifier is cancelled when a new query with the same identifier is received by the server. The default value is false.

Expression allows configurable objects, such as cases, events, event types or flows, to be divided into configurable dimensions and calculable values (KPIs). In a basic form, the result of this query is a table with the following columns: (1) One column for each specified dimension, and (2) at least one column for each specified value. The result of this query is a table with one row for each unique dimension value combination.

When building expression query, the individual expressions are run in following contexts:

  • Root expression is run either in the EventLog or generic context, depending on the ContextType setting.
  • Dimension expressions are run in the context of the individual object returned by the root expression.
  • Value (measure) expressions is run in the context of the array of items that are part of the dimensioned slice. If dimensioning is disabled (i.e. dimensions parameter is null), the value expression is run in the context of the individual object returned by the root expression.
  • Row initialization expression is run in the same context as the measures. RowInitExpression is run before the measures, so variables initialized in the RowInitExpression are available in the measures.
  • Aggregation expression is run in the context of an array of the rest of the items, i.e. items to be aggregated as the last row.

Dimensions

Using dimensions, the root rows are sliced into different groups based on how many unique values the dimension expressions produce. There can be zero to many dimensions. The result data contains a column for each dimension.

If dimensions are an ampty array (i.e. Dimensions: []), all root objects are aggregated into a single group. If dimensions array is not defined (i.e. Dimensions: null), each object in the root expression will be as a separate row in the result. Then context for the value expression is a single object, instead of array of objects.

Each dimension may have the following properties:

Property Description
Name Name of the dimension. Dimension name works as the title for the dimension column. In addition, the dimension name is used when referring to this dimension in the ordering.
Expression Expression to calculate the dimension value for each root object, i.e. each of the root objects are used as the context for the dimension expression.
ValueExpression Specifies an expression that is used as the final returned value of the dimension. The ValueExpression is calculated in the context of an unordered array containing of all the objects in the dimension represented by the row (i.e. the context is different than in the Expression). The ValueExpression can be used to improve performance as follows: the Expression is used to provide a simplified expression that's enough for the dimensioning, and define a more complex expression in the ValueExpression to provide the final returned value. The benefit is that the ValueExpression is not calculated for rows that are left out by the MaximumRowCount setting.
NumberPrecision Specified the number of decimals for rounding numerical dimension values. If not defined, no rounding is done. Negative number can be used to round to nearest tens (-1), hundreds (-2), etc. The rounded values are dividing values into the distinct dimension slots. If any value in this dimension is other than numerical, error is given. Works for both the inmemory and dataframe processing methods.
DatetimeTruncation Dimension values that are of the datetime type, are truncated (i.e. rounded downwards) using the defined time period (granularity). Available values for the inmemory processing are year, halfyear, quarter, month, week, day, hour, minute, second and millisecond. Truncation is done before dividing values into distinct dimension slots. If any value of this dimension is other than datetime, error is given. Works for both the inmemory and dataframe processing methods.

For the week truncation, in the in-memory processing the locale settings of the QPR ProcessAnalyzer server determine the first day of the week, and in the dataframe processing, SQL Server or Snowflake settings determine the first day of the week.

TimespanPrecision Dimension values that are of the timespan type, are rounded using the defined granularity. Available values are year, halfyear, quarter, month, fortnight, week, day, hour, minute, second and millisecond. Rounding is done before dividing values into distinct dimension slots. If any value of this dimension is other than timespan, error is given.

This settings does not work for the dataframe processing method.

IsHidden Is the dimension column hidden (true or false). Hiding dimensions don't affect the calculation, but the hidden columns are not returned to the client.

Values

When there are dimensions defined, the Values are aggregations from the dimensioned root objects, usually these are the measures or KPI's. Value expressions are calculated for each dimensioned data group, i.e. for each unique dimension value. The context of the expression to calculate the value is an array of objects. If dimensions are not defined, context for the value expression is a single root object. Each value may have the following properties:

Property Description
Name Name for the value. Used only when ValueType is Single.
Expression Behavior depends on the ValueType property:
  • Single: Expression for calculating the value for the single column. The expression is calculated for each row (i.e. the unique dimension combinations), and the expression usually contains aggregations from the group of objects into a single value.
  • Dynamic: Expression for calculating the value for each dynamically generated column specified by ValueDimensionExpression property. The result of the ValueDimensionExpression is accessible via ValueDimension -variable.
  • Pivot: Expression to produce an array, where each item represents a dynamically generated column. The result shown by the Expression query is number of items belonging to each column.
AggregationExpression Expression to calculate the aggregated value for the rest of the rows for this column when using the AggregateOthers setting. All the aggregated values are provided as the context (_) for the aggregation expression. If no aggregation expression is defined, null value is used. Example aggregation expressions:

Row count:

"AggregationExpression": "Count(_)"

Sum of values (for numerical columns):

"AggregationExpression": "Sum(_)"

Showing count, minimum and maximum values:

"AggregationExpression": "\"(\" + Count(_) + \" others) \" + Min(_) + \" - \" + Max(_)"
ValueType

Type of the value, which is one of the following:

  • Single (default): Produces a single column calculated by the expression.
  • Dynamic: There is an expression for defining the actual produced columns dynamically (ValueDimensionExpression) and another expression (Expression property) for specifying the actual values for the columns.
  • Pivot: Produces as many columns as there are unique values in the expression. Columns are ordered according to the order defined by DimensionOrderExpression.
ValueDimensionExpression

(ValueType: Dynamic, Pivot)

Expression used to generate columns in Dynamic and Pivot value types. The expression should return an array where each item represents a column. This expression is executed once when calculating the Expression query. For Dynamic type value ValueDimensionExpression field is mandatory, and for Pivot type it's not mandatory.

The result of this expression will be given as the value of variable ValueDimension when evaluating the Expression. For Pivot type value, this can be used to override the default set of columns created automatically based on the actual cell evaluation results. An array of all the actual dimension values created by the actual cell evaluation result is given as context object for the evaluation.

NameExpression

(ValueType: Dynamic, Pivot)

Expression used to generate column names in Dynamic and Pivot value types. This expression is run once for each column, and the expression has as a context the column generated in the ValueDimensionExpression, and as a result the expression should give the column name.
DimensionOrderExpression

(ValueType: Pivot)

Expression used to order the dimensions when pivot value type value is used. If not defined, the following default expression will be used: OrderByValue(_).

PivotAggregationExpression

(ValueType: Pivot)

Expression used to aggregate all the values within one cell of pivot type values. If the expression returns an array of arrays, the first item in the inner array is used as the ValueDimension and the second item is used as the root object when evaluating PivotAggregationExpression. By default, if expression returns only an array of atomic objects, the root object of the evaluation of PivotAggregationExpression is the value of the current ValueDimension.

DefaultValueExpression

(ValueType: Pivot)

Expression whose result is used to fill the "gaps" of the matrix created when pivot value type is used.

EvaluateAfterAggregation If set to true, makes the value calculation occur only after dimension aggregations have been performed, thus making it possible to calculate values based on the already aggregated value or dimension columns. Available only in the DataFrame processing method.
IsHidden Is the value column hidden (true or false). Hiding values don't affect the calculation, but the hidden columns are not returned to the client.