SqlDataFrame in Expression Language

From QPR ProcessAnalyzer Wiki
Revision as of 15:07, 1 December 2021 by Ollvihe (talk | contribs)
Jump to navigation Jump to search

SqlDataFrame represents tabular data similar to an SQL query result. Data in the SqlDataFrames are processed in the original datasource where the data is located (not in the QPR ProcessAnalyzer server memory). For each SqlDataFrame, there is an SQL query generated that is run in the datasource.

SqlDataFrame operations don't yet execute the SQL in the original datasource, but it will happen only when the Collect function is called for an SqlDataFrame, which executes the SQL query of the SqlDataFrames and loads the data into QPR ProcessAnalyzer memory DataFrame (where it can be presented in a dashboard).

Each SqlDataFrame contain information about the datasource where the SQL query will be executed. When writing queries with several SqlDataFrames, the SqlDataFrames need to be located in the same datasource, because otherwise the queries cannot be executed. If needed, data can be moved between datasources by using the Import or Persist functions. Alternatively, processing can be done in the memory by calling Collect for the SqlDataFrames and continuing queries using the in-memory DataFrames.

There is a similar API for the SqlDataFrames as there is for the DataFrames. Note that merging is not possible between SqlDataFrames like it's for DataFrames, but SqlDataFrame can be merged into a datatable.

SqlDataFrame properties Description
ColumnTypes (Dictionary)

Returns information about the columns of the SqlDataFrame as an array of dictionaries with keys Name and DataType. Columns are returned in the same order as the columns exist in the data table. Column types are calculated based on the type of the column in the relational database management system (i.e. type of the column in an SQL table).

Examples:

table.SqlDataFrame.ColumnTypes
Returns: [ #{ "Name": "name1", "DataType": "Integer" },  #{ "Name": "name2", "DataType": "Any" } ]

table.SqlDataFrame.ColumnTypes.Name
Returns: ["name1", "name2"]

table.SqlDataFrame.ColumnTypes.DataType
Returns: ["Integer", "Any"]
NColumns (Integer) Returns number of columns in the dataset represented by the SqlDataFrame.
NRows (Integer) Returns number of rows in the dataset represented by the SqlDataFrame.


SqlDataFrame functions Parameters Description
Aggregate (SqlDataFrame)
  1. Aggregated columns (string array or key-value pairs)
  2. Aggregation methods (string array)
Same functionality as in the DataFrame.
Append DataFrame to append Same functionality as in the DataFrame.
Collect (SqlDataFrame) (none)

Executes the SQL query for the SqlDataFrame in the datasource and returns results as an in-memory DataFrame. Then processing of the data can be continued as the in-memory DataFrame. In addition to the Persist function, Collect function is the only way to get the actual SQL query executed to see the results (or store them to a table).

Examples:

DataTableById(123).SqlDataFrame.Head(100).Collect().ToCsv()
Returns the top 100 rows from datatable id 123.
ExcludeValues (SqlDataFrame)
  1. Column name (string)
  2. Value (single item) or values (array) to exclude

Same functionality as in the DataFrame.

GroupBy (GroupedDataFrame)

Grouped columns (string array)

Same functionality as in the DataFrame.

Head (SqlDataFrame) Number of top rows

Same functionality as in the DataFrame.

IncludeOnlyValues (SqlDataFrame)
  1. Column name (string)
  2. Value (single item) or values (array) to include

Same functionality as in the DataFrame.

Join (SqlDataFrame)
  1. DataFrame
  2. Columns to match (String or key-value pairs)
  3. Join type (String)
Same functionality as in the DataFrame.
OrderByColumns (SqlDataFrame)
  1. Columns to be ordered (String array)
  2. Sorting order (boolean array)

Same functionality as in the DataFrame.

Persist (SqlDataFrame)
  1. DataTable name
  2. Additional parameters

Same functionality as in the DataFrame. If the SQL query for the SqlDataFrame is run in the same system as the target Datatable, for efficient operation, all data processing and storage is done within the system.

Select (SqlDataFrame)

Column names (string array, or key-value pairs)

Same functionality as in the DataFrame.

TakeSample (SqlDataFrame) Number of rows (Integer) Same functionality as in the DataFrame.
WithDenseRankNumberColumn (SqlDataFrame)
  1. New column name (String)
  2. Order by columns (String array)
  3. Partition by columns (String array)

Same functionality as in the DataFrame.

WithExpressionColumn (SqlDataFrame)
  1. New column name (String)
  2. Calculation expression

Same functionality as in the DataFrame. Note that not all expression language functionality are available in the expression, as the condition is converted into SQL having differences and limitations comparing to the expression language functionality.

WithRankColumn (SqlDataFrame)
  1. New column name (String)
  2. Order by columns (String array)
  3. Partition by columns (String array)

Same functionality as in the DataFrame.

WithRowNumberColumn (SqlDataFrame)
  1. New column name (String)
  2. Order by columns (String array)
  3. Partition by columns (String array)

Same functionality as in the DataFrame.

Where (SqlDataFrame) Condition expression Same functionality as in the DataFrame. Note that not all expression language functionality are available in the condition expression, as the condition is converted into SQL having differences and limitations comparing to the expression language functionality.

Condition also supports CaseWhen function, which goes through conditions and returns a value when the first condition is met (like an if-then-else statement). The odd parameters are the conditions and the even parameters are the return values. If no conditions are true, it returns the value in the last parameter which is the "else" parameter. If the "else" parameter is not defined (i.e. there are even number of parameters), null value is used as default.