SqlDataFrame in Expression Language: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 11: Line 11:
! '''Parameters'''
! '''Parameters'''
! '''Description'''
! '''Description'''
|-
||Aggregate (DataFrame)
||
# Aggregated columns (string array or key-value pairs)
# Aggregation methods (string array)
||Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]].
|-
|-
||Collect (DataFrame)
||Collect (DataFrame)
Line 32: Line 38:
||Head (SqlDataFrame)
||Head (SqlDataFrame)
||Number of top rows
||Number of top rows
||
Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]].
|-
||GroupBy (GroupedDataFrame)
||
Grouped columns (string array)
||
||
Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]].
Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]].

Revision as of 15:00, 5 May 2021

SqlDataFrame represents tabular data similar to a result of an SQL query. Data in the SqlDataFrames are processed in the original datasource where the data is located (not in the QPR ProcessAnalyzer server in-memory). For each SqlDataFrame, there is an SQL query that processes and generates the actual data in the datasource.

Running SqlDataFrames and operations between them don't yet execute the SQL in the original datasource. This happens when the Collect function is called for an SqlDataFrame, which executes the SQL query of the SqlDataFrames and loads the data into QPR ProcessAnalyzer server in-memory dataframe (where it can be presented in a dashboard).

Each SqlDataFrames knows the datasource where the SQL query is executed.

There is a similar API for the SqlDataFrames as there is for the DataFrames.

SqlDataFrame functions Parameters Description
Aggregate (DataFrame)
  1. Aggregated columns (string array or key-value pairs)
  2. Aggregation methods (string array)
Same functionality as in the DataFrame.
Collect (DataFrame) (none)

Executes the SQL query for the SqlDataFrame in the datasource and returns results as an in-memory DataFrame. Then processing of the data can be continued as the in-memory DataFrame. In addition to the Persist function, Collect function is the only way to get the actual SQL query executed to see the results (or store them to a table).

Examples:

DataTableById(123).SqlDataFrame.Head(100).Collect().ToCsv()
Returns the top 100 rows from datatable id 123.
ExcludeValues (SqlDataFrame)
  1. Column name (string)
  2. Value (single item) or values (array) to exclude

Same functionality as in the DataFrame.

Head (SqlDataFrame) Number of top rows

Same functionality as in the DataFrame.

GroupBy (GroupedDataFrame)

Grouped columns (string array)

Same functionality as in the DataFrame.

IncludeOnlyValues (SqlDataFrame)
  1. Column name (string)
  2. Value (single item) or values (array) to include

Same functionality as in the DataFrame.

Join (SqlDataFrame)
  1. DataFrame
  2. Columns to match (String or key-value pairs)
  3. Join type (String)
Same functionality as in the DataFrame.
OrderByColumns (SqlDataFrame)
  1. Columns names to be ordered (String array)
  2. Sorting order (boolean array)

Same functionality as in the DataFrame.

Select (SqlDataFrame)

Column names (string array, or key-value pairs)

Same functionality as in the DataFrame.

TakeSample (SqlDataFrame) Number of rows (Integer)

Takes a random sample of given number of rows of the original rows.

Examples:

DataTableById(123).SqlDataFrame.TakeSample(100).Collect()
Returns 100 random rows out of the datatable.