SqlDataFrame in Expression Language: Difference between revisions
No edit summary |
No edit summary |
||
(13 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
SqlDataFrame represents tabular data similar to | SqlDataFrame represents tabular data similar to an SQL query result. Data in the SqlDataFrames are processed in the original datasource where the data is located (not in the QPR ProcessAnalyzer server memory). For each SqlDataFrame, there is an SQL query generated that is run in the datasource where the referenced datatables are located. | ||
SqlDataFrame operations itself don't cause the SQL to execute in the datasource, but it will happen when the '''Collect''' function is called for an SqlDataFrame, which generates and executes the SQL query representing the SqlDataFrame and loads the data into QPR ProcessAnalyzer memory as a DataFrame (where it can be presented in a dashboard). | |||
Each | Each SqlDataFrame contain information about the datasource where the SQL query will be executed. When writing queries with several SqlDataFrames, the SqlDataFrames need to be located in the same datasource, to be able to execute the queries. If needed, data can be moved between datasources by using the ''Import'' or ''Persist'' functions. Alternatively, processing can be done in-memory by calling ''Collect'' for an SqlDataFrame and continuing calculation as in-memory DataFrame. | ||
There is a similar API for the SqlDataFrames as there is for the DataFrames. | There is a similar API for the SqlDataFrames as there is for the DataFrames. Note that merging is not possible between SqlDataFrames like it's for DataFrames, but SqlDataFrame can be merged into a [[QPR_ProcessAnalyzer_Objects_in_Expression_Language#Datatable|datatable]]. | ||
{| class="wikitable" | |||
!'''SqlDataFrame properties''' | |||
! '''Description''' | |||
|- | |||
||ColumnTypes (Dictionary) | |||
|| | |||
Returns information about the columns of the SqlDataFrame as an array of dictionaries with keys Name and DataType. Columns are returned in the same order as the columns exist in the data table. Column types are calculated based on the type of the column in the relational database management system (i.e. type of the column in an SQL table). | |||
Examples: | |||
<pre> | |||
table.SqlDataFrame.ColumnTypes | |||
Returns: [ #{ "Name": "name1", "DataType": "Integer" }, #{ "Name": "name2", "DataType": "String" } ] | |||
table.SqlDataFrame.ColumnTypes.Name | |||
Returns: ["name1", "name2"] | |||
table.SqlDataFrame.ColumnTypes.DataType | |||
Returns: ["Integer", "String"] | |||
</pre> | |||
|- | |||
||NColumns (Integer) | |||
||Returns number of columns in the dataset represented by the SqlDataFrame. | |||
|- | |||
||NRows (Integer) | |||
||Returns number of rows in the dataset represented by the SqlDataFrame. | |||
|} | |||
{| class="wikitable" | {| class="wikitable" | ||
Line 18: | Line 44: | ||
# Aggregated columns (string array or key-value pairs) | # Aggregated columns (string array or key-value pairs) | ||
# Aggregation methods (string array) | # Aggregation methods (string array) | ||
||Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | |||
|- | |||
||Append | |||
||DataFrame to append | |||
||Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | ||Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | ||
|- | |- | ||
Line 23: | Line 53: | ||
||(none) | ||(none) | ||
|| | || | ||
Executes the SQL query for the SqlDataFrame in the datasource and returns results as an in-memory DataFrame. Then processing of the data can be continued as the in-memory DataFrame | Executes the SQL query for the SqlDataFrame in the datasource and returns results as an in-memory DataFrame. Then processing of the data can be continued as the in-memory DataFrame. | ||
Examples: | Examples: | ||
Line 65: | Line 95: | ||
||OrderByColumns (SqlDataFrame) | ||OrderByColumns (SqlDataFrame) | ||
|| | || | ||
# | # Ordered columns (String array) | ||
# Sorting order (boolean array) | # Sorting order (boolean array) | ||
|| | || | ||
Line 75: | Line 105: | ||
# Additional parameters | # Additional parameters | ||
|| | || | ||
Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. Additionally, if the SQL query for the SqlDataFrame is run in the same system as the target datatable, all data processing and storage is done within the system to achieve efficient operation. | ||
|- | |||
||RemoveColumns (SqlDataFrame) | |||
||Column names (string array) | |||
||Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | |||
|- | |- | ||
||Select (SqlDataFrame) | ||Select (SqlDataFrame) | ||
Line 82: | Line 116: | ||
|| | || | ||
Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | ||
|- | |||
||Skip (SqlDataFrame) | |||
||Number of rows to skip | |||
||Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | |||
|- | |- | ||
||TakeSample (SqlDataFrame) | ||TakeSample (SqlDataFrame) | ||
Line 95: | Line 133: | ||
Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]]. | ||
|- | |- | ||
|| | ||WithColumn (SqlDataFrame) | ||
|| | || | ||
# New column name (String) | # New column name (String) | ||
# | # New column expression | ||
|| | || | ||
Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]] | Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]], except instead of in-memory expressions, SqlDataFrame use [[SQL Expressions]]. | ||
|- | |- | ||
||WithRankColumn (SqlDataFrame) | ||WithRankColumn (SqlDataFrame) | ||
Line 120: | Line 158: | ||
||Where (SqlDataFrame) | ||Where (SqlDataFrame) | ||
||Condition expression | ||Condition expression | ||
||Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]] | ||Same functionality as in the [[DataFrame_in_Expression_Language|DataFrame]], except instead of in-memory expressions, SqlDataFrame use [[SQL Expressions]]. | ||
|} | |} |
Revision as of 14:56, 1 June 2022
SqlDataFrame represents tabular data similar to an SQL query result. Data in the SqlDataFrames are processed in the original datasource where the data is located (not in the QPR ProcessAnalyzer server memory). For each SqlDataFrame, there is an SQL query generated that is run in the datasource where the referenced datatables are located.
SqlDataFrame operations itself don't cause the SQL to execute in the datasource, but it will happen when the Collect function is called for an SqlDataFrame, which generates and executes the SQL query representing the SqlDataFrame and loads the data into QPR ProcessAnalyzer memory as a DataFrame (where it can be presented in a dashboard).
Each SqlDataFrame contain information about the datasource where the SQL query will be executed. When writing queries with several SqlDataFrames, the SqlDataFrames need to be located in the same datasource, to be able to execute the queries. If needed, data can be moved between datasources by using the Import or Persist functions. Alternatively, processing can be done in-memory by calling Collect for an SqlDataFrame and continuing calculation as in-memory DataFrame.
There is a similar API for the SqlDataFrames as there is for the DataFrames. Note that merging is not possible between SqlDataFrames like it's for DataFrames, but SqlDataFrame can be merged into a datatable.
SqlDataFrame properties | Description |
---|---|
ColumnTypes (Dictionary) |
Returns information about the columns of the SqlDataFrame as an array of dictionaries with keys Name and DataType. Columns are returned in the same order as the columns exist in the data table. Column types are calculated based on the type of the column in the relational database management system (i.e. type of the column in an SQL table). Examples: table.SqlDataFrame.ColumnTypes Returns: [ #{ "Name": "name1", "DataType": "Integer" }, #{ "Name": "name2", "DataType": "String" } ] table.SqlDataFrame.ColumnTypes.Name Returns: ["name1", "name2"] table.SqlDataFrame.ColumnTypes.DataType Returns: ["Integer", "String"] |
NColumns (Integer) | Returns number of columns in the dataset represented by the SqlDataFrame. |
NRows (Integer) | Returns number of rows in the dataset represented by the SqlDataFrame. |
SqlDataFrame functions | Parameters | Description |
---|---|---|
Aggregate (SqlDataFrame) |
|
Same functionality as in the DataFrame. |
Append | DataFrame to append | Same functionality as in the DataFrame. |
Collect (SqlDataFrame) | (none) |
Executes the SQL query for the SqlDataFrame in the datasource and returns results as an in-memory DataFrame. Then processing of the data can be continued as the in-memory DataFrame. Examples: DataTableById(123).SqlDataFrame.Head(100).Collect().ToCsv() Returns the top 100 rows from datatable id 123. |
ExcludeValues (SqlDataFrame) |
|
Same functionality as in the DataFrame. |
GroupBy (GroupedDataFrame) |
Grouped columns (string array) |
Same functionality as in the DataFrame. |
Head (SqlDataFrame) | Number of top rows |
Same functionality as in the DataFrame. |
IncludeOnlyValues (SqlDataFrame) |
|
Same functionality as in the DataFrame. |
Join (SqlDataFrame) |
|
Same functionality as in the DataFrame. |
OrderByColumns (SqlDataFrame) |
|
Same functionality as in the DataFrame. |
Persist (SqlDataFrame) |
|
Same functionality as in the DataFrame. Additionally, if the SQL query for the SqlDataFrame is run in the same system as the target datatable, all data processing and storage is done within the system to achieve efficient operation. |
RemoveColumns (SqlDataFrame) | Column names (string array) | Same functionality as in the DataFrame. |
Select (SqlDataFrame) |
Column names (string array, or key-value pairs) |
Same functionality as in the DataFrame. |
Skip (SqlDataFrame) | Number of rows to skip | Same functionality as in the DataFrame. |
TakeSample (SqlDataFrame) | Number of rows (Integer) | Same functionality as in the DataFrame. |
WithDenseRankNumberColumn (SqlDataFrame) |
|
Same functionality as in the DataFrame. |
WithColumn (SqlDataFrame) |
|
Same functionality as in the DataFrame, except instead of in-memory expressions, SqlDataFrame use SQL Expressions. |
WithRankColumn (SqlDataFrame) |
|
Same functionality as in the DataFrame. |
WithRowNumberColumn (SqlDataFrame) |
|
Same functionality as in the DataFrame. |
Where (SqlDataFrame) | Condition expression | Same functionality as in the DataFrame, except instead of in-memory expressions, SqlDataFrame use SQL Expressions. |