Datatable in Expression Language: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
Line 26: Line 26:
||ColumnTypes (Dictionary*)
||ColumnTypes (Dictionary*)
||
||
Array of Datatable column metadata in dictionaries. Each dictionary has '''Name''' and '''Datatype''' properties.
Returns Datatable column metadata as array of dictionaries. Each column dictionary has following properties:
* '''Name''' (string): Column name.
* '''Datatype''' (string); Column data type
* '''PrimaryKey''' (boolean): Whether this column is part of the datatable primary key.


Example: 3rd column data type:
<pre>
<pre>
DatatableById(123).ColumnTypes[2].Datatype
DatatableById(123).Columns[2].Datatype
</pre>
</pre>
|-
|-

Revision as of 18:01, 1 December 2022

Datatables are used to store data in QPR ProcessAnalyzer persistently. Datatables consist of one-to-many named columns and zero-to-many rows. The datatable entity in the expression language is used to access Datatables metadata, such as name, description and audit information. The actual data contents can be accessed by using the DataFrame property, which causes the actual data to be loaded from the database into memory for calculations.

Datatables have many usecases:

  • Models can load their data from Datatables.
  • Datatables can be used to store data extracted from external systems
  • Intermediate ETL calculation results can be stored to Datatables
  • It's possible to visualize data from Datatables directly in the dashboards without building a QPR ProcessAnalyzer model

Properties

Datatables are stored to QPR ProcessAnalyzer database as database tables. Each datatable belongs to a project.

Datatable properties Description
ColumnNames (String*) Array of Datatable column names in the order they are in the DataTable. The DataTable doesn't need to be loaded into memory to get the column names.
ColumnMappings (Dictionary) Gives column mappings configured for this Datatable. Returned data is a dictionary, where keys are mapping names (CaseId, EventType, Timestamp) and value is the column name. The ColumnMappings property returns null if column mappings have not been defined for the Datatable.

Example:

let caseIdColumnName = DatatableById(123).ColumnMappings("CaseId");
ColumnTypes (Dictionary*)

Returns Datatable column metadata as array of dictionaries. Each column dictionary has following properties:

  • Name (string): Column name.
  • Datatype (string); Column data type
  • PrimaryKey (boolean): Whether this column is part of the datatable primary key.
DatatableById(123).Columns[2].Datatype
Configuration (Dictionary) Returns the datatable configuration as dictionary. Example:
DatatableById(123).Configuration.Columns[0].Datatype
ConfigurationJson (String) Returns the datatable configuration as json string.
CreatedBy (User) User who created the datatable.
CreatedDate (DateTime) Timestamp when the datatable was created.
DataFrame (DataFrame) Contents of the datatable as a DataFrame.
DataSourceObjectType

Returns table if the source data comes from a database table, and view if the data comes from a database view. When datasource type is view, any operations to modify the datatable data are not possible. When a datatable with the datasource type view is deleted, the underlying database view is not deleted, because the views are managed by users directly in the Snowflake. When a datatable with the datasource type table is deleted, the underlying database table is deleted, because the tables are managed by QPR ProcessAnalyzer automatically.

DataSourceType Physical storage location of the datatable data. Options:
  • Local: data is stored to QPR ProcessAnalyzer's own database in SQL Server (defined by the database connection string in the appsettings.json file).
  • SqlServer: data is stored to an SQL Server database (defined by the SqlServerConnectionString setting).
  • Snowflake: data is stored to Snowflake.
  • Spark: data is stored to Databricks.
  • Redshift: data is stored to AWS Redshift.
Description (String) Datatable description. The datatable description may contain line breaks.
Id (Integer) Datatable Id. Datatable Id is generated by QPR ProcessAnalyzer when the datatable is created.
LastImportBy (User) User who the last time imported or modified the data in the datatable. Modification covers adding new rows, modifying rows and deleting rows. This attribute contains correct information only if data is imported using QPR ProcessAnalyzer UI. Data can also be imported to Snowflake tables directly using Snowflake, and in that case this attribute doesn't track the user correctly.
LastImportDate (DateTime) Timestamp when data was last time imported or modified in the datatable. Shows the correct import time when data is imported using QPR ProcessAnalyzer UI. When the Synchronize function is used, the LastImportDate is set to the current time (regardless of when data was actually imported to the datatable).
LastModifiedBy (User) User who last time modified the datatable properties. This does not cover the data contained by the datatable (for that, use the LastImportBy by property). If DataSourceObjectType is view, gives null.
LastModifiedDate (DateTime) Timestamp when the datatable properties were last time modified (such as name, description or containing project). This does not cover the data contained by the datatable (for that, use the LastImportDate by property).
Name (String) Name of the datatable.
NameInDataSource (String) Name of the table in the underlying datasource (e.g., Snowflake or SQL Server) containing the datatable data. This property is handy to identify the correct table when loading data directly to the table in the datasource.
NColumns (Integer) Number of columns in the datatable.
NRows (Integer) Number of data rows in the datatable.
Project (Project) Project the datatable belongs to.
ProjectId (Number) Id of the project the datatable belongs to.
SqlDataFrame (SqlDataFrame) Returns SqlDataFrame that can be used to access the contents of the DataTable.

Functions

Datatable functions Parameters Description
AddColumn
  • Column name (String)
  • Data type (String)

Adds a new column to the Datatable. Existing rows in the Datatable will get null values for the created column. The function returns the updated Datatable entity.

Available data types are String, Integer, Float, DateTime, Boolean, Duration (Timespan) and Any (can contain any type of data).

Example: Add three columns to the Datatable with id 123.

DatatableById(123)
  .AddColumn("MyColumn1", "String")
  .AddColumn("MyColumn2", "Float")
  .AddColumn("MyColumn3", "DateTime")
DeletePermanently (none)

Deletes the Datatable permanently. This will lead to a lost of all data in the Datatable. Datatables cannot be moved to the recycle bin, so they can only be deleted permanently.

Example: Delete permanently Datatable with id 123.

DatatableById(123).DeletePermanently()
Import Data to import (DataFrame)

Imports data to the datatable from given DataFrame. Columns between datatable andDataFrame are matched with names (the column order does not matter). Columns where there are no values in the DataFrame are imported as null values.

Following parameters affect the behavior of the import:

  • Append: When true (default), existing data in the datatable is preserved and the imported data is created as new rows. When false, existing rows in the datatable are removed (i.e. import overwrites the existing data). Values of not imported columns will be set to null. Note that when Append=false, and the imported data doesn't have any rows, the import just removes all rows from the datatable.
  • ImportExistingColumnOnly: When true (default), the appending import does not create any new columns. When false and there are columns in the imported data that are not in the datatable, the appending import creates the new columns to the datatable. Note that this setting does not have effect, when using non-appending import.
  • MatchByColumns: Array of column names, which are used to match the imported data with the existing data in the datatable. When matching rows are found, they are updated instead of creating new rows. When no match is found, a new row is created for the imported row. By default, the matching is not used.

If the imported DataFrame and the target datatable are located in the same system, the import is performed within the system, so no data is transferred from the system to in-memory. Otherwise, the DataFrame is first loaded into memory and imported into the target datatable.

The import function returns the Datatable object itself (to allow chaining).

Example:

DatatableById(123).Import(
  ToDataFrame(
    [
      ["Case 1", "A", DateTime(2021,1,2)],
      ["Case 2", "B", DateTime(2021,1,5)],
      ["Case 2", "A", DateTime(2021,1,9)]
    ],
    ["Case", "Event type", "Time"]
  ),
  #{"Append": true}
)

DatatableById(123).Import(
  ToDataFrame(
    [
      ["Case 1", "A", DateTime(2021,1,2)],
      ["Case 2", "B", DateTime(2021,1,5)],
      ["Case 2", "A", DateTime(2021,1,9)]
    ],
    ["Case", "Event type", "Time"]
  ),
  #{"MatchByColumns": ["Case"]}
)
Merge Data to merge (DataFrame/SqlDataFrame)

Modifies the datatable by merging its contents with contents of given DataFrame. Merging works with the same principle and parameters as merging between in-memory DataFrames

Examples:

let target = ToDataFrame([[0, "zero", "target"], [1, "", "target"]], ["id", "text", "frame"])
  .Persist("test_data");
let source = ToDataFrame([[1, "one", "source"], [2, "two", "source"], [3, "three", "source"]], ["id", "text", "frame"]);

target.Merge(source, "id").DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns):
id;text;frame
0;zero;target
1;one;source
2;two;source
3;three;source

target.Merge(source, "id", ["text"]).DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns, copy only text column from source):
id;text;frame
0;zero;target
1;one;target
2;two;
3;three;

target.Merge(source, "id", ["text"], _remove).DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns, copy only text column from source, remove rows found only in source):
id;text;frame
0;zero;target
1;one;target

target.Merge(source, "id", ["text"], _remove, false).DataFrame.ToCsv()

Returns string (one key, identical dataframe columns, copy only text column from source, remove rows found only in source or only in target):
id;text;frame
1;one;target

target.Merge(source, "id", _remove, _remove, false).DataFrame.ToCsv()

Returns string (one key, identical dataframe columns, remove all rows):
id;text;frame

target.Merge(source, "id", [], _remove, false).DataFrame.ToCsv()

Returns string (one key, filter out all the non-matching rows):
id;text;frame
1;;target
RemoveColumns

Column names (String*)

Removes the given columns from the datatable. Data in the removed columns is also lost permanently. The function returns the updated datatable entity.

Example: Remove three columns from the datatable with id 123.

DatatableById(123).RemoveColumns(["MyColumn1", "MyColumn2", "MyColumn3"])
RenameColumns

Column name mappings (Dictionary)

Changes names of one or several columns in the datatable. The function returns the updated datatable entity.

Examples: Rename one/two columns in the datatable with id 123.

DatatableById(123).RenameColumns(#{"MyColumnNewName": "MyColumnOldName"})

DatatableById(123).RenameColumns(#{
  "MyColumnNewName1": "MyColumnOldName1",
  "MyColumnNewName2": "MyColumnOldName2"
})
SetPrimaryKey Key columns (String array)

Sets the primary key for the datatable. The primary key can consist of one or several columns of this datatable. When providing an empty array, the primary key is removed.

The primary key is used to optimize queries made to this datatable.

Examples:

DatatableById(1).SetPrimaryKey(["id"]);
DatatableById(1).SetPrimaryKey(["id","price"]);
DatatableById(1).SetPrimaryKey([]);
Synchronize (none)

Refreshes the datatable information from the datasource (e.g., Snowflake) to QPR ProcessAnalyzer. This function needs to be called when the underlying table in the datasource is modified for QPR ProcessAnalyzer to be aware of the modifications. This is typically done right after the ETL run (to add data to the tables) has completed. Note that when data is imported to datatables using QPR ProcessAnalyzer user interface or scripts, calling the synchronize function is not needed.

Calling synchronize will perform the following actions:

  • Clears datatable metadata cache, such as the row count and datatable contents.
  • Changes the last imported date to current time.
  • Refresh columns metadata from datasource. Columns with unsupported data types are treated as strings.
  • If columns are changed, the last modified date is updated to current time.

Calling synchronize requires the same permissions as for importing data, i.e., GenericWrite for the project and global CreateModel permissions.

Example:

DatatableById(123).Synchronize();
Truncate (none)

Truncates the datatable, i.e., removes all rows from it. Truncate is a very efficient operation to remove all rows comparing using the delete operation. Example:

DatatableById(123).Truncate();

Function to get a datatable by id:

Function Parameters Description
DataTableById

datatable id (Integer)

Return datatable with given id. An exception is given if a datatable with the given id does not exist or user doesn't have permissions to it.

Examples:

DataTableById(27)
Returns: DataTable for the data table having id 27.