Datatable in Expression Language

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search

Datatables are used to store data in QPR ProcessAnalyzer persistently. Datatables consist of one-to-many named columns and zero-to-many rows. The datatable entity in the expression language is used to access Datatables metadata, such as name, description and audit information. The actual data contents can be accessed by using the DataFrame property, which causes the actual data to be loaded from the database into memory for calculations.

Datatables have many usecases:

  • Models can load their data from Datatables.
  • Datatables can be used to store data extracted from external systems
  • Intermediate ETL calculation results can be stored to Datatables
  • It's possible to visualize data from Datatables directly in the dashboards without building a QPR ProcessAnalyzer model

Datatable Properties

Datatables are stored to QPR ProcessAnalyzer database as database tables. Each datatable belongs to a project.

Property Description
ColumnNames (String*) Returns Datatable column names as a string array in the order they are in the DataTable.
ColumnMappings (Dictionary) Returns column mappings configured for this Datatable. Returned data is a dictionary, where keys are mapping names (CaseId, EventType, Timestamp) and value is the column name. The ColumnMappings property returns null if column mappings have not been defined for the Datatable.

Example:

let caseIdColumnName = DatatableById(123).ColumnMappings("CaseId");
ColumnTypes (Dictionary*)

Returns Datatable column metadata as array of dictionaries. Each column dictionary has following properties:

  • Name (string): Column name.
  • Datatype (string); Column data type.
  • PrimaryKey (boolean): Whether this column is part of the Datatable primary key.
DatatableById(123).Columns[2].Datatype
Configuration (Dictionary) Returns the datatable configuration as dictionary. Example:
DatatableById(123).Configuration.Columns[0].Datatype
ConfigurationJson (String) Returns the datatable configuration as json string.
CreatedBy (User) User who created the datatable.
CreatedDate (DateTime) Timestamp when the datatable was created.
DataFrame (DataFrame) Contents of the datatable as a DataFrame.
DataSourceObjectType

Returns table if the source data comes from a database table, and view if the data comes from a database view. When datasource type is view, any operations to modify the datatable data are not possible. When a datatable with the datasource type view is deleted, the underlying database view is not deleted, because the views are managed by users directly in the Snowflake. When a datatable with the datasource type table is deleted, the underlying database table is deleted, because the tables are managed by QPR ProcessAnalyzer automatically.

DataSourceType Physical storage location of the datatable data. Options:
  • Local: data is stored to QPR ProcessAnalyzer's own database in SQL Server (defined by the database connection string in the appsettings.json file).
  • SqlServer: data is stored to an SQL Server database (defined by the SqlServerConnectionString setting).
  • Snowflake: data is stored to Snowflake.
  • Spark: data is stored to Databricks.
  • Redshift: data is stored to AWS Redshift.
Description (String) Datatable description. The datatable description may contain line breaks.
Id (Integer) Datatable Id. Datatable Id is generated by QPR ProcessAnalyzer when the datatable is created.
LastImportBy (User) User who the last time imported or modified the data in the datatable. Modification covers adding new rows, modifying rows and deleting rows. This attribute contains correct information only if data is imported using QPR ProcessAnalyzer UI. Data can also be imported to Snowflake tables directly using Snowflake, and in that case this attribute doesn't track the user correctly.
LastImportDate (DateTime) Timestamp when data was last time imported or modified in the datatable. Shows the correct import time when data is imported using QPR ProcessAnalyzer UI. When the Synchronize function is used, the LastImportDate is set to the current time (regardless of when data was actually imported to the datatable).
LastModifiedBy (User) User who last time modified the datatable properties. This does not cover the data contained by the datatable (for that, use the LastImportBy by property). If DataSourceObjectType is view, gives null.
LastModifiedDate (DateTime) Timestamp when the datatable properties were last time modified (such as name, description or containing project). This does not cover the data contained by the datatable (for that, use the LastImportDate by property).
Name (String) Name of the datatable.
NameInDataSource (String) Name of the table in the underlying datasource (e.g., Snowflake or SQL Server) containing the datatable data. This property is handy to identify the correct table when loading data directly to the table in the datasource.
NColumns (Integer) Number of columns in the datatable.
NRows (Integer) Number of data rows in the datatable.
Project (Project) Project the datatable belongs to.
ProjectId (Number) Id of the project the datatable belongs to.
SqlDataFrame (SqlDataFrame) Returns SqlDataFrame that can be used to access the contents of the DataTable.

Datatable Functions

Function Parameters Description
AddColumn
  • Column name (String)
  • Data type (String)

Adds a new column to the Datatable. Existing rows in the Datatable will get null values for the created column. The function returns the updated Datatable entity.

Available data types are String, Integer, Float, DateTime, Boolean, Duration (Timespan) and Any (can contain any type of data).

Example: Add three columns to the Datatable with id 123.

DatatableById(123)
  .AddColumn("MyColumn1", "String")
  .AddColumn("MyColumn2", "Float")
  .AddColumn("MyColumn3", "DateTime")
AddColumns
  • Column definitions (Object Array)

Adds one or several columns to the Datatable in the same operation. Takes as a parameter an array objects with the following properties:

  • Name: Name of a new column.
  • Datatype: Datatype of a new column. Available data types are String, Integer, Float, DateTime, Boolean, Duration (Timespan) and Any (can contain any type of data).

Function returns the Datatable object itself. Requires the GenericWrite permission for the project.

Example: Add three columns (MyColumn1, MyColumn2, MyColumn3) with different data types to datatable having id 123:

DatatableById(123)
  .AddColumns([
    #{"Name": "MyColumn1", "Datatype": "String"},
    #{"Name": "MyColumn2", "Datatype": "Float"},
    #{"Name": "MyColumn3", "Datatype": "Integer"}
  ]);
DeletePermanently (none)

Deletes the Datatable permanently. This will lead to a lost of all data in the Datatable. Datatables cannot be moved to the recycle bin, so they can only be deleted permanently.

Example: Delete permanently Datatable with id 123.

DatatableById(123).DeletePermanently()
Import Data to import (DataFrame/DataFlow)

Imports data to the datatable from given DataFrame or DataFlow. Columns between datatable and DataFrame/DataFlow are matched with names (the column order does not matter). Columns where there are no values in the DataFrame/DataFlow are imported as null values.

Following parameters affect the behavior of the import:

  • Append: When true (default), existing data in the datatable is preserved and the imported data is created as new rows. When false, existing rows in the datatable are removed (i.e. import overwrites the existing data). Values of not imported columns will be set to null. Note that when Append=false, and the imported data doesn't have any rows, the import just removes all rows from the datatable.
  • ImportExistingColumnOnly: When true (default), the appending import does not create any new columns. When false and there are columns in the imported data that are not in the datatable, the appending import creates the new columns to the datatable. Note that this setting does not have effect, when using non-appending import.
  • MatchByColumns: Array of column names, which are used to match the imported data with the existing data in the datatable. When matching rows are found, they are updated instead of creating new rows. When no match is found, a new row is created for the imported row. By default, the matching is not used.

If using DataFrames, and the imported DataFrame and the target datatable are located in the same system, the import is performed within the system, so no data is transferred from the system to in-memory. Otherwise, the DataFrame is first loaded into memory and imported into the target datatable. For lower memory consumption, prefer DataFlows instead.

The import function returns the Datatable object itself (to allow chaining).

Example:

DatatableById(123).Import(
  ToDataFrame(
    [
      ["Case 1", "A", DateTime(2021,1,2)],
      ["Case 2", "B", DateTime(2021,1,5)],
      ["Case 2", "A", DateTime(2021,1,9)]
    ],
    ["Case", "Event type", "Time"]
  ),
  #{"Append": true}
)

DatatableById(123).Import(
  ToDataFrame(
    [
      ["Case 1", "A", DateTime(2021,1,2)],
      ["Case 2", "B", DateTime(2021,1,5)],
      ["Case 2", "A", DateTime(2021,1,9)]
    ],
    ["Case", "Event type", "Time"]
  ),
  #{"MatchByColumns": ["Case"]}
)
Merge Data to merge (DataFrame/SqlDataFrame)

Modifies the datatable by merging its contents with contents of given DataFrame. Merging works with the same principle and parameters as merging between in-memory DataFrames

Examples:

let target = ToDataFrame([[0, "zero", "target"], [1, "", "target"]], ["id", "text", "frame"])
  .Persist("test_data");
let source = ToDataFrame([[1, "one", "source"], [2, "two", "source"], [3, "three", "source"]], ["id", "text", "frame"]);

target.Merge(source, "id").DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns):
id;text;frame
0;zero;target
1;one;source
2;two;source
3;three;source

target.Merge(source, "id", ["text"]).DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns, copy only text column from source):
id;text;frame
0;zero;target
1;one;target
2;two;
3;three;

target.Merge(source, "id", ["text"], _remove).DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns, copy only text column from source, remove rows found only in source):
id;text;frame
0;zero;target
1;one;target

target.Merge(source, "id", ["text"], _remove, false).DataFrame.ToCsv()

Returns string (one key, identical dataframe columns, copy only text column from source, remove rows found only in source or only in target):
id;text;frame
1;one;target

target.Merge(source, "id", _remove, _remove, false).DataFrame.ToCsv()

Returns string (one key, identical dataframe columns, remove all rows):
id;text;frame

target.Merge(source, "id", [], _remove, false).DataFrame.ToCsv()

Returns string (one key, filter out all the non-matching rows):
id;text;frame
1;;target
RemoveColumns

Column names (String*)

Removes the given columns from the datatable. Data in the removed columns is also lost permanently. The function returns the updated datatable entity.

Example: Remove three columns from the datatable with id 123.

DatatableById(123).RemoveColumns(["MyColumn1", "MyColumn2", "MyColumn3"])
RenameColumns

Column name mappings (Dictionary)

Changes names of one or several columns in the datatable. The function returns the updated datatable entity.

Examples: Rename one/two columns in the datatable with id 123.

DatatableById(123).RenameColumns(#{"MyColumnNewName": "MyColumnOldName"})

DatatableById(123).RenameColumns(#{
  "MyColumnNewName1": "MyColumnOldName1",
  "MyColumnNewName2": "MyColumnOldName2"
})
SetPrimaryKey Key columns (String array)

Sets primary key for the datatable. The primary key consists of one or several columns of the Datatable. When providing an empty array, the primary key is removed.

The primary key is used to optimize queries made to this datatable.

Examples:

DatatableById(1).SetPrimaryKey(["Id"]);
DatatableById(1).SetPrimaryKey(["CaseId", "EventType"]);
DatatableById(1).SetPrimaryKey([]);
Synchronize (none)

Refreshes the datatable information from the datasource (e.g., Snowflake) to QPR ProcessAnalyzer. This function needs to be called when the underlying table in the datasource is modified for QPR ProcessAnalyzer to be aware of the modifications. This is typically done right after the ETL run (to add data to the tables) has completed. Note that when data is imported to datatables using QPR ProcessAnalyzer user interface or scripts, calling the synchronize function is not needed.

Calling synchronize will perform the following actions:

  • Clears datatable metadata cache, such as the row count and datatable contents.
  • Changes the last imported date to current time.
  • Refresh columns metadata from datasource. Columns with unsupported data types are treated as strings.
  • If columns are changed, the last modified date is updated to current time.

Calling synchronize requires the GenericWrite permissions for the project.

Example:

DatatableById(123).Synchronize();
Truncate (none)

Truncates the datatable, i.e., removes all rows from it. Truncate is a very efficient operation to remove all rows comparing using the delete operation. At least the GenericWrite permission for the project is needed for this operation.

Example:

DatatableById(123).Truncate();

Function to get a datatable by id:

Function Parameters Description
DataTableById

datatable id (Integer)

Return datatable with given id. An exception is given if a datatable with the given id does not exist or user doesn't have permissions to it.

Examples:

DataTableById(27)
Returns: DataTable for the data table having id 27.