Datatable in Expression Language

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search

Datatables store data in QPR ProcessAnalyzer persistently. Datatables can contain one-to-many named columns and zero-to-many rows. Datatables are stored to QPR ProcessAnalyzer database as database tables. Each datatable belongs to a project.

The datatable entity in the expression language is used to access Datatables metadata, such as name, description and audit information. The data contents can be accessed by using the DataFrame property, which causes the actual data to be loaded from the database into memory for calculations. The data contents can also be accessed by using the SqlDataFrame property, which performs calculations in the datasource where the datatable located and only transfers and shows the result of the calculation.

Datatables have many usecases:

  • Models can load their data from Datatables.
  • Datatables can be used to store data extracted from external systems
  • Intermediate ETL calculation results can be stored to Datatables
  • It's possible to visualize data from Datatables directly in the dashboards without building a QPR ProcessAnalyzer model

Datatable Properties

Property Description
ColumnNames (String*) Returns Datatable column names as a string array in the order they are in the DataTable.
ColumnMappings (Dictionary) Returns column mappings configured for this Datatable. Returned data is a dictionary, where keys are mapping names (CaseId, EventType, Timestamp) and value is the column name. The ColumnMappings property returns null if column mappings have not been defined for the Datatable.

Example:

let caseIdColumnName = DatatableById(123).ColumnMappings("CaseId");
ColumnTypes (Dictionary*)

Returns Datatable column metadata as array of dictionaries. Each column dictionary has following properties:

  • Name (string): Column name.
  • Datatype (string); Column data type.
  • PrimaryKey (boolean): Whether this column is part of the Datatable primary key.
DatatableById(123).Columns[2].Datatype
Configuration (Dictionary) Returns the datatable json configuration as a dictionary. Datatable configuration has the following documented properties:
  • TableName: User-defined table name in the data source. The table name cannot start with "qprpa_dt_" to avoid conflicts with system created tables.
  • SchemaName: User-defined schema name where the datatable is located. If not specified, schema name from the ODBC connection string is used.
  • DatabaseName: User-defined database name where the datatable is located. If not specified, database name from the ODBC connection string is used.

Example:

DatatableById(123).Configuration.Columns[0].Datatype
ConfigurationJson (String) Returns the datatable configuration as json string.
CreatedBy (User) User who created the datatable.
CreatedDate (DateTime) Timestamp when the datatable was created.
DatabaseNameInDataSource (String) Returns the user-defined database name in the data source. If no database has been defined for the datatable (i.e., database of the connection string is used), null is returned.
DataFrame (DataFrame) Contents of the datatable as a DataFrame.
DataSourceObjectType (String)

Returns table if the source data comes from a database table, and view if the data comes from a database view. When datasource type is view, any operations to modify the datatable data are not possible. When a datatable with the datasource type view is deleted, the underlying database view is not deleted, because the views are managed by users directly in the Snowflake. When a datatable with the datasource type table is deleted, the underlying database table is deleted, because the tables are managed by QPR ProcessAnalyzer automatically.

DataSourceType (String) Physical storage location of the datatable data. Options:
  • Local: data is stored to QPR ProcessAnalyzer's own database in SQL Server (defined by the database connection string in the appsettings.json file).
  • SqlServer: data is stored to an SQL Server database (defined by the SqlServerConnectionString setting).
  • Snowflake: data is stored to Snowflake.
  • Spark: data is stored to Databricks.
  • Redshift: data is stored to AWS Redshift.
Description (String) Datatable description. The datatable description may contain line breaks.
ForeignKeys (String) Returns foreign keys of the datatable. The data structure is an array of dictionaries with following keys:
  • TargetDatatable (string): Name of the datatable the foreign key is pointing to.
  • Columns (array): Columns that are part of the foreign key.
    • Source (string): Column name in the source datatable (this datatable) that belongs to the foreign key.
    • Target (string): Column name in the target datatable the foreign key is pointing to.

Examples:

DatatableById(1).ForeignKeys
Returns: 
[
  {
    "Columns": [
      { "Source": "Case Name", "Target": "Case Id" }
    ],
    "TargetDatatable": "Cases"
  },
  {
    "Columns": [
      { "Source": "Employee Id", "Target": "Employee Id"},
      { "Source": "Record version", "Target": "Record version"}
    ],
    "TargetDatatable": "Employees"
  }
]
Id (Integer) Datatable Id. Datatable Id is generated by QPR ProcessAnalyzer when the datatable is created.
LastImportBy (User) User who the last time imported or modified the data in the datatable. Modification covers adding new rows, modifying rows and deleting rows. This attribute contains correct information only if data is imported using QPR ProcessAnalyzer UI. Data can also be imported to Snowflake tables directly using Snowflake, and in that case this attribute doesn't track the user correctly.
LastImportDate (DateTime) Timestamp when data was last time imported or modified in the datatable. Shows the correct import time when data is imported using QPR ProcessAnalyzer UI. When the Synchronize function is used, the LastImportDate is set to the current time (regardless of when data was actually imported to the datatable).
LastModifiedBy (User) User who last time modified the datatable properties. This does not cover the data contained by the datatable (for that, use the LastImportBy by property). If DataSourceObjectType is view, gives null.
LastModifiedDate (DateTime) Timestamp when the datatable properties were last time modified (such as name, description or containing project). This does not cover the data contained by the datatable (for that, use the LastImportDate by property).
Name (String) Name of the datatable.
NameInDataSource (String) Name of the table in the datasource (e.g., Snowflake or SQL Server) containing the datatable data. If the table name is specified by a user, that table name is returned. Otherwise the automatically determined table name is returned. This property identifies the correct table when loading data directly to the table in the datasource.
NColumns (Integer) Number of columns in the datatable.
NRows (Integer) Number of data rows in the datatable.
Project (Project) Project the datatable belongs to.
ProjectId (Number) Id of the project the datatable belongs to.
SchemaNameInDataSource (String) Returns the user-defined schema name in the data source. If no schema has been defined for the datatable (i.e., schema of the connection string is used), null is returned.
SqlDataFrame (SqlDataFrame) Returns SqlDataFrame that can be used to access the contents of the DataTable.

Datatable Functions

Function Parameters Description
AddColumn
  • Column name (String)
  • Data type (String)

Adds a new column to the Datatable. Existing rows in the Datatable will get null values for the created column. The function returns the updated Datatable entity.

Available data types are String, Integer, Float, DateTime, Boolean, Duration (Timespan) and Any (can contain any type of data).

Example: Add three columns to the Datatable with id 123.

DatatableById(123)
  .AddColumn("MyColumn1", "String")
  .AddColumn("MyColumn2", "Float")
  .AddColumn("MyColumn3", "DateTime")
AddColumns
  • Column definitions (Object Array)

Adds one or several columns to the Datatable in the same operation. Takes as a parameter an array objects with the following properties:

  • Name: Name of a new column.
  • DataType: Data type of a new column. Available data types are String, Integer, Float, DateTime, Boolean, Duration (Timespan) and Any (can contain any type of data).

Function returns the Datatable object itself. Requires the GenericWrite permission for the project.

Example: Add three columns (MyColumn1, MyColumn2, MyColumn3) with different data types to datatable having id 123:

DatatableById(123)
  .AddColumns([
    #{"Name": "MyColumn1", "DataType": "String"},
    #{"Name": "MyColumn2", "DataType": "Float"},
    #{"Name": "MyColumn3", "DataType": "Integer"}
  ]);
DeletePermanently (none)

Deletes the Datatable permanently. This will lead to a lost of all data in the Datatable. Datatables cannot be moved to the recycle bin, so they can only be deleted permanently.

Example: Delete permanently Datatable with id 123.

DatatableById(123).DeletePermanently()
DeleteRows Expression to select deleted rows Deletes defined rows from the datatable. The parameter is an expression (evaluated for each row) to select which rows are deleted. If the parameter is not specified, all rows are deleted.

Example: Delete rows from model's events datatable where Organization column value is Finance:

ModelById(1)
  .EventsDataTable
  .DeleteRows(Column("Organization") == "Finance");

Example: Delete rows where Revenue column has lower value than Cost column:

DatatableById(1)
  .DeleteRows(Column("Revenue") < Column("Cost"));

Example: Delete rows where PredictedEvent column value is true:

DatatableById(1)
  .DeleteRows(Column("PredictedEvent"));

Example: Delete all rows from datatable:

DatatableById(1).DeleteRows();
Import Data to import (DataFrame/DataFlow)

Imports data to the datatable from given DataFrame or DataFlow. Columns between datatable and DataFrame/DataFlow are matched with names (the column order does not matter). Columns where there are no values in the DataFrame/DataFlow are imported as null values.

Following parameters affect the behavior of the import:

  • Append: When true (default), existing data in the datatable is preserved and the imported data is created as new rows. When false, existing rows in the datatable are removed (i.e. import overwrites the existing data). Values of not imported columns will be set to null. Note that when Append=false, and the imported data doesn't have any rows, the import just removes all rows from the datatable.
  • ImportExistingColumnOnly: When true (default), the appending import does not create any new columns. When false and there are columns in the imported data that are not in the datatable, the appending import creates the new columns to the datatable. Note that this setting does not have effect, when using non-appending import.
  • MatchByColumns: Array of column names, which are used to match the imported data with the existing data in the datatable. When matching rows are found, they are updated instead of creating new rows. When no match is found, a new row is created for the imported row. By default, the matching is not used.

If using DataFrames, and the imported DataFrame and the target datatable are located in the same system, the import is performed within the system, so no data is transferred from the system to in-memory. Otherwise, the DataFrame is first loaded into memory and imported into the target datatable. For lower memory consumption, prefer DataFlows instead.

The import function returns the Datatable object itself (to allow chaining).

Example:

DatatableById(123).Import(
  ToDataFrame(
    [
      ["Case 1", "A", DateTime(2021,1,2)],
      ["Case 2", "B", DateTime(2021,1,5)],
      ["Case 2", "A", DateTime(2021,1,9)]
    ],
    ["Case", "Event type", "Time"]
  ),
  #{"Append": true}
)

DatatableById(123).Import(
  ToDataFrame(
    [
      ["Case 1", "A", DateTime(2021,1,2)],
      ["Case 2", "B", DateTime(2021,1,5)],
      ["Case 2", "A", DateTime(2021,1,9)]
    ],
    ["Case", "Event type", "Time"]
  ),
  #{"MatchByColumns": ["Case"]}
)
Merge Data to merge (DataFrame/SqlDataFrame)

Modifies the datatable by merging its contents with contents of given DataFrame. Merging works with the same principle and parameters as merging between in-memory DataFrames

Examples:

let target = ToDataFrame([[0, "zero", "target"], [1, "", "target"]], ["id", "text", "frame"])
  .Persist("test_data");
let source = ToDataFrame([[1, "one", "source"], [2, "two", "source"], [3, "three", "source"]], ["id", "text", "frame"]);

target.Merge(source, "id").DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns):
id;text;frame
0;zero;target
1;one;source
2;two;source
3;three;source

target.Merge(source, "id", ["text"]).DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns, copy only text column from source):
id;text;frame
0;zero;target
1;one;target
2;two;
3;three;

target.Merge(source, "id", ["text"], _remove).DataFrame.ToCsv()

Returns string (one key, default parameters, identical dataframe columns, copy only text column from source, remove rows found only in source):
id;text;frame
0;zero;target
1;one;target

target.Merge(source, "id", ["text"], _remove, false).DataFrame.ToCsv()

Returns string (one key, identical dataframe columns, copy only text column from source, remove rows found only in source or only in target):
id;text;frame
1;one;target

target.Merge(source, "id", _remove, _remove, false).DataFrame.ToCsv()

Returns string (one key, identical dataframe columns, remove all rows):
id;text;frame

target.Merge(source, "id", [], _remove, false).DataFrame.ToCsv()

Returns string (one key, filter out all the non-matching rows):
id;text;frame
1;;target
Modify Parameters dictionary

Modifies datatable by changing its properties. Following properties can be changed:

  • Name (string): Name of the datatable.
  • Description (string): Description for the datatable.
  • ProjectId (integer): Id of the project where the datatable is moved to.
  • NameInDataSource (string): Table name in the datasource (e.g., in Snowflake) which this datatable is linked to.
  • SchemaNameInDataSource (string): Schema name in the datasource (e.g., in Snowflake) which this datatable is linked to.
  • DatabaseNameInDataSource (string): Database name in the datasource (e.g., in Snowflake) which this datatable is linked to.

Example: Change datatable name:

DatatableById(1)
  .Modify(#{
    "Name": "My datatable"
  });

Example: Move datatable into other project:

DatatableById(1)
  .Modify(#{
    "ProjectId": ProjectById(2).Id
  });

Example: Change the connected table in the datasource:

DatatableById(1)
  .Modify(#{
    "NameInDataSource": "MyTable",
    "SchemaNameInDataSource": "MySchema",
    "DatabaseNameInDataSource": "MyDatatable"
  });
RemoveColumns

Column names (String*)

Removes the given columns from the datatable. Data in the removed columns is also lost permanently. The function returns the updated datatable entity.

Example: Remove three columns from the datatable with id 123.

DatatableById(123).RemoveColumns(["MyColumn1", "MyColumn2", "MyColumn3"])
RenameColumns

Column name mappings (Dictionary)

Changes names of one or several columns in the datatable. The function returns the updated datatable entity.

Examples: Rename one/two columns in the datatable with id 123.

DatatableById(123).RenameColumns(#{"MyColumnNewName": "MyColumnOldName"})

DatatableById(123).RenameColumns(#{
  "MyColumnNewName1": "MyColumnOldName1",
  "MyColumnNewName2": "MyColumnOldName2"
})
SetForeignKeys Foreign keys (Dictionary array) Sets foreign keys for the datatable. Foreign keys mean that certain column(s) in the datatable forms a key that is referring to other datatable.

The function call replaces previously defined foreign keys, so all foreign keys of the datatable need to be mentioned in the call. The parameter has the same structure as in the ForeignKeys property (see above). Providing an empty array removes all foreign keys. Note that the column(s) and target datatable(s) mentioned in the foreign keys don't actually need to exist.

Examples:

DatatableById(1).SetForeignKeys([
  #{
    "Columns": [
      #{ "Source": "Case Name", "Target": "Case Id" }
    ],
    "TargetDatatable": "Cases"
  },
  #{
    "Columns": [
      #{ "Source": "Employee Id", "Target": "Employee Id"},
      #{ "Source": "Record version", "Target": "Record version"}
    ],
    "TargetDatatable": "Employees"
  }
]);

Remove all foreign keys:

DatatableById(1).SetForeignKeys([]);
SetPrimaryKey Key columns (String array)

Sets primary key for the datatable. The primary key consists of one or several columns of the Datatable. When providing an empty array, the primary key is removed.

The primary key is used to optimize queries made to this datatable.

Examples:

DatatableById(1).SetPrimaryKey(["Id"]);
DatatableById(1).SetPrimaryKey(["CaseId", "EventType"]);
DatatableById(1).SetPrimaryKey([]);
Synchronize (none)

Refreshes the datatable information from the datasource (e.g., Snowflake). This function needs to be called when the underlying table in the datasource is modified for QPR ProcessAnalyzer to be aware of the changes. This can be done right after the ETL run has completed (to add data to the tables). Note that when data is imported to datatables using QPR ProcessAnalyzer user interface or scripts, calling the synchronize function is not needed.

Calling synchronize requires the GenericWrite permissions for the project.

Calling synchronize will perform the following actions:

  • Clears datatable metadata cache, such as the row count and datatable contents.
  • Changes the last imported date to current time.
  • Refresh columns metadata from datasource. Columns with unsupported data types are treated as strings. If the underlying database table doesn't exist, all columns in the datatable are removed.
  • If columns are changed, the last modified date is updated to current time.

Example:

DatatableById(1).Synchronize();
Truncate (none)

Truncates the datatable, i.e., removes all rows from it. Truncate is a very efficient operation to remove all rows comparing using the delete operation. At least the GenericWrite permission for the project is needed for this operation.

Example:

DatatableById(123).Truncate();
UpdateRows
  1. Where expression
  2. Column name
  3. Column expression
  4. ..Column name
  5. ..Column expression

Updates column(s) in datatable for selected rows with given expression value. Function has following parameters:

  1. SQL expression to select the rows to be updated (also called the "where" expression).
  2. Column name which values are changed.
  3. SQL expression that will evaluate to the value to be changed.

It's possible to change several columns in one call by providing more column name and expression pairs as parameters.

Example: For all rows where column "Organization" value is "Finance", set column "Role" value to "Accountant":

DatatableById(1)
  .UpdateRows(
    Column("Organization") == "Finance",
    "Role",
    "Accountant"
  );

Example: For all rows where "Timestamp" is earlier than 01/01/2024, update "Cost" and "Profit" columns (note that the Profit will be calculated with the original Cost value):

DatatableById(1)
  .UpdateRows(
    Column("Timestamp") < "01/01/2024",
    "Cost", Column("Cost") * 2,
    "Profit", Column("Revenue") - Column("Cost")
  );

Example: Set column "Cost" to null for all rows:

DatatableById(1)
  .UpdateRows(true, "Cost", null);

Function to get a datatable by id:

Function Parameters Description
DataTableById

datatable id (Integer)

Return datatable with given id. An exception is given if a datatable with the given id does not exist or user doesn't have permissions to it.

Examples:

DataTableById(27)
Returns: DataTable for the data table having id 27.