DataFrame in Expression Language: Difference between revisions
Line 12: | Line 12: | ||
! '''Parameters''' | ! '''Parameters''' | ||
! '''Description''' | ! '''Description''' | ||
|- | |||
||ExtractSap (DataFrame) | |||
||# parameters (Dictionary) | |||
|| | |||
Extracts data from an SAP system and returns it as a DataFrame. | |||
If a column name contains illegal characters for table names, the illegal characters are converted to be underscore characters (e.g. "sap:Owner" -> "sap_Owner"). Columns are extracted as text data. Note that using this command requires installing [[Installing_QPR_ProcessAnalyzer_Server#Installing_SAP_NetWeaver_RFC_Library|SAP NetWeaver RFC Library]]. | |||
Parameters: | |||
* '''ProjectId''': The id or the name of the project in which the target data table exists. | |||
* '''User''': SAP username used to connect to SAP. Mandatory. Corresponds to the "USER" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''Password''': Password of the SAP user used to connect to SAP. Mandatory. Corresponds to the "PASSWD" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''PWKey''': Secure string key for the stored SAP password. Alternative for the SapPW property. | |||
* '''Client''': The SAP backend client. Mandatory. Corresponds to the "CLIENT" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''AppServerHost''': The hostname or IP of the specific SAP application server, to which all connections shall be opened. Mandatory if SapMessageServerHost is not defined. Corresponds to the "ASHOST" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''MessageServerHost''': The hostname or IP of the SAP system’s message server (central instance). Mandatory if SapAppServerHost is not defined. Corresponds to the "MSHOST" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''SystemNumber''': The SAP system’s system number. Mandatory if SapSystemID is not defined. Corresponds to the "SYSNR" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''SystemID''': The SAP system’s three-letter system ID. Mandatory if SapSystemNumber is not defined. Corresponds to the "SYSID" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''Language''': SAP language used. Default value is "EN". Optional. Corresponds to the "LANG" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''PoolSize''': The maximum number of RFC connections that this destination will keep in its pool. Default value is "5". Optional. Corresponds to the "POOL_SIZE" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''PeakConnectionsLimit''': In order to prevent an unlimited number of connections to be opened, you can use this parameter. Default value is "10". Optional. Corresponds to the "MAX_POOL_SIZE" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''ConnectionIdleTimeout''': If a connection has been idle for more than SapIdleTimeout seconds, it will be closed and removed from the connection pool upon checking for idle connections or pools. Default value is "600". Optional. Corresponds to the "IDLE_TIMEOUT" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''Router''': A list of host names and service names / port numbers for the SAPRouter in the following format: /H/hostname/S/portnumber. Optional. Corresponds to the "SAPROUTER" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''LogonGroup''': The logon group from which the message server shall select an application server. Optional. Corresponds to the "GROUP" constant on SAP side. See the SAP .NET Connector documentation for more info. | |||
* '''QueryMode''': If this number is set to "1", then the query result will have the SAP Table field names as data table column names and actual data rows as rows. If this is set to "3", the query result will get the field descriptions from the SAP query using NO_DATA parameter, i.e. the returned columns are the following (in this order): Field, Type, Description, Length, Offset. Default value is "1". Optional. See the SAP .NET Connector documentation for more info. | |||
* '''QueryTable''': Name of the SAP table to be extracted. Specifies the value for the parameter QUERY_TABLE in tab: 'Import' or function module 'rfc_read_table' in SAP. Mandatory. See the SAP .NET Connector documentation for more info. Note that if the query doesn't return any data, the target data table or temporary table is not created. | |||
* '''Rowcount''': The maximum amount of rows to fetch. Specifies the value for parameter ROWCOUNT in tab: 'Import' or function module 'rfc_read_table' in SAP. Optional. See the SAP .NET Connector documentation for more info. | |||
* '''Rowskips''': The number of rows to skip. Specifies the value for parameter ROWSKIPS in tab: 'Import' or function module 'rfc_read_table'. in SAP. Optional. See the SAP .NET Connector documentation for more info. | |||
* '''WhereClause''': A comma separated list of WHERE clause elements passed for the SapQueryTable. Can be used with or without the SapWhereClauseSelect parameter. If used together with the SapWhereClauseSelect parameter, use the SapWhereClause parameter first. NOTE: The default maximum length for the Where Clause string is 72 characters in SAP, so the recommended maximum length of the SapWhereClause value is also 72 characters. In effect, specifies the value for parameter OPTIONS in tab: 'Import' or function module 'rfc_read_table' in SAP. Optional. See the SAP .NET Connector documentation for more info. | |||
* '''WhereClauseSelect''': The SELECT query to be executed in QPR ProcessAnalyzer sandbox. Used with or without the SapWhereClause parameter to pass WHERE clauses to SapQueryTable. If used together with the SapWhereClause parameter, use the SapWhereClause parameter first. The query is expected to return a table with at least one column, as the contents from the rows in the first column of the table are concatenated together to form the WHERE clause in SAP RFC_ReadTable. Therefore, it's recommended to first create the table with the WHERE clauses into a temporary table. In addition, it's recommended to have an order number column in the table and use that in the SELECT query to make sure the WHERE clause elements are concatenated in the correct order. The default maximum length for Where Clause string is 72 characters in SAP, so the recommended maximum length for the WHERE clause string in each row of the table is also 72. In effect, specifies the value for parameter OPTIONS in tab: 'Import' or function module 'rfc_read_table' in SAP. Optional. The contents up to the first 10 rows in the first column of the SELECT query are shown in the QPR ProcessAnalyzer Script Log. See the SAP .NET Connector documentation for more info. | |||
* '''FieldNames''': A comma separated list of field names for columns to be imported. Default value is empty, resulting in all columns being imported. Specifies the value for parameter FIELDNAME in tab: 'Tables' for table 'FIELDS' for function module 'rfc_read_table' in SAP. Optional. See the SAP .NET Connector documentation for more info. | |||
* '''Function''': If you define a value for this parameter, then the new value specifies the SAP function that is called inside the #ImportSapQuery command. Optional. The default value is RFC_READ_TABLE. Another possible value is BBP_RFC_READ_TABLE. See the SAP .NET Connector documentation for more info. | |||
* '''ConvertDataTypes''': List of SAP data types that are converted into respective data types. Defined by listing the data type identifier characters in any order. Available data type identifying characters are IFPCDTNX. If not defined, all data types are converted. Example: IFP (convert only numeric data types: Integer, Float, Packed number) (more information). | |||
Project is needed for the secure strings. | |||
If there is an error in the function, exception is given. | |||
|- | |- | ||
||ImportOdbc (DataFrame) | ||ImportOdbc (DataFrame) |
Revision as of 15:00, 29 November 2022
DataFrame represents a two dimensional array of data with one-to-many columns and zero-to-many rows, like a relational database table, an Excel sheet or a CSV data file. Each column in the DataFrame has a name, and there must not be more than one column with the same name. DataFrame is the generic data structure used to manage all kinds data in QPR ProcessAnalyzer expression engine that run in-memory. DataFrames as linked to other entities in QPR ProcessAnalyzer as follows:
- Datatable contents is fetched into the memory as a DataFrame object
- DataFrame can be stored (persisted) to a Datatable
- ETL operations, such as joining, unions, filtering and grouping are based on the DataFrames
- Data extracted from an external data source, e.g. using ODBC, is fetched to the in-memory calculation as a DataFrame.
- When using a loading scripts, cases and events data is fed to the model using the DataFrame.
Extract Data to DataFrame
DataFrame functions | Parameters | Description |
---|---|---|
ExtractSap (DataFrame) | # parameters (Dictionary) |
Extracts data from an SAP system and returns it as a DataFrame. If a column name contains illegal characters for table names, the illegal characters are converted to be underscore characters (e.g. "sap:Owner" -> "sap_Owner"). Columns are extracted as text data. Note that using this command requires installing SAP NetWeaver RFC Library. Parameters:
Project is needed for the secure strings. If there is an error in the function, exception is given. |
ImportOdbc (DataFrame) |
|
Runs given query in given ODBC datasource and returns data as DataFrame. AllowExternalDatasources setting needs to be true to use the ImportODBC function. Note also that the ODBC connection requires an ODBC driver to the datasource to be installed in the QPR ProcessAnalyzer Server machine. Examples: ImportOdbc("Driver={SQL Server};Server=localhost;DataBase=QPR_PA1;Trusted_Connection=yes", "SELECT * FROM OdbcTest") Returns: Contents of OdbcTest table in given ODBC data source inside a DataFrame. |
ImportOdbcSecure (DataFrame) |
|
Similar command as ImportOdbc, except instead of the plaintext connection string, a secure string key is provided. Also a project id, from where to fetch the connection string key, needs to be provided. Examples: ImportOdbcSecure(12, "MySecureConnectionString", "SELECT * FROM OdbcTest") Returns: Contents of OdbcTest table in given ODBC data source inside a DataFrame. |
DataFrame Properties
DataFrame properties | Description |
---|---|
Columns (String*) | DataFrame columns names as an array in the order the columns are in the DataFrame. |
ColumnNames (Dictionary*) | DataFrame columns metadata (names and data types) in an array of dictionaries, where each dictionary has the Name and Datatype properties. For in-memory DataFrames, the precise data types are available only if the dataframe originates directly from a datatable. In other cases, for example in DataFrames originating from modification operations, the data type Any is returns for all columns.
Example: 3rd column data type for a variable stored DataFrame: myDataframe.Columns[2].Datatype |
ColumnMappings (Dictionary) | Gives column mappings configured for this DataFrame. Returned data is a dictionary, where keys are mapping names (CaseId, EventType, Timestamp) and value is the column name. The ColumnMappings property returns null if column mappings have not been defined.
Example: let caseIdColumnName = myDataFrame.ColumnMappings("CaseId"); |
Rows (Object**) | Returns the data content of the DataFrame as a two-dimensional array (matrix). The column names are not part of the data content.
Examples: DatatableById(5).DataFrame.Rows[0][0] Returns: the value in the first row and first column in a datatable with id 5. |
<column name> (Object*) |
Returns an array of values of given column in the datatable. If the column name contains spaces, the Column function needs to be used to refer to a column. Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).right Returns: [zero, two, three] ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).right[2] Returns: three |
DataFrame Functions
DataFrame functions | Parameters | Description |
---|---|---|
Aggregate (DataFrame) |
|
Create a new DataFrame from a GroupedDataFrame by performing aggregations to all groups separately and returning one row for each group. Also returns all the columns used in the grouping of the values. Parameters:
Supported aggregations are:
Examples: ToDataFrame([[0, "zero"], [10, "zero"], [2, "two"], [12, "two"], [22, "two"], [3, "three"]], ["id", "text"]) .GroupBy(["text"]) .Aggregate(["ids": "id"], ["sum"]).ToCsv() Returns string: text;id three;3 two;36 zero;10 ToDataFrame([[0, "zero"], [10, "zero"], [2, "two"], [12, "two"], [22, "two"], [3, "three"]], ["id", "text"]) .GroupBy(["text"]) .Aggregate( ["average": "id", "sum": "id", "min": "id", "max": "id", "median": "id"], ["average", "sum", "min", "max", "median"] ).ToCsv() Returns string: text;average;sum;min;max;median three;3;3;3;3;3 two;12;36;2;22;12 zero;5;10;0;10;5 ToDataFrame([[0, DateTime(2020, 1)], [0, DateTime(2020, 4)], [0, DateTime(2020, 2)], [1, DateTime(2019, 1)], [1, DateTime(2009, 1)]], ["id", "timestamp"]) .GroupBy(["id"]) .Aggregate( ["duration": "timestamp", "count": "id"], ["DateTimeRange", "Count"] ).ToCsv() Returns string: id;duration;count 0;7862400;3 1;315532800;2 ToDataFrame([[1, "zero"], [1, "one"], [1, "two"], [2, "two"], [3, "two"], [3, "three"]], ["id", "text"]) .GroupBy(["id"]) .Aggregate( ["text"], [#{"Function": "List", "Ordering": ["text"], "Separator": ", "}] ).ToCsv() Returns: id;text 1;one, two, zero 2;two 3;three, two |
Append (DataFrame) |
DataFrame to append |
Creates a new DataFrame that has the contents of given DataFrame added to the end of this DataFrame. When the data is combined, the order of columns matters, not the names of the columns. The resulting DataFrame gets column names from this DataFrame. If the number of columns is different between this DataFrame and the other DataFrame, an exception is thrown. Examples: let dataframe1 = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "text"]); let dataframe2 = ToDataFrame([[1, "one"], [4, "four"]], ["id", "text"]); dataframe1.Append(dataframe2); Returns: id;text 0;zero 2;two 3;three 1;one 4;four |
Clone (DataFrame) | DataFrame to clone |
Returns a new DataFrame that is an exact copy of the data frame this method was called for. Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).Clone() Returns:A copy of the original DataFrame object. |
ColumnIndexes (Integer*) |
Column names (String*) |
Convert DataFrame column names into column indexes. The indexes are starting from zero. If a column is not found, an exception is given. Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).ColumnIndexes(["right", "id"]) Returns: [1, 0] |
Column (Object*) |
Column name |
Returns an array of values of given column in the order rows are in the datatable. Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).Column("right") Returns: [zero, two, three] |
Columns (DataFrame) |
Array of column names |
Creates a new DataFrame having only the defined columns of the original DataFrame. Note that Columns function is different than Columns property (difference is that the function has parameters). Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).Columns(["right"]).ToCsv() Returns: right zero two three |
ExcludeValues (DataFrame) |
|
Creates a new DataFrame having only rows for which given column does not have any of the specified values. Examples: ToDataFrame([[0, "zero"], [1, "one"], [2, "two"]], ["id", "left"]).ExcludeValues("id", 1).ToCsv() Returns: id;left 0;zero 2;two ToDataFrame([[0, "zero"], [1, "one"], [2, "two"]], ["id", "left"]).ExcludeValues("left", ["one", "two", "three"]).ToCsv() Returns: id;left 0;zero |
Head (DataFrame) |
Number of top rows |
Creates a new DataFrame that only contains the defined top number of rows of this DataFrame. If the DataFrame has less than the defined top rows, all rows are returned. Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).Head(2).ToCsv() Results string: id;right 0;zero 2;two |
IncludeOnlyValues |
|
Create a new DataFrame containing only those rows for which the given column has any of the given values. Values can be provided as a single object (if there is only one value) or an array of objects (if multiple values). Examples: ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]).IncludeOnlyValues("id", 1).ToCsv() Returns: id;left 1;one ToDataFrame([[0, "zero"], [1, "one"], [3, "three"]], ["id", "left"]).IncludeOnlyValues("id", [0, 1, 2]).ToCsv() Returns: id;left 0;zero 1;one ToDataFrame([[0, "zero"], [1, "one"], [3, "three"]], ["id", "left"]).IncludeOnlyValues("left", ["zero", "three"]).ToCsv() Returns: id;left 0;zero 3;three df.IncludeOnlyValues("EventType", ["start", "end"]) Returns a data frame containing all the rows in DataFrame df that have either "start" or "end" value in the column "EventType". |
Join (DataFrame) |
|
See Joining DataFrames. |
GroupBy (GroupedDataFrame) |
Grouped columns (string array) |
Creates a GroupedDataFrame object based based on given columns. Takes as a parameter an array of column names, based on which to group the rows. For examples, see the Aggregate function. |
GroupBy (DataFrame) |
|
Creates a new DataFrame based on the current DataFrame. The resulting DataFrame has rows grouped by given columns and values aggregated using given functions. In the resulting DataFrame one row in the end result corresponds with one group. Parameters:
Examples: ToDataFrame([[0, "zero"], [0, "zero2"], [2, "two"], [2, "two"], [2, "two3"], [3, "three"]], ["id", "text"]).GroupBy(["id"], [ "ids": () => Sum(id), "texts": () => StringJoin(",", text), "constant": 123 ]).ToCsv() Returns: ids;texts;constant 0;zero,zero2;123 6;two,two,two3;123 3;three;123 ToDataFrame([[0, "zero"], [0, "zero2"], [2, "two"], [2, "two"], [2, "two3"], [3, "three"]], ["id", "text"]).GroupBy(["id", "text"], [ "ids": () => Sum(id), "texts": () => StringJoin(",", text), "constant": 123 ]).ToCsv() Returns: ids;texts;constant 0;zero;123 0;zero2;123 4;two,two;123 2;two3;123 3;three;123 Analysis("OperationLog") .GroupBy( ["User Name"], [ "User Name": () => Column("User Name")[0], "Count": () => CountTop(Rows), "Avg. Duration": () => Average(Duration), "Max. Duration": () => Max(Duration) ] ).ToCsv() Returns (similar to this): User Name;Count;Avg. Duration;Max. Duration ;207;0.617434782608696;20.556 Administrator;665;16.3750631578947;4225.497 qpr;128;2.158765625;20.346 |
Merge (DataFrame) |
See Merging DataFrames. | |
OrderBy (DataFrame) |
|
Creates a new DataFrame having rows ordered in an ascending order using the given expression(s) evaluated on each row. Rows that have same ordering value in the primary ordering exprssion, are sorted based on the secondary ordering expression. Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "text"]).OrderBy(text).ToCsv() Returns: id;right 3;three 2;two 0;zero Analysis("OperationLog").OrderBy(Duration).Head(1) Results string: id;right 0;zero 2;two 3;three el.Analysis("EventTypes").OrderBy(Count, Name) Returns event types in eventlog ordered primarily by Count and secondarily by Name. If there is a need to sort by some column ascending and some column descending, the sortings can be chained. Example: el.Analysis("EventTypes").OrderByDescending(Name).OrderBy(Count) Returns event types in eventlog ordered primarily by Count ascending and secondarily by Name descending. |
OrderByColumns (DataFrame) |
|
Creates a new DataFrame having rows ordered by given columns in given directions. Note that the ordered columns need to contain same type of data, because ordering is not possible between different data types. Parameters:
Null values are always first in the order (both in ascending and descending order). Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "text"]).OrderByColumns(["id"], [false]).ToCsv() Returns string: id;text 3;three 2;two 0;zero ToDataFrame([[0, "zero"], [0, "nolla"], [2, "two"], [3, "three"]], ["id", "text"]).OrderByColumns(["id", "text"], [true, false]).ToCsv() Returns string: id;text 0;zero 0;nolla 2;two 3;three |
OrderByDescending (DataFrame) |
|
Creates a new DataFrame having rows ordered in an descending order using the given expression(s) evaluated on each row. See OrderBy above for examples. |
Persist (DataTable) |
|
Writes a DataFrame into a datatable. If a datatable with that name does not exist in the project, a new datatable is created. If a datatable with that name already exists, the DataFrame will be stored into that DataTable. The function returns the written datatable object. The following parameters are supported:
Examples: let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]); right.Persist("RightDataTable", ["ProjectName": "TestData"]) Results: Id of the new data table named "RightDataTable" created into project named TestData (which is created if it doesn't already exist). If the table already existed, its contents will be overwritten by the new content. let newData = ToDataFrame([[4, "four"]], ["id", "right"]); newData.Persist("RightDataTable", ["ProjectName": "TestData", "Append": true]) Results: Id of the new data table named "RightDataTable" created into project named TestData (which is created if it doesn't already exist). If the table already existed, new content will be appended into the end of the table. The following script reads data from a datatable (MyProject -> MyDatatable) convert one column (MyColumn) into floats and writes the data back to the same datatable. let project = (Projects.Where(Name=="MyProject"))[0]; let datatable = (project.Datatables.Where(Name=="MyDatatable"))[0]; DatatableById(datatable.Id).DataFrame .SetColumns([ "MyColumn": () => ToFloat(Column("MyColumn")) ]) .Persist(datatable.Name, ["ProjectId": project.Id, "Append": false]); |
RemoveColumns (DataFrame) |
Column names (string array) |
Creates new DataFrame where the defined columns have been removed. Throws an exception if the original DataFrame doesn't contain any of the defined columns. Examples: ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]).RemoveColumns(["id"]).ToCsv() Returns string: left zero one |
RenameColumns (DataFrame) | Key-value pairs of name mappings |
Renames DataFrame columns. Takes a parameter of key-value pairs describing how old names (value) are changed to new names (key). Throws an exception if any of the (old) column names don't exist. Renaming DataFrame columns doesn't copy the data, so it's a quick operation for even large datasets. Examples: ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]).RenameColumns(["newId": "id", "newLeft": "left"]).ToCsv() Returns string: newId;newLeft 0;zero 1;one |
Select | Column names (string array, or key-value pairs) |
Creates a new DataFrame where only the selected columns are included. Allows also to change column names when the parameter contains key-value pairs where the original column names are as values and new columns names as keys (see the examples). Throws an exception if any of the columns specified does not exist. Examples: ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]).Select(["left"]).ToCsv() Returns string: left zero one ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]).Select(["newLeft": "left"]).ToCsv() Returns string: newLeft zero one ToDataFrame([[0, "zero", "nolla"], [1, "one", "yksi"]], ["id", "left", "right"]).Select(["left": "newLeft", "right"]).ToCsv() Returns string: newLeft;right zero;nolla one;yksi |
SelectDistinct | Column names (string array, or key-value pairs) |
Returns only distinct values within given columns or all columns (if no columns are specified). Takes as parameter the column names defining for which columns the distinct value combinations are returned. When no parameter is provided, returns all columns in the dataframe. Supports also renaming columns similar to the Select function. Examples: ToDataFrame([[0, "Apple"], [1, "Orange"], [1, "Orange"], [1, "Apple"], [2, "Apple"]], ["number", "text"]) .SelectDistinct() Returns: number;text 0;Apple 1;Orange 1;Apple 2;Apple ToDataFrame([[0, "Apple"], [1, "Orange"], [1, "Orange"], [1, "Apple"], [2, "Apple"]], ["number", "text"]) .SelectDistinct(["text"]) Returns: text Apple Orange |
SetColumns (DataFrame) |
New/modified columns as array |
Creates a new DataFrame based on the current DataFrame, where new columns have been created and/or existing columns have been modified. New and modified columns are defined using an array, where the column name is as a key and as a value there is the expression to calculate the new or modified column. When specifying a column that already exists, the column values are modified. When specifying a new column name, that column is created as a new column to the resulting DataFrame. Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "text"]).SetColumns([ "both": () => text + "=" + id ]).ToCsv() Returns: id;text;both 0;zero;zero=0 2;two;two=2 3;three;three=3 ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "text"]).SetColumns([ "text": () => text + "=" + id ]).ToCsv() Returns: id;text 0;zero=0 2;two=2 3;three=3 ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "text"]).SetColumns([ "both": () => text + "=" + id, "both+1": () => both + 1, "text": () => "Done: " + Column("both+1"), "constant": 1234 ]).ToCsv() Returns: id;text;both;both+1;constant 0;Done: zero=01;zero=0;zero=01;1234 2;Done: two=21;two=2;two=21;1234 3;Done: three=31;three=3;three=31;1234 |
Skip (DataFrame) | Number of rows to skip | Returns DataFrame where the defined number of rows is removed ("skipped"). Returns an empty DataFrame if skipping more rows than there are rows in the source DataFrame.
Examples: ToDataFrame([[0, "zero"], [1, "one"], [2, "two"], [3, "three"]], ["id", "right"]).Skip(1).ToCsv() Results string: id;right 1;one 2;two 3;three |
Tail (DataFrame) |
Number of rows |
Creates a new DataFrame that has only the bottom number of rows of this DataFrame. If the DataFrame has less than n rows, all its rows are returned. Example: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).Tail(2).ToCsv() Results string: id;right 2;two 3;three |
ToCsv (String) | includeHeaders (boolean) |
Converts a DataFrame into a CSV data (i.e. string). The CSV data has the following formatting:
Parameter includeHeaders defines whether the header (column names) is returned (true, default) or not (false) as the first row. Example: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).ToCsv() Returns: id;right 0;zero 2;two 3;three |
WithDenseRankNumberColumn (DataFrame) |
|
Similar to the WithRankColumn function, except rank numbers doesn't contain gaps when there are rows with same rank values. let data = ToDataFrame([ ["A", "Dallas", 8], ["B", "Dallas", 5], ["C", "Dallas", 5], ["D", "Dallas", 4], ["B", "New York", 6], ["C", "New York", 2] ], ["Customer", "Region", "Revenue"]); data.WithDenseRankColumn("Rank", ["Revenue"]).OrderByColumns(["Revenue"], [true]).ToCsv(); Returns: Customer;Region;Revenue;Rank C;New York;2;1 D;Dallas;4;2 B;Dallas;5;3 C;Dallas;5;3 B;New York;6;4 A;Dallas;8;5 |
WithColumn (DataFrame) |
|
Creates new DataFrame with new column having value evaluated using given expression. If the column name already exist in the original DataFrame, it gets replaced with the evaluated expression. ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "text"]).WithColumn("combinedtext", text + "=" + (id * 2)).ToCsv() Returns: id;text;combinedtext 0;zero;zero=0 2;two;two=4 3;three;three=6 |
WithRankColumn (DataFrame) |
|
Similar to the WithRowNumberColumn function, except the produced numbering is based on the ranking logic. The difference to the row number is that the rows with an equal sorting value, gets the same rank number. The numbering continues being same as the row number, i.e. there are gaps in the assigned ranks if there are rows same rank values (see the example). let data = ToDataFrame([ ["A", "Dallas", 8], ["B", "Dallas", 5], ["C", "Dallas", 5], ["D", "Dallas", 4], ["B", "New York", 6], ["C", "New York", 2] ], ["Customer", "Region", "Revenue"]); data.WithRankColumn("Rank", ["Revenue"]).OrderByColumns(["Revenue"], [true]).ToCsv(); Returns: Customer;Region;Revenue;Rank C;New York;2;1 D;Dallas;4;2 B;Dallas;5;3 C;Dallas;5;3 B;New York;6;5 A;Dallas;8;6 |
WithRowNumberColumn (DataFrame) |
|
Creates new DataFrame with a new column containing row numbers based on defined partitions and ordering. Partitions are defined as one or several columns, and each partition will contain own row numbering starting from one. Rows are ordered within each partition and the row numbering is based in the ordering (which is separate than the ordering of the result data). Parameters:
Examples: let data = ToDataFrame([ ["A", "Dallas", 8], ["B", "Dallas", 5], ["C", "Dallas", 4], ["B", "New York", 6], ["C", "New York", 2] ], ["Customer", "Region", "Revenue"]); data.WithRowNumberColumn("Order", ["Revenue"]).OrderByColumns(["Revenue"], [true]).ToCsv(); Returns: Customer;Region;Revenue;Order C;New York;2;1 C;Dallas;4;2 B;Dallas;5;3 B;New York;6;4 A;Dallas;8;5 data.WithRowNumberColumn("Order", ["Revenue"], ["Region"]).OrderByColumns(["Region", "Revenue"], [true, true]).ToCsv() Returns: Customer;Region;Revenue;Order C;Dallas;4;1 B;Dallas;5;2 A;Dallas;8;3 C;New York;2;1 B;New York;6;2 data.WithRowNumberColumn("Order", ["Revenue"], ["Region"], ["false"]).OrderByColumns(["Region", "Revenue"], [true, false]).ToCsv() Returns: Customer;Region;Revenue;Order A;Dallas;8;1 B;Dallas;5;2 C;Dallas;4;3 B;New York;6;1 C;New York;2;2 |
Where (DataFrame) |
Condition expression |
Creates a new DataFrame having only rows for which the given condition expression returns true. The condition expression can refer to the columns of the DataFrame (see the example below). Examples: let df = ToDataFrame([[0, "zero"], [2, "two", true], [3, "three"]], ["id", "string"]); All the following expression return the same: df.Where(id < 3); df.Where(Column("id") < 3); df.Where(_[0] < 3); Returns: id;string 0;zero 2;two |
Zip (DataFrame) |
DataFrame |
Creates a new DataFrame that has the contents of given DataFrame appended as new columns into the end of this DataFrame. Returns a new DataFrame that has the colums from both the data frames so that the columns from the other DataFrame are appended to the end of the columns of this DataFrame. If the number of rows is different between this DataFrame and the other DataFrame, an exception is thrown. There must not be duplicate column names in the DataFrames - otherwise an exception is thrown. Examples: let df1 = ToDataFrame([[0, "zero"], [1, "one"], [4, "four"]], ["id", "text"]); let df2 = ToDataFrame([[1, "one"], [2, "two"], [3, "three"]], ["id2", "text2"]); df1.Zip(df2).ToCsv(); Returns: id;text;id2;text2 0;zero;1;one 1;one;2;two 4;four;3;three |
The following functions can be used to initialize DataFrame objects:
Function | Parameters | Description |
---|---|---|
ToDataFrame |
|
Creates a DataFrame object containing the given data (in two dimensional array) and the array of column names. Number of column names must be the same as the number of columns in the two dimensional array. To be able to define data types for the columns, the second parameter can also be an array of dictionaries (one for each column), where each dictionary contains Name and DataType properties (see an example below). Available data types are String, Integer, Float, DateTime, Boolean, Duration (Timespan) and Any (can contain any type of data). The first parameter can also be an existing DataFrame which will create a copy of the DataFrame. Examples: ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]).ToCsv() Returns: A string containing: id;right 0;zero 2;two 3;three ToDataFrame( [[0, "zero"], [2, "two"], [3, "three"]], [ #{"Name": "id", "DataType": "Integer"}, #{"Name": "right", "DataType": "String"} ] ).ToCsv(); Returns: id;right 0;zero 2;two 3;three |
Joining DataFrames
Performs a joining operation between two DataFrames.
Parameters:
- DataFrame: The other DataFrame to join.
- Columns to match: Columns which the joining is based on, can be defined as follows:
- If joining using one column having the same name in both DataFrames, the column name is specified as as string.
- If joining using several columns having the same names in both DataFrames, the column names are specified as a string array.
- If joining using columns having different names between the DataFrames, columns are specified as an array of key-value pairs, where the key is the column name in the left side DataFrame, and value is the column name in the right side DataFrame.
- Join type which can be
- inner (default): row is generated if both DataFrames have the key.
- leftouter: at least one row is generated for each left side DataFrame row, even if there is no matching other row (in that case null is given as value for the other columns).
- rightouter: at least one row is generated for each right side DataFrame row, even if there is no matching other row (in that case null is given as value for the other columns).
- outer: at least one row is generated both for the left and right side DataFrames even if there is no matching other row (in that case null is given as value for the other columns).
Examples:
let left = ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]); let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]); left.join(right, "id").ToCsv() Returns: id;left;right 0;zero;zero let left = ToDataFrame([[0, "zero"], [1, "one"]], ["id", "left"]); let right = ToDataFrame([[0, "zero"], [2, "two"], [3, "three"]], ["id", "right"]); left.join(right, "id", "leftouter").ToCsv() Returns: id;left;right 0;zero;zero 1;one; let left = ToDataFrame([[0, 0, "zerozeroleft"], [0, 1, "zeroleft"], [1, 2, "oneleft"]], ["idleft1", "idleft2", "left"]); let right = ToDataFrame([[0, 0, "zerozeroright"], [0, 1, "zeroright"], [2, 3, "tworight"], [3, 4, "threeright"]], ["idright1", "idright2", "right"]); left.join(right, ["idleft1": "idright1"], "inner"); Returns: idleft1;idleft2;left;idright2;right 0;0;zerozeroleft;0;zerozeroright 0;0;zerozeroleft;1;zeroright 0;1;zeroleft;0;zerozeroright 0;1;zeroleft;1;zeroright let left = ToDataFrame([[0, 0, "zerozeroleft"], [0, 1, "zeroleft"], [1, 2, "oneleft"]], ["idleft1", "idleft2", "left"]); let right = ToDataFrame([[0, 0, "zerozeroright"], [0, 1, "zeroright"], [2, 3, "tworight"], [3, 4, "threeright"]], ["idright1", "idright2", "right"]); left.join(right, ["idleft1": "idright1", "idleft2": "idright2"], "inner"); Returns: idleft1;idleft2;left;right 0;0;zerozeroleft;zerozeroright 0;1;zeroleft;zeroright
Merging DataFrames
Creates a new DataFrame that has the contents of a DataFrame (target) merged with another DataFrame (source). The merging works in the same principle as in the SQL language. Note that the merging does not create new columns but the result DataFrame has the same columns as the target DataFrame. Merging has the following principle:
Parameters:
- Source DataFrame: DataFrame to be merged with the target DataFrame.
- Columns to used for matching: Columns used to match the source and target DataFrames. This parameter has similar syntax as the 2. parameter in the join function.
- Columns to UPDATE for matching rows: Columns to update from the source DataFrame to the target DataFrame for the rows that match between the DataFrames. Note that the matching columns cannot be updated because they always have same values due to the matching logic. This parameter has similar syntax as the 2. parameter in the join function. Following special values can also be used:
- If the value is _remove, all matching rows are be deleted from the resulting DataFrame.
- If the value is [], no columns are updated to the resulting DataFrame.
- If the value is null (default), all columns are copied from the source to target (the ones having matching names).
- Columns to CREATE for non-match rows in source: Columns to create to the result DataFrame, if a matching row in the source DataFrame is not found in the target DataFrame. This parameter has similar syntax as the 2. parameter in the join function. Following special values can also be used:
- If the value is _remove, no new rows are created to the resulting DataFrame.
- If the value is [], new rows are created to the resulting DataFrame but all columns of the created rows get empty values.
- If the value is null (default), when creating rows, all columns are copied from the source to target (the ones having matching names).
- Keep or DELETE non-matching rows in target: Boolean value that defines whether those rows in the target DataFrame where no matching row in the source DataFrame is found, are included in the result DataFrame (true) or deleted (false). The default value is true.
Examples:
Update case attribute values for some cases and create non-existing:
let CaseData = ToDataFrame([ ["0", "New York", 6], ["1", "Dallas", 3], ["2", "Dallas", 8], ["3", "Chicago", 4], ["4", "New York", 2] ], ["Case id", "Region", "Cost"]); let UpdatedData = ToDataFrame([ ["0", 7], ["1", 5], ["4", 2], ["5", 2] ], ["Case id", "Cost"]); CaseData.Merge(UpdatedData, "Case id").toCsv(); Returns: Case id;Region;Cost 0;New York;7 1;Dallas;5 2;Dallas;8 3;Chicago;4 4;New York;2 5;;2
Update case attribute values for some cases and don't create non-existing:
let CaseData = ToDataFrame([ ["0", "New York", 6], ["1", "Dallas", 3], ["2", "Dallas", 8], ["3", "Chicago", 4], ["4", "New York", 2] ], ["Case id", "Region", "Cost"]); let UpdatedData = ToDataFrame([ ["0", 7], ["1", 5], ["4", 2], ["5", 2] ], ["Case id", "Cost"]); CaseData.Merge(UpdatedData, "Case id", null, _remove).toCsv(); Returns: Case id;Region;Cost 0;New York;7 1;Dallas;5 2;Dallas;8 3;Chicago;4 4;New York;2
Update case attribute values for some cases (create non-existing) where columns to match have different names in the source and target DataFrames ("Case id" in target and "Case" in source are used to match, and "Cost" in target is updated from "Variable Cost" in source):
let CaseData = ToDataFrame([ ["0", "New York", 6], ["1", "Dallas", 3], ["2", "Dallas", 8], ["3", "Chicago", 4], ["4", "New York", 2] ], ["Case id", "Region", "Cost"]); let UpdatedData = ToDataFrame([ ["0", 7, 4], ["1", 5, 2], ["4", 2, 0], ["5", 2, 1] ], ["Case", "Cost", "Variable Cost"]); CaseData.Merge(UpdatedData, ["Case id": "Case"], ["Cost": "Variable Cost"]).toCsv(); Returns: Case id;Region;Cost 0;New York;4 1;Dallas;2 2;Dallas;8 3;Chicago;4 4;New York;0 5;;1
Delete matching cases (don't create non-matching):
let CaseData = ToDataFrame([ ["0", "New York", 6], ["1", "Dallas", 3], ["2", "Dallas", 8], ["3", "Chicago", 4], ["4", "New York", 2] ], ["Case id", "Region", "Cost"]); let UpdatedData = ToDataFrame([ ["0"], ["1"], ["4"], ["5"] ], ["Case id"]); CaseData.Merge(UpdatedData, "Case id", _remove, _remove).toCsv(); Returns: Case id;Region;Cost 2;Dallas;8 3;Chicago;4
Update matching cases, create non-matching by source as new, and delete non-matching by target:
let CaseData = ToDataFrame([ ["0", "New York", 6], ["1", "Dallas", 3], ["2", "Dallas", 8], ["3", "Chicago", 4], ["4", "New York", 2] ], ["Case id", "Region", "Cost"]); let UpdatedData = ToDataFrame([ ["0", 7], ["1", 5], ["4", 2], ["5", 2] ], ["Case id", "Cost"]); CaseData.Merge(UpdatedData, "Case id", null, null, false).toCsv(); Returns: Case id;Region;Cost 0;New York;7 1;Dallas;5 4;New York;2 5;;2
let target = ToDataFrame([[0, "zero", "target"], [1, "", "target"]], ["id", "text", "frame"]); let source = ToDataFrame([[1, "one", "source"], [2, "two", "source"], [3, "three", "source"]], ["id", "text", "frame"]); target.Merge(source, "id").ToCsv() Returns (one key, default parameters, identical dataframe columns): id;text;frame 0;zero;target 1;one;source 2;two;source 3;three;source target.Merge(source, "id", ["text"]).ToCsv() Returns (one key, default parameters, identical dataframe columns, copy only text column from source): id;text;frame 0;zero;target 1;one;target 2;two; 3;three; target.Merge(source, "id", ["text"], _remove).ToCsv() Returns (one key, default parameters, identical dataframe columns, copy only text column from source, remove rows found only in source): id;text;frame 0;zero;target 1;one;target target.Merge(source, "id", ["text"], _remove, false).ToCsv() Returns (one key, identical dataframe columns, copy only text column from source, remove rows found only in source or only in target): id;text;frame 1;one;target target.Merge(source, "id", _remove, _remove, false).ToCsv() Returns (one key, identical dataframe columns, remove all rows): id;text;frame let target = ToDataFrame([[0, 0, "zerozeroleft", "target"], [0, 1, "zeroleft", "target"], [1, 2, "left", "target"], [4, 5, "fourleft", "target"]], ["idleft1", "idleft2", "textleft", "frame"]); let source = ToDataFrame([[0, 0, "zerozeroright", "source"], [0, 1, "zeroright", "target"], [1, 2, "oneright", "source"], [2, 3, "tworight", "source"], [3, 4, "threeright", "source"]], ["idright1", "idright2", "textright", "frame"]); target.Merge(source, ["idleft1": "idright1"]).ToCsv() Returns (one key, default parameters, different dataframe columns, copy all matching columns): idleft1;idleft2;textleft;frame 0;0;zerozeroleft;source 0;0;zerozeroleft;target 0;1;zeroleft;source 0;1;zeroleft;target 1;2;left;source 4;5;fourleft;target 2;;;source 3;;;source target.Merge(source, ["idleft1": "idright1"], []).ToCsv() Returns (one key, default parameters, different dataframe columns, copy only key column): idleft1;idleft2;textleft;frame 0;0;zerozeroleft;target 0;0;zerozeroleft;target 0;1;zeroleft;target 0;1;zeroleft;target 1;2;left;target 4;5;fourleft;target 2;;; 3;;; target.Merge(source, ["idleft1": "idright1", "idleft2": "idright2"]).ToCsv() Returns (two keys, default parameters, different dataframe columns, copy all matching columns): idleft1;idleft2;textleft;frame 0;0;zerozeroleft;source 0;1;zeroleft;target 1;2;left;source 4;5;fourleft;target 2;3;;source 3;4;;source target.Merge(source, ["idleft1": "idright1", "idleft2": "idright2"], ["textleft": "textright"]).ToCsv() Returns (two keys, default parameters, different dataframe columns, copy only textright -column): idleft1;idleft2;textleft;frame 0;0;zerozeroright;target 0;1;zeroright;target 1;2;oneright;target 4;5;fourleft;target 2;3;tworight; 3;4;threeright; target.Merge(source, ["idleft1": "idright1", "idleft2": "idright2"], ["textleft": "textright", "frame"]).ToCsv() Returns (two keys, default parameters, different dataframe columns, copy textright and frame -columns): idleft1;idleft2;textleft;frame 0;0;zerozeroright;source 0;1;zeroright;target 1;2;oneright;source 4;5;fourleft;target 2;3;tworight;source 3;4;threeright;source target.Merge(source, ["idleft1": "idright1", "idleft2": "idright2"], _remove).ToCsv() Returns (two keys, default parameters, different dataframe columns, remove all matching rows, copy only matching columns): idleft1;idleft2;textleft;frame 4;5;fourleft;target 2;3;; 3;4;; target.Merge(source, ["idleft1": "idright1", "idleft2": "idright2"], ["textleft": "textright"], ["idleft1": "idright1", "idleft2": "idright2", "textleft": "textright"]).ToCsv() Returns (two keys, default parameters, different dataframe columns, copy only textright-column for matching columns, copy id columns and textright-column for rows not found in target): idleft1;idleft2;textleft;frame 0;0;zerozeroright;target 0;1;zeroright;target 1;2;oneright;target 4;5;fourleft;target 2;3;tworight; 3;4;threeright; target.Merge(source, ["idleft1": "idright1", "idleft2": "idright2"], ["textleft": "textright"], ["idleft1": "idright1", "idleft2": "idright2", "textleft": "textright", "frame"]).ToCsv() Returns (two keys, default parameters, different dataframe columns, copy only textright-column for matching columns, copy id, frame and textright-column for rows not found in target): idleft1;idleft2;textleft;frame 0;0;zerozeroright;target 0;1;zeroright;target 1;2;oneright;target 4;5;fourleft;target 2;3;tworight;source 3;4;threeright;source target.Merge(source, ["idleft1": "idright1", "idleft2": "idright2"], null, ["idleft1": "idright1", "idleft2": "idright2", "textleft": "textright"]).ToCsv() Returns (two keys, default parameters, different dataframe columns, don't copy any columns from source for matching columns, copy id columns and textright-column for rows not found in target): idleft1;idleft2;textleft;frame 0;0;zerozeroleft;source 0;1;zeroleft;target 1;2;left;source 4;5;fourleft;target 2;3;tworight; 3;4;threeright; target.Merge(source, ["idleft1": "idright1", "idleft2": "idright2"], ["textleft": "textright"], ["idleft1": "idright1", "idleft2": "idright2", "textleft": "textright", "frame"], false).ToCsv() Returns (two keys, default parameters, different dataframe columns, copy only textright-column for matching columns, copy id, frame and textright-column for rows not found in target, remove all rows not found in source): idleft1;idleft2;textleft;frame 0;0;zerozeroright;target 0;1;zeroright;target 1;2;oneright;target 2;3;tworight;source 3;4;threeright;source