DataFlow in Expression Language: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
DataFlow is an object representing a stream of tabular data. DataFlow contains data with the similar structure as DataFrame, but difference is that in the DataFrame all its contents is stored to the system memory. If there is lot of data, also lot of memory is required when using DataFrames. On the other hand in the DataFlow, contents "flows" from the source to the destination, and data can be manipulated, while having only a small portion of the entire data in memory at the same time. Thus, DataFlows are suitable for ETL where data volumes are high.
DataFlow is an object representing a stream of tabular data. DataFlow contains data with the similar structure as DataFrame, but difference is that in the DataFrame all its contents is stored to the system memory. If there is lot of data, also lot of memory is required when using DataFrames. On the other hand in the DataFlow, contents "flows" from the source to the destination, and data can be manipulated, while having only a small portion of the entire data in memory at the same time. Thus, DataFlows are suitable for ETL where data volumes are high.
DataFlow continues to run until it ''completes''. DataFlow will complete automatically, when all queried items have been returned. DataFlow can also be completed explicitly by calling the ''Complete'' function. When the DataFlow has been completed, no new items can be added to it. When collecting the DataFlow to an in-memory DataFrame, the ''Collect'' call waits until the DataFlow completes, to make sure all items are included to the DataFrame.


{| class="wikitable"
{| class="wikitable"
Line 5: Line 7:
! '''Description'''
! '''Description'''
|-
|-
||IsCompleted
||IsCompleted (boolean)
||Returns true only if Complete-function (#71364#) has been called for the DataFlow and there are no more unread items in the it.
||Returns true when the ''Complete'' function has been called for the DataFlow and there are no more unread items in it.
|}
|}


Line 15: Line 17:
|-
|-
||Append
||Append
||DataFrame to append
||
||
||
Appends given DataFrame to the DataFlow. Returns the DataFlow object.
Appends given IDataFrame (#70613#) object to the DataFlow.
2. Parameters:
2.1. dataFrame: DataFrame to add to data flow.
3. Returns the DataFlow object itself.


Examples:
Examples:
Line 32: Line 31:
|-
|-
||Collect
||Collect
||Parameters (Dictionary)
||
||
||
Returns in-memory DataFrame extracted from the DataFlow. Returns the extracted data as in-memory DataFrame or null if either the timeout has been exceeded or the flow has been completed and is empty.
Returns in-memory DataFrame (#63866#) extracted from the DataFlow.
Parameters:
2. Parameters:
# CollectChunk: Takes the next IDataFrame (#70613#) object found in the data flow and converts it into an in-memory DataFrame.
2.1. parameters: Optional parameters for the collect operation. Its value should be convertible to a StringDictionary (#48323#).
# Timeout: Maximum number of milliseconds to wait for data to appear into the DataFlow before exiting with null value as result.
2.1.1. Supports the following keys with values:
2.1.1.1. CollectChunk:
2.1.1.1.1. Takes the next IDataFrame (#70613#) object found in the data flow and converts it into an in-memory DataFrame.
2.1.1.2. Timeout:
2.1.1.2.1. The maximum number of milliseconds to wait for data to appear into the DataFlow before exiting with null value as result.
3. Returns the extracted data as in-memory DataFrame. or null if either the timeout has been exceeded or the flow has been completed and is empty.
3.1. Removes all the returned objects from the DataFlow.


Examples:
Examples:
Line 63: Line 56:
   .ToCsv()
   .ToCsv()
</pre>
</pre>
|-
|-
||Complete
||Complete
||(none)
||(none)
||
||
Completes DataFlow. After a DataFlow has been completed, no new items can be added into it. An exception will be thrown if such an attempt is made. Returns the DataFlow.
Declares that the DataFlow is completed, i.e., there won't be any new items anymore added to the DataFlow.


Examples:
Examples:

Revision as of 17:10, 8 December 2022

DataFlow is an object representing a stream of tabular data. DataFlow contains data with the similar structure as DataFrame, but difference is that in the DataFrame all its contents is stored to the system memory. If there is lot of data, also lot of memory is required when using DataFrames. On the other hand in the DataFlow, contents "flows" from the source to the destination, and data can be manipulated, while having only a small portion of the entire data in memory at the same time. Thus, DataFlows are suitable for ETL where data volumes are high.

DataFlow continues to run until it completes. DataFlow will complete automatically, when all queried items have been returned. DataFlow can also be completed explicitly by calling the Complete function. When the DataFlow has been completed, no new items can be added to it. When collecting the DataFlow to an in-memory DataFrame, the Collect call waits until the DataFlow completes, to make sure all items are included to the DataFrame.

Property Description
IsCompleted (boolean) Returns true when the Complete function has been called for the DataFlow and there are no more unread items in it.
Function Parameters Description
Append DataFrame to append

Appends given DataFrame to the DataFlow. Returns the DataFlow object.

Examples:

ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
Collect Parameters (Dictionary)

Returns in-memory DataFrame extracted from the DataFlow. Returns the extracted data as in-memory DataFrame or null if either the timeout has been exceeded or the flow has been completed and is empty. Parameters:

  1. CollectChunk: Takes the next IDataFrame (#70613#) object found in the data flow and converts it into an in-memory DataFrame.
  2. Timeout: Maximum number of milliseconds to wait for data to appear into the DataFlow before exiting with null value as result.

Examples:

ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect(#{"CollectChunk": true})
  .ToCsv()
Complete (none)

Declares that the DataFlow is completed, i.e., there won't be any new items anymore added to the DataFlow.

Examples:

ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
Persist (Datatable)
  1. Datatable name (String)
  2. Additional parameters (Dictionary)
Writes DataFlow into datatable. Works similarly as the same function in the DataFrame.

Create new DataFlow:

Function Parameters Description
Persist (Datatable)
  1. DataFrame

Creates new DataFlow and optionally initializes it with given DataFrame.

Examples:

ToDataFlow(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()