DataFlow in Expression Language: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
DataFlow is an object representing a stream of tabular data. DataFlow contains data with the similar structure as DataFrame, but difference is that in the DataFrame all its contents is stored to the system memory. If there is lot of data, also lot of memory is required when using DataFrames. On the other hand in the DataFlow, contents "flows" from the source to the destination, and data can be manipulated, while having only a small portion of the entire data in memory at the same time. Thus, DataFlows are suitable for ETL where data volumes are high.
DataFlow is an object representing a stream of tabular data. DataFlow contains data with the similar structure as DataFrame, but difference is that in the DataFrame all its contents is stored to the system memory. If there is lot of data, also lot of memory is required when using DataFrames. On the other hand in the DataFlow, contents "flows" from the source to the destination, and data can be manipulated, while having only a small portion of the entire data in memory at the same time. Thus, DataFlows are suitable for ETL where data volumes are high.
{| class="wikitable"
!'''Property'''
! '''Description'''
|-
||IsCompleted
||Returns true only if Complete-function (#71364#) has been called for the DataFlow and there are no more unread items in the it.
|}


{| class="wikitable"
{| class="wikitable"
Line 5: Line 13:
!'''Parameters'''
!'''Parameters'''
! '''Description'''
! '''Description'''
|-
||Append
||
||
Appends given IDataFrame (#70613#) object to the DataFlow.
2. Parameters:
2.1. dataFrame: DataFrame to add to data flow.
3. Returns the DataFlow object itself.
Examples:
<pre>
ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
</pre>
|-
||Collect
||
||
Returns in-memory DataFrame (#63866#) extracted from the DataFlow.
2. Parameters:
2.1. parameters: Optional parameters for the collect operation. Its value should be convertible to a StringDictionary (#48323#).
2.1.1. Supports the following keys with values:
2.1.1.1. CollectChunk:
2.1.1.1.1. Takes the next IDataFrame (#70613#) object found in the data flow and converts it into an in-memory DataFrame.
2.1.1.2. Timeout:
2.1.1.2.1. The maximum number of milliseconds to wait for data to appear into the DataFlow before exiting with null value as result.
3. Returns the extracted data as in-memory DataFrame. or null if either the timeout has been exceeded or the flow has been completed and is empty.
3.1. Removes all the returned objects from the DataFlow.
Examples:
<pre>
ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
</pre>
<pre>
ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect(#{"CollectChunk": true})
  .ToCsv()
</pre>
|-
||Complete
||(none)
||
Completes DataFlow. After a DataFlow has been completed, no new items can be added into it. An exception will be thrown if such an attempt is made. Returns the DataFlow.
Examples:
<pre>
ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
</pre>
|-
|-
||Persist (Datatable)
||Persist (Datatable)
Line 11: Line 84:
# Additional parameters (Dictionary)
# Additional parameters (Dictionary)
||Writes DataFlow into datatable. Works similarly as the same function in the [[DataFrame_in_Expression_Language#DataFrame_Functions|DataFrame]].
||Writes DataFlow into datatable. Works similarly as the same function in the [[DataFrame_in_Expression_Language#DataFrame_Functions|DataFrame]].
|-
|}
Create new DataFlow:
{| class="wikitable"
!'''Function'''
!'''Parameters'''
! '''Description'''
|-
||Persist (Datatable)
||
# DataFrame
||
Creates new DataFlow and optionally initializes it with given DataFrame.
Examples:
<pre>
ToDataFlow(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
</pre>
|-
|-
|}
|}

Revision as of 09:38, 8 December 2022

DataFlow is an object representing a stream of tabular data. DataFlow contains data with the similar structure as DataFrame, but difference is that in the DataFrame all its contents is stored to the system memory. If there is lot of data, also lot of memory is required when using DataFrames. On the other hand in the DataFlow, contents "flows" from the source to the destination, and data can be manipulated, while having only a small portion of the entire data in memory at the same time. Thus, DataFlows are suitable for ETL where data volumes are high.

Property Description
IsCompleted Returns true only if Complete-function (#71364#) has been called for the DataFlow and there are no more unread items in the it.
Function Parameters Description
Append

Appends given IDataFrame (#70613#) object to the DataFlow. 2. Parameters: 2.1. dataFrame: DataFrame to add to data flow. 3. Returns the DataFlow object itself.

Examples:

ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
Collect

Returns in-memory DataFrame (#63866#) extracted from the DataFlow. 2. Parameters: 2.1. parameters: Optional parameters for the collect operation. Its value should be convertible to a StringDictionary (#48323#). 2.1.1. Supports the following keys with values: 2.1.1.1. CollectChunk: 2.1.1.1.1. Takes the next IDataFrame (#70613#) object found in the data flow and converts it into an in-memory DataFrame. 2.1.1.2. Timeout: 2.1.1.2.1. The maximum number of milliseconds to wait for data to appear into the DataFlow before exiting with null value as result. 3. Returns the extracted data as in-memory DataFrame. or null if either the timeout has been exceeded or the flow has been completed and is empty. 3.1. Removes all the returned objects from the DataFlow.

Examples:

ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect(#{"CollectChunk": true})
  .ToCsv()
Complete (none)

Completes DataFlow. After a DataFlow has been completed, no new items can be added into it. An exception will be thrown if such an attempt is made. Returns the DataFlow.

Examples:

ToDataFlow(ToDataFrame([], ["id", "color"]))
  .Append(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()
Persist (Datatable)
  1. Datatable name (String)
  2. Additional parameters (Dictionary)
Writes DataFlow into datatable. Works similarly as the same function in the DataFrame.

Create new DataFlow:

Function Parameters Description
Persist (Datatable)
  1. DataFrame

Creates new DataFlow and optionally initializes it with given DataFrame.

Examples:

ToDataFlow(ToDataFrame([[1, "red"], [2, "green"]], ["id", "color"]))
  .Append(ToDataFrame([[3, "blue"]], ["id", "color"]))
  .Complete()
  .Collect()
  .ToCsv()