Big Data Chart: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
Big data chart is similar to the in-memory chart, except in the Big data chart backend processing is performed in the datasource where the eventlog data is stored. Depending on the [[QPR_ProcessAnalyzer_Project_Workspace#Models|model type], processing is done in Snowflake (for models using Snowflake datatables) or in SQL Server (for models using Local datatables).
'''Big Data Chart''' is a type of chart visualization that performs backend calculations in the datasource where the eventlog data is stored, whereas the [[QPR_ProcessAnalyzer_Chart|in-memory chart]] uses the in-memory calculation engine. Depending on the [[QPR_ProcessAnalyzer_Project_Workspace#Models|model]], Big Data Chart processing is performed in Snowflake (for models using Snowflake datatables) or in SQL Server (for models using ''Local'' datatables). [[QPR_ProcessAnalyzer_System_Architecture#Snowflake_Powered_Calculation|Snowflake-powered calculation]] will enable practically unlimited scaling when the amount of data and number of users increase. When creating dashboards, Big Data Chart needs to be chosen when using Snowflake models. Big Data Chart can be added to dashboard by selecting the second item from the tool palette (labelled ''Big Data Chart'').


The Big data chart can be added to dashboard by selecting the second item from the tool palette (''Bid Data Chart'').
Big Data Chart can also be used for models with local datatables, performing processing in SQL Server. The benefits are that the model doesn't need to be loaded into memory, consuming less memory in the application server. Also no time need to be spent for the model loading. The disadvantage is that SQL Server is not optimal for analytical queries, meaning in practice insufficient performance in large dataset. Despite the limitation, there are use cases when Big Data Chart is a suitable for models with local datatables:
* Eventlogs are filtered heavily so that the number of remaining cases and events are low (usually maximum of some thousands), maintaining the performance in sufficient level.
* If the model is currently not available in the memory, it's faster to use Big Data Chart comparing to the in-memory chart, when the required time to load the model into memory is taken into account.


== Differences to in-memory chart ==
Visualization settings are the same between the Big Data Chart and in-memory chart. The data selection settings, measures and dimensions work differently. Differences are as follows:
Visualization settings are mainly the same in the Big data chart comparing to the in-memory chart. On the other hand data selection, measures and dimensions work differently. The main differences are:
* There are different set of analyzed objects, measures and dimensions available.
* There are different set of analyzed objects, measures and dimensions available
* Filtering cases and events can be done for each measure and dimension separately. This allows to build most KPI's flexibly without using custom expressions.
* Filtering can be done for each measure and dimension separately
* Measures and dimensions have equal lists of available items. The difference is that an aggregation selection needs to be done for measures. Enabled by this, measures can be moved to dimensions and vice versa by clicking the '''Move to dimensions''' and '''Move to measures''' buttons.
* List of measures and dimensions are equal. They are only separated by the additional aggregation selection that measures have.
* Custom expressions are written as [[SQL_Expressions|SQL expressions]] which differs from the [[Process_Mining_Objects_in_Expression_Language|eventlog objects]] available in the in-memory charts. Note also that measure expressions in Big Data Chart don't include the aggregation logic, and thus the custom measure and dimension expressions are equal.
* Custom expressions are written as [[SQL_Expressions|SQL expressions]] which differs from the [[Process_Mining_Objects_in_Expression_Language|eventlog objects]] available in the in-memory charts. Note also that the measure expression in the Bid Data Chart doesn't contain the aggregation logic, and thus the measure and dimension expression are equal. The aggregation selection from the dropdown list is available also for the custom measures.
* Event attribute used as the event type can be set for each Big Data chart separately, to visualize the process flow from different angles. For more information, see [[QPR_ProcessAnalyzer_Chart#Analyzed_Data|chart settings]].
* The ''Any'' [[Importing Data to Datatable from CSV File#Data types|datatype]] is not supported by the Big Data Chart in case and event attributes. Thus, when importing data, specific datatypes need to be set for each column, for case and event attributes to be available.
* Big Data Chart supports filtering similar to the in-memory chart, i.e., visualizations can be clicked to create filters for the shown data. Big Data Chart does not support [[Filtering_in_QPR_ProcessAnalyzer_Queries#Expression|expression based filter rules]] and thus there are some dimensions where filtering is not available. Other types of filter rules are same for Big Data and in-memory charts. Thus same dashboard can contain both types of chart components, and filtering between them works. Note that when an expression based filter is created from an in-memory chart, Big Data Chart cannot be shown as the filter cannot be calculated.
* Following functionalities supported by in-memory chart are not available in Big Data Chart: Group rows exceeding maximum, Analyzed objects sample size, Find root causes, and Business calendar.
* Following measure/dimension settings are not available: Calculate measure for, Variable name, Custom aggregation expression, and Adjustment expression.
* Big data chart cannot be used with model using ODBC or expression datasources.


Following functionalities supported by the in-memory are not available in the Big data chart: Presets, Group rows exceeding maximum, Analyzed objects sample size, Find root causes, and Business calendars. In addition, the following measure/dimension settings are not available: Round to decimals, Calculate measure for, Variable name, Custom aggregation expression, and Adjustment expression.
Calculation results are mostly the same between the Big Data Chart and in-memory chart, with the following exceptions:
 
* If there are cases with events having exactly same timestamps, in the Big Data Chart the order of events is based on the alphabetical order of event type names. In the in-memory chart, the order is based on the order of loaded data rows in the events datatable. The order of events affects for example the variations and flows the cases are belonging to.
== When to use Big Data Chart ==
* There are slight differences how duration between dates are calculated. Big Data Chart is based on the commonly used datediff function in SQL, which calculates the number of period boundaries between the dates. The in-memory chart calculates the precise duration as a decimal number between the dates. More information about datediff: https://www.w3schools.com/sql/func_sqlserver_datediff.asp.
[QPR_ProcessAnalyzer_System_Architecture#Snowflake_Powered_Calculation|Snowflake powered calculation] will allow practically unlimited scaling when the amount of data and users increases. The Big data chart is the component to use in dashboards for the Snowflake models. In addition, the Big data chart can be used for model using local datatables. Note that the processing is then performed in SQL Server which is not optimal for analytics queries. There are still special usecases when the Big data chart is the best option for model using local datatables:
* Eventlogs are filtered heavily so that the number of remaining cases and events are low (usually maximum of some thousands). Then processing may be done in the SQL Server without using the in-memory processing (which will require less memory)
* If the model is not currently loaded in the memory, the fastest method is to use the Big data chart comparing to the in-memory chart, when also calculating the required time to load the model in-memory.

Revision as of 22:05, 3 July 2022

Big Data Chart is a type of chart visualization that performs backend calculations in the datasource where the eventlog data is stored, whereas the in-memory chart uses the in-memory calculation engine. Depending on the model, Big Data Chart processing is performed in Snowflake (for models using Snowflake datatables) or in SQL Server (for models using Local datatables). Snowflake-powered calculation will enable practically unlimited scaling when the amount of data and number of users increase. When creating dashboards, Big Data Chart needs to be chosen when using Snowflake models. Big Data Chart can be added to dashboard by selecting the second item from the tool palette (labelled Big Data Chart).

Big Data Chart can also be used for models with local datatables, performing processing in SQL Server. The benefits are that the model doesn't need to be loaded into memory, consuming less memory in the application server. Also no time need to be spent for the model loading. The disadvantage is that SQL Server is not optimal for analytical queries, meaning in practice insufficient performance in large dataset. Despite the limitation, there are use cases when Big Data Chart is a suitable for models with local datatables:

  • Eventlogs are filtered heavily so that the number of remaining cases and events are low (usually maximum of some thousands), maintaining the performance in sufficient level.
  • If the model is currently not available in the memory, it's faster to use Big Data Chart comparing to the in-memory chart, when the required time to load the model into memory is taken into account.

Visualization settings are the same between the Big Data Chart and in-memory chart. The data selection settings, measures and dimensions work differently. Differences are as follows:

  • There are different set of analyzed objects, measures and dimensions available.
  • Filtering cases and events can be done for each measure and dimension separately. This allows to build most KPI's flexibly without using custom expressions.
  • Measures and dimensions have equal lists of available items. The difference is that an aggregation selection needs to be done for measures. Enabled by this, measures can be moved to dimensions and vice versa by clicking the Move to dimensions and Move to measures buttons.
  • Custom expressions are written as SQL expressions which differs from the eventlog objects available in the in-memory charts. Note also that measure expressions in Big Data Chart don't include the aggregation logic, and thus the custom measure and dimension expressions are equal.
  • Event attribute used as the event type can be set for each Big Data chart separately, to visualize the process flow from different angles. For more information, see chart settings.
  • The Any datatype is not supported by the Big Data Chart in case and event attributes. Thus, when importing data, specific datatypes need to be set for each column, for case and event attributes to be available.
  • Big Data Chart supports filtering similar to the in-memory chart, i.e., visualizations can be clicked to create filters for the shown data. Big Data Chart does not support expression based filter rules and thus there are some dimensions where filtering is not available. Other types of filter rules are same for Big Data and in-memory charts. Thus same dashboard can contain both types of chart components, and filtering between them works. Note that when an expression based filter is created from an in-memory chart, Big Data Chart cannot be shown as the filter cannot be calculated.
  • Following functionalities supported by in-memory chart are not available in Big Data Chart: Group rows exceeding maximum, Analyzed objects sample size, Find root causes, and Business calendar.
  • Following measure/dimension settings are not available: Calculate measure for, Variable name, Custom aggregation expression, and Adjustment expression.
  • Big data chart cannot be used with model using ODBC or expression datasources.

Calculation results are mostly the same between the Big Data Chart and in-memory chart, with the following exceptions:

  • If there are cases with events having exactly same timestamps, in the Big Data Chart the order of events is based on the alphabetical order of event type names. In the in-memory chart, the order is based on the order of loaded data rows in the events datatable. The order of events affects for example the variations and flows the cases are belonging to.
  • There are slight differences how duration between dates are calculated. Big Data Chart is based on the commonly used datediff function in SQL, which calculates the number of period boundaries between the dates. The in-memory chart calculates the precise duration as a decimal number between the dates. More information about datediff: https://www.w3schools.com/sql/func_sqlserver_datediff.asp.