Best Practices for Designing Models: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
== Best Practices for Creating Dashboards ==
== Best Practices for designing dashboards ==


=== Visualization and usability best practices ===
=== Visualization and usability best practices ===
Line 10: Line 10:
* Limit the number of shown attributes or event types, if there are some that are not needed. This doesn't have performance impact, though.
* Limit the number of shown attributes or event types, if there are some that are not needed. This doesn't have performance impact, though.


=== Performance best practices ===
=== Performance optimization best practices ===
* As Analyzed objects, prefer Cases over Events, as usually there are lot more events than cases. Some KPI's can be calculated from the cases point of view. Also Variations, Event types and Flows are generally fast. On the other hand, Flow Occurrences is slow, as the number of them is even more than the event count.  
* As Analyzed objects, prefer Cases over Events, as usually there are lot more events than cases. Some KPI's can be calculated from the cases point of view. Also Variations, Event types and Flows are generally fast. On the other hand, Flow Occurrences is slow, as the number of them is even more than the event count.  
* Prefer ready-made measures and dimensions over custom. For some simple calculations, Statistical calculations may be used. Also the Adjustment expression is useful.
* Prefer ready-made measures and dimensions over custom. For some simple calculations, Statistical calculations may be used. Also the Adjustment expression is useful.
Line 20: Line 20:
* Don't use dimensioning when it's not needed. When there is anyways a row for each root object, dimensioning is unnecessary. For example, Cases as Analyzed objects and dimension by case id will lead to a row for each case, but the same result can be achieved by disabling dimensioning.
* Don't use dimensioning when it's not needed. When there is anyways a row for each root object, dimensioning is unnecessary. For example, Cases as Analyzed objects and dimension by case id will lead to a row for each case, but the same result can be achieved by disabling dimensioning.


=== Advanced performance best practices ===
=== Advanced performance optimization best practices ===
* For slow charts, use the Benchmark Performance to find the fastest settings. Usually settings up a working chart is the first thing to do, and if the chart appears too slow, you can try to find another, faster way to calculate the same chart.
* For slow charts, use the Benchmark Performance to find the fastest settings. Usually settings up a working chart is the first thing to do, and if the chart appears too slow, you can try to find another, faster way to calculate the same chart.
* Avoid calculating same things multiple times in different measures. If there are repeating expressions, create a separate measure for it, and define it as a variable, which can be referenced from other measures.
* Avoid calculating same things multiple times in different measures. If there are repeating expressions, create a separate measure for it, and define it as a variable, which can be referenced from other measures.
Line 44: Line 44:
== Best practices for writing ETL scripts ==
== Best practices for writing ETL scripts ==


== Hosting QPR ProcessAnalyzer server ==
== Best Practices for hosting QPR ProcessAnalyzer ==

Revision as of 00:19, 25 March 2022

Best Practices for designing dashboards

Visualization and usability best practices

  • Use conditional formattings to improve KPI visualization
  • Use on-screen settings for settings that often users want to change, as they are easier to use than opening the settings. They also guide users to change parameters that might be relevant from the analysis viewpoint.
  • Check that each measure and dimension has describing units. The general terms "cases" and "events" might not describe the counts best. E.g. cases might be orders.
  • Use custom labels if they describe the measures and dimensions better. Still, in many measures and dimensions, the automatically generated title is suitable.
  • Note the special values, such as null and empty strings, and set a describing label name for them. E.g. ...
  • Disable creating filters from chart, if there are no meaningful filters created.
  • Limit the number of shown attributes or event types, if there are some that are not needed. This doesn't have performance impact, though.

Performance optimization best practices

  • As Analyzed objects, prefer Cases over Events, as usually there are lot more events than cases. Some KPI's can be calculated from the cases point of view. Also Variations, Event types and Flows are generally fast. On the other hand, Flow Occurrences is slow, as the number of them is even more than the event count.
  • Prefer ready-made measures and dimensions over custom. For some simple calculations, Statistical calculations may be used. Also the Adjustment expression is useful.
  • Limit number of returned rows: The Max rows setting determines the number of returned rows. The less there are rows the better performance. Usually in dashboards, limiting the amount of shown data is desired also for usability, e.g., show only the top-20 items etc.
  • Group rows exceeding maximum affects performance, so use it only when the information is useful for the analysis.
  • Sorting affects performance, so use it only when it's relevant for the analysis
  • Note the number of charts in a dashboard. The more there are charts, the slower it is.
  • Alternative to chart filter is Analyzed objects containing filtering, e.g. ... might improve performance
  • Don't use dimensioning when it's not needed. When there is anyways a row for each root object, dimensioning is unnecessary. For example, Cases as Analyzed objects and dimension by case id will lead to a row for each case, but the same result can be achieved by disabling dimensioning.

Advanced performance optimization best practices

  • For slow charts, use the Benchmark Performance to find the fastest settings. Usually settings up a working chart is the first thing to do, and if the chart appears too slow, you can try to find another, faster way to calculate the same chart.
  • Avoid calculating same things multiple times in different measures. If there are repeating expressions, create a separate measure for it, and define it as a variable, which can be referenced from other measures.
  • Same dashboard can easily use different models and filtering still works. Model optimized for a chart might improve performance.
  • Try sampling. It improves performance, but in most cases, it cannot be used, as it affects the analysis results, for example in object counts. When sampling can be used, it's very useful in improving calculation performance for large models.

Other best practices

  • Use preset as examples. In many cases, you find what you are looking for from the presets.
  • Mappings can be done freely, so dimensions don't always need to go to the X-axis and measures go to the Y-axis.
  • Avoid Custom layout settings as their compatibility with future QPR ProcessAnalyzer versions might not be maintained. Use Custom layout only when it's absolutely necessary for the visualization.
  • Exporting data: for large amounts of data, prefer CSV export over Excel export

Best practices for creating models

  • Use the most suitable datatypes for case and event attributes. If there are only two possible values, boolean is the best. The true and false values can be mapped into a textual presentation, so it's not needed to use strings to get desired texts for visualizations. If numerical data cannot contain decimals or precision containing decimals is not required for the analysis, integer should be used over float. If the attribute value contains a numerical score (such as number between 1 and 5), integer is better than string. Usually string is the slowest.
  • All datatypes support null values to mark missing or some kind of special values. The null value can be freely used to mark anything - it's just a matter of decision.
  • Include only case and event attributes that are needed by the dashboards. For analysis, more attributes maybe useful, but they are not needed for dashboards. Loading model is slower, when there are more attributes.
  • Note the Load Model on Startup setting. When to use it correctly.
  • Include only events that are needed by the dashboards
  • Shorter event type names are easier to read in the UI and provide slightly better performance. This is also true for case and event attributes values.
  • Use calculated attributes, to pre-calculate case level KPI's from measures. It cannot be used when there is event type filtering applied. On the other hand, don't use calculated attributes unnecessarily because they are stored into memory, and thus they consume memory list the normal attributes. Don't calculate anything from the entire model level in the calculated attributes expression, because it will lead to very slow performance in model loading.
  • Use the model description to document the necessary details regarding the model for other users.

Best practices for writing ETL scripts

Best Practices for hosting QPR ProcessAnalyzer