FindRootCauses Function

From QPR ProcessAnalyzer Wiki
Revision as of 19:24, 1 December 2022 by Ollvihe (talk | contribs) (Created page with "FindRootCauses performs the root causes analysis for eventlog based on the given settings, and returns results as a DataFrame. Root causes are features that co...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

FindRootCauses performs the root causes analysis for eventlog based on the given settings, and returns results as a DataFrame. Root causes are features that correlate with the investigated phenomenon in the selected set of cases.

Following parameters can be used:

  • Selection: Filter to define the investigated feature or phenomenon to find root causes for.
  • CaseAttributeTypes: Searches root causes only for the defined case attributes. If null, search for all case attributes. Applicable only when Types contains CaseAttributeValue.
  • MaximumRowCount: Maximum number of rows (i.e. root causes) to return. Root causes are always returned starting from the most important one.
  • Types: Types of root causes to search from the eventlog. Currently, supports only CaseAttributeValue.
  • WeightingExpression: Expression to define weights for each case, used additionally for the root causes calculation. When not defined, all cases have equal weight. If a weight for a case is null, the case is filtered out from the calculation. Weighting can be used when some cases are more important than others, for example the Cost attribute can be used as weights to emphasize cases that have higher costs or value. The basic principle of the weighting is that instead of case counts, the calculation is based on the weights. The non-weighting mode can be considered as weighting where each case has the weight of one.

The result is a DataFrame where each row represents an individual root cause. Identified root causes are features that are common among a set of cases (i.e. all cases where Region is Dallas). Root causes are returned in the order or importance, and either positively or negatively correlated, or both (determined by the correlation setting).

The columns in the DataFrame describe the type of the root cause and different kind of volume data regarding that set of cases. There are the following columns:

  • Type: Type of the found root cause. Currently all are of type CaseAttributeValue, i.e. cases with specific case attribute value.
  • Name: When type is CaseAttributeValue, the case attribute name.
  • Value: When type is CaseAttributeValue, the case attribute value.
  • Total: Number of cases having the found root cause.
  • Selected: Number of cases in the selected cases (Selection parameter) having the found root cause, i.e. common cases between Total and TotalSelected.
  • Compared: Number of cases in the selected cases (Selection parameter) that doesn't have the found root cause. Formula: Total - Selected.
  • Contribution: Describes in how many cases this root cause contributes to the investigated phenomenon, i.e. how many cases more there are in the root causes cases comparison to the average among all cases in the eventlog. Formula: Selected - Total * TotalSelected / TotalEventLog. When weighting is in use, the calculation is not based on the number of cases, but the weights instead.
  • ContributionPercentage: Describes how large portion of the existence of the investigated phenomenon this root causes explains. Formula: Contribution / TotalSelected = (Selected - Total * TotalSelected / TotalEventLog) / TotalSelected = Selected / TotalSelected - Total / TotalEventLog. When weighting is in use, the calculation is not based on the number of cases, but the weights instead.
  • SelectedPercentage: Percentage of cases that belong to the selected cases in the cases having the found root cause. Formula: Selected / Total. When weighting is in use, the calculation is not based on the number of cases, but the weights instead.
  • DifferencePercentage: Difference between the percentage in the root cause cases and the all cases in the eventlog. Formula: Selected / Total - TotalSelected / TotalEventLog. When weighting is in use, the calculation is not based on the number of cases, but the weights instead.

When the weighting is in use (i.e., WeightingExpression is defined), there are the following additional columns:

  • SelectedWeight: Sum of the weights of the selected cases.
  • ComparedWeight: Sum of the weights of the compared cases.
  • TotalWeight: Sum of the weights of all cases.

In the above formulas, TotalSelected are the number of cases in the eventlog that belong to the selected cases (i.e. the 'Selection parameter). TotalEventLog are the number of cases in the eventlog where the root causes are searched from.

EventLogById(123).FindRootCauses({
  "Selection": #{
    "Items": [
      #{
        "Type": "IncludeCases",
        "Items": [
          #{
            "Type": "EventType",
            "Values": [
              "Shopping Cart Rejected"
            ]
          }
        ]
      }
    ]
  },
  "CaseAttributeTypes": ["Account Manager", "Case start month"],
  "MaximumRowCount": 20,
  "WeightingExpression": "Attribute(\"Cost\")"
});