Process Mining Concepts: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Concepts ==
The process mining concepts that are used by QPR ProcessAnalyzer, are defined in this page and their relations are shown in the diagram.


[[File:ProcessMiningConcept.png|frame|Figure 1: Basic elements Process Mining]]
[[File:Process mining concepts.png|right|800px]]
=== Business object ===
A business object is an instance of structured information, or an entity, representing data about real world that is meaningful in a business context. Business objects are discovered and identified via various problem domain models as part of the information architecture of an organization, for example
* Customer
* Supplier
* Product
* Transaction
* Shipment
* Person
* Organization unit
It has a collection of attributes, which are relevant in a particular context. Business objects may have also various types of relationships with each other. Execution of software functions may change the attribute values as well as the relationships.
Certain unique combinations of attribute values can be referred to as constituting the state of a business object. The business object can be associated with a state model describing the set of legitimate states and transitions between them. When a software function operates on a business object, it changes the state according to the programmed business logic and the constraints of the state model.
Among different business object categories, transactions of various sorts are of particular interest in process mining.
=== Process ===
In an organizational context, a process is the way that people work together and apply methods and tools, to accomplish predefined common objectives. The objectives include the value that the output of the process is expected to produce to its stakeholders.
A process model is an abstraction representing a process. It is the organization of work in terms of a functionally cohesive or causally connected set of activities triggered by a common event, executed by work roles, and creating or altering the state of some business object(s).
A process case is a particular execution instance or an execution trace of a process. In process modeling context, a simple case contains a linear sequence of activities. A more complex case may contain parallel execution paths. In process mining context, the case contains only the end events of activities, not the activities themselves.
Relevant attributes of a process case depend on the context. They apply to a case and the contained events as a whole. For example, in insurance claims processing, relevant case attributes could include:
* Customer
* Insurance policy
* Vehicle registration id
* Location of accident
* Claims handler
Every case attribute has a type, which defines what kind of data can be represented by the attribute, for example:
* Integer
* Logical truth (TRUE or FALSE)
* Date and time
Among different process categories, transactional processes are of particular interest, for example:
* Purchase-to-pay
* Quote-to-cash
* Request-to-resolution
* Idea-to-product
=== Activity ===
[[File:FlowDurationBusinessProcessModel.png|300px|frame|Figure 2: Flow and duration in typical business process models]]
[[File:FlowDurationProcessMining.png|200px|frame|Figure 3: Flow and duration in event log based process mining]]


An activity is the basic building block of a process model. All human work is modeled as being performed within activities. An activity is executed by a work role. The purpose of an activity is to produce value adding output. Input from preceding activities and output to succeeding activities are provided in the form of business objects in well-defined states. In a digitalized business process, execution of an activity results in changing the state of business objects.
== Case ==
An activity has a duration, and there may be waiting time or slack between the end of an activity and the start of the next activity.


In computerized or digital transaction systems, event logs typically contain the end events only and the start events of activities are not necessarily recorded. Durations are calculated from the time between the end events. Flows are used to represent both the causality and the duration.
Case is an instance in the process which flow is analyzed. Cases can be chosen differently depending on the analysis. A case consist of a sequence of events describing how the case flows through the process.


=== Event ===
Case has a starting time, which is the timestamp of the first event of the case, and an ending timestamp, which is the timestamp of the last event of the case. Case has a duration which is the difference between the last and first timestamps. Case belongs to a specific variation.
An event is a change in the state of matters that has happened or is expected to happen, having a meaning in a particular context. The word event is also used to mean a software object inside a computer program that represents such an occurrence in a computing system, typically causing a state transition.
 
An event does not have a duration. Durations are expressed in terms of the time between two events. An event typically signifies the beginning or completion of an activity or the state change of a business object. Process mining typically deals with these events.
== Event ==
Relevant attributes of an event can be derived from the context and corresponding activity. Mandatory attributes for an event include a timestamp and an identifier associating it with a process case. Several optional attributes can be identified, depending on the context, for example:
Event describes that something happened in a specific time. In QPR ProcessAnalyzer, event represent a specific timestamp and thus it does not have a duration. Still, it's possible to store duration information as an event attribute to be used in the analyses. Each event belongs to a specific case. Each event has an event type which describes what what happened in the event. Events may have following event attributes, for example:
* Person and work role that triggered the event
* Person and work role that triggered the event
* Location where it was done
* Location where it was done
Line 53: Line 17:
* Environmental conditions
* Environmental conditions
* Any activity specific factor that might affect the outcome
* Any activity specific factor that might affect the outcome
A process case may contain different types of events. Each event type has a special meaning related to the particular activity and business object state transition it is associated with.
 
=== Flow ===
== Event type ==
In process mining context, a flow represents an identified causal relationship between state changes in a process model or business object. Typically, it signifies the transition from one state to the next state of a transaction. The duration of a flow is defined as the time between the events it connects. Note that in process mining, the analyst may choose the events included in the analysis using selection and filtering features of the tool, and the flows will be identified based on the included events only. In terms of a sate model, this equals to skipping uninteresting intermediate states in favor of a simpler or more focused analysis model.
Event type describes the type of activity that an event represents. All events have a specific type, e.g. ''Order sent'', ''Goods received'' or ''Invoice paid''. In a flowchart, the process is shown as a flow from event types to other. Each event has specific number of starting and ending flows, i.e. transitions to and from other events (which is visualized in the flowchart).
Flow occurrence is a historical fact or actual instance of a flow. A timestamp and identifiers of the related events are needed for distinct identification. Note that the flow occurrence is an artificial concept. It is the events that actually occur, and the flow is just a relationship between the events.
 
=== Variation ===
== Variation ==
Process variations (or variants) divide a set of process cases into non-overlapping partitions based on their unique configuration of events, flows, and execution path they have taken. All cases in a variation have the same sequence of events and flows. The durations between the events may be different, though.
Variation (or variant) means the sequence of events that a case goes through. If another case has different events or they are in different order, the case belongs to a different variation. Duration between the events or occurrence timestamps don't matter in regards to which variation a case belongs to. In theory, in an eventlog there is a minimum of one variation and maximum number of variations equals the case count.
=== Transaction ===
 
[[File:TransactionStateModel.png|frame|Figure 4: A transaction business object and the corresponding state model.]]
Usually, the more there are event types, the more there are variations comparing to the case count. Analyzing variations might be challenging if there are lot of them. One methods to reduce the number of variations, is to exclude event types that are less important from the analysis point of view.
A transaction is a protocol between participating entities intended to transfer value between the participants. Purchasing a product from a retail shop is an example of a simple transaction, where product of value is transferred from the seller to the buyer and money of equal value is transferred from the buyer to the seller.
 
Transactions can be far more complex than buying a soda can. However, all parties of a transaction should have the same understanding about the protocol and current state of the transaction. It is important to know, whether or not an irrevocable commitment has been achieved by the parties. Transaction state models are a convenient way of explicating the protocol and communicating the current state.
Variations are important in the [[Conformance_Analysis|conformance analysis]] point of view, because the conformance analysis classifies each variation either as a conformant or nonconformant against a certain BPMN model.
When a transaction is proceeds through its state model, it leaves an execution trace. The trace consists of events signifying the transitions of the transaction from one state to another. It is these events that are typically used in process mining work, processed by the various analysis algorithms.
 
Transaction processing systems (like ERP) are usually state oriented, meaning that they do not reflect the way processes are modeled. They do not record the time it takes to perform an activity, but only the significant events i.e. state transitions of the transaction objects. However, the state transitions are usually the result of someone performing an activity, which then causes an event in the transaction processing system.
== Flow ==
Flow is a transition from an event type to other event type, and it describes all transitions that have occurred in all cases in the eventlog. There are also start and end flows, meaning case start and case end, and thus in the starting flow there is no starting point and in the end flow there is no ending point. Flow means that there are no other events occurred between the start and end event, thus the ending event occurred directly after the start event. Note also that flows always has a direction: flow from A to B is different than flow from B to A. Starting and ending event types can be the same.
 
== Flow occurrence ==
Flow occurrence means a transition from an event to other within a certain case. Flow occurrences are thus related to a unique case and have unique starting and ending events. The first and last flow occurrences of a case are special, as the first flow occurrence does not have a starting event and the last flow occurrences does not have an ending event. There are thus always one more flow occurrences in a case than there are events. Flow occurrences has a duration which is the difference between the starting and ending event timestamps (first and last flow occurrences are again exceptions). Note that the type of the starting and ending events can be same, and in that case there is a direct loop in the process. Note also that flow occurrences with the same starting and ending events may occur several times in a case, that is called an indirect loop.
 
== Path ==
Path is a sequence of events of specific types that a case goes through. In contrast to a variation (which contains all events in a case), a path only contains part of the event sequence. For example case with a variation A->B->C->D->E contains e.g. paths A->B->C, B->C->D and D->E.
 
== Case and Event attribute ==
In addition to the eventlog, cases and events can have additional information related to them. For example, a case in a sales process might have information about customer, region, sales revenue, sold items, or an event might contain information who performed the event and what was the geographical position of the event. These are examples of case and event attributes.
 
In QPR ProcessAnalyzer, case attributes are stored to a datatable, where there is a row for each case and column for each case attributes. There needs to be one column for the case id to identify the row. Case attributes have a certain data type, such as string, integer, decimal number, date, boolean or duration. Event attributes are stored as additional columns in the events (eventlog) datatable. Note that also event type and event timestamps can also be perceived as event attributes, but in QPR ProcessAnalyzer these basic information of the eventlog are not part of the event attributes.
 
== Project==
Project is a collection of related dashboards, models, datatables, scripts and secure strings, e.g. they belong to the same analyzed process. Projects can also be created hierarchically by adding child projects to a project. Projects are also import for the access control, because user rights are set in the project level.
[[File:ProcessAnalyzer concepts.png|right|700px]]
 
== Model ==
Model consists of a list of events and cases, forming an eventlog. Separate models should be created for each analyzed process. Processes can be e.g. Purchase-to-pay, Quote-to-cash, Request-to-resolution or Idea-to-product. Models belong to projects in QPR ProcessAnalyzer, helping to organize content.
 
== Filter and Filter Rule ==
Filters are used to choose specific parts of a model for more detailed analysis, i.e. slice the model in a smaller part. Filter consist of one or more filter rules.
 
There are two kinds of filter rules: case filter rules and event type filter rule. Case filter rules filter cases, and thus the number of cases in the eventlog decreases. The event type filters filter events of specific types, and thus the number of cases don't decrease but the process flow in individual cases may change as some events are left out. Note that event type filter rules may filter out all events of a case, which results in cases that don't have events at all. Cases that don't have events, are not shown as cases in the analysis.
 
Details about filter rules:
* For a filter to match, all the filter rules need to match, i.e. there is an AND logic between the filter rules.
* There are include and exclude type of filter rules, i.e. the filter rule can either include the matched cases/events or exclude them from the resulting eventlog.
* Filter rules are applied in the order they are defined.
* Each filter rule modifies the eventlog affecting the next filter rules.
 
Note that filters are only for filtering eventlogs. The expression language provides more generic filtering capabilities for any objects.
 
== Script ==
Scripts are programmed routines to perform e.g. ETL tasks, e.g fetch data from source systems and store to datatables.
 
== Datatable ==
Datatables are a generic storage of tabular data. Datatables can be used to store events (eventlog) and cases (case attributes), and in addition any intermediate results needed by ETL. Also additional data needed by dashboards can be stored to datatables.
 
== Secret ==
Secrets are texts which need to be kept secret, and passwords to be used to login to the datasources for data extraction can be stored as secure strings. Secure strings can be added, changed and used in scripts, but their contents cannot be viewed, keeping the data secure.

Latest revision as of 08:53, 3 October 2024

The process mining concepts that are used by QPR ProcessAnalyzer, are defined in this page and their relations are shown in the diagram.

Process mining concepts.png

Case

Case is an instance in the process which flow is analyzed. Cases can be chosen differently depending on the analysis. A case consist of a sequence of events describing how the case flows through the process.

Case has a starting time, which is the timestamp of the first event of the case, and an ending timestamp, which is the timestamp of the last event of the case. Case has a duration which is the difference between the last and first timestamps. Case belongs to a specific variation.

Event

Event describes that something happened in a specific time. In QPR ProcessAnalyzer, event represent a specific timestamp and thus it does not have a duration. Still, it's possible to store duration information as an event attribute to be used in the analyses. Each event belongs to a specific case. Each event has an event type which describes what what happened in the event. Events may have following event attributes, for example:

  • Person and work role that triggered the event
  • Location where it was done
  • Cost incurred by the event
  • Work effort expended
  • Environmental conditions
  • Any activity specific factor that might affect the outcome

Event type

Event type describes the type of activity that an event represents. All events have a specific type, e.g. Order sent, Goods received or Invoice paid. In a flowchart, the process is shown as a flow from event types to other. Each event has specific number of starting and ending flows, i.e. transitions to and from other events (which is visualized in the flowchart).

Variation

Variation (or variant) means the sequence of events that a case goes through. If another case has different events or they are in different order, the case belongs to a different variation. Duration between the events or occurrence timestamps don't matter in regards to which variation a case belongs to. In theory, in an eventlog there is a minimum of one variation and maximum number of variations equals the case count.

Usually, the more there are event types, the more there are variations comparing to the case count. Analyzing variations might be challenging if there are lot of them. One methods to reduce the number of variations, is to exclude event types that are less important from the analysis point of view.

Variations are important in the conformance analysis point of view, because the conformance analysis classifies each variation either as a conformant or nonconformant against a certain BPMN model.

Flow

Flow is a transition from an event type to other event type, and it describes all transitions that have occurred in all cases in the eventlog. There are also start and end flows, meaning case start and case end, and thus in the starting flow there is no starting point and in the end flow there is no ending point. Flow means that there are no other events occurred between the start and end event, thus the ending event occurred directly after the start event. Note also that flows always has a direction: flow from A to B is different than flow from B to A. Starting and ending event types can be the same.

Flow occurrence

Flow occurrence means a transition from an event to other within a certain case. Flow occurrences are thus related to a unique case and have unique starting and ending events. The first and last flow occurrences of a case are special, as the first flow occurrence does not have a starting event and the last flow occurrences does not have an ending event. There are thus always one more flow occurrences in a case than there are events. Flow occurrences has a duration which is the difference between the starting and ending event timestamps (first and last flow occurrences are again exceptions). Note that the type of the starting and ending events can be same, and in that case there is a direct loop in the process. Note also that flow occurrences with the same starting and ending events may occur several times in a case, that is called an indirect loop.

Path

Path is a sequence of events of specific types that a case goes through. In contrast to a variation (which contains all events in a case), a path only contains part of the event sequence. For example case with a variation A->B->C->D->E contains e.g. paths A->B->C, B->C->D and D->E.

Case and Event attribute

In addition to the eventlog, cases and events can have additional information related to them. For example, a case in a sales process might have information about customer, region, sales revenue, sold items, or an event might contain information who performed the event and what was the geographical position of the event. These are examples of case and event attributes.

In QPR ProcessAnalyzer, case attributes are stored to a datatable, where there is a row for each case and column for each case attributes. There needs to be one column for the case id to identify the row. Case attributes have a certain data type, such as string, integer, decimal number, date, boolean or duration. Event attributes are stored as additional columns in the events (eventlog) datatable. Note that also event type and event timestamps can also be perceived as event attributes, but in QPR ProcessAnalyzer these basic information of the eventlog are not part of the event attributes.

Project

Project is a collection of related dashboards, models, datatables, scripts and secure strings, e.g. they belong to the same analyzed process. Projects can also be created hierarchically by adding child projects to a project. Projects are also import for the access control, because user rights are set in the project level.

ProcessAnalyzer concepts.png

Model

Model consists of a list of events and cases, forming an eventlog. Separate models should be created for each analyzed process. Processes can be e.g. Purchase-to-pay, Quote-to-cash, Request-to-resolution or Idea-to-product. Models belong to projects in QPR ProcessAnalyzer, helping to organize content.

Filter and Filter Rule

Filters are used to choose specific parts of a model for more detailed analysis, i.e. slice the model in a smaller part. Filter consist of one or more filter rules.

There are two kinds of filter rules: case filter rules and event type filter rule. Case filter rules filter cases, and thus the number of cases in the eventlog decreases. The event type filters filter events of specific types, and thus the number of cases don't decrease but the process flow in individual cases may change as some events are left out. Note that event type filter rules may filter out all events of a case, which results in cases that don't have events at all. Cases that don't have events, are not shown as cases in the analysis.

Details about filter rules:

  • For a filter to match, all the filter rules need to match, i.e. there is an AND logic between the filter rules.
  • There are include and exclude type of filter rules, i.e. the filter rule can either include the matched cases/events or exclude them from the resulting eventlog.
  • Filter rules are applied in the order they are defined.
  • Each filter rule modifies the eventlog affecting the next filter rules.

Note that filters are only for filtering eventlogs. The expression language provides more generic filtering capabilities for any objects.

Script

Scripts are programmed routines to perform e.g. ETL tasks, e.g fetch data from source systems and store to datatables.

Datatable

Datatables are a generic storage of tabular data. Datatables can be used to store events (eventlog) and cases (case attributes), and in addition any intermediate results needed by ETL. Also additional data needed by dashboards can be stored to datatables.

Secret

Secrets are texts which need to be kept secret, and passwords to be used to login to the datasources for data extraction can be stored as secure strings. Secure strings can be added, changed and used in scripts, but their contents cannot be viewed, keeping the data secure.