Clustering Analysis: Difference between revisions
No edit summary |
No edit summary |
||
Line 2: | Line 2: | ||
You can use the Clustering Analysis View, for example, to check data integrity. That is, the Clustering Analysis might reveal that the model actually contains data from two different processes. | You can use the Clustering Analysis View, for example, to check data integrity. That is, the Clustering Analysis might reveal that the model actually contains data from two different processes. | ||
[[File:Clusteringanalysis.png|800px]] | |||
== Left Panel == | == Left Panel == |
Revision as of 12:51, 9 September 2020
The Clustering Analysis view groups cases in the model in a way that the cases inside a group are similar to each other (e.g. cases have the same case attribute values are in the same group). Clustering is based on advanced Machine Learning and Artificial Intelligence algorithms. By default Clustering Analysis uses in-memory built-in kmodes algorithm with categorized values for Event Type occurrences and Case Attribute values. The algorithm does not guarantee convergence to the global optimum which means that subsequent Clustering Analysis runs may result in slightly different clustering results. See this Wikipedia article for more about the idea behind clustering.
You can use the Clustering Analysis View, for example, to check data integrity. That is, the Clustering Analysis might reveal that the model actually contains data from two different processes.
Left Panel
You can use the left panel to filter cases. Note that you are not bound to using just the Flowchart analysis, as you can change the analysis by right-clicking the analysis and selecting a different type of analysis shown on the panel.
Right Panel
The right panel contains the clustering analysis. The table shows the clusters, how many cases are in each cluster, and the following details for each cluster:
- Feature and Value: These two columns list the case attribute and other values that are common to the cases in the cluster.
- Cluster Density %: Share of cases having this feature value within the cluster (i.e. the number of cases having the value shown on the row in this particular cluster divided by the number of cases in the cluster * 100).
- Total Density %: Share of cases having this feature value in the whole data set (i.e. the total number of cases having the value shown on the row divided by the total number of cases * 100).
- Contribution %: Amount of cases that can be explained to belong to this cluster because of this feature value. The scale is such that 0% means that the feature value isn't specific to this cluster and 100% means that all cases belonging to this cluster can be explained by this feature value.