Machine Learning Functions in Expression Language: Difference between revisions
Line 16: | Line 16: | ||
! '''Parameters''' | ! '''Parameters''' | ||
! '''Description''' | ! '''Description''' | ||
|- | |||
||BalancedKMeans | |||
||jsonData (String) | |||
|| | |||
Performs BalancedKMeans -clustering for given numeric matrix. | |||
1.1. Based on: http://accord-framework.net/docs/html/T_Accord_MachineLearning_BalancedKMeans.htm | |||
2. Parameters and return value structure identical to KMeans function (#48341#) | |||
|- | |||
||Codify | |||
||Matrix to codify | |||
|| | |||
Use Accord's Codify -functionality to encode all unique column values into unique numeric integer values. Based on: http://accord-framework.net/docs/html/T_Accord_Statistics_Filters_Codification.htm. Returns codified matrix of exactly the same dimensions as the input matrix. | |||
Examples: | |||
<pre> | |||
Codify([[1,2], [3,4], [1,4]]) | |||
Returns: [[0, 0], [1, 1], [0, 1]] | |||
Codify([[123, "foo"], [456, "bar"], [456, "foo"]]) | |||
Returns: [[0, 0], [1, 1], [1, 0]] | |||
</pre> | |||
|- | |||
||KMeans | |||
|| | |||
|| | |||
Performs KMeans -clustering for given numeric matrix. Based on: http://accord-framework.net/docs/html/T_Accord_MachineLearning_KMeans.htm | |||
2. Parameters: | |||
2.1. matrix: Matrix to cluster. | |||
2.1.1. Rows (1st dimension) represent data points and columns represent feature values (2nd dimension) | |||
2.2. k: Target number of clusters | |||
2.3. distanceFunction: Distance function to be used in the clustering process (#48347#). | |||
2.4. parameters: Optional key value pair collection as described in #34201#. | |||
2.4.1. Supported keys and values: | |||
2.4.1.1. ComputeCovariance: If true, the result will include covariance matrices. | |||
2.4.1.1.1. Default = false | |||
3. Returns an array having the following elements: | |||
3.1. Element 0: An array of all the cluster labels for all the rows in the input matrix in the same order as they were given in the matrix parameter. | |||
3.2. Element 1: An array of length 2 having the following elements: | |||
3.2.1. Element 0: Computed final error of the clustering. | |||
3.2.2. Element 1: Number of iterations performed in the clustering. | |||
3.3. Element 2: Only returned if computeCovariance is True. | |||
Examples: | |||
<pre> | |||
KMeans([[1, 2], [2, 3], [2, 2]], 2) | |||
Returns (e.g.): [[0, 1, 0], [0.16667, 2]] | |||
KMeans([[1, 2], [2, 3], [2, 2]], 3) | |||
Returns (e.g.): [[2, 1, 0], [0, 1]] | |||
KMeans([[1, 2], [2, 3], [2, 2]], 2, "manhattan", true) | |||
Returns (e.g.): [[0, 1, 0], [0.33333, 2], <covariance matrices (k * columns * columns)>] | |||
KMeans(OneHot(Codify([[123, "foo"], [456, "bar"], [456, "foo"]])), 2) | |||
Returns (e.g.): [[0, 1, 0], [0.33333, 2]] | |||
</pre> | |||
|- | |||
||KModes | |||
|| | |||
|| | |||
Performs KModes -clustering for given numeric matrix. Based on: http://accord-framework.net/docs/html/T_Accord_MachineLearning_KModes.htm | |||
2. Parameters: | |||
2.1. matrix: Matrix to cluster. | |||
2.1.1. Rows (1st dimension) represent data points and columns represent feature values (2nd dimension) | |||
2.2. k: Target number of clusters | |||
2.3. distanceFunction: Distance function to be used in the clustering process (#48347#). | |||
3. Returns an array having the following elements: | |||
3.1. Element 0: An array of all the cluster labels for all the rows in the input matrix in the same order as they were given in the matrix parameter. | |||
3.2. Element 1: An array of length 2 having the following elements: | |||
3.2.1. Element 0: Computed final error of the clustering. | |||
3.2.2. Element 1: Number of iterations performed in the clustering. | |||
Examples: | |||
<pre> | |||
KModes([[1, 2], [2, 3], [2, 2]], 2) | |||
Returns (e.g.): [[0, 1, 0], [0, 2]] | |||
KModes([[1, 2], [2, 3], [2, 2]], 3) | |||
Returns (e.g.): [[2, 1, 0], [0, 1]] | |||
</pre> | |||
|| | |||
|- | |- | ||
||MLModel (MLModel) | ||MLModel (MLModel) | ||
Line 34: | Line 115: | ||
* Parameters: Supported keys and values. | * Parameters: Supported keys and values. | ||
** '''ComputeCovariance''': If true, the result will include covariance matrices. Default value is false. | ** '''ComputeCovariance''': If true, the result will include covariance matrices. Default value is false. | ||
|- | |||
||OneHot | |||
|| | |||
|| | |||
One-hot encode all matrix columns. Based on: http://accord-framework.net/docs/html/M_Accord_Math_Jagged_OneHot_1.htm | |||
2. Parameters: | |||
2.1. matrix: Numeric matrix to one-hot encode. | |||
3. Returns a matrix consisting of a concatenation of one-hot encoding of each of the input matrix columns. | |||
3.1. The number of columns in the returned matrix is at least the same as in the input matrix. | |||
3.2. For each input column, the corresponding one-hot vector will have all the values of 0, except for one which will be 1. | |||
Examples: | |||
<pre> | |||
OneHot([[0], [2], [1], [3]]) | |||
Returns: [[1, 0, 0, 0], [0, 0, 1, 0], [0, 1, 0, 0], [0, 0, 0, 1]] | |||
OneHot(Codify([[123, "foo"], [456, "bar"], [456, "foo"]])) | |||
Returns: [[1, 0, 1, 0], [0, 1, 0, 1], [0, 1, 1, 0]] | |||
</pre> | |||
|- | |- |
Revision as of 07:16, 24 June 2019
This pages describes functions and properties that are related to the machine learning functionality in the QPR ProcessAnalyzer expression language.
MLModel (Machine Learning Model)
MLModel properties | Description |
---|---|
Type | Returns the exact type of the MLModel. |
Machine Learning Functions
Function | Parameters | Description | |
---|---|---|---|
BalancedKMeans | jsonData (String) |
Performs BalancedKMeans -clustering for given numeric matrix. 1.1. Based on: http://accord-framework.net/docs/html/T_Accord_MachineLearning_BalancedKMeans.htm 2. Parameters and return value structure identical to KMeans function (#48341#) | |
Codify | Matrix to codify |
Use Accord's Codify -functionality to encode all unique column values into unique numeric integer values. Based on: http://accord-framework.net/docs/html/T_Accord_Statistics_Filters_Codification.htm. Returns codified matrix of exactly the same dimensions as the input matrix. Examples: Codify([[1,2], [3,4], [1,4]]) Returns: [[0, 0], [1, 1], [0, 1]] Codify([[123, "foo"], [456, "bar"], [456, "foo"]]) Returns: [[0, 0], [1, 1], [1, 0]] | |
KMeans |
Performs KMeans -clustering for given numeric matrix. Based on: http://accord-framework.net/docs/html/T_Accord_MachineLearning_KMeans.htm 2. Parameters: 2.1. matrix: Matrix to cluster. 2.1.1. Rows (1st dimension) represent data points and columns represent feature values (2nd dimension) 2.2. k: Target number of clusters 2.3. distanceFunction: Distance function to be used in the clustering process (#48347#). 2.4. parameters: Optional key value pair collection as described in #34201#. 2.4.1. Supported keys and values: 2.4.1.1. ComputeCovariance: If true, the result will include covariance matrices. 2.4.1.1.1. Default = false 3. Returns an array having the following elements: 3.1. Element 0: An array of all the cluster labels for all the rows in the input matrix in the same order as they were given in the matrix parameter. 3.2. Element 1: An array of length 2 having the following elements: 3.2.1. Element 0: Computed final error of the clustering. 3.2.2. Element 1: Number of iterations performed in the clustering. 3.3. Element 2: Only returned if computeCovariance is True. Examples: KMeans([[1, 2], [2, 3], [2, 2]], 2) Returns (e.g.): [[0, 1, 0], [0.16667, 2]] KMeans([[1, 2], [2, 3], [2, 2]], 3) Returns (e.g.): [[2, 1, 0], [0, 1]] KMeans([[1, 2], [2, 3], [2, 2]], 2, "manhattan", true) Returns (e.g.): [[0, 1, 0], [0.33333, 2], <covariance matrices (k * columns * columns)>] KMeans(OneHot(Codify([[123, "foo"], [456, "bar"], [456, "foo"]])), 2) Returns (e.g.): [[0, 1, 0], [0.33333, 2]] | ||
KModes |
Performs KModes -clustering for given numeric matrix. Based on: http://accord-framework.net/docs/html/T_Accord_MachineLearning_KModes.htm 2. Parameters: 2.1. matrix: Matrix to cluster. 2.1.1. Rows (1st dimension) represent data points and columns represent feature values (2nd dimension) 2.2. k: Target number of clusters 2.3. distanceFunction: Distance function to be used in the clustering process (#48347#). 3. Returns an array having the following elements: 3.1. Element 0: An array of all the cluster labels for all the rows in the input matrix in the same order as they were given in the matrix parameter. 3.2. Element 1: An array of length 2 having the following elements: 3.2.1. Element 0: Computed final error of the clustering. 3.2.2. Element 1: Number of iterations performed in the clustering. Examples: KModes([[1, 2], [2, 3], [2, 2]], 2) Returns (e.g.): [[0, 1, 0], [0, 2]] KModes([[1, 2], [2, 3], [2, 2]], 3) Returns (e.g.): [[2, 1, 0], [0, 1]] |
||
MLModel (MLModel) |
Type (string) |
Create a new machine learning model for predictions. Takes type of the prediction/classification model to create as a parameter. Currently the only supported value is randomforest which uses the Accord.NET's RandomForest algorithm. | |
MLModel |
|
Creates a new binary classification model of given type. Parameters:
| |
OneHot |
One-hot encode all matrix columns. Based on: http://accord-framework.net/docs/html/M_Accord_Math_Jagged_OneHot_1.htm 2. Parameters: 2.1. matrix: Numeric matrix to one-hot encode. 3. Returns a matrix consisting of a concatenation of one-hot encoding of each of the input matrix columns. 3.1. The number of columns in the returned matrix is at least the same as in the input matrix. 3.2. For each input column, the corresponding one-hot vector will have all the values of 0, except for one which will be 1. Examples: OneHot([[0], [2], [1], [3]]) Returns: [[1, 0, 0, 0], [0, 0, 1, 0], [0, 1, 0, 0], [0, 0, 0, 1]] OneHot(Codify([[123, "foo"], [456, "bar"], [456, "foo"]])) Returns: [[1, 0, 1, 0], [0, 1, 0, 1], [0, 1, 1, 0]] | ||
Train (MLModel) |
|
Trains given MLModel using given input data and expected outcomes. Parameters:
Returns the trained MLModel object. | |
Transform (array) |
Input data |
Transforms given input data using the MLModel to generating predictions. Takes the input data as a parameter which is a two dimensional array where the first dimension (rows) specifies different data points and the second dimension (columns) specifies the feature values. Returns an array of predictions. Transformations for each row in the input data can be found at the same index of the returned array. |
Examples
Example #1: Train a model using an event log and test its performance by replaying training data itself.
Def("GetOneHotColumnInformation", ( Let("el", _), ToDictionary([ "et": OrderByValue(el.EventTypes), "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values])) ]) )); Def("GenerateOneHot", "cases", ( Let("columnInformation", _), cases.( Let("cas", _), Flatten( [ columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)), ( Let("atColumns", columnInformation.Get("at")), OrderByValue(atColumns.Keys).( Let("key", _), Let("values", atColumns.Get(key)), Let("caseValue", cas.Attribute(key)), values.(If(_ == caseValue, 1, 0)) ) ) ] ) ) )); Let("el", EventLogById(1)); Let("columnInformation", el.GetOneHotColumnInformation()); Let("allCases", el.Cases); Let("allCasesOH", columnInformation.GenerateOneHot(el.Cases)); Let("trainDataOH", allCasesOH); Let("outcomes", allCases.(Duration > TimeSpan(24))); Let("testDataOH", allCasesOH); Let("predictions", MLModel("randomforest") .Train(trainDataOH, outcomes) .Transform(trainDataOH)); Sum(Zip(outcomes, predictions).(_[0] == _[1] != 0)) / Count(outcomes)
Example #2: Train a model using an a 75% sample of an event log and test its performance by using the rest 25% of the event log.
Def("GetOneHotColumnInformation", ( Let("el", _), ToDictionary([ "et": OrderByValue(el.EventTypes), "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values])) ]) )); Def("GenerateOneHot", "cases", ( Let("columnInformation", _), cases.( Let("cas", _), Flatten( [ columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)), ( Let("atColumns", columnInformation.Get("at")), OrderByValue(atColumns.Keys).( Let("key", _), Let("values", atColumns.Get(key)), Let("caseValue", cas.Attribute(key)), values.(If(_ == caseValue, 1, 0)) ) ) ] ) ) )); Let("el", EventLogById(1)); Let("columnInformation", el.GetOneHotColumnInformation()); Let("allCases", Shuffle(el.Cases)); Let("lastTrainCaseIndex", 0.75 * CountTop(el.Cases)); Let("trainCases", allCases[NumberRange(0, lastTrainCaseIndex)]); Let("testCases", allCases[NumberRange(lastTrainCaseIndex + 1, CountTop(el.Cases) - 1)]); Let("trainDataOH", columnInformation.GenerateOneHot(trainCases)); Let("testDataOH", columnInformation.GenerateOneHot(testCases)); Let("trainOutcomes", trainCases.(Duration > TimeSpan(24))); Let("testOutcomes", testCases.(Duration > TimeSpan(24))); Let("predictions", MLModel("randomforest") .Train(trainDataOH, trainOutcomes) .Transform(testDataOH)); Sum(Zip(testOutcomes, predictions).(_[0] == _[1] != 0)) / Count(testOutcomes)
Example #3: Three sets of cases: training cases, target cases (subset of training cases) and test cases (independent set of cases). Try to predict which cases in the test set will eventually end up becoming a case in target cases.
Def("GetOneHotColumnInformation", ( Let("el", _), ToDictionary([ "et": OrderByValue(el.EventTypes), "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values])) ]) )); Def("GenerateOneHot", "cases", ( Let("columnInformation", _), cases.( Let("cas", _), Flatten( [ columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)), ( Let("atColumns", columnInformation.Get("at")), OrderByValue(atColumns.Keys).( Let("key", _), Let("values", atColumns.Get(key)), Let("caseValue", cas.Attribute(key)), values.(If(_ == caseValue, 1, 0)) ) ) ] ) ) )); Let("el", <event log to use>); Let("trainCases", <cases to use for training>); Let("targetCases", <cases representing the properties we want to try to predict (subset of traincases)>); Let("testCases", <cases to use for testing>); Let("targetCasesDict", ToDictionary(targetCases:true)); Let("outcomes", traincases.(Let("c", _), targetCasesDict.ContainsKey(c) ? 1 : 0)); Let("columnInformation", el.GetOneHotColumnInformation()); Let("mlModel", MLModel("randomforest")); mlModel.Train(columnInformation.GenerateOneHot(trainCases), outcomes); mlModel.Transform(columnInformation.GenerateOneHot(testCases));