Machine Learning Functions in Expression Language: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 11: Line 11:
|}
|}


== Machine Learning Functions ==
{| class="wikitable"
{| class="wikitable"
!'''DateTime functions'''
!'''Function'''
! '''Parameters'''
! '''Parameters'''
! '''Description'''
! '''Description'''
|-
|-
||MLModel (MLModel)
||MLModel (MLModel)
||* type (string)
||
Type (string)
||
||
Create a new binary classification model of given type. Takes type as a parameter which is the type of the prediction/classification model to create. Currently the only supported value is '''randomforest''' using the Accord.NET's RandomForest algorithm.
Create a new binary classification model of given type. Takes type as a parameter which is the type of the prediction/classification model to create. Currently the only supported value is '''randomforest''' using the Accord.NET's RandomForest algorithm.
Line 46: Line 48:
|-
|-
||Transform (array)
||Transform (array)
||* Input data
||
Input data
||
||
Transforms given input data using the machine learning model thus generating predictions/classifications.
Transforms given input data using the machine learning model thus generating predictions/classifications.
Line 57: Line 60:
== Examples ==
== Examples ==


Example #1: Train a model using an event log and test its performance by replaying training data itself.
<pre>
<pre>
Example #1: Train a model using an event log and test its performance by replaying training data itself.


Def("GetOneHotColumnInformation", (
Def("GetOneHotColumnInformation", (
Line 101: Line 104:
     .Transform(trainDataOH));
     .Transform(trainDataOH));
Sum(Zip(outcomes, predictions).(_[0] == _[1] != 0)) / Count(outcomes)
Sum(Zip(outcomes, predictions).(_[0] == _[1] != 0)) / Count(outcomes)
 
</pre>


Example #2: Train a model using an a 75% sample of an event log and test its performance by using the rest 25% of the event log.
Example #2: Train a model using an a 75% sample of an event log and test its performance by using the rest 25% of the event log.
 
<pre>
Def("GetOneHotColumnInformation", (
Def("GetOneHotColumnInformation", (
   Let("el", _),
   Let("el", _),
Line 149: Line 152:
     .Transform(testDataOH));
     .Transform(testDataOH));
Sum(Zip(testOutcomes, predictions).(_[0] == _[1] != 0)) / Count(testOutcomes)
Sum(Zip(testOutcomes, predictions).(_[0] == _[1] != 0)) / Count(testOutcomes)
</pre>


Example #3: Three sets of cases: training cases, target cases (subset of training cases) and test cases (independent set of cases). Try to predict which cases in the test set will eventually end up becoming a case in target cases.
Example #3: Three sets of cases: training cases, target cases (subset of training cases) and test cases (independent set of cases). Try to predict which cases in the test set will eventually end up becoming a case in target cases.


<pre>
Def("GetOneHotColumnInformation", (
Def("GetOneHotColumnInformation", (
   Let("el", _),
   Let("el", _),

Revision as of 12:55, 6 May 2019

This pages describes functions and properties that are related to the machine learning functionality in the QPR ProcessAnalyzer expression language.

MLModel

MLModel properties Description
Type Returns the exact type of the MLModel.

Machine Learning Functions

Function Parameters Description
MLModel (MLModel)

Type (string)

Create a new binary classification model of given type. Takes type as a parameter which is the type of the prediction/classification model to create. Currently the only supported value is randomforest using the Accord.NET's RandomForest algorithm.

Returns the created MLModel object.

Train (MLModel)
  • Input data
  • Expected outcomes
  • Parameters

Trains given MLModel using given input data and expected outcomes.

Parameters:

  • input data: Two dimensional array where:
    • The first dimension (rows) specifies different data points.
    • The second dimension (columns) specifies the feature values.
  • expected outcomes/labels:
    • An array of expected outcomes for each row in the input data.
    • Must be in the same order as the rows in the input data.
  • parameters: Additional parameters for the MLModel. Optional.
    • NumberOfTrees: the number of trees in the random forest, default value is 10.
    • SampleRatio: the proportion of samples used to train each of the trees in the decision forest, default value is 0.632.

Returns the trained MLModel object itself.

Transform (array)

Input data

Transforms given input data using the machine learning model thus generating predictions/classifications.

Takes the input data as a parameter which is a two dimensional array of data where the first dimension (rows) specifies different data points and the the second dimension (columns) specifies the feature values.

Returns an array of predictions/classifications. Transformations for each row in the input data can be found at the same index of the returned array.

Examples

Example #1: Train a model using an event log and test its performance by replaying training data itself.


Def("GetOneHotColumnInformation", (
  Let("el", _),
  ToDictionary([
    "et": OrderByValue(el.EventTypes),
    "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values]))
  ])
));

Def("GenerateOneHot", "cases", (
  Let("columnInformation", _),
  cases.(
    Let("cas", _),
    Flatten(
      [
        columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)),
        (
          Let("atColumns", columnInformation.Get("at")),
          OrderByValue(atColumns.Keys).(
            Let("key", _),
            Let("values", atColumns.Get(key)),
            Let("caseValue", cas.Attribute(key)),
            values.(If(_ == caseValue, 1, 0))
          )
        )
      ]
    )
  )
));

Let("el", EventLogById(1));
Let("columnInformation", el.GetOneHotColumnInformation());
Let("allCases", el.Cases);
Let("allCasesOH", columnInformation.GenerateOneHot(el.Cases));
Let("trainDataOH", allCasesOH);
Let("outcomes", allCases.(Duration > TimeSpan(24)));
Let("testDataOH", allCasesOH);
Let("predictions", 
  MLModel("randomforest")
    .Train(trainDataOH, outcomes)
    .Transform(trainDataOH));
Sum(Zip(outcomes, predictions).(_[0] == _[1] != 0)) / Count(outcomes)

Example #2: Train a model using an a 75% sample of an event log and test its performance by using the rest 25% of the event log.

Def("GetOneHotColumnInformation", (
  Let("el", _),
  ToDictionary([
    "et": OrderByValue(el.EventTypes),
    "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values]))
  ])
));

Def("GenerateOneHot", "cases", (
  Let("columnInformation", _),
  cases.(
    Let("cas", _),
    Flatten(
      [
        columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)),
        (
          Let("atColumns", columnInformation.Get("at")),
          OrderByValue(atColumns.Keys).(
            Let("key", _),
            Let("values", atColumns.Get(key)),
            Let("caseValue", cas.Attribute(key)),
            values.(If(_ == caseValue, 1, 0))
          )
        )
      ]
    )
  )
));

Let("el", EventLogById(1));
Let("columnInformation", el.GetOneHotColumnInformation());
Let("allCases", Shuffle(el.Cases));
Let("lastTrainCaseIndex", 0.75 * CountTop(el.Cases));
Let("trainCases", allCases[NumberRange(0, lastTrainCaseIndex)]);
Let("testCases", allCases[NumberRange(lastTrainCaseIndex + 1, CountTop(el.Cases) - 1)]);
Let("trainDataOH", columnInformation.GenerateOneHot(trainCases));
Let("testDataOH", columnInformation.GenerateOneHot(testCases));
Let("trainOutcomes", trainCases.(Duration > TimeSpan(24)));
Let("testOutcomes", testCases.(Duration > TimeSpan(24)));
Let("predictions", 
  MLModel("randomforest")
    .Train(trainDataOH, trainOutcomes)
    .Transform(testDataOH));
Sum(Zip(testOutcomes, predictions).(_[0] == _[1] != 0)) / Count(testOutcomes)

Example #3: Three sets of cases: training cases, target cases (subset of training cases) and test cases (independent set of cases). Try to predict which cases in the test set will eventually end up becoming a case in target cases.

Def("GetOneHotColumnInformation", (
  Let("el", _),
  ToDictionary([
    "et": OrderByValue(el.EventTypes),
    "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values]))
  ])
));

Def("GenerateOneHot", "cases", (
  Let("columnInformation", _),
  cases.(
    Let("cas", _),
    Flatten(
      [
        columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)),
        (
          Let("atColumns", columnInformation.Get("at")),
          OrderByValue(atColumns.Keys).(
            Let("key", _),
            Let("values", atColumns.Get(key)),
            Let("caseValue", cas.Attribute(key)),
            values.(If(_ == caseValue, 1, 0))
          )
        )
      ]
    )
  )
));

Let("el", <event log to use>);
Let("trainCases", <cases to use for training>);
Let("targetCases", <cases representing the properties we want to try to predict (subset of traincases)>);
Let("testCases", <cases to use for testing>);
Let("targetCasesDict", ToDictionary(targetCases:true));
Let("outcomes", traincases.(Let("c", _), targetCasesDict.ContainsKey(c) ? 1 : 0));
Let("columnInformation", el.GetOneHotColumnInformation());

Let("mlModel", MLModel("randomforest"));
mlModel.Train(columnInformation.GenerateOneHot(trainCases), outcomes);
mlModel.Transform(columnInformation.GenerateOneHot(testCases));