Machine Learning Functions in Expression Language
This pages describes functions and properties that implement the machine learning functionality, such as clustering and prediction, that are part of the expression language. For prediction, the random forest is a supported algorithm. For clustering, the following algorithms are supported: KModes, KMeans and BalancedKMeans.
Clustering functions
Function | Parameters | Description |
---|---|---|
KModes |
Matrix to cluster |
Performs KModes clustering for a numeric matrix. Implementation uses Accord.NET KModes method (http://accord-framework.net/docs/html/T_Accord_MachineLearning_KModes.htm). Parameters:
Returns an array having the following elements:
Examples: KModes([[1, 2], [2, 3], [2, 2]], 2) Returns (e.g.): [[0, 1, 0], [0, 2]] KModes([[1, 2], [2, 3], [2, 2]], 3) Returns (e.g.): [[2, 1, 0], [0, 1]] |
KMeans |
|
Performs KMeans clustering for a numeric matrix. Implementation uses Accord.NET KMeans function (http://accord-framework.net/docs/html/T_Accord_MachineLearning_KMeans.htm). Parameters:
Returns an array having the following elements:
Examples: KMeans([[1, 2], [2, 3], [2, 2]], 2) Returns (e.g.): [[0, 1, 0], [0.16667, 2]] KMeans([[1, 2], [2, 3], [2, 2]], 3) Returns (e.g.): [[2, 1, 0], [0, 1]] KMeans([[1, 2], [2, 3], [2, 2]], 2, "manhattan", true) Returns (e.g.): [[0, 1, 0], [0.33333, 2], <covariance matrices (k * columns * columns)>] KMeans(OneHot(Codify([[123, "foo"], [456, "bar"], [456, "foo"]])), 2) Returns (e.g.): [[0, 1, 0], [0.33333, 2]] |
BalancedKMeans |
|
Performs Balanced KMeans clustering for given numeric matrix. Algorithm is based on http://accord-framework.net/docs/html/T_Accord_MachineLearning_BalancedKMeans.htm. Parameters and return value structure is identical to the KMeans function. |
Codify | Matrix to codify |
Encodes all unique column values into unique numeric integer values. Based on Accord.Net codify functionality: http://accord-framework.net/docs/html/T_Accord_Statistics_Filters_Codification.htm. Returns codified matrix of exactly the same dimensions as the input matrix. Examples: Codify([[1,2], [3,4], [1,4]]) Returns: [[0, 0], [1, 1], [0, 1]] Codify([[123, "foo"], [456, "bar"], [456, "foo"]]) Returns: [[0, 0], [1, 1], [1, 0]] |
OneHot |
Numeric matrix |
One-hot encodes all matrix columns. Implementation uses Accord.NET OneHot method (http://accord-framework.net/docs/html/M_Accord_Math_Jagged_OneHot_1.htm) Returns a matrix consisting of a concatenation of one-hot encoding of each of the input matrix columns. The number of columns in the returned matrix is at least the same as in the input matrix. For each input column, the corresponding one-hot vector will have all the values of 0, except for one which will be 1. Examples: OneHot([[0], [2], [1], [3]]) Returns: [[1, 0, 0, 0], [0, 0, 1, 0], [0, 1, 0, 0], [0, 0, 0, 1]] OneHot(Codify([[123, "foo"], [456, "bar"], [456, "foo"]])) Returns: [[1, 0, 1, 0], [0, 1, 0, 1], [0, 1, 1, 0]] |
Prediction functions
Function | Parameters | Description |
---|---|---|
MLModel |
|
Create a new machine learning model for predictions. Takes type of the prediction/classification model to create as a parameter. Currently the only supported value is randomforest which uses the Accord.NET's RandomForest algorithm. Parameters:
|
Train (MLModel) |
|
Trains given MLModel using given input data and expected outcomes. Parameters:
Returns the trained MLModel object. |
Transform (array) |
Input data |
Transforms given input data using the MLModel to generating predictions. Takes the input data as a parameter which is a two dimensional array where the first dimension (rows) specifies different data points and the second dimension (columns) specifies the feature values. Returns an array of predictions. Transformations for each row in the input data can be found at the same index of the returned array. |
MLModel (Machine Learning Model)
These properties are available for the MLModel object.
MLModel properties | Description |
---|---|
Type | Returns the exact type of the MLModel. |
Examples
Example #1: Train a model using an event log and test its performance by replaying training data itself.
Def("GetOneHotColumnInformation", ( Let("el", _), ToDictionary([ "et": OrderByValue(el.EventTypes), "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values])) ]) )); Def("GenerateOneHot", "cases", ( Let("columnInformation", _), cases.( Let("cas", _), Flatten( [ columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)), ( Let("atColumns", columnInformation.Get("at")), OrderByValue(atColumns.Keys).( Let("key", _), Let("values", atColumns.Get(key)), Let("caseValue", cas.Attribute(key)), values.(If(_ == caseValue, 1, 0)) ) ) ] ) ) )); Let("el", EventLogById(1)); Let("columnInformation", el.GetOneHotColumnInformation()); Let("allCases", el.Cases); Let("allCasesOH", columnInformation.GenerateOneHot(el.Cases)); Let("trainDataOH", allCasesOH); Let("outcomes", allCases.(Duration > TimeSpan(24))); Let("testDataOH", allCasesOH); Let("predictions", MLModel("randomforest") .Train(trainDataOH, outcomes) .Transform(trainDataOH)); Sum(Zip(outcomes, predictions).(_[0] == _[1] != 0)) / Count(outcomes)
Example #2: Train a model using an a 75% sample of an event log and test its performance by using the rest 25% of the event log.
Def("GetOneHotColumnInformation", ( Let("el", _), ToDictionary([ "et": OrderByValue(el.EventTypes), "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values])) ]) )); Def("GenerateOneHot", "cases", ( Let("columnInformation", _), cases.( Let("cas", _), Flatten( [ columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)), ( Let("atColumns", columnInformation.Get("at")), OrderByValue(atColumns.Keys).( Let("key", _), Let("values", atColumns.Get(key)), Let("caseValue", cas.Attribute(key)), values.(If(_ == caseValue, 1, 0)) ) ) ] ) ) )); Let("el", EventLogById(1)); Let("columnInformation", el.GetOneHotColumnInformation()); Let("allCases", Shuffle(el.Cases)); Let("lastTrainCaseIndex", 0.75 * CountTop(el.Cases)); Let("trainCases", allCases[NumberRange(0, lastTrainCaseIndex)]); Let("testCases", allCases[NumberRange(lastTrainCaseIndex + 1, CountTop(el.Cases) - 1)]); Let("trainDataOH", columnInformation.GenerateOneHot(trainCases)); Let("testDataOH", columnInformation.GenerateOneHot(testCases)); Let("trainOutcomes", trainCases.(Duration > TimeSpan(24))); Let("testOutcomes", testCases.(Duration > TimeSpan(24))); Let("predictions", MLModel("randomforest") .Train(trainDataOH, trainOutcomes) .Transform(testDataOH)); Sum(Zip(testOutcomes, predictions).(_[0] == _[1] != 0)) / Count(testOutcomes)
Example #3: Three sets of cases: training cases, target cases (subset of training cases) and test cases (independent set of cases). Try to predict which cases in the test set will eventually end up becoming a case in target cases.
Def("GetOneHotColumnInformation", ( Let("el", _), ToDictionary([ "et": OrderByValue(el.EventTypes), "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes, Name).[_: Values])) ]) )); Def("GenerateOneHot", "cases", ( Let("columnInformation", _), cases.( Let("cas", _), Flatten( [ columnInformation.Get("et").(Let("et", _), If(Count(cas.EventsByType(et)) > 0, 1, 0)), ( Let("atColumns", columnInformation.Get("at")), OrderByValue(atColumns.Keys).( Let("key", _), Let("values", atColumns.Get(key)), Let("caseValue", cas.Attribute(key)), values.(If(_ == caseValue, 1, 0)) ) ) ] ) ) )); Let("el", <event log to use>); Let("trainCases", <cases to use for training>); Let("targetCases", <cases representing the properties we want to try to predict (subset of traincases)>); Let("testCases", <cases to use for testing>); Let("targetCasesDict", ToDictionary(targetCases:true)); Let("outcomes", traincases.(Let("c", _), targetCasesDict.ContainsKey(c) ? 1 : 0)); Let("columnInformation", el.GetOneHotColumnInformation()); Let("mlModel", MLModel("randomforest")); mlModel.Train(columnInformation.GenerateOneHot(trainCases), outcomes); mlModel.Transform(columnInformation.GenerateOneHot(testCases));
Example #4: Customized version of example #3 using actual event type and attribute names. Three sets of cases: training cases, target cases (subset of training cases) and test cases (independent set of cases). Try to predict which cases in the test set will eventually end up becoming a case in target cases. Generate HTML result ready to be sent out in an email message.
Def("GenerateOneHot", "cases", { let columnInformation = _; cases.{ let cas = _; Flatten( [ { let etColumns = columnInformation.Get("et"); etColumns.{ let et = _; If(Count(cas.EventsByType(et)) > 0, 1, 0) } }, { let atColumns = columnInformation.Get("at"); OrderByValue(atColumns.Keys).{ let key = _; let values = atColumns.Get(key); let caseValue = cas.Attribute(key); values.(If(_ == caseValue, 1, 0)) } } ] ) } }); // Make predictions for the whole model: // let el = ModelById(39694).EventLog; // Make predictions for cases in a particular filter let el = EventLogById(109773); let currenttime = now; let trainCases = el.Cases.Where(Catch(currenttime - EventTimeStampsByType("hs_analytics_last_visit_timestamp")[0] > TimeSpan(30) , true)); let targetCases = trainCases.Where(_.Attribute("Lifecycle Stage").In(["opportunity", "marketingqualifiedlead", "customer", "salesqualifiedlead"])); let testCases = el.Cases.Where(Catch(currenttime - EventTimeStampsByType("hs_analytics_last_visit_timestamp")[0] < TimeSpan(30) , false)); let targetCasesDict = ToDictionary(targetCases:true); let outcomes = traincases.{ let c = _; targetCasesDict.ContainsKey(c) ? 1 : 0 }; let columnInformation = ToDictionary([ "et": OrderByValue(el.EventTypes).Where(_.Name.In(["hs_email_last_click_date","first_conversion_date", "hs_analytics_last_visit_timestamp"])), "at": ToDictionary(ConcatTop(OrderByTop(el.CaseAttributes.Where(_.Name.In(["Lifecycle Stage", "Original Source", "QPR Digest", "Unsubscribed from all email"])) , Name).[_: Values])) ]); let mlModel = MLModel("randomforest"); mlModel.Train(columnInformation.GenerateOneHot(trainCases), outcomes); let predictions = mlModel.Transform(columnInformation.GenerateOneHot(testCases)); let predictedCases = ToDataFrame(Zip(testCases, predictions).Where(_[1] == 1) , ["id", "pred"]).id; let body = "<html><body><table><tr><td>Last visited</td><td>Name</td></tr><tr>" + StringJoin( "</tr><tr>", "<td>" + predictedCases.EventTimeStampsByType("hs_analytics_last_visit_timestamp")[0] + "</td><td>" + predictedCases.Name + "</td>") + "</tr></table></body></html>"; body;