SQL Scripting for ETL: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
 
(87 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This page gives general information related to extracting, transforming and loading data in QPR ProcessAnalyzer. The supported commands and their descriptions are listed in [[Supported QPR ProcessAnalyzer Commands in Scripts]].  
This page describes how to use the SQL-based scripting language to extract data from source systems and load data into QPR ProcessAnalyzer, i.e. how to do ETL (extract, transform and load). The supported commands are described in [[SQL Scripting Commands]]. Scripts can be written in the [[Managing_Scripts|Manage Scripts]] dialog. The SQL scripts consists of SQL statements and QPR ProcessAnalyzer commands that take the preceding SQL statements as parameters.


__TOC__
__TOC__


It is possible to load raw data into QPR ProcessAnalyzer and do the data transformation and loading into QPR ProcessAnalyzer Service via scripts using temporary database tables, so that the resulting transformed data can be used for analyses in QPR ProcessAnalyzer.
== Introduction to SQL Scripting ==


A user with the ManageIntegrations and RunScripts permissions can define a script via the [[Script Management#Script Manager|Manage Scripts]] dialog. The script consists of SQL statements and QPR ProcessAnalyzer commands that take the preceding SQL statements as parameters.
The QPR ProcessAnalyzer ETL system enables data extraction, transformation and loading based on SQL queries implemented by scripts. Scripts are written in SQL and consist of standard SQL commands, QPR ProcessAnalyzer commands and special parameters related to QPR ProcessAnalyzer. Scripts are useful for performing several database related operations. More specifically, with the QPR ProcessAnalyzer ETL system it is possible to, for example:
* extract data from a source system to data tables
* load data from data tables and create events and case attributes to new or existing models based on this data
* read data from existing models and create reports
* import data from existing models and create new models extending the current features of QPR ProcessAnalyzer


== Script Variables ==
== Script Variables ==
The scripts have the following variables available:
SQL scripts have the below listed variables available.
* '''@_ProjectId''': the Id of the project in whose context the script is being run. BIGINT. Undefined if project context has not been specified in script execution parameters.
 
* '''@_ModelId''': the Id of the model in whose context the script is being run. BIGINT. Undefined if model context has not been specified in script execution parameters.
* '''@_FilterId''': the Id of the filter in whose context the script is being run. BIGINT. Undefined if filter context has not been specified in script execution parameters.
* '''@_QPRProcessAnalyzerVersion''': the QPR Process Analyzer core dll version as string in the format: <major>.<minor>.<build>.<revision>. NVARCHAR(64).
* '''@_ScriptingVersion''': The scripting version number that identifies the version which was used when script was saved. Can be used for indicating, for example, the version for which the script was originally planned. A script created for newer (bigger) scripting version doesn't necessarily work on a PA server supporting older (smaller) version. INT.
* '''@_UserId''': the Id of the user running the script. INT.
* '''@_ScriptId''': Id of the script being run. BIGINT.
* '''@_EngineScriptingVersion''': The scripting version that identifies the functionalities available from QPR ProcessAnalyzer server. A script created for newer (bigger) scripting version doesn't necessarily work on a QPR ProcessAnalyzer server supporting older (smaller) version. INT.
* '''@_DatabaseId''': the Id of the database in use. GUID.
* '''@_ExceptionOccurred''': if there was an exception when running the script, then this value is 1, otherwise 0. INT
* '''@_ExceptionType''': if there was an exception when running the script, shows the C# class name for the exception, NVARCHAR(MAX), otherwise NULL.
* '''@_ExceptionMessage''': if there was an exception when running the script, contains a message that would have been displayed, NVARCHAR(MAX), otherwise NULL.
* '''@_ExceptionDetails''': if there was an exception when running the script, contains the details that would have been displayed, including the system stack trace, NVARCHAR(MAX), otherwise NULL.
* '''#_ViewSettings''': The Settings shown in the right-hand pane in Excel used to generate the selected view. Note that this doesn't include analysis results or QPR ProcessAnalyzer Server related information, such as FilterId, TotalEventCount, or DatabaseId. Represented as a table two column table variable. The columns are:
** Name: the name of the parameter. NVARCHAR(440).
** Value: the value of the parameter. NVARCHAR(MAX).
* '''#_Selection''': The objects the user has currently selected on the analysis. Supported objects are the event types and transitions in the Flowchart or Path analyses and the Excel cells in the Duration, Profiling, Cases, Variations, Event Types, or Flows analyses. Any filters already applied are included in the selection. The maximum supported amount for the selection is 1000 separate objects and 10000 cells and their values. Represented as a four column table variable. <div id="ObjectSelectionTypes"></div>The columns are:
:* Type: the type of the selected object. INT.
:{|
|STYLE="vertical-align: top; width: 50%;"|
{| class="wikitable"
{| class="wikitable"
! Type
! Variable name
! Description
! Description
|-
|-
|0
|@_ProjectId (BIGINT)
|Common. Values for the Common Type are represented as a Value-IntValue pair. See the table on the right for descriptions.
|Id of the project in which context the script is run. Undefined if project context has not been specified in script execution parameters.
|-
|2
|Activity. Has the following names and values:
*SelectedActivities (IntValue): One activity/event type id in every row.
|-
|-
|6
|@_ModelId (BIGINT)
|Activity Path. Has the following names and values:
|Id of the model in which context the script is run. Undefined if model context has not been specified in script execution parameters.
*SelectedActivityPaths (Value): One activity path in every row.
|-
|-
|8
|@_FilterId (BIGINT)
|Attribute Value. Has the following names and values:
|Id of the filter in which context the script is run. Undefined if filter context has not been specified in script execution parameters.
*SelectedAttributeValues (Value): One attribute value in every row.
|-
|-
|10
|@_UserId (INT)
|Case. Has the following names and values:
|Id of the user running the script.
*SelectedCases (IntValue): One case id in every row.
|-
|-
|12
|@_ScriptId (BIGINT)
|Count. Has the following names and values:
|Id of the script that originally started the script run.
*SelectedCounts (IntValue): One variation event type count in every row.
|-
|-
|14
|@_CurrentScriptId (BIGINT)
|Duration. Has the following names and values:
|Id of the script where the execution currently is. If scripts call other scripts, @_ScriptId doesn't change, whereas @_CurrentScriptId changes when the parent script calls other script.
*SelectedDurations (IntValue): One duration in every row.
|-
|-
|16
|@_ExceptionOccurred (INT)
|Transition. Has the following names and values:
|If there was an exception when running the script, the value is '''1''', otherwise '''0'''. INT
*SelectedTransition (Value): One transition/flow in every row.
|-
|-
|18
|@_ExceptionMessage (NVARCHAR(MAX))
|Variation. Has the following names and values:
|If there was an exception when running the script, contains the message of the exception.
*SelectedVariations (IntValue): One variation in every row.
|-
|-
|20
|@_Parameter_<ParameterName>&nbsp;(SQL_VARIANT)
|Trends. Has the following names and values:
|All passed parameters are available is variables with name @_Parameter_<ParameterName> where ''ParameterName'' is the name of the variable. For example, variable ''myVariable1'' can be used with name ''@_Parameter_myVariable1''.
*SelectedEventTypeTrends (Value): One selected event type trend with all its selected period identifiers in every row.
*SelectedTrendPeriodLevel: Selected period level for trend selection, possible values are 0 = Day, 1 = Week, 2 = Month, 3 = Quarter, 4 = Year.
*SelectedTrendStartDate: Selected start date in ISO format (e.g. 2016-01-28) for the trend selection.
|}
|STYLE="vertical-align: top; padding-left: 20px"|
{| class="wikitable"
! Common Type Value
! Description
|-
|NumberOfActivities
|The amount of activities in the selection.
|-
|NumberOfActivityPaths
|The amount of activity paths in the selection.
|-
|NumberOfAttributeValues
|The amount of attribute values in the selection.
|-
|NumberOfCases
|The amount of cases in the selection.
|-
|NumberOfCounts
|See the "Group By Event Count" entry in the [[Variations]] page.
|-
|NumberOfDurations
|The amount of durations in the selection.
|-
|NumberOfTransitions
|The amount of transitions in the selection.
|-
|NumberOfVariations
|The amount of variations in the selection.
|-
|DurationGranularity
|The duration time unit used in the selection. See the "Group By" entry in the [[Duration Analysis]] page for reference.
|-
|DurationMaximum
|The duration limit in the selection. See the "Duration Limit" entry in the [[Duration Analysis]] page for reference.
|-
|ReversedPaths
|The value is "1" if the "Predecessors" Direction was selected in the [[Path Analysis]], otherwise "0".
|-
|SelectionType
|What was selected. The value for this parameter is shown on the "Value" column instead of the IntValue column.
|}
|}
|}


* IntValue: the integer representation of the selection value (used for object IDs etc). BIGINT.
== Script Variable Examples ==
* Value: the string representation of the selected value. Used for selections that require more complex representation than single integer value. NVARCHAR(MAX).
The following script command defines ProjectId, ModelId, and FilterId variables by using the script variables:
* Name: the unique name for the selection value. See the table above for the list of name used for each type. Note that there can be multiple rows with the same name if the selection consists of multiple "objects". NVARCHAR(MAX).
<br/>
 
Note that even when a script context hasn't been selected, the variables get their values from a default filter. The default filter is the filter that was last used by the user in the model the user has last opened. If the user hasn't opened any model before, the default filter is taken from the last created model to which the user has access rights.
 
: '''Example'''
: The following script fragment defines the ProjectId, ModelId, and FilterId parameters by using the script variables:
<pre>SELECT @_ProjectId as ProjectId, @_ModelId as ModelId, @_FilterId as FilterId;</pre>
<pre>SELECT @_ProjectId as ProjectId, @_ModelId as ModelId, @_FilterId as FilterId;</pre>


: '''Example'''
The following script gets various information about the environment:
: The following script gets the View Settings currently in use in the Settings pane in Excel and shows them as a table in a new Excel sheet:
<pre>SELECT @_QPRProcessAnalyzerVersion as QPRProcessAnalyzerVersion, @_UserId as Userid
<pre>(SELECT * FROM #_ViewSettings)
(SELECT 'SheetName', 'Info')
(SELECT 'SheetName', 'ViewSettings')
--#ShowReport</pre>
--#ShowReport</pre>


: '''Example'''
== Calling Expression Script from SQL Script ==
: The following script gets the current selection on the analysis and shows the data as a table in a new Excel sheet:
Expression scripts can be called from SQL scripts by using the [[SQL_Scripting_Commands#--.23GetAnalysis|--#GetAnalysis]] command. There you need to call the ''AnalysisType=33'' to run an expression to use the [[QPR_ProcessAnalyzer_Objects_in_Expression_Language#Script|Run]] function. Here is an example (to call an expression script with id 123 and pass parameters values):
<pre>(SELECT * FROM #_Selection)
<pre>
(SELECT 'SheetName' , 'Selection')
(SELECT 'AnalysisType', '33') UNION ALL
--#ShowReport</pre>
(SELECT 'TargetTable', '#ExampleTable') UNION ALL
(SELECT 'Configuration', '{
"Root": "ScriptById(123).Run(#{
\"parameter1\": \"value1\",
\"parameter2\": 321
})"
}')
--#GetAnalysis
</pre>
 
In the above example, if the expression script returns a DataFrame, it is stored to the ''#ExampleTable'' temporary table in the scripting sandbox database.


: '''Example'''
== Calling SQL Script from SQL Script ==
: The following script gets various information about the environment and shows them in a new Excel sheet:
In an SQL script, it's possible to call other SQL script using the [[SQL_Scripting_Commands#--.23Run|--#Run]] command. For more information, see its [[SQL_Scripting_Commands#--.23Run|documentation]] and [[Run_Script_Examples|examples]].
<pre>SELECT  @_QPRProcessAnalyzerVersion as QPRProcessAnalyzerVersion, @_ScriptingVersion as ScriptingVersion, @_EngineScriptingVersion as EngineScriptingVersion, @_UserId as Userid, @_DatabaseId as DatabaseId
(SELECT 'SheetName', 'Info')
--#ShowReport</pre>


== Exception Handling ==
== Exception Handling ==
Line 151: Line 79:
Note that the CatchOperationExceptions parameter is in effect only for the command it is used with, i.e. it isn't in effect in child scripts or scripts that are run via the --#Exit command. In addition, when there are multiple ProcessAnalyzer script commands in the script, the @_ExceptionOccurred, @_ExceptionType, @_ExceptionMessage, and @_ExceptionDetails script variables get updated each time, i.e. the variable values are available only until the next ProcessAnalyzer command is executed. To use the same variable values in multiple ProcessAnalyzer commands in the script, place the values into a temporary table:<br/>
Note that the CatchOperationExceptions parameter is in effect only for the command it is used with, i.e. it isn't in effect in child scripts or scripts that are run via the --#Exit command. In addition, when there are multiple ProcessAnalyzer script commands in the script, the @_ExceptionOccurred, @_ExceptionType, @_ExceptionMessage, and @_ExceptionDetails script variables get updated each time, i.e. the variable values are available only until the next ProcessAnalyzer command is executed. To use the same variable values in multiple ProcessAnalyzer commands in the script, place the values into a temporary table:<br/>
<pre>
<pre>
SELECT
SELECT @_ExceptionOccurred 'ExceptionOccurred', @_ExceptionMessage 'ExceptionMessage'
@_ExceptionOccurred 'ExceptionOccurred',
INTO #PACommandExceptions
@_ExceptionType 'ExceptionType',
@_ExceptionMessage 'ExceptionMessage',
@_ExceptionDetails 'ExceptionDetails'
INTO #PACommandExceptions</pre>
 
== SQL Command Support ==
When transforming data in QPR ProcessAnalyzer Pro (i.e. when connected to the QPR ProcessAnalyzer Service), only temporary tables (#) should be used. Note that '''global''' temporary tables (##) should never be used in the SQL scripts, and using them is not allowed. When using QPR ProcessAnalyzer Xpress, these limitations do not apply.
 
==== Print ====
The [http://msdn.microsoft.com/en-us/library/ms176047.aspx Print SQL statement] can be used to generate log entries into the script execution log.
 
== Running Scripts ==
A script can be run in the following ways:
 
=== Running a Script from the Ribbon ===
# On the QPR tab of the ribbon, click '''Run'''.
# Select the script to be run from the Script Gallery that opens.
 
=== Running a Script from the Manage Scripts Dialog ===
# On the QPR tab of the ribbon, click '''Manage Scripts'''.
# From the dialog, select the context in which the script you wish to run exists.
# Select the script you wish to run.
# Click '''Run'''.
 
== Terminating Scripts ==
A script can be terminated by the following ways:
*The user running the script can click the '''Cancel''' button when the script is running.
*The script can use the "--#Exit" command to stop script execution.
*A QPR ProcessAnalyzer Administrator user can terminate the script via the Operation Log.
*The SQL Server System Administrator can kill the session using the script by using e.g. SQL Server Management Studio.
*The Internet Information Services Administrator can recycle the application pool if the script has caused it to hang. Note however, that this may also cause other requests by other users being processed at the same time to be aborted.
*The Windows Administrator can kill the w3wp.exe-process processing a problematic script. Note however, that this may also cause other requests by other users being processed at the same time to be aborted.
 
'''NOTE!'''<br/>
Terminating the script will not revert any changes the script has already done in the database before the Cancel button is clicked.
 
== Things to Note About Scripts ==
When writing and using scripts, take the following things into account:
* Only those lines in the script that start with "--#" (without the quotes) are treated as QPR ProcessAnalyzer Commands, i.e. if there are leading whitespaces before the command, the line is treated as a comment.
* If you don't define a value for the MaximumCount parameter, 1000 will be used as default, i.e. only the 1000 first rows from a given table or model will be used.
* When doing more advanced operations with scripts, you may run into the error messages such as: "'''The data types sql_variant and varchar are incompatible in the add operation.'''", "'''Argument data type sql_variant is invalid for argument 1 of like function.'''", "'''Argument data type sql_variant is invalid for argument 1 of left function.'''". This is because [[Case Attribute|case attributes]], [[Event Attribute|event attributes]], and data inside [[Data Table|data tables]] are '''sql_variant''' type data. In order to use them with more advanced operations (e.g. Add), you need to CONVERT or CAST them into some other data type before the operations. See [[#ConvertExample|this example]].
* For certain characters in attribute values, you need to use escaping in order to have them interpreted correctly in the script. For more information, see [[Attribute Values|Escaping for Attribute Values]].
 
== Troubleshooting Scripts ==
After a script has been run, the [[Script Log]] dialog is shown. You can use it to troubleshoot your script.
 
=== Invalid Object Name ===
* '''"Invalid object name" exception is thrown when running a script with #GetAnalysis and #ImportEvents commands. What causes this?
: The issue may arise when there are no events to import, i.e. the #GetAnalysis command doesn't create any table. In this case, the missing event table needs to be created in the script between the #GetAnalysis and #ImportEvents commands:
<pre>
/*This is the GetAnalysis part */
(SELECT 'AnalysisType', '6') UNION ALL
(SELECT 'MaximumCount', '0') UNION ALL
(SELECT 'FilterId', '10') UNION ALL
(SELECT 'SelectedEventAttributes', '*') UNION ALL
(SELECT 'TargetTable', '#AnalysisResult')
--#GetAnalysis
 
/*Here we create the event table, if it is missing */
IF object_id('tempdb..#AnalysisResult') is  null
BEGIN
CREATE TABLE #AnalysisResult (
CaseName NVARCHAR(MAX),
EventTypeName NVARCHAR(MAX),
TimeStamp DATETIME,
Cost FLOAT,
TotalCost FLOAT
);
END
 
/*Then we use the ImportEvents */
(SELECT 'ProjectName', 'ExampleProject') UNION ALL
(SELECT 'ModelName', 'ExampleModel')
(SELECT * FROM [#AnalysisResult])
--#ImportEvents
</pre>
</pre>


=== Incorrect Syntax When Using ' Characters ===
== Example Script Walkthough==
Here is an example of a simple ETL script that prints data taken from the table #Customers:
* '''SQL error: Incorrect syntax near ' '''
:If you get this error message, check the ' characters in the parameter definitions. Note that you cannot use the ' character in the parameter value.
=== Incorrect Syntax When Running Scripts Copied from QPR ProcessAnalyzer Wiki ===
 
* '''SQL error: Incorrect syntax'''
: You may get this kind of error message when trying to run a script that you have copied and pasted directly from, for example, the examples given in [[Data Extraction, Transformation, and Loading#Supported QPR ProcessAnalyzer Commands in Scripts|Supported QPR ProcessAnalyzer Commands in Scripts]]. This happens because the scripts are then likely to contain non-breaking spaces (encoded as 0xA0 but not visible when pasted to the '''Script Code''' field) which cause the running of the scripts to fail as they are not accepted in the SQL syntax.
 
: The solution is to manually remove the non-breaking spaces after copying and pasting the script, so there are no extra spaces at the beginning of each line, for example.
 
=== Using "##" in a Script ===
* '''Using "##" in a script is not allowed. Did you mistype temporary table name?'''
: This error message is displayed if the name of the project or model name contains two consecutive # characters (i.e. ##) and you try creating a new project or a model using scripts. Note that you cannot use the ## character in the name of the project or the model.
 
=== The ? Character in Project and Model Names ===
* '''A project or model name contains question marks.'''
: The problem arises if you try to, for example, create a project or a model with a name containing multibyte characters using scripts. The created project or model will then include question mark characters instead of multibyte characters. The fix is to add a prefix N' to the name containing multibyte characters (for example, N'【隱藏▲】【純文字檢視】【複製到剪貼簿】'). Another workaround is not to use multibyte characters at all in scripts but instead ProjectId and ModelId, if possible. To do that, you can view the Id column in the Models tab of the Workspace dialog in the same way as described for data tables in [[Workspace#Adding_an_Id_Column_to_the_Data_Tables_List|Adding an Id Column to the Data Tables List]].
 
=== Multiple Projects with the Same Name ===
 
* '''"There are multiple projects with name '<ProjectName>'. Please, specify ProjectId in sandbox script operation: #<Operation>"'''
: This error message is displayed when trying to run a script if you have access to two or several projects with the same name and you refer to the project by name in the script. The fix for resolving this ambiguity is to refer to the project by its ID using the 'ProjectId' parameter instead of 'ProjectName'. An ID is always unique, whereas a name is not.
 
=== Multiple Models with the Same Name ===
* '''"There are multiple models with name '<ModelName>'. Please, specify ModelId in sandbox script operation: #<Operation>"'''
: This error message is displayed when trying to run a script if you have access to two or several models with the same name and you refer to the model by name in the script. The fix for resolving this ambiguity is to refer to the model by its ID using the 'ModelId' parameter instead of 'ModelName'. An ID is always unique, whereas a name is not.
 
{{UnicodeComparison}}
* For example, the following script will show characters that are treated similar as zero "0":
<pre>
<pre>
DECLARE @referenceAscii int = 0
SELECT * FROM #Customers;
 
(SELECT 'Title', 'Customers') UNION ALL
IF object_id('tempdb..#ids') is null
(SELECT 'MaximumCount', '0');
CREATE TABLE #ids ( id int)
--#ShowReport  
 
DECLARE @i int = 0
WHILE @i < 70000 BEGIN
    SET @i = @i + 1
    INSERT #ids VALUES (@i)
END
 
select * FROM (
  select
    sys.fn_varbintohexstr(CONVERT(varBINARY(8), id)) as CharacterAsHex,
    id as ASCIICode,
    nchar(id) as Character,
  (CASE WHEN Nchar(@referenceAscii) = NCHAR(id) THEN 1 ELSE 0 END) as ShownAsSimilar
  FROM #ids ) as x
where x.ShownAsSimilar = 1
--#ShowReport
</pre>
</pre>


=== Client Exception When Using #ImportSapQuery Command ===
This script consists of SQL statements (lines 1-3) and a QPR ProcessAnalyzer command (line 4) that take the results of the preceding SQL SELECT statements as parameters.


* '''Limitations in size of SAP query result''' 
Let’s go through this example in more detail (line 1):
:The SAP system has limitations as to how much data can be queried. The workaround for the error is to reduce the amount of data to be queried using 'SapRowcount', 'SapRowskips', 'SapWhereClauseSelect', 'SapFieldNames' and 'SapFunction' parameters. See [[Supported_QPR_ProcessAnalyzer_Commands_in_Scripts#--.23ImportSapQuery|--#ImportSapQuery]] for parameter explanations.
<pre>SELECT * FROM #Customers; </pre>


* '''"Client exception" is thrown when running a script with #ImportSapQuery command using the 'SapWhereClauseSelect' parameter.
SELECT is one of the most basic SQL commands you can use. The SELECT statement is used to extract data from a table. You can define the criteria for what data is selected for extraction. On line 1 of the example above, the SQL statement selects all the columns from the #Customers table. In addition, this produces the first input argument for the QPR ProcessAnalyzer command —#ShowReport by defining what to print.
The preceding hash symbol (#) indicates that results are stored to a temporary table. They exist only during the execution of the script so they are not stored permanently.


:For example, the following error text can be displayed:
Lines 2-3:
<pre>
<pre>(SELECT 'Title', 'Customers') UNION ALL
#ImportSapQuery: SAP.Middleware.Connector.RfcAbapRuntimeException: SAP.Middleware.Connector.RfcAbapRuntimeException:
(SELECT 'MaximumCount', '0'); </pre>
Error in module RSQL of the database interface
  at SAP.Middleware.Connector.RfcConnection.ThrowRfcErrorMsg()
  at SAP.Middleware.Connector.RfcConnection.ReadBytes(Byte* buffer, Int32 count)
  at SAP.Middleware.Connector.RfcConnection.ReadRfcIDBegin(Int32& length)
  at SAP.Middleware.Connector.RfcConnection.ReadUpTo(RFCGET readState, RfcFunction function, RFCID toRid)
  at SAP.Middleware.Connector.RfcConnection.RfcReceive(RfcFunction function)
  at SAP.Middleware.Connector.RfcFunction.RfcDeserialize(RfcConnection conn, IRfcIOStream stream)
  at SAP.Middleware.Connector.RfcFunction.RfcCallReceive(RfcConnection conn, IRfcIOStream stream, RFCID rid)
  at SAP.Middleware.Connector.RfcFunction.Invoke(RfcDestination destination)
  at Qpr.ProcessAnalyzer.Common.Integration.Sap.SapQuery(RfcDestination destination, Action`2 getSapWhereClauseSelect, Boolean enableStreamCompression, IDictionary`2 parameters)
  System.ServiceModel.FaultException`1[Qpr.Utils.SapError]: SAP.Middleware.Connector.RfcAbapRuntimeException: Error in module RSQL of the database interface
  (Fault Detail is equal to SAP.Middleware.Connector.RfcAbapRuntimeException: Error in module RSQL of the database interface</pre>


:This issue may arise when you are extracting data from SAP and the size of 'SapWhereClauseSelect' parameter value is too large. Reduce the size of the 'SapWhereClauseSelect' value.
These lines create the second input argument for the command --#ShowReport by giving a label for the sheet and defining how many lines to print. ‘0’ means that all rows will be printed.


== [[Supported QPR ProcessAnalyzer Commands in Scripts]] ==
SQL uses the UNION operator to combine the result-set of two or more SELECT statements. UNION ALL is used to select all the values (including duplicate values) whereas UNION is used to select only the distinct values.
For a list of supported commands and their descriptions, see page [[Supported QPR ProcessAnalyzer Commands in Scripts]].


== [[QPR ProcessAnalyzer ETL Tutorial]] ==
Line 4:
There is also an [[QPR ProcessAnalyzer ETL Tutorial | ETL tutorial]] for learning the basics of creating ETL scripts in SQL and using them in QPR ProcessAnalyzer.
<pre>--#ShowReport </pre>


== [[ETL Script Examples]] ==
When writing scripts with the QPR ProcessAnalyzer ETL system, the lines in the script that start with "--#" (without the quotes and without any preceding blank spaces at the beginning of the line) are treated as QPR ProcessAnalyzer Commands and SQL ignores these lines as comments. The supported QPR ProcessAnalyzer commands and some examples of how to use them in scripts are listed in [[SQL_Scripting_Commands|here]].
For more examples, see [[ETL Script Examples]].


[[Category:ETL]]
== Tips for Writing SQL Scripts ==
[[Category:API]]
* It is important to follow the T-SQL syntax when writing SQL queries.
* SQL is not case sensitive, so you can write either ''SELECT'' or ''select''. However, the parameter values and the QPR ProcessAnalyzer commands given as comments are case sensitive.
* Semicolon is used to end SQL statements. It is not always required, but in some cases the missing semicolon may cause SQL errors due to ambiguity. We recommend using a semicolon especially at the end of SELECT statements preceding the QPR ProcessAnalyzer ETL commands to avoid SQL errors.
* Table field names may contain white spaces, special characters, or reserved words but then they must be referred to by using brackets ([]) around them. This situation happens frequently in QPR ProcessAnalyzer ETL scripts since the case and event attribute data contain fields that have extra spaces.
* When transforming data, ''temporary tables'' can be used (temporary tables names start with ''#''). Note that ''global temporary tables'' (tables names start with ''##'') is not allowed.
* The [http://msdn.microsoft.com/en-us/library/ms176047.aspx Print SQL statement] can be used to generate log entries into the script execution log.
* Only those lines in the script that start with "--#" (without the quotes) are treated as QPR ProcessAnalyzer commands, i.e. if there are leading whitespaces before the command, the line is treated as a comment.
* If you don't define a value for the MaximumCount parameter, 1000 will be used as default, i.e. only the 1000 first rows from a given table or model will be used.
* When doing more advanced operations with scripts, you may run into the error messages such as: "The data types sql_variant and varchar are incompatible in the add operation.", "Argument data type sql_variant is invalid for argument 1 of like function.", "Argument data type sql_variant is invalid for argument 1 of left function.". This is because case attributes, event attributes, and data inside datatables are ''sql_variant'' type data. In order to use them with more advanced operations (e.g. Add), you need to CONVERT or CAST them into some other data type before the operations. See [[#ConvertExample|this example]].
* For certain characters in attribute values, you need to use escaping in order to have them interpreted correctly in the script.

Latest revision as of 13:39, 31 August 2023

This page describes how to use the SQL-based scripting language to extract data from source systems and load data into QPR ProcessAnalyzer, i.e. how to do ETL (extract, transform and load). The supported commands are described in SQL Scripting Commands. Scripts can be written in the Manage Scripts dialog. The SQL scripts consists of SQL statements and QPR ProcessAnalyzer commands that take the preceding SQL statements as parameters.

Introduction to SQL Scripting

The QPR ProcessAnalyzer ETL system enables data extraction, transformation and loading based on SQL queries implemented by scripts. Scripts are written in SQL and consist of standard SQL commands, QPR ProcessAnalyzer commands and special parameters related to QPR ProcessAnalyzer. Scripts are useful for performing several database related operations. More specifically, with the QPR ProcessAnalyzer ETL system it is possible to, for example:

  • extract data from a source system to data tables
  • load data from data tables and create events and case attributes to new or existing models based on this data
  • read data from existing models and create reports
  • import data from existing models and create new models extending the current features of QPR ProcessAnalyzer

Script Variables

SQL scripts have the below listed variables available.

Variable name Description
@_ProjectId (BIGINT) Id of the project in which context the script is run. Undefined if project context has not been specified in script execution parameters.
@_ModelId (BIGINT) Id of the model in which context the script is run. Undefined if model context has not been specified in script execution parameters.
@_FilterId (BIGINT) Id of the filter in which context the script is run. Undefined if filter context has not been specified in script execution parameters.
@_UserId (INT) Id of the user running the script.
@_ScriptId (BIGINT) Id of the script that originally started the script run.
@_CurrentScriptId (BIGINT) Id of the script where the execution currently is. If scripts call other scripts, @_ScriptId doesn't change, whereas @_CurrentScriptId changes when the parent script calls other script.
@_ExceptionOccurred (INT) If there was an exception when running the script, the value is 1, otherwise 0. INT
@_ExceptionMessage (NVARCHAR(MAX)) If there was an exception when running the script, contains the message of the exception.
@_Parameter_<ParameterName> (SQL_VARIANT) All passed parameters are available is variables with name @_Parameter_<ParameterName> where ParameterName is the name of the variable. For example, variable myVariable1 can be used with name @_Parameter_myVariable1.

Script Variable Examples

The following script command defines ProjectId, ModelId, and FilterId variables by using the script variables:

SELECT @_ProjectId as ProjectId, @_ModelId as ModelId, @_FilterId as FilterId;

The following script gets various information about the environment:

SELECT  @_QPRProcessAnalyzerVersion as QPRProcessAnalyzerVersion, @_UserId as Userid
(SELECT 'SheetName', 'Info')
--#ShowReport

Calling Expression Script from SQL Script

Expression scripts can be called from SQL scripts by using the --#GetAnalysis command. There you need to call the AnalysisType=33 to run an expression to use the Run function. Here is an example (to call an expression script with id 123 and pass parameters values):

(SELECT 'AnalysisType', '33') UNION ALL
(SELECT 'TargetTable', '#ExampleTable') UNION ALL
(SELECT 'Configuration', '{
	"Root": "ScriptById(123).Run(#{
		\"parameter1\": \"value1\",
		\"parameter2\": 321
	})"
}')
--#GetAnalysis

In the above example, if the expression script returns a DataFrame, it is stored to the #ExampleTable temporary table in the scripting sandbox database.

Calling SQL Script from SQL Script

In an SQL script, it's possible to call other SQL script using the --#Run command. For more information, see its documentation and examples.

Exception Handling

In general, scripts are meant to be developed in such a way that in the end you can run the scripts without any errors. However, sometimes there may be some system issues (timeouts SAP etc.) that can cause for example data extraction scripts to fail. For these kind of situations and for development time and troubleshooting purposes, you can use the CatchOperationExceptions parameter and the @_ExceptionOccurred, @_ExceptionType, @_ExceptionMessage, and @_ExceptionDetails script variables with the QPR ProcessAnalyzer script commands to handle exceptions in ProcessAnalyzer. Out of these, the @_ExceptionOccurred is handy for example in defining some other command to be run in case there was an exception. For SQL errors, the TRY-CATCH error handling should be used.

Note that the CatchOperationExceptions parameter is in effect only for the command it is used with, i.e. it isn't in effect in child scripts or scripts that are run via the --#Exit command. In addition, when there are multiple ProcessAnalyzer script commands in the script, the @_ExceptionOccurred, @_ExceptionType, @_ExceptionMessage, and @_ExceptionDetails script variables get updated each time, i.e. the variable values are available only until the next ProcessAnalyzer command is executed. To use the same variable values in multiple ProcessAnalyzer commands in the script, place the values into a temporary table:

SELECT @_ExceptionOccurred 'ExceptionOccurred', @_ExceptionMessage 'ExceptionMessage'
INTO #PACommandExceptions

Example Script Walkthough

Here is an example of a simple ETL script that prints data taken from the table #Customers:

SELECT * FROM #Customers; 
(SELECT 'Title', 'Customers') UNION ALL 
(SELECT 'MaximumCount', '0'); 
--#ShowReport 

This script consists of SQL statements (lines 1-3) and a QPR ProcessAnalyzer command (line 4) that take the results of the preceding SQL SELECT statements as parameters.

Let’s go through this example in more detail (line 1):

SELECT * FROM #Customers; 

SELECT is one of the most basic SQL commands you can use. The SELECT statement is used to extract data from a table. You can define the criteria for what data is selected for extraction. On line 1 of the example above, the SQL statement selects all the columns from the #Customers table. In addition, this produces the first input argument for the QPR ProcessAnalyzer command —#ShowReport by defining what to print.

The preceding hash symbol (#) indicates that results are stored to a temporary table. They exist only during the execution of the script so they are not stored permanently.

Lines 2-3:

(SELECT 'Title', 'Customers') UNION ALL 
(SELECT 'MaximumCount', '0'); 

These lines create the second input argument for the command --#ShowReport by giving a label for the sheet and defining how many lines to print. ‘0’ means that all rows will be printed.

SQL uses the UNION operator to combine the result-set of two or more SELECT statements. UNION ALL is used to select all the values (including duplicate values) whereas UNION is used to select only the distinct values.

Line 4:

--#ShowReport 

When writing scripts with the QPR ProcessAnalyzer ETL system, the lines in the script that start with "--#" (without the quotes and without any preceding blank spaces at the beginning of the line) are treated as QPR ProcessAnalyzer Commands and SQL ignores these lines as comments. The supported QPR ProcessAnalyzer commands and some examples of how to use them in scripts are listed in here.

Tips for Writing SQL Scripts

  • It is important to follow the T-SQL syntax when writing SQL queries.
  • SQL is not case sensitive, so you can write either SELECT or select. However, the parameter values and the QPR ProcessAnalyzer commands given as comments are case sensitive.
  • Semicolon is used to end SQL statements. It is not always required, but in some cases the missing semicolon may cause SQL errors due to ambiguity. We recommend using a semicolon especially at the end of SELECT statements preceding the QPR ProcessAnalyzer ETL commands to avoid SQL errors.
  • Table field names may contain white spaces, special characters, or reserved words but then they must be referred to by using brackets ([]) around them. This situation happens frequently in QPR ProcessAnalyzer ETL scripts since the case and event attribute data contain fields that have extra spaces.
  • When transforming data, temporary tables can be used (temporary tables names start with #). Note that global temporary tables (tables names start with ##) is not allowed.
  • The Print SQL statement can be used to generate log entries into the script execution log.
  • Only those lines in the script that start with "--#" (without the quotes) are treated as QPR ProcessAnalyzer commands, i.e. if there are leading whitespaces before the command, the line is treated as a comment.
  • If you don't define a value for the MaximumCount parameter, 1000 will be used as default, i.e. only the 1000 first rows from a given table or model will be used.
  • When doing more advanced operations with scripts, you may run into the error messages such as: "The data types sql_variant and varchar are incompatible in the add operation.", "Argument data type sql_variant is invalid for argument 1 of like function.", "Argument data type sql_variant is invalid for argument 1 of left function.". This is because case attributes, event attributes, and data inside datatables are sql_variant type data. In order to use them with more advanced operations (e.g. Add), you need to CONVERT or CAST them into some other data type before the operations. See this example.
  • For certain characters in attribute values, you need to use escaping in order to have them interpreted correctly in the script.