SQL Scripting for ETL: Difference between revisions

From QPR ProcessAnalyzer Wiki
Jump to navigation Jump to search
 
(37 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page describes how to extract data from source systems and load data into QPR ProcessAnalyzer, i.e. how to do ETL (extract, transform and load). The supported commands and their descriptions are listed in [[QPR ProcessAnalyzer Scripting Commands]]. It is possible to load raw data into QPR ProcessAnalyzer, and do the data transformation and loading into QPR ProcessAnalyzer Service via scripts using temporary database tables, so that the resulting transformed data can be used for analyses in QPR ProcessAnalyzer. Scripts can be written in the [[Manage Scripts in QPR ProcessAnalyzer#Script Manager|Manage Scripts]] dialog in the Excel Client. The script consists of SQL statements and QPR ProcessAnalyzer commands that take the preceding SQL statements as parameters.
This page describes how to use the SQL-based scripting language to extract data from source systems and load data into QPR ProcessAnalyzer, i.e. how to do ETL (extract, transform and load). The supported commands are described in [[SQL Scripting Commands]]. Scripts can be written in the [[Managing_Scripts|Manage Scripts]] dialog. The SQL scripts consists of SQL statements and QPR ProcessAnalyzer commands that take the preceding SQL statements as parameters.


__TOC__
__TOC__
== Introduction to SQL Scripting ==
The QPR ProcessAnalyzer ETL system enables data extraction, transformation and loading based on SQL queries implemented by scripts. Scripts are written in SQL and consist of standard SQL commands, QPR ProcessAnalyzer commands and special parameters related to QPR ProcessAnalyzer. Scripts are useful for performing several database related operations. More specifically, with the QPR ProcessAnalyzer ETL system it is possible to, for example:
* extract data from a source system to data tables
* load data from data tables and create events and case attributes to new or existing models based on this data
* read data from existing models and create reports
* import data from existing models and create new models extending the current features of QPR ProcessAnalyzer


== Script Variables ==
== Script Variables ==
Scripts have the below listed variables available.
SQL scripts have the below listed variables available.


{| class="wikitable"
{| class="wikitable"
Line 21: Line 29:
|@_UserId (INT)
|@_UserId (INT)
|Id of the user running the script.
|Id of the user running the script.
|-
|@_ScriptId (BIGINT)
|Id of the script that originally started the script run.
|-
|@_CurrentScriptId (BIGINT)
|Id of the script where the execution currently is. If scripts call other scripts, @_ScriptId doesn't change, whereas @_CurrentScriptId changes when the parent script calls other script.
|-
|-
|@_ExceptionOccurred (INT)
|@_ExceptionOccurred (INT)
Line 28: Line 42:
|If there was an exception when running the script, contains the message of the exception.
|If there was an exception when running the script, contains the message of the exception.
|-
|-
|@_ExceptionDetails (NVARCHAR(MAX))
|@_Parameter_<ParameterName>&nbsp;(SQL_VARIANT)
|If there was an exception when running the script, contains the details of the exception, including the system stack trace.
|All passed parameters are available is variables with name @_Parameter_<ParameterName> where ''ParameterName'' is the name of the variable. For example, variable ''myVariable1'' can be used with name ''@_Parameter_myVariable1''.
|-
|@_ExceptionType (NVARCHAR(MAX))
|If there was an exception when running the script, shows the C# class name for the exception.
|-
|@_ScriptId (BIGINT)
|Id of the script that originally called to start the script run.
|-
|@_CurrentScriptId (BIGINT)
|Id of the script where the execution currently is. So, if there are nested scripts, @_ScriptId doesn't change, whereas @_CurrentScriptId is updated when the parent script calls some other script.
|-
|@_QPRProcessAnalyzerVersion (NVARCHAR(64))
|QPR ProcessAnalyzer version.
|-
|}
|}
== Script Operation Parameters ==
It's possible to pass parameters to scripts. If not passed, variables get their values from a default filter. The default filter is the filter that was last used by the user in the model the user has last opened. If the user hasn't opened any model before, the default filter is taken from the last created model to which the user has access rights.
Parameters of an operation are the following:
* FilterId: Specifies the id of the filter in which to operate.
* ModelId: Specifies the id of the model in which to operate.
* ProjectId: Specifies the id of the project in which to operate.
* ProjectName: Specifies the name of the project in which to operate (ignored if ProjectId defined). Can not be used when importing data since the possible new project name resulting from the import operation is given using ProjectName parameter.
* ModelName: Specifies the name of the project in which to operate (ignored if ModelId defined). Can not be used when importing any model data, such as events, since the possible new model name resulting from the import operation is given using ModelName parameter.
* FilterName: Specifies the name of the project in which to operate (ignored if FilterId defined). If multiple context parameters have been defined, they all must point to the same context (intersection of all the defined contexts is not empty).
'''Model import parameters''' are parameters in which the whole script is being run and the parameters change when import operations are made during the operation of the script. When a script run is started, script parameters are copied as model import parameters. When running import operations during the script execution (e.g. ImportEvents or ImportDataTable) the target model or datatable of the import replace the earlier value in the model import parameters.
When running a script, parameters can be given either for the entire script run or explicitly for each operation. When running any operation, the used parameter values are determined primarily from the parameters given to the operation, or (if not given) from the model import parameters.


== Script Variable Examples ==
== Script Variable Examples ==
Line 68: Line 54:
(SELECT 'SheetName', 'Info')
(SELECT 'SheetName', 'Info')
--#ShowReport</pre>
--#ShowReport</pre>
== Calling Expression Script from SQL Script ==
Expression scripts can be called from SQL scripts by using the [[SQL_Scripting_Commands#--.23GetAnalysis|--#GetAnalysis]] command. There you need to call the ''AnalysisType=33'' to run an expression to use the [[QPR_ProcessAnalyzer_Objects_in_Expression_Language#Script|Run]] function. Here is an example (to call an expression script with id 123 and pass parameters values):
<pre>
(SELECT 'AnalysisType', '33') UNION ALL
(SELECT 'TargetTable', '#ExampleTable') UNION ALL
(SELECT 'Configuration', '{
"Root": "ScriptById(123).Run(#{
\"parameter1\": \"value1\",
\"parameter2\": 321
})"
}')
--#GetAnalysis
</pre>
In the above example, if the expression script returns a DataFrame, it is stored to the ''#ExampleTable'' temporary table in the scripting sandbox database.
== Calling SQL Script from SQL Script ==
In an SQL script, it's possible to call other SQL script using the [[SQL_Scripting_Commands#--.23Run|--#Run]] command. For more information, see its [[SQL_Scripting_Commands#--.23Run|documentation]] and [[Run_Script_Examples|examples]].


== Exception Handling ==
== Exception Handling ==
Line 74: Line 79:
Note that the CatchOperationExceptions parameter is in effect only for the command it is used with, i.e. it isn't in effect in child scripts or scripts that are run via the --#Exit command. In addition, when there are multiple ProcessAnalyzer script commands in the script, the @_ExceptionOccurred, @_ExceptionType, @_ExceptionMessage, and @_ExceptionDetails script variables get updated each time, i.e. the variable values are available only until the next ProcessAnalyzer command is executed. To use the same variable values in multiple ProcessAnalyzer commands in the script, place the values into a temporary table:<br/>
Note that the CatchOperationExceptions parameter is in effect only for the command it is used with, i.e. it isn't in effect in child scripts or scripts that are run via the --#Exit command. In addition, when there are multiple ProcessAnalyzer script commands in the script, the @_ExceptionOccurred, @_ExceptionType, @_ExceptionMessage, and @_ExceptionDetails script variables get updated each time, i.e. the variable values are available only until the next ProcessAnalyzer command is executed. To use the same variable values in multiple ProcessAnalyzer commands in the script, place the values into a temporary table:<br/>
<pre>
<pre>
SELECT
SELECT @_ExceptionOccurred 'ExceptionOccurred', @_ExceptionMessage 'ExceptionMessage'
@_ExceptionOccurred 'ExceptionOccurred',
INTO #PACommandExceptions
@_ExceptionType 'ExceptionType',
</pre>
@_ExceptionMessage 'ExceptionMessage',
 
@_ExceptionDetails 'ExceptionDetails'
== Example Script Walkthough==
INTO #PACommandExceptions</pre>
Here is an example of a simple ETL script that prints data taken from the table #Customers:
<pre>
SELECT * FROM #Customers;
(SELECT 'Title', 'Customers') UNION ALL
(SELECT 'MaximumCount', '0');
--#ShowReport
</pre>
 
This script consists of SQL statements (lines 1-3) and a QPR ProcessAnalyzer command (line 4) that take the results of the preceding SQL SELECT statements as parameters.
 
Let’s go through this example in more detail (line 1):
<pre>SELECT * FROM #Customers; </pre>
 
SELECT is one of the most basic SQL commands you can use. The SELECT statement is used to extract data from a table. You can define the criteria for what data is selected for extraction. On line 1 of the example above, the SQL statement selects all the columns from the #Customers table. In addition, this produces the first input argument for the QPR ProcessAnalyzer command —#ShowReport by defining what to print.
The preceding hash symbol (#) indicates that results are stored to a temporary table. They exist only during the execution of the script so they are not stored permanently.
 
Lines 2-3:
<pre>(SELECT 'Title', 'Customers') UNION ALL
(SELECT 'MaximumCount', '0'); </pre>
 
These lines create the second input argument for the command --#ShowReport by giving a label for the sheet and defining how many lines to print. ‘0’ means that all rows will be printed.
 
SQL uses the UNION operator to combine the result-set of two or more SELECT statements. UNION ALL is used to select all the values (including duplicate values) whereas UNION is used to select only the distinct values.


== Terminating Scripts ==
Line 4:
Scripts can be terminated in the following ways:
<pre>--#ShowReport </pre>
* The user running the script can click the '''Cancel''' button when the script is running.
* The script can use the '''--#Exit''' command to stop script execution.
* A QPR ProcessAnalyzer Administrator user can terminate scripts in the '''Operation Log'''.
* The SQL Server System Administrator can kill the session that uses the script by using e.g. '''SQL Server Management Studio'''.
* The '''Internet Information Services Administrator''' can recycle the application pool if the script has caused it to hang. This may also cause other requests by other users being processed at the same time to be aborted.
* The Windows Administrator can kill the '''w3wp.exe''' process processing a problematic script. This may also cause other requests by other users being processed at the same time to be aborted.


Note that terminating a script will not revert any changes the script has already done in the database before the '''Cancel''' button is clicked.
When writing scripts with the QPR ProcessAnalyzer ETL system, the lines in the script that start with "--#" (without the quotes and without any preceding blank spaces at the beginning of the line) are treated as QPR ProcessAnalyzer Commands and SQL ignores these lines as comments. The supported QPR ProcessAnalyzer commands and some examples of how to use them in scripts are listed in [[SQL_Scripting_Commands|here]].


== Notes for Scripting ==
== Tips for Writing SQL Scripts ==
When working with scripts, note the following points:
* It is important to follow the T-SQL syntax when writing SQL queries.
* When transforming data, '''temporary tables''' can be used (temporary tables names start with '''#'''). Note that '''global temporary tables''' (tables names start with '''##''') is not allowed.
* SQL is not case sensitive, so you can write either ''SELECT'' or ''select''. However, the parameter values and the QPR ProcessAnalyzer commands given as comments are case sensitive.
* Semicolon is used to end SQL statements. It is not always required, but in some cases the missing semicolon may cause SQL errors due to ambiguity. We recommend using a semicolon especially at the end of SELECT statements preceding the QPR ProcessAnalyzer ETL commands to avoid SQL errors.
* Table field names may contain white spaces, special characters, or reserved words but then they must be referred to by using brackets ([]) around them. This situation happens frequently in QPR ProcessAnalyzer ETL scripts since the case and event attribute data contain fields that have extra spaces.
* When transforming data, ''temporary tables'' can be used (temporary tables names start with ''#''). Note that ''global temporary tables'' (tables names start with ''##'') is not allowed.
* The [http://msdn.microsoft.com/en-us/library/ms176047.aspx Print SQL statement] can be used to generate log entries into the script execution log.
* The [http://msdn.microsoft.com/en-us/library/ms176047.aspx Print SQL statement] can be used to generate log entries into the script execution log.
* Only those lines in the script that start with "--#" (without the quotes) are treated as QPR ProcessAnalyzer commands, i.e. if there are leading whitespaces before the command, the line is treated as a comment.
* Only those lines in the script that start with "--#" (without the quotes) are treated as QPR ProcessAnalyzer commands, i.e. if there are leading whitespaces before the command, the line is treated as a comment.
* If you don't define a value for the MaximumCount parameter, 1000 will be used as default, i.e. only the 1000 first rows from a given table or model will be used.
* If you don't define a value for the MaximumCount parameter, 1000 will be used as default, i.e. only the 1000 first rows from a given table or model will be used.
* When doing more advanced operations with scripts, you may run into the error messages such as: "'''The data types sql_variant and varchar are incompatible in the add operation.'''", "'''Argument data type sql_variant is invalid for argument 1 of like function.'''", "'''Argument data type sql_variant is invalid for argument 1 of left function.'''". This is because case attributes, event attributes, and data inside datatables are '''sql_variant''' type data. In order to use them with more advanced operations (e.g. Add), you need to CONVERT or CAST them into some other data type before the operations. See [[#ConvertExample|this example]].
* When doing more advanced operations with scripts, you may run into the error messages such as: "The data types sql_variant and varchar are incompatible in the add operation.", "Argument data type sql_variant is invalid for argument 1 of like function.", "Argument data type sql_variant is invalid for argument 1 of left function.". This is because case attributes, event attributes, and data inside datatables are ''sql_variant'' type data. In order to use them with more advanced operations (e.g. Add), you need to CONVERT or CAST them into some other data type before the operations. See [[#ConvertExample|this example]].
* For certain characters in attribute values, you need to use escaping in order to have them interpreted correctly in the script. For more information, see [[Case and Event Attributes in QPR ProcessAnalyzer|Escaping for Attribute Values]].
* For certain characters in attribute values, you need to use escaping in order to have them interpreted correctly in the script.
 
== Supported Transport Layer Security Protocols ==
The following transport layer security protocols are supported when using for example the [[QPR ProcessAnalyzer Scripting Commands#--.23CallWebService|CallWebService]], [[QPR ProcessAnalyzer Scripting Commands#--.23ImportOdbcQuery|ImportOdbcQuery]], [[QPR ProcessAnalyzer Scripting Commands#--.23ImportOleDbQuery|ImportOleDbQuery]], [[QPR ProcessAnalyzer Scripting Commands#--.23ImportSapQuery|ImportSapQuery]], [[QPR ProcessAnalyzer Scripting Commands#--.23ImportSqlQuery|ImportSqlQuery]], and [[QPR ProcessAnalyzer Scripting Commands#--.23SendEmail|SendEmail]] script commands:
* TLS 1.0
* TLS 1.1
* TLS 1.2
 
SSL2 and SSL3 are not supported.
 
== See Also ==
* [[Troubleshooting QPR ProcessAnalyzer Scripts]]
* For a list of supported commands and their descriptions, see page [[QPR ProcessAnalyzer Scripting Commands]]
* There is also an [[QPR ProcessAnalyzer ETL Tutorial | ETL tutorial]] for learning the basics of creating ETL scripts in SQL and using them in QPR ProcessAnalyzer
* For more examples, see [[ETL Script Examples]]

Latest revision as of 13:39, 31 August 2023

This page describes how to use the SQL-based scripting language to extract data from source systems and load data into QPR ProcessAnalyzer, i.e. how to do ETL (extract, transform and load). The supported commands are described in SQL Scripting Commands. Scripts can be written in the Manage Scripts dialog. The SQL scripts consists of SQL statements and QPR ProcessAnalyzer commands that take the preceding SQL statements as parameters.

Introduction to SQL Scripting

The QPR ProcessAnalyzer ETL system enables data extraction, transformation and loading based on SQL queries implemented by scripts. Scripts are written in SQL and consist of standard SQL commands, QPR ProcessAnalyzer commands and special parameters related to QPR ProcessAnalyzer. Scripts are useful for performing several database related operations. More specifically, with the QPR ProcessAnalyzer ETL system it is possible to, for example:

  • extract data from a source system to data tables
  • load data from data tables and create events and case attributes to new or existing models based on this data
  • read data from existing models and create reports
  • import data from existing models and create new models extending the current features of QPR ProcessAnalyzer

Script Variables

SQL scripts have the below listed variables available.

Variable name Description
@_ProjectId (BIGINT) Id of the project in which context the script is run. Undefined if project context has not been specified in script execution parameters.
@_ModelId (BIGINT) Id of the model in which context the script is run. Undefined if model context has not been specified in script execution parameters.
@_FilterId (BIGINT) Id of the filter in which context the script is run. Undefined if filter context has not been specified in script execution parameters.
@_UserId (INT) Id of the user running the script.
@_ScriptId (BIGINT) Id of the script that originally started the script run.
@_CurrentScriptId (BIGINT) Id of the script where the execution currently is. If scripts call other scripts, @_ScriptId doesn't change, whereas @_CurrentScriptId changes when the parent script calls other script.
@_ExceptionOccurred (INT) If there was an exception when running the script, the value is 1, otherwise 0. INT
@_ExceptionMessage (NVARCHAR(MAX)) If there was an exception when running the script, contains the message of the exception.
@_Parameter_<ParameterName> (SQL_VARIANT) All passed parameters are available is variables with name @_Parameter_<ParameterName> where ParameterName is the name of the variable. For example, variable myVariable1 can be used with name @_Parameter_myVariable1.

Script Variable Examples

The following script command defines ProjectId, ModelId, and FilterId variables by using the script variables:

SELECT @_ProjectId as ProjectId, @_ModelId as ModelId, @_FilterId as FilterId;

The following script gets various information about the environment:

SELECT  @_QPRProcessAnalyzerVersion as QPRProcessAnalyzerVersion, @_UserId as Userid
(SELECT 'SheetName', 'Info')
--#ShowReport

Calling Expression Script from SQL Script

Expression scripts can be called from SQL scripts by using the --#GetAnalysis command. There you need to call the AnalysisType=33 to run an expression to use the Run function. Here is an example (to call an expression script with id 123 and pass parameters values):

(SELECT 'AnalysisType', '33') UNION ALL
(SELECT 'TargetTable', '#ExampleTable') UNION ALL
(SELECT 'Configuration', '{
	"Root": "ScriptById(123).Run(#{
		\"parameter1\": \"value1\",
		\"parameter2\": 321
	})"
}')
--#GetAnalysis

In the above example, if the expression script returns a DataFrame, it is stored to the #ExampleTable temporary table in the scripting sandbox database.

Calling SQL Script from SQL Script

In an SQL script, it's possible to call other SQL script using the --#Run command. For more information, see its documentation and examples.

Exception Handling

In general, scripts are meant to be developed in such a way that in the end you can run the scripts without any errors. However, sometimes there may be some system issues (timeouts SAP etc.) that can cause for example data extraction scripts to fail. For these kind of situations and for development time and troubleshooting purposes, you can use the CatchOperationExceptions parameter and the @_ExceptionOccurred, @_ExceptionType, @_ExceptionMessage, and @_ExceptionDetails script variables with the QPR ProcessAnalyzer script commands to handle exceptions in ProcessAnalyzer. Out of these, the @_ExceptionOccurred is handy for example in defining some other command to be run in case there was an exception. For SQL errors, the TRY-CATCH error handling should be used.

Note that the CatchOperationExceptions parameter is in effect only for the command it is used with, i.e. it isn't in effect in child scripts or scripts that are run via the --#Exit command. In addition, when there are multiple ProcessAnalyzer script commands in the script, the @_ExceptionOccurred, @_ExceptionType, @_ExceptionMessage, and @_ExceptionDetails script variables get updated each time, i.e. the variable values are available only until the next ProcessAnalyzer command is executed. To use the same variable values in multiple ProcessAnalyzer commands in the script, place the values into a temporary table:

SELECT @_ExceptionOccurred 'ExceptionOccurred', @_ExceptionMessage 'ExceptionMessage'
INTO #PACommandExceptions

Example Script Walkthough

Here is an example of a simple ETL script that prints data taken from the table #Customers:

SELECT * FROM #Customers; 
(SELECT 'Title', 'Customers') UNION ALL 
(SELECT 'MaximumCount', '0'); 
--#ShowReport 

This script consists of SQL statements (lines 1-3) and a QPR ProcessAnalyzer command (line 4) that take the results of the preceding SQL SELECT statements as parameters.

Let’s go through this example in more detail (line 1):

SELECT * FROM #Customers; 

SELECT is one of the most basic SQL commands you can use. The SELECT statement is used to extract data from a table. You can define the criteria for what data is selected for extraction. On line 1 of the example above, the SQL statement selects all the columns from the #Customers table. In addition, this produces the first input argument for the QPR ProcessAnalyzer command —#ShowReport by defining what to print.

The preceding hash symbol (#) indicates that results are stored to a temporary table. They exist only during the execution of the script so they are not stored permanently.

Lines 2-3:

(SELECT 'Title', 'Customers') UNION ALL 
(SELECT 'MaximumCount', '0'); 

These lines create the second input argument for the command --#ShowReport by giving a label for the sheet and defining how many lines to print. ‘0’ means that all rows will be printed.

SQL uses the UNION operator to combine the result-set of two or more SELECT statements. UNION ALL is used to select all the values (including duplicate values) whereas UNION is used to select only the distinct values.

Line 4:

--#ShowReport 

When writing scripts with the QPR ProcessAnalyzer ETL system, the lines in the script that start with "--#" (without the quotes and without any preceding blank spaces at the beginning of the line) are treated as QPR ProcessAnalyzer Commands and SQL ignores these lines as comments. The supported QPR ProcessAnalyzer commands and some examples of how to use them in scripts are listed in here.

Tips for Writing SQL Scripts

  • It is important to follow the T-SQL syntax when writing SQL queries.
  • SQL is not case sensitive, so you can write either SELECT or select. However, the parameter values and the QPR ProcessAnalyzer commands given as comments are case sensitive.
  • Semicolon is used to end SQL statements. It is not always required, but in some cases the missing semicolon may cause SQL errors due to ambiguity. We recommend using a semicolon especially at the end of SELECT statements preceding the QPR ProcessAnalyzer ETL commands to avoid SQL errors.
  • Table field names may contain white spaces, special characters, or reserved words but then they must be referred to by using brackets ([]) around them. This situation happens frequently in QPR ProcessAnalyzer ETL scripts since the case and event attribute data contain fields that have extra spaces.
  • When transforming data, temporary tables can be used (temporary tables names start with #). Note that global temporary tables (tables names start with ##) is not allowed.
  • The Print SQL statement can be used to generate log entries into the script execution log.
  • Only those lines in the script that start with "--#" (without the quotes) are treated as QPR ProcessAnalyzer commands, i.e. if there are leading whitespaces before the command, the line is treated as a comment.
  • If you don't define a value for the MaximumCount parameter, 1000 will be used as default, i.e. only the 1000 first rows from a given table or model will be used.
  • When doing more advanced operations with scripts, you may run into the error messages such as: "The data types sql_variant and varchar are incompatible in the add operation.", "Argument data type sql_variant is invalid for argument 1 of like function.", "Argument data type sql_variant is invalid for argument 1 of left function.". This is because case attributes, event attributes, and data inside datatables are sql_variant type data. In order to use them with more advanced operations (e.g. Add), you need to CONVERT or CAST them into some other data type before the operations. See this example.
  • For certain characters in attribute values, you need to use escaping in order to have them interpreted correctly in the script.