Some index maintenance scripts or dynamic SQL scripts use SQL object names as values for variables or columns i.e. database name, table name, index name etc.
Another technically correct thing to do is to use the QUOTENAME function to wrap the object name in [ and ] brackets. This handles those cases where there are special characters in the name or the name is a reserved keyword.
Example –
set @dbname = quotename(db_name(db_id()))
set @tablename = quotename(object_name(object_id))
set @indexname = quotename(object_name(object_id))
The Issue
The brackets will add two more characters to the value.
So for really long object names that are 128 characters in length, the value would be 128 + 2 = 130 characters.
The two extra characters will break the variable assignment or row insert statements with truncation error –
String or binary data would be truncated. [SQLSTATE 22001] (Error 8152).
In my shop, we do have some long index names and I have had to debug some scripts for this issue. And this is not the first time!
The Solution
I suggest that you use 130 as the width for the variables or columns that will store an object name. SQL Server object names can be maximum 128 characters long so using 130 characters in scripts will handle the extra two characters.
Some things are meant to be learnt the hard way. And that is how I learnt about today’s gotcha.
Overview of [sysdiagrams]
I have been working on an ERD (Entity Relationship Diagram) recently and used the Database Diagram feature in SSMS for this purpose. When you try to create a diagram for the first time in a database, a message box asks you if you would like to create diagramming objects.
Image 1 (Click to enlarge)
On clicking Yes, a system table by the name of [sysdiagrams] is created in the same database that you are creating the diagram in. The diagrams are stored in this table. SSMS shows the diagram in the Database Diagrams node of the Object Explorer tree view.
Image 2 (Click to enlarge)
The Scenario
I was creating my diagram in a test environment. At some point I had to refresh all data from the production environment. The easiest way for me to do a full refresh is to use the Import and Export Wizard. The wizard can be launched either via SSMS context menu or in the Business Intelligence Developer Studio as an SSIS project. As usual in the case of quick data refreshes, I selected all tables using the top-left Source checkbox in the wizard, and chose the options to delete and reinsert all rows with identity values.
Image 3 (Click to enlarge)
When the wizard ran successfully, my database diagram at the destination test system was missing!
Gotcha
Upon some research and trials I found that that if the diagramming capabilities in the source and destination servers are enabled, the wizard includes the [sysdiagrams] table automatically in the list of tables to refresh. As you can see, there are no other system tables in the wizard except the [sysdiagrams] table so it is easy to miss it in a long list.
Image 4 (Click to enlarge)
So in my case, all data in the destination [sysdiagrams] table was deleted. There were no diagrams at the source so nothing was imported for this table. This outcome would have been the same with the drop and recreate option too because the destination table would have been recreated.
Conclusion
One needs to be careful while using the Import and Export Wizard. Uncheck this table in the selection list to preserve the diagrams at destination.
Event logging in SSIS gives a lot of valuable information about the run-time behavior and execution status of the SSIS package. Having a common minimum number of events is good for consistency in reports and general analysis. Lets say your team wants to ensure that all packages must log at least OnError, OnWarning, OnPreExecute and OnPostExecute events. If the package has a DataFlowTask then the BufferSizeTuning should also be logged. The developer can include more events to log as required but these mentioned previously are the minimum that must be included.
You can create pre-deployment checklists or documentation to ensure this minimum logging. Probably you already have that. As the number of developers and/or packages increase, it becomes difficult to ensure consistency in anything, not just for event logging. Therefore documentation, checklists and training are helpful to an extent only. Your requirement could be more complex than my five-event example above and thus more prone to to oversight.
The easiest way to ensure a common logging implementation would be a logging template that has all the minimum event pre-selected. The developer should just need to apply that to the package.
Event Logging in SSIS with a Template
I assume that you are already familiar with the concepts of event logging in SSIS so this post is not going to be a beginners level introduction to event logging. I will rather discuss options to have a minimum standard event logging across SSIS packages and teams with minimal effort. I’ll also mention some traps to avoid.
I am using a demo SSIS package with two Data Flow Tasks and an Execute SQL Task. I have enabled event logging in SSIS for the first few events at the package level for the sake of demonstration. The logging configuration options for the package node (which is the top node) are shown in the first image.
Image 1 – Event logging configuration window for the package node
The logging options at the child container node Data Flow Task 1 are shown in the second image. The configuration for other Data Flow and the Execute SQL Task look the same.
Image 2 – Event logging configuration window at the child container node
The check marks for the tasks are grayed out which means they are inheriting the logging options from their parent, i.e. the package. To disable logging for a task, remove its check mark in the left tree view window.
TIP: Logging can also be disabled by going to the Control Flow canvas and changing the LoggingMode property of the task to Disabled.
The Trick
Now look at the bottom of the images again. Notice the Load… and Save… buttons? They do exactly what they say. You can set your logging options and save them as an XML template. Later, this XML template can be loaded into other packages to enable the same logging options.
The XML template file has nodes for each event. For example, the logging options for OnError event are saved like this –
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-<EventsFilterName="OnError">
-<Filter>
<Computer>true</Computer>
<Operator>true</Operator>
<SourceName>true</SourceName>
<SourceID>true</SourceID>
<ExecutionID>true</ExecutionID>
<MessageText>true</MessageText>
<DataBytes>true</DataBytes>
</Filter>
</EventsFilter>
Notice that the XML just mentions the event name, not the name of any task. This means that when the template file is loaded, this logging option will be set for any task where the event is applicable. More on this later.
The Traps
The OnError event is a generic event applicable to all tasks. Lets talk about events that are specific to tasks. For example, the BufferSizeTuning event is applicable just to the Data Flow Tasks, not Execute SQL Tasks.
When I proceed to set logging for BufferSizeTuning event, I have to set it individually in the Data Flow Task tree node. Notice the message at the bottom of the second image that says –
To enable unique logging options for this container, enable logging for it in the tree view.
This message is important in the context of saving and loading a template file too. When I save a template file, the logging options of just that tree view node are saved. For example, the BufferSizeTuning event will be saved in the template only if I am at the Data Flow task in the tree view. It will not be saved if I am at the Package or the Execute SQL task in the tree view.
The reverse is also true. When I load a template, its logging options are applied to just that node which I select in the tree view. For example, if I load a template at the Data Flow Task 1, the options will not be applied to the Data Flow Task 2 or the Execute SQL Task. If the template has an event that is not applicable to the task then that event’s settings will be ignored. For example, the BufferSizeTuning event logging option is meant for Data Flow Tasks so it will be ignored for the Execute SQL Task. The fact that non-relevant options are ignored can be helpful for us to consolidate all logging options in a single template file.
Conclusion
A package level Save and Load of a logging template is straight forward. But if you need to have logging for events that are specific to a task type, then consider creating a logging template for each type of task. Also, if your logging configuration requires anything else than the package level settings, remember to load the template for each task in the tree view.
Number of Template Files
How
Pros and Cons
Individual File per Task
Create one template file for each type of task. The file will have events applicable to that task.
Pros –
Easier to know what type of tasks have a template and which ones do not.Cons –
More files to manage.
Single File for All Tasks
Create a template file for each task. Then copy all event options in a single XML file.
Pros –
One file is easier to manage.Cons –
Not obvious which tasks are include. Need to put in comments in the XML file.
Do you trust your users to always pass range search parameters in the correct order? Or do the users trust that the code will take care of a small thing like parameter ordering?
If you have a stored procedure or a script that accepts two parameters to do a range search using a BETWEEN keyword (or maybe with a >= and <=) then it is important to verify that the start expression and end expression are in the correct order. The correct order for numeric values is smaller value before a larger value. The correct order for character values is the dictionary order. A user might pass on the expression values the wrong way around and see no results returned.
The following demonstration has a small check for the correct order of values and to do a quick reorder to fix any issues. To help things a bit more, instead of using generic variable names like @param1 and @param2, they can be made self-documenting by being descriptive like @begin_param and @end_param.
On the other hand, Yes, it could affect the output in any of the OUTER joins.
This happens because when a filter criteria is made a part of the JOIN condition, it is applied at the time of reading the tables to identify the rows for the JOIN. The rows that do not satisfy the filter criteria are not included for the purpose of the JOIN and that causes NULLs in the OUTER JOINs.
However, with the filter criteria a part of the WHERE clause, it is applied after the query has been processed and the adhoc dataset has been created internally. The final dataset is reduced to just those rows which meet the filter criteria.
I will use a criteria of c.isactiveYN = ‘Y’ for demonstration below. Rows with a value of ‘Y’ are used for the join. The rows with ‘N’ or NULLs don’t match the criteria so they are left out.
As the examples show below, this seemingly minor difference can cause substantial difference in the query output of OUTER JOINs. This effect could cascade on to other subsequent tables in bigger queries that are joined to the original tables.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Create demo tables and insert some data
declare@customer table
(id int,
cname varchar(20),
isactiveYN char(1))
declare@ordertable
(id int,
customerid int,
orderdate date)
insertinto@customer
values
(1, 'Bob', 'Y'),
(2, 'Harry', 'N'),
(3, 'Max', 'Y')
insertinto@order
values
(1, 1, '2014-1-1'),
(2, 1, '2014-2-2'),
(3, 2, '2014-3-3'),
(4, 2, '2014-4-4'),
(5, 3, '2014-5-5'),
(6, 3, '2014-6-6')
select* from@customer
select* from@order
Results :
Demonstration: The location of criteria has no effect on the results in INNER JOIN.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- A simple INNER JOIN without any
-- conditions to display all data:
-- INNER JOIN with out any conditions
select*
from
@customer asc
innerjoin
@orderaso
onc.id = o.customerid
-- The following two queries differ
-- in the location of the criteria
-- but still show similar results
-- INNER JOIN with filter in WHERE clause
select*
from
@customer asc
innerjoin
@orderaso
onc.id = o.customerid
where
c.isactiveYN = 'Y'
-- INNER JOIN with filter in ON clause
select*
from
@customer asc
innerjoin
@orderaso
onc.id = o.customerid
andc.isactiveYN = 'Y'
Results : The second and third results are similar.
Demonstration: LEFT OUTER joins with the condition in the WHERE clause and the ON clause
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- LEFT JOIN: Filter in WHERE clause
select*
from
@customer asc
leftouterjoin
@orderaso
onc.id = o.customerid
where
c.isactiveYN = 'Y'
-- LEFT JOIN: Filter in ON clause
select*
from
@customer asc
leftouterjoin
@orderaso
onc.id = o.customerid
andc.isactiveYN = 'Y'
Results : “Harry” is not in the first result-set because the row was filtered out by the WHERE clause. “Harry” is in the second result-set because it comes from the LEFT OUTER table but there is no matching row because it was not used for the JOIN.
Demonstration: RIGHT OUTER joins with the condition in the WHERE clause and the ON clause
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- RIGHT JOIN: Filter in WHERE clause
select*
from
@customer asc
rightouterjoin
@orderaso
onc.id = o.customerid
where
c.isactiveYN = 'Y'
-- RIGHT JOIN: Filter in ON clause
select*
from
@customer asc
rightouterjoin
@orderaso
onc.id = o.customerid
andc.isactiveYN = 'Y'
Results : “Harry” is not in the first result-set because the row was filtered out by the WHERE clause. “Harry” is not in the second result-set because it has been filtered out and it is also not in the OUTER table.
Summary: When the criteria is in the WHERE clause, The results of the INNER, LEFT OUTER and the RIGHT OUTER queries are the same. Differences show up in the results of LEFT and RIGHT OUTER queries when the criteria is in the ON clause.
If you want to assign some default value to a parameter of type VARCHAR in a stored procedure and the parameter value does not have any spaces then the quotes around the value are optional.
1
2
3
4
5
6
7
8
9
10
createprocedureTestProc
(
@a varchar(20) = 'spaces in string', --must use quotes for string with spaces
@b varchar(20) = 'nospacesinstring',
@c varchar(20) = nospacesinstring --quotes are optional if no spaces in string
The ISNUMERIC function (MSDN documentation) is used to validate if a string is a valid number. In the demonstration that follows, you would expect the results from 1 to 5 but look closely at the results 6, 7 and 8. There are multiple commas in the string and the last one is not even a number!
Well, a note at the MSDN documentation explains the True result for the ‘$’ sign at Id 8 –
ISNUMERIC returns 1 for some characters that are not numbers, such as plus (+), minus (-), and valid currency symbols such as the dollar sign ($). For a complete list of currency symbols, see Using Monetary Data.
For the strings at Id 6 and 7 with commas, the function behaves that if the string is convertible to a number then the result will be True. Something to remember while writing those data validation routines.
Some minor omissions can go unnoticed and produce completely unexpected results.
The following code example has two seemingly similar SELECT statements but they produce different results. All because I missed out a comma in the second SELECT statement! SQL Server did not raise any error and quietly assumed that I have tried to use an alias name for the First Name column!
If you try to put an empty string in an INTEGER type (or its cousin) columns, SQL Server converts it implicitly into a zero. This might not be the expected behavior for the application so it is good to be aware.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
declare@v char(1) = ''-- this could have been a varchar too with the same results
declare@t table
(
colbigint bigint,
colint int,
colsmallint smallint,
coltinyint tinyint
)
/* The wrong way that will change all blank strings to zero */