Friday 26 February 2010

Quindecillion

Am currently using a SHA-1 hash generating function to generate unique indexes over large text columns.
The NVARCHAR(1024) columns I am encoding could cost as much as 2048 bytes each to store before compression (as they’re UNICODE).
The idea is to reduce my storage and to be able to index large text columns using HASHBYTES. The function allows me to use VARBINARY(20) to hold the Hash calculation. 20 bytes being a lot more friendly than 2048.

Links :
1) SHA Hash functions
2) MSSQLTips.com : Unique constraints for large text columns (using hashbytes)

To quote that article,

The odds of a duplicate hash value being generated are 1 in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00
1 in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 is ‘one in a quindecillion’.

Here’s how I found out this amazing fact...
Really Big Numbers

(you can all go back to sleep now...)

Tuesday 23 February 2010

SSIS Contol Flow - For Loop with Counter

CXPACKET delays & the MAXDOP query hint

Running a large aggregate query today , Activity Monitor filled with suspended sessions with a WAITTYPE of CXPACKET.

CXPACKET means parallelism issues i.e the load has been shared between processors/cores and they're waiting on each other to finish.

The query in question takes a results set of 500,000 rows and aggregates it by four grouping columns into a smaller set of 46,000 records.

To solve this I added the query hint OPTION (MAXDOP n) to the end of a query (where n is number of processors).
Trying the query on it's own generated the CXPACKET wait types, as did any value of MAXDOP over 1.
Specifiying MAXDOP 1 let the query finish in 3 minutes without interuption.

SELECT columnlist...
FROM table1
INNER JOIN table2 ON table1.fk = table2.pk
WHERE column1 = criteria1
AND column2 = criteria2
GROUP BY
[group by column list]
ORDER BY
[order by column list]
OPTION (MAXDOP 1)

NB : MAXDOP is NOT a magic setting to be applied everywhere. Most queries will happily generate a parallel execution plan. Only use this to reduce the number of cores used if you have a query that will not scale or deliberately want to prevent an operation taking resources on the server.

Links :
Server level Max Degree of Parallelism setting
MSDN : Query Hints

Friday 19 February 2010

Remove Splash screen from Visual Studio Startup

If you want a few seconds back every day, add this /NOSPLASH switch to your shortcut...

TRY / CATCH with rollback

A simple template for a TRY/CATCH block implementing a rollback if the statements fail.
BEGIN TRY
  BEGIN TRANSACTION     
  -- DELETE STATEMENT  
  -- INSERT STATEMENT
  COMMIT     
END TRY

BEGIN CATCH
  IF @@TRANCOUNT > 0
     ROLLBACK
END CATCH

Links :
Four guys from rolla
Solace : Try/Catch Error Detection

Thursday 18 February 2010

SQL 2008 : Adding DATE and TIME back to DATETIME

SQL 2008 has DATE and TIME as separate data types at last.
But what if you need to fit them back together as a DATETIME?

SELECT CAST(CAST(mydate AS DATE) AS DATETIME) + CAST(mytime AS TIME)
FROM  [dbo].[mytable]

I'm sure there are other ways, I'd be interested to hear them, and any performance differences they provide!

Ref :
http://stackoverflow.com/questions/700619/combine-the-date-in-one-field-with-time-of-another-field-ms-sql-server

Stress / Life Balance

Some interesting SqlServerCentral articles on the DBA role.
They require registration, but if you're checking this blog you're a database guy/girl and are already registered right?

r

Monday 15 February 2010

OUTPUT clause on an UPDATE statement

Using the OUTPUT clause on an UPDATE statement -

The virtual DELETED table brings back the data values from before the update.
The virtual INSERTED table brings back the data from after the update.

Example :
UPDATE TOP (1) [Import].[ProcessQueue] SET ProcessId = 999
OUTPUT Inserted.QueueId, inserted.QueueName
WHERE QueueBusy = 0

References :
http://blogs.msdn.com/sqltips/archive/2005/06/13/OUTPUT_clause.aspx
http://blogs.mssqltips.com/blogs/chadboyd/archive/2007/10/31/katmai-sql-2008-consume-output-from-output-directly.aspx

Friday 12 February 2010

Adding an 'All' option to a SSRS parameter dropdown

Adding 'All' to a query to use in a parameter list -
SELECT * FROM (SELECT NULL AS DepartmentID, '' as Name) AllDepartments
UNION
SELECT DepartmentID, Name FROM HumanResources.Department ORDER BY Name

Use OR in the WHERE clause as follows to use te passed parameter -
ELECT DepartmentID, Name FROM HumanResources.Department 
WHERE (DepartmentID IS NULL) 
OR (DepartmentID = @DepartmentID)

Sunday 7 February 2010

Hadoop Video

I can't pretend it's a direction I can see myself taking, but I was sent this video about Hadoop by a Microsoft hating (linux loving) colleague today...

Hadoop video from Cloudera

Saturday 6 February 2010

SSIS : ADO.NET Stored Procedure Performance

Andy Leonard's article 'SSIS 101: Object Variables, ResultSets, and Foreach Loop Containers' has been updated. One of the new parts shows a considerable performance improvement when using an ADO.NET connection and setting IsQueryStoredProcedure to true (when using SPs).

Friday 5 February 2010

Notes : Non-yielding IOCP Listener error

Following a colleague running a heavy query via a linked server and killing the process on the remote machine, we've had SQL becoming non reponsive and I had to restart the server (the services individually failed to restart).

The error is 'Non-yielding IOCP Listener error' and followed some memory issues.

I'm following recommendations in these links and am implementing the changing both the Min and Max SQL memory settings to the same (2GB less than the machine capacity).

http://blogs.msdn.com/grahamk/archive/2008/09/11/non-yielding-iocp-listener-stack-dump-analysis.aspx

http://204.9.76.233/Community/forums/p/24993/139079.aspx#139079

(Admittedly I should have done this years ago but with 32GB to play with on the box, I've never come close to investigating any memory problems.)

Tuesday 2 February 2010

Bookmark : 13 Things you should know about statistics and the query optimizer

13 Things you should know about statistics and the query optimizer

This article covers >

1. Selectivity
2. Density
3. Cardinality
4. SARG
5. Foldable Expressions
6. Non-foldable Expressions
7. What Happens When the Query Optimizer Can’t Estimate the Cardinality of a SARG?
8. Contradiction Detections
9. Foreign Keys
10. Check Constraints
11. Non-Updating Updates
12. RowModCtr
13. When is the Auto-Update to Statistics Triggered?

I know I'll be reading it several times! (when I get round to it...)

Comments in Profiler

I've been having fun debugging a large web application.
Because I havent touched PHP in 8 years (and it's not strictly my job), I need some help from the developers to see what is going on. It's also a large application that has grown into a monster to maintain codewise.

I've proposed therefore, the following change to code.

Each call to SQL is prefixed by a SQL comment block, encasing some common features >

/*
Page : index.php
Function : Menu Drop Downs
Logged In : richard.doering
*/
SELECT MenuItem, MenuLink FROM application.userMenus ORDER BY MenuOrder

The comment will be ignored by the query parser but will still be fully visible in Profiler :)

Hopefully we can get this into a library so a single function passes SQL statements through, but that may be a little way off....