Wednesday, October 10, 2012

SSIS log reporting and benchmarking


We've been using SSIS packages for quite some time and have always setup the packages to use the embedded SSIS Logging.  If you haven't noticed, SSIS Logging can get pretty verbose, so I decided to write some queries and a sproc to display package component execution times over time.  It is by no means the end all solution to SSIS Log reporting, and it does have some short-comings which I will mention during my documentation.  But overall, I've found it to be helpful, and I would encourage any readers to submit their own SSIS log reporting methodologies.

Before I start, I will mention a tool I happened upon a couple of days ago, SSIS Log Analyzer at codeplex.com.  I downloaded it on Monday, installed it, and then tried running it.  It ran on my laptop, but it seemed slow and unresponsive.  The reporting capability it provides is worth further investigation.  It may have been my network connection or the database I was trying to report off of, but it certainly deserves an honorable mention when discussing SSIS log reporting.

Our Setup:
We log to the default SSIS Log table: dbo.sysssislog.  I've listed the structure below in Figure 1.0.
















Figure 1.0


To minimize SSIS Logging's verbosity(is that a word), we only log certain events:
OnError, OnPostExecute, OnPreExecute, OnTaskFailed, and OnWarning.  See Figure 1.1 for a screenshot of the SSIS Logs Event Details.








































Figure 1.1
Every package has the same events logged to the same table in the same database, so we can easily, I suppose that is a relative term, see what our packages are doing at any given time.  If you've ever looked at the table mentioned in Figure 1.0, you will probably agree that it is not very helpful in and of itself.
Hence, this article...

Basic Querying of the sysssislog table:
If you have a lot of packages, or a lot of components, or a lot of both, I recommend you trim down your result set a bit with dates.  A common query I run if I'm watching a specific package or viewing what is current is as follows:

SELECT event,computer,operator,source,executionid,starttime,endtime,message
FROM dbo.sysssislog
 WHERE starttime >= Dateadd(d, -1, Getdate())
 ORDER BY starttime DESC




I'm not going to explain the query, if you don't know what it is doing, you are probably on the wrong blog.

Querying the sysssislog table for durations:
We use the PreExecute and PostExecute events as a way of determining execution time.  The following query displays duration in seconds for all packages that have started since "DATEADD(d,-1,GETDATE())" or more specifically "Yesterday 00:00:01 and later".  NOTE:  This also differentiates between multiple executions of the same package.  So if a package runs more than once, you will see multiple entries in the resultset of this query.

SELECT pre.starttime
      ,Cast(Datepart(YY, pre.starttime) AS VARCHAR(4)) + '-' + Cast(Datepart(MM, pre.starttime) AS VARCHAR(2)) + '-' + Cast(Datepart(DD, pre.starttime) AS VARCHAR(2)) AS dtm
      ,pre.source
      ,Datediff(s,pre.starttime,post.endtime) AS duration
FROM       (SELECT starttime
                   ,endtime
                   ,executionid
                   ,source
            FROM   dbo.sysssislog
            WHERE  [event] = 'OnPreExecute') pre
INNER JOIN (SELECT starttime
                   ,endtime
                   ,executionid
                   ,source
            FROM   dbo.sysssislog
            WHERE  [event] = 'OnPostExecute') post
ON pre.executionid = post.executionid
  AND pre.source = post.source
WHERE pre.starttime > Dateadd(d, -1, Getdate())

This is great, but I'm really interested in seeing total execution times regardless of the number of times they were executed.  Emphasis on that previous statement, keep in mind the next query is duration regardless of the number of times executed.

SELECT dtm,source,Sum(duration) duration
FROM  (SELECT pre.starttime
             ,Cast(Datepart(YY, pre.starttime) AS VARCHAR(4)) + '-' + Cast(Datepart(MM, pre.starttime) AS VARCHAR(2)) + '-' + Cast(Datepart(DD, pre.starttime) AS VARCHAR(2)) AS dtm
             ,pre.source
             ,Datediff(s, pre.starttime, post.endtime) AS duration
       FROM       (SELECT starttime
                          ,endtime
                          ,executionid
                          ,source
                   FROM   dbo.sysssislog
                   WHERE  [event] = 'OnPreExecute') pre
       INNER JOIN (SELECT starttime
                          ,endtime
                          ,executionid
                          ,source
                   FROM   dbo.sysssislog
                   WHERE  [event] = 'OnPostExecute') post
          ON pre.executionid = post.executionid
             AND pre.source = post.source
       WHERE      pre.starttime > DATEADD(d,-1,GETDATE())) details
GROUP  BY dtm ,source
ORDER  BY dtm ,source


Notice I'm ordering by start date and source, so I'll have an entry per day of each package.

That was all well and good, but it will return a very tall resultset if there are a lot of packages and a lot of days to report on.  Without pulling it into excel, modifying it into a pivot query, or doing some CASE WHEN statements, I still can't see what my packages are doing over time.

Querying the sysssislog table for durations over time:
I created a stored procedure for this which I have included below.  The sproc accepts a start date and end date, and basically stores the results of the above query to a temporary table.  It then builds dynamic SQL to create a column for each day between my start and end date.  Finally it executes the dynamic SQL and provides a nice display of my SSIS package durations over time.

CREATE PROCEDURE [audit].[get_ssis_durations]
   @debug     CHAR(1) = 'N',
   @start_dtm SMALLDATETIME,
   @end_dtm   SMALLDATETIME
AS
   BEGIN
      SET XACT_ABORT ON

      IF @debug = 'Y'
         SET NOCOUNT OFF ELSE

      SET NOCOUNT ON
      IF @debug = 'Y'
         BEGIN
            EXEC debug.Logmsg 'BEGIN PROCEDURE audit.get_ssis_durations'

            EXEC debug.Logval
              
'@start_dtm',
               @start_dtm

            EXEC debug.Logval
              
'@end_dtm',
               @end_dtm

            EXEC debug.Logmsg 'CREATING TABLE #SSIS_DETAILS...'
         END

      CREATE TABLE #SSIS_DETAILS
         (
             dtm       DATETIME
             ,source   VARCHAR(200)
             ,duration BIGINT
         )

      IF @debug = 'Y'
         BEGIN
            EXEC debug.Logmsg 'Inserting into #SSIS_DETAILS...'
         END

      --Get the details into a temporary table
      INSERT INTO #SSIS_DETAILS
      SELECT dtm
             ,source
             ,CAST(Sum(duration) AS BIGINT) duration
      FROM   (SELECT Cast(Datepart(YY, pre.starttime) AS VARCHAR(4)) + '-' + Cast(Datepart(MM, pre.starttime) AS VARCHAR(2)) + '-' + Cast(Datepart(DD, pre.starttime) AS VARCHAR(2)) AS dtm
                    ,pre.source
                    ,CAST(Datediff(s, pre.starttime, post.endtime) AS BIGINT) AS duration
              FROM       (SELECT starttime
                                 ,endtime
                                 ,executionid
                                 ,source
                          FROM   dbo.sysssislog
                          WHERE  [event] = 'OnPreExecute') pre
              INNER JOIN (SELECT starttime
                                 ,endtime
                                 ,executionid
                                 ,source
                          FROM   dbo.sysssislog
                          WHERE  [event] = 'OnPostExecute') post
                 ON pre.executionid = post.executionid
                    AND pre.source = post.source
              WHERE      pre.starttime >= @start_dtm) details
      GROUP  BY
         dtm
         ,source
      ORDER  BY
         dtm

      IF @debug = 'Y'
         BEGIN
            EXEC debug.Logmsg 'GENERATING dynamic SQL...'
         END

      --Setup my dynamic SQL
      DECLARE @sql VARCHAR(MAX) = 'SELECT DISTINCT source AS component'
      DECLARE @dayAdd INT = 0;
      DECLARE @strDate VARCHAR(20);

      --Loop through each day and build my sql statement.
      WHILE Dateadd(d, @dayAdd, @start_dtm) <= @end_dtm
         BEGIN
            SELECT @strDate = Cast(Datepart(m, Dateadd(d, @dayAdd, @start_dtm)) AS VARCHAR(2))

            SELECT @strDate = @strDate + '/' + Cast(Datepart(d, Dateadd(d, @dayAdd, @start_dtm)) AS VARCHAR(2))
            SELECT @strDate = @strDate + '/' + Cast(Datepart(yyyy, Dateadd(d, @dayAdd, @start_dtm)) AS VARCHAR(4))
            SELECT @sql = @sql + ',(SELECT duration  FROM #SSIS_DETAILS i WHERE i.dtm = ''' + @strDate + ''' and i.source = o.source) [' + @strDate + ']' + Char(13)
            SELECT @dayAdd = @dayAdd + 1;
         END

      SELECT @sql = @sql + ' FROM #SSIS_DETAILS o ORDER BY component'
      --Display the sql I'll be executing
      PRINT @sql

      IF @debug = 'Y'
         BEGIN
            EXEC debug.Logmsg 'EXECUTING dynamic SQL...'
         END

      --Execute the query
      EXEC (@sql)

      IF @debug = 'Y'
         BEGIN
            EXEC debug.Logmsg 'DROPPING #SSIS_DETAILS...'
         END

      --Drop the temp table
      DROP TABLE #SSIS_DETAILS

      IF @debug = 'Y'
         BEGIN
            EXEC debug.Logmsg 'END PROCEDURE audit.get_ssis_durations'
         END
   END


Apologies for the poor formatting.  Unfortunately Blogger doesn't handle formatted SQL directly from SSMS 2012 very well.  Pasting the above queries into SSMS and running the Beautifier formatting tool against them is a small price to pay.  These queries are relatively new, and I certainly invite any feedback or criticism you may have.
over time I'll have benchmark data on all of our packages so I can see how changes to the database or to SSIS packages affect performance.

Regards,
Josh