Last week David and I spent about 2 and a half hours discussing how we wanted to set up the infrastructure of our Data warehouse. We are only on our second iteration, and figured the earlier we could get the structure in place, the better. Part of that infrastructure will be the capturing of row counts and transfer speeds in order to determine our baselines. For example, after this iteration, we should know exactly how fast our data warehouse processes will ever be, because right now, those processes are at their simplest and smallest forms.
The way our old system is setup is as follows:
SSIS parent packages are called by SQL Server Agent Jobs. Any configuration settings we want applied to those packages are configured through the SQL Job's dialogue box which allows command line modifications, setting values, overriding connection strings, etc. Most of the child packages, however, have hard coded connection strings and little if any inherited values from their parents. That was fine when we were developing and deploying onto the same production server. Allowing those packages to be transferred to other servers, however, was a bit of a fiasco.
Dave Rodabaugh suggested using xml configuration files, so David and I did a bit of research. Microsoft's "Understanding Integration Services Package Configurations" page shows up on Google as the number 2 page to look at. It provides a good bit of information, but unfortunately, it didn't seem comprehensive enough, in my opinion. I would have liked to see more limitations, comparisons, etc. of the different types. We aren't against the use of xml configurations, however, we have limited access to the actual server and it's file system, and thought it would be difficult to deploy changes via a third party. I know. At this point you are probably thinking, "Wait a second. You are building a data warehouse, but you have limited access to the machines you are building it on? What does that even mean?". We can access SQL Server, SSAS, and SSIS through SSMS, but we can't RDP into the box itself. This makes troubleshooting hardware a bit of a problem, but we work with what we have. It doesn't make sense, but It is what it is.
So the only good alternative to xml configurations is the SQL Server configuration option. We wanted one location to manage all of the configurations for every SSIS package that would run. We thought this option would cater to that, until we were configuring our first SSIS package and realized we had to specify a Configuration Filter at Design Time. It doesn't appear to be an option that can inherit from its parent, so we were beginning to lose our optimism. We found out if we wanted to deploy to development, or to production, we would have to specify that filter to be applied at design time, and that just doesn't work. This is the type of thing I would have expected Microsoft to include in their documentation. Perhaps if we were going to have 1 or 2 SSIS packages we could manage, but with the 10-20 we expect, it's a no-go. By the way, below is the Syntax for the configuration table. You can also find it at the link mentioned above.
CREATE TABLE [dbo].[SSIS Configurations]
ConfigurationFilter NVARCHAR(255) NOT NULL,
ConfiguredValue NVARCHAR(255) NULL,
PackagePath NVARCHAR(255) NOT NULL,
ConfiguredValueType NVARCHAR(20) NOT NULL
So we decided to take the 2nd best path and have one of these configuration tables on our production server, and one on our development server. The configuration filter would actually be the same for both(we don't want to have to touch our ssis packages when deploying to different servers), and the connection string for the table's Database can be inherited from parent packages. This table and other configuration settings will be included and segregated into their own etl databases, which we are managing with Visual Studio's Database Project. You can see David's post on Database Projects if you are interested in reading a bit more on it. We've found this VS project type exceptionally useful. Exceptional may be used a bit strongly here. It has it's pros and cons, and perhaps one of us will write a bit about our experiences with it later on down the road.
Once we've implemented our iteration 2, I'll write more on our findings, what worked and what didn't, and perhaps help you if you stumble upon this blog early in your datawarehouse design.
Below are some links you may find helpful:
I've included these links after the initial posting of this blog to assist you in determining order configurations are applied, etc..
Defining a Configuration Approach for Integration Services Packages
Best Practices for Integration Services Configurations