Monday, July 2, 2012

You're going to do what?!?

Today I spent most of my time on documentation and preparation for the new data warehouse.  It's hard to not skip right over the planning and commence building it.  However, I'm certain the outcome will be remarkably better if we spend a lot of time up front on design.

The consultant we had on-site last week, Dave Rodabaugh with Atlas Analytics, was an excellent resource and really helped us redefine the way we viewed the data warehouse.  Our original methodology was, what we thought, a great way to approach the situation.  We were planning on building the OLTP database in a very relational and normalized form.  After we were satisfied with its design, and had it fully populated, we were going to move the data into a de-normalized schema for use with SSAS and to create and populate the cube.  Sounds great, but as it turns out, that rarely works the first time.  Dave's suggestion?  Build it "Long and Thin" and "Reduce".

So what do those terms mean?  For the short amount of time we were able to spend with Mr. Rodabaugh, it was clear that he had an almost religious conviction about his philosophies on data warehousing.  Not to the extent that he was dictating exactly what must be done, but really hammering in some best practices that he found very successful in DW. I'll probably speak of more of his philosophies later, but I think it would help at this stage to expound on what he meant by "Long and Thin" and "Reduce".

"Long and Thin" refers to end to end data development.  If you are familiar with ETL and DW, then it's pretty obvious we're pulling data from multiple sources, massaging the data or, as I've heard it called, "Munging", and serving it up via SSAS or some other multi-dimensional reporting tool.  The idea is to choose a small number ("Thin") of measures and dimensions from the source and move them all of the way ("Long") to the reporting tool.  By building the staging structures, the OLTP, which we are calling the ODS(not pronounced "Odious"), the UDM, and the cube itself, we are able to guage performance, maintainability, and complexity without investing an overwhelming amount of time in any one area.  We'll continue to lay "Long and Thin" layers on top of one another, encountering things we like and dis-like, and fixing previous erroneous methodologies and layers along the way.

"Reduce" refers to looking at and working with the data in smaller pieces.  If you can break a process down into smaller pieces, do it.  Not in excess.  We don't need a pile of crumbs to maintain.  But if an SSIS package is doing many different or potentially unrelated tasks, break it into more than one package.  Create smaller, more maintainable packages that have predictable performance.  More on predictable performance later.

It turns out it is difficult to do "Long and Thin" and "Reduce" if you are used to trying to merge SSIS packages into each other.  It's easy to think "Big Picture" when you are facing an overwhelmingly large DW project.  We found ourselves adding more and more to our first iteration of "Long and Thin" and having to rein ourselves in.  I think we're starting to get it, though.  Hopefully at some point we'll stop having to remind each other, but we're not there yet.

I get my Ralph Kimball book tomorrow.  I'm looking forward to diving into it tomorrow night.  It will be a good read during my July 4th holiday.

ETL,
Josh



No comments:

Post a Comment