Webinar Q&A: SSIS Design Patterns for Loading a Data Warehouse
I’m not going to sugar coat it. I’ve been a bit of a slacker lately. Unfortunately I didn’t get the Q&A for my two webinars up the week I did the presentations. So here is part 1 and the SSAS post will be separate. To start with, thank you for joining if you did attend the webinar. All of us at Pragmatic Works that do the webinars really do it because we enjoy giving back to the community. If you are looking for the recording you can find it here.
[important]The code I used during the webinar including the completed SSIS packages can be downloaded here.[/important]
A few notes about the files included:
- In order to demo the solution a little better I removed some columns from the DimEmployee table inside AdventureWorksDW. There is a new copy of the table in the zip file. See note number 4 below.
- SSIS Project is in the “DW Loading” folder. One package has the dimension load and the other is the fact table load.
- AdventureWorksAssets.sql has all the source data I used in the session. This script will create the source data tables as well as populate them with the necessary data.
- AdventureWorksDW2012Assets.sql has the new schema for the DimEmployee and FactSalesQuota tables. You are going to need to drop or rename the existing versions of the tables. They will be created empty as the packages will load them. You can always restore the AdventureWorksDW2012 database if you want to get the original schema back. The other thing that is in this script is the definition for the necessary stage tables, again created empty.
Below are the questions I got during the webinar.
Q: If the SCD wizard is accessing the data warehouse why do we need an OLEDB source?
A: The SCD wizard compares and writes data to the data warehouse only it does not pull data from your source system. The OLEDB source is used to pull in all the data from the transactional systems that feed the warehouse.
Q: Can we use CDC instead of the Slowly Changing Dimension (SCD) wizard?
A: Absolutely! I prefer to use CDC when possible as it already identifies the updates and inserts for me and makes the SSIS package simplified. If you are using SQL Server 2012 there are some really awesome new CDC components that will help with this. I may try to do this as my next webinar topic in January, if not I will certainly add it to the list for next year.
Q: What happens when a row has both type 1 and type 2 updates?
A: If a record has both it should be processed as a type 2 otherwise we will lose the history of the change. The SCD wizard will take this into account on its own but this is important to consider when setting up your own components. We used the conditional split to identify if the record was unchanged, type 1 or type 2. The key to remember with the conditional split is the record will go through the first output it matches. Therefore be sure to make the Type 2 condition listed before the Type 1 condition in the conditional split transform. There are arrows on the right side of the screen to move the conditions up and down.
Q: What about using T-SQL Merge?
A: I have use this method in the past and I like it as an alternative to SSIS. Some places just aren’t SSIS shops and can’t support a large warehouse load process that is heavy in SSIS development. As with everything be sure to test the performance and make sure it meets your needs. In order for this to work all source data will need to be staged into a table on the same server as the warehouse. That is an extra step to land the data but it’s usually not a big deal. I’ve found that Merge works well on some fairly large datasets but depending on the amount of memory on the server it will hit a brick wall in performance, so as I mentioned: test, test, test. I have used it on the past with tables in the hundreds of millions of records on very large servers and it works just fine but the same query on a development server with only a few gigs of memory has been painfully slow.
Q: What is the difference between fast load and non-fast load in the OLEDB destination?
A: The regular Table or View method does row by row inserts. Table or View Fast Load does inserts in bulk.
Q: How do you remove the inferred member from the dimension when the record shows up?
A: You won’t remove the inferred member record that was created, you will simply update it with all the actual values when the real record comes through. If you use a column that indicates the record is inferred be sure to switch that to false when making the update.
Be sure to check the follow up about building your first SSAS cube.