Search Suggest

Slowly Changing Dimension Alternatives

Case
The Slowly Changing Dimension component works fine for small dimensions, but performance is very poor for slightly bigger dimensions. Is there an alternative?

Solution
There is a request on the Microsoft site to improve SCD in SQL 2012 which you can support, but intil then... some alternatives:

A) There is an open source project named SSIS Dimension Merge CSD Component which performs better (100x faster according the developers).

B) There is a third party table difference component that does the same trick, but faster. Downside is the yearly $400,-. An other (free) third party component is the Konesans Checksum Transformation which you can use to accomplish a SCD. And we made our own Checksum Transformation.

C) There is the TSQL Merge statement which has been added to Sql Server 2008. It Performs insert, update, or delete operations on a target table based on the results of a join with a source table.

D) And you can use the standard ssis components (lookups) to accomplish a faster SCD.

Lets elaborate option D to see the difference between the standard Slowly Changing Dimension Component and the lookups. Case: 15481 clients in a dimension table. Process same clients again with 100 new clients and 128 changed.

Slowly Changing Dimension Component solution
The standard solution with the Slowly Changing Dimension component looks something like this and takes about 25 seconds on average to complete on my laptop.
Slowly Changing Dimension component













Lookups solution
The solution with two lookups looks something like this and takes less than 1,5 seconds on average to complete. The first lookup only compares the business key. If the business key from the source doesn't exist in the dimension table, then it's a new record/client. All other records go to the second lookup which compares all attributes (incl the business key) to the dimension table. When no match is found, the client has been updated.
Two lookups



















This performs very well if there are not too many updates. If I change the number of updates from 128 to 15422 records, the whole proces takes around two minutes with this method, which is only slightly faster than the standard SCD component.

Alternative for lot's of updates
An other alternative is to change the Update Clients to an insert statement which inserts all updated records into a second table. A lot of single update statements slow down the process. A fastload insert takes only a fragment of the time.
Change update to insert (into an other table)






















After that you can use a batch-update to update all changed clients.
Execute SQL Statement




















This alternative takes only six seconds in total. The batch-update query looks something like this.
--Batch update
UPDATE [Dim_Client]
SET [Dim_Client].Title = [Dim_Client2].Title,
[Dim_Client].FirstName = [Dim_Client2].FirstName,
[Dim_Client].LastName = [Dim_Client2].LastName
FROM [Dim_Client], [Dim_Client2]
WHERE [Dim_Client].BusinessKey = [Dim_Client2].BusinessKey

So there are alternatives for the standard SCD component. Which is best? That depends on the situation (are third party/open source dll's alowed on the production servers, how many updates do you expect, etc.). Let me know if you have an other alternative...

* Update: TSQL Merge added (suggested by Koen)*

Post a Comment