A client of mine had a fact table with a lot of unknown dimension values because the source had a lot of garbage. He built a construction to log all facts with unknown dimensions (id = -1) to a log table with the original lookup value so he could make reports of all garbage values to correct the source. This is a simplified version of the construction:
Conditional Split, Multicast and Union All |
This works fine, but there is a more elegant solution with a script component. Let's call it the Conditional Multicast script. I stole this idea from SSIS Junkie.
Solution
1) Remove the Conditional Split, Multicast and Union All. Add a Script Component instead.
Script component (transformation) |
2) Select all dimensionkeys as ReadOnly input columns.
ReadOnly Input Columns |
3) Rename the default Output port to "All" and change the ExclusionGroup to a non-zero value (let's say: 1).
Change default output port |
4) Now add a second Output port with the name "Unknown". Change the ExclusionGroup to the same value as in the previous step ("1") and synchronise it with the Input port.
Second output port |
5) Now the script (I used C# in this example). SSIS will generate some default methods but we only need Input0_ProcessInputRow for this solution. Create an If-statement to check if there are any unknown dimensions and redirect them to the second port (Unknown). After the If-statement redirect all facts to de default output port (All).
// This script redirects all fact with unknown dimensions to an other target
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
// Method that will be started for each record in you dataflow
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// Check if there are unknown dimensions
if (Row.TimeKey == -1 || Row.LocationKey == -1 || Row.ColorKey == -1)
{
// Redirect all facts with unknown dimensions
Row.DirectRowToUnknown();
}
// Direct all facts (including the unknown) to the fact table
Row.DirectRowToAll();
}
}
6) Now reconnect the targets and run the package to see the result:
More elegant solution |
Note: you don't have to do this for the performance improvement (because there hardly is any), but do it for the simplicity.