Showing posts with label SSIS 2008. Show all posts
Showing posts with label SSIS 2008. Show all posts
Case
I have a CSV file with numeric values that use a dot "." as decimal separator instead of the comma "," we use locally. When I try to import it in SSIS with a Flat File Source it gives me an error. I don't want to/can't change the regional settings on the server. How do I import this flat file without errors?
The value could not be converted because of a potential loss of data
10.5 should be 10,5 (or vice versa)






















Error: 0xC02020A1 at DFT - Process Data, FF_SRC - myCsvFile [2]: Data conversion failed. The data conversion for column "myColumn" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
Error: 0xC0209029 at DFT - Process Data, FF_SRC - myCsvFile [2]: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR.  The "FF_SRC - myCsvFile.Outputs[Flat File Source Output].Columns[myColumn]" failed because error code 0xC0209084 occurred, and the error row disposition on "FF_SRC - myCsvFile.Outputs[Flat File Source Output].Columns[myColumn]" specifies failure on error. An error occurred on the specified object of the specified component.  There may be error messages posted before this with more information about the failure.
Error: 0xC0202092 at DFT - Process Data, FF_SRC - myCsvFile [2]: An error occurred while processing file "D:\myFolder\2016-12-27.csv" on data row 2.
Error: 0xC0047038 at DFT - Process Data, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.  The PrimeOutput method on FF_SRC - myCsvFile returned error code 0xC0202092.  The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.  There may be error messages posted before this with more information about the failure.


Solution
You can change the LocaleID of the connection manager to import this file. Right click the connection managers and choose Properties...
Go to properties of flat file connection manager




















Then locate the LocaleID property and change it to English (United States), English (United Kingdom) or an other country that uses a dot "." as decimal separator. Or change it to for example Dutch (Netherlands) if you have the opposite problem.
Change LocaleID
















Now run the package again to see the result.
Success











In 2006 Jamie Thomson came up with naming conventions for SSIS tasks and data flow components. These naming conventions make your packages and logs more readable. Five SQL Server versions and a decade later a couple of tasks and components were deprecated, but there were also a lot of new tasks and components introduced by Microsoft.

Together with Koen Verbeeck (B|T) and André Kamman (B|T) we extended the existing list with almost 40 tasks/components and created a PowerShell Script that should make it easier to check/force the naming conventions. This PowerShell script will soon be published at GitHub as a PowerShell module. But for now you can download and test the fully working proof of concept script. Download both ps1 files and the CSV file. Then open "naming conventions v4.ps1" and change the parameters before executing it. The script works with local packages because you can't read individual package from the catalog, but you can use a powershell script to download your packages from the catalog.
PowerShell Naming Conventions Checker
























Task name Prefix Type New
For Loop Container FLC Container
Foreach Loop Container FELC Container
Sequence Container SEQC Container
ActiveX Script AXS Task
Analysis Services Execute DDL Task ASE Task
Analysis Services Processing Task ASP Task
Azure Blob Download Task ADT Task *
Azure Blob Upload Task AUT Task *
Azure HDInsight Create Cluster Task ACCT Task *
Azure HDInsight Delete Cluster Task ACDT Task *
Azure HDInsight Hive Task AHT Task *
Azure HDInsight Pig Task APT Task *
Back Up Database Task BACKUP Task *
Bulk Insert Task BLK Task
CDC Control Task CDC Task *
Check Database Integrity Task CHECKDB Task *
Data Flow Task DFT Task
Data Mining Query Task DMQ Task
Data Profiling Task DPT Task *
Execute Package Task EPT Task
Execute Process Task EPR Task
Execute SQL Server Agent Job Task AGENT Task *
Execute SQL Task SQL Task
Execute T-SQL Statement Task TSQL Task *
Expression Task EXPR Task
File System Task FSYS Task
FTP Task FTP Task
Hadoop File System Task HFSYS Task *
Hadoop Hive Task HIVE Task *
Hadoop Pig Task PIG Task *
History Cleanup Task HISTCT Task *
Maintenance Cleanup Task MAINCT Task *
Message Queue Task MSMQ Task
Notify Operator Task NOT Task *
Rebuild Index Task REBIT Task *
Reorganize Index Task REOIT Task *
Script Task SCR Task
Send Mail Task SMT Task
Shrink Database Task SHRINKDB Task *
Transfer Database Task TDB Task
Transfer Error Messages Task TEM Task
Transfer Jobs Task TJT Task
Transfer Logins Task TLT Task
Transfer Master Stored Procedures Task TSP Task
Transfer SQL Server Objects Task TSO Task
Update Statistics Task STAT Task *
Web Service Task WST Task
WMI Data Reader Task WMID Task
WMI Event Watcher Task WMIE Task
XML Task XML Task
Transformation name Prefix Type New
ADO NET Source ADO_SRC Source *
Azure Blob Source AB_SRC Source *
CDC Source CDC_SRC Source *
DataReader Source DR_SRC Source
Excel Source EX_SRC Source
Flat File Source FF_SRC Source
HDFS File Source HDFS_SRC Source *
OData Source ODATA_SRC Source *
ODBC Source ODBC_SRC Source *
OLE DB Source OLE_SRC Source
Raw File Source RF_SRC Source
SharePoint List Source SPL_SRC Source
XML Source XML_SRC Source
Aggregate AGG Transformation
Audit AUD Transformation
Balanced Data Distributor BDD Transformation *
Cache Transform CCH Transformation *
CDC Splitter CDCS Transformation *
Character Map CHM Transformation
Conditional Split CSPL Transformation
Copy Column CPYC Transformation
Data Conversion DCNV Transformation
Data Mining Query DMQ Transformation
Derived Column DER Transformation
DQS Cleansing DQSC Transformation *
Export Column EXPC Transformation
Fuzzy Grouping FZG Transformation
Fuzzy Lookup FZL Transformation
Import Column IMPC Transformation
Lookup LKP Transformation
Merge MRG Transformation
Merge Join MRGJ Transformation
Multicast MLT Transformation
OLE DB Command CMD Transformation
Percentage Sampling PSMP Transformation
Pivot PVT Transformation
Row Count CNT Transformation
Row Sampling RSMP Transformation
Script Component SCR Transformation
Slowly Changing Dimension SCD Transformation
Sort SRT Transformation
Term Extraction TEX Transformation
Term Lookup TEL Transformation
Union All ALL Transformation
Unpivot UPVT Transformation
ADO NET Destination ADO_DST Destination *
Azure Blob Destination AB_DST Destination *
Data Mining Model Training DMMT_DST Destination
Data Streaming Destination DS_DST Destination *
DataReaderDest DR_DST Destination
Dimension Processing DP_DST Destination
Excel Destination EX_DST Destination
Flat File Destination FF_DST Destination
HDFS File Destination HDFS_DST Destination *
ODBC Destination ODBC_DST Destination *
OLE DB Destination OLE_DST Destination
Partition Processing PP_DST Destination
Raw File Destination RF_DST Destination
Recordset Destination RS_DST Destination
SharePoint List Destination SPL_DST Destination
SQL Server Compact Destination SSC_DST Destination *
SQL Server Destination SS_DST Destination


Example of the prefixes

SSIS Appetizer
I'm not sure I have a purpose for this, but did you know that you can use the cache file of the Cache Transformation (introduced in SSIS 2008) as a source file in the Raw File Source.

Demo
For this demo I use two Data Flow Tasks. The first creates the cache file and the second one uses it as a source.
Two Data Flow Task

























1) Create Cache
The first Data Flow has a random source (a flat file in this case) and a Cache Transformation named "CTR - Create Cache"  as a destination. When you create the Cache Connection Manager, make sure to check "Use file cache" to provide a file path for the cache file. Copy the path for the next step.
The Cache Transformation and Connection Manager

















2) Read Cache
The second Data Flow Task uses a Raw File Source. In the editor you can specify the location of the Raw File. Paste the path from the Cache Connection Manager (a .caw file). For demonstration purposes I added a dummy Derived Column behind it with a Data Viewer on the path between them. Now run the package a see the result. You will get some hash columns 'for free'.
Raw File Source













Please let me know in the comments if you found a good purpose for this.

Note: you can't use a raw file as a cache file unless you're able to add the extra hash columns as well.
CaseWhat is a fast way to load a Slowly Changing Dimension or Persistent Staging Area in SSIS. When using a Data Flow Task for this, the process could become very slow when there are a lot of updates.

Solution
Instead of using the OLE DB Command for updating records you could load all those records to a temporary table and then use a batch update command to update all records in the target table with the values from the temporary table.

An even fancier way is by using the TSQL MERGE statement. This statement has one downside and that is that it can only update records when there is a match and not update the old record and insert a new record. You can overcome this by using the output of the MERGE statement.

The MERGE statement will update the existing record in the destination table, but it can output the old version of the updated record. You can then use this output to do an insert on the destination table.


First create a source and destination table for testing purposes:
-- Drop if exist
IF OBJECT_ID('dbo.Employees', 'U') IS NOT NULL
DROP TABLE dbo.Employees;

-- Create source table
CREATE TABLE [dbo].[Employees](
[EmployeeNumber] [varchar](5) NULL,
[FirstName] [varchar](50) NULL,
[LastName] [varchar](50) NULL,
[DateOfBirth] [date] NULL,
[Salary] [money] NULL
);

-- Insert test records in source
INSERT [dbo].[Employees] ([EmployeeNumber], [FirstName], [LastName], [DateOfBirth], [Salary]) VALUES (N'00001', N'John', N'Williams', CAST(N'1972-02-15' AS Date), 5100.00);
INSERT [dbo].[Employees] ([EmployeeNumber], [FirstName], [LastName], [DateOfBirth], [Salary]) VALUES (N'00002', N'Jane', N'Smith', CAST(N'1965-09-02' AS Date), 4900.00);
INSERT [dbo].[Employees] ([EmployeeNumber], [FirstName], [LastName], [DateOfBirth], [Salary]) VALUES (N'00003', N'Marc', N'Brown', CAST(N'1981-12-01' AS Date), 3300.00);
INSERT [dbo].[Employees] ([EmployeeNumber], [FirstName], [LastName], [DateOfBirth], [Salary]) VALUES (N'00004', N'David', N'Garcia', CAST(N'1975-01-01' AS Date), 3700.00);

-- Drop if exist
IF OBJECT_ID('dbo.DimEmployee', 'U') IS NOT NULL
DROP TABLE dbo.DimEmployee;

-- Create destination table
CREATE TABLE [dbo].[DimEmployee](
[EmployeeID] [int] IDENTITY(1,1) NOT NULL,
[EmployeeNumber] [varchar](5) NULL,
[FirstName] [varchar](50) NULL,
[LastName] [varchar](50) NULL,
[DateOfBirth] [date] NULL,
[Salary] [money] NULL,
[Active] [bit] NULL,
[DateFrom] [datetime] NULL,
[DateEnd] [datetime] NULL
);


The destination table is a Slowly Changing Dimension, but it could also be a Persistent Staging Area. It has the same columns as the source with a few extra columns:
  1. EmployeeID is the dimension id it's populated by the identity setting. You could skip this column for the Persistent Staging Area.
  2. Active: This is a Boolean to quickly filter all active records. It's a bit redundant, but also easy.
  3. DateFrom: This is the datetime to indicate the insertion of this record. It never changes.
  4. DateFrom: This is the datetime to indicate when this record was inactivated. I use NULL for active records, but you could also use a future date like '9999-31-12'

And now the MERGE script. Below I will try to describe each section that starts with a comment and number: /***** 1 *****/.
--Merge script
/***** 6b *****/
INSERT INTO DimEmployee
(EmployeeNumber
, FirstName
, LastName
, DateOfBirth
, Salary
, Active
, DateFrom
, DateEnd)
SELECT MergeOutput.EmployeeNumber
, MergeOutput.FirstName
, MergeOutput.LastName
, MergeOutput.DateOfBirth
, MergeOutput.Salary
, 0 -- InActivate the record
, MergeOutput.DateFrom -- Keep the old from date
, GETDATE() -- Close the record
FROM (
/***** 1 *****/
MERGE DimEmployee as T -- Target
USING Employees as S -- Source
ON T.EmployeeNumber = S.EmployeeNumber -- Compare key
AND T.Active = 1 -- Only compare open records
/***** 2 *****/
WHEN NOT MATCHED BY TARGET THEN -- Not found in destination
INSERT
( EmployeeNumber
, FirstName
, LastName
, DateOfBirth
, Salary
, Active
, DateFrom)
VALUES
( S.EmployeeNumber
, S.FirstName
, S.LastName
, S.DateOfBirth
, S.Salary
, 1 -- Activate the record
, GETDATE()) -- Open the record
/***** 3 *****/
WHEN NOT MATCHED BY SOURCE -- Not found in source
AND T.Active = 1 THEN -- Only compare open records
UPDATE
SET T.Active = 0 -- Inactivate record
, T.DateEnd = GETDATE() -- Close date
/***** 4 *****/
WHEN MATCHED -- Found in source and destination
AND T.Active = 1 -- Only compare open records
AND EXISTS (SELECT S.FirstName
, S.LastName
, S.DateOfBirth
, S.Salary
EXCEPT
SELECT T.FirstName
, T.LastName
, T.DateOfBirth
, T.Salary) THEN
UPDATE
SET T.FirstName = S.FirstName
, T.LastName = S.LastName
, T.DateOfBirth = S.DateOfBirth
, T.Salary = S.Salary
, T.Active = 1 -- Make record active
, T.DateFrom = GETDATE() -- Open record with current datetime
, T.DateEnd = null -- Keep record open
/***** 5 *****/
OUTPUT $action as MergeAction, Deleted.*, Inserted.Active as NewActiveCheck
/***** 6a *****/
) as MergeOutput WHERE MergeAction = 'UPDATE' and NewActiveCheck = 1;


  1. In this section you provide the name of the target and source table and which key to use to compare those records. I also added a filter on active to only compare open records. You could replace it by T.DateFrom is null.
  2. This section is for new records. Source records that are not found in the destination will be inserted with Active set to 1 (true), the FromDate set to now and the EndDate set to null.
  3. This section is for deleted records. Active destination records that are not found in the source are deactived and closed by setting the EndDate. Other columns remain unchanged.
  4. This secion is for active updating record with new values. To prevent unnecessary updates I have added EXISTS-EXCEPT part. This is  a very handy way to compare all (non key) columns for changes and above all it can even compare NULL values.
  5. This is the last part of the MERGE statement and it can output the old and new values of deletes, updates and insertions. In this case I'm interested in the old values of the changed records (Deleted.* or Deleted.column1, Deleted.column2, etc). I also output the Active column from the new record to filter inactivated records (deleted records from the source shouldn't be inserted again). The $active indicates whether this is an 'INSERT', 'UPDATE', or 'DELETE'.
  6. In 6a I filter on the action to only keep the old values of the updated records. In 6b I insert a new record with the old values of the changed records. I inactivate the new record and I set the EndDate to close the new record. Other columns remain unchanged.


Testing the script:
One update




















Second test:
One update, one delete and one insert




















I use this script primarily for the Persistent Staging Area. When you want to use it for an SCD you have to reload the facttable because the dimension ID changes. The fact pointing to ID 4 with David's old salary now points to the record with David's new salary.

The alternative script below could be a solution for that. Instead of comparing the key columns, I compare the CHECKSUM (or HASHBYTES) of all columns and remove the WHEN matched part (if the checksum matches, then we don't have to do anything). The benefit of this is that the dimension ID never changes. A second benefit is that you don't need to know the key columns. One downside is that CHECKSUM may not be unique and the HASHBYTES can only handle 8000bytes and can't compare NULL values. So the script below is NOT yet foolproof!!! Will work on that, but let me know if you have a solution.


--Alternative Merge script with checkum or hashbytes
/***** 6b *****/
INSERT INTO DimEmployee
( EmployeeNumber
, FirstName
, LastName
, DateOfBirth
, Salary
, Active
, DateFrom
, DateEnd)
SELECT MergeOutput.EmployeeNumber
, MergeOutput.FirstName
, MergeOutput.LastName
, MergeOutput.DateOfBirth
, MergeOutput.Salary
, 0 -- InActivate the record
, MergeOutput.DateFrom -- Keep the old from date
, GETDATE() -- Close the record
FROM (
/***** 1 *****/
MERGE DimEmployee as T -- Target
USING Employees as S -- Source
ON CHECKSUM(S.EmployeeNumber + '|' + S.FirstName + '|' + S.LastName + '|' + CAST(S.DateOfBirth as varchar(10)) + '|' + CAST(S.Salary as varchar(20))) =
CHECKSUM(T.EmployeeNumber + '|' + T.FirstName + '|' + T.LastName + '|' + CAST(T.DateOfBirth as varchar(10)) + '|' + CAST(T.Salary as varchar(20)))
--ON HASHBYTES('MD5 ', S.EmployeeNumber + '|' + S.FirstName + '|' + S.LastName + '|' + CAST(S.DateOfBirth as varchar(10)) + '|' + CAST(S.Salary as varchar(20))) =
-- HASHBYTES('MD5 ', T.EmployeeNumber + '|' + T.FirstName + '|' + T.LastName + '|' + CAST(T.DateOfBirth as varchar(10)) + '|' + CAST(T.Salary as varchar(20)))

AND T.Active = 1 -- Only compare open records
/***** 2 *****/
WHEN NOT MATCHED BY TARGET THEN -- Not found in destination
INSERT
( EmployeeNumber
, FirstName
, LastName
, DateOfBirth
, Salary
, Active
, DateFrom)
VALUES
( S.EmployeeNumber
, S.FirstName
, S.LastName
, S.DateOfBirth
, S.Salary
, 1 -- Activate the record
, GETDATE()) -- Open the record
/***** 3 *****/
WHEN NOT MATCHED BY SOURCE -- Not found in source
AND T.Active = 1 THEN -- Only compare open records
UPDATE
SET T.Active = 0 -- Inactivate record
, T.DateEnd = GETDATE() -- Close date
/***** 4 *****/
/***** REMOVED *****/
/***** 5 *****/
OUTPUT $action as MergeAction, Deleted.*, Inserted.Active as NewActiveCheck
/***** 6a *****/
) as MergeOutput WHERE MergeAction = 'UPDATE' and NewActiveCheck = 1;


Testing the script:
One update



















Second test:
One update, one delete and one insert

Case
A while ago I did a post on validating XML files in SSIS with nested XSD files. The out of the box XML Task doesn't honor nested XSD files with an (include or import). The Script Task workaround is simple, but since I use it a lot I decided to make a task for it.

Solution
The XML Validation Task allows you to specify the XML and XSD filepaths with a Connection Manager or string variable. After that it will either succeed or fail with an error message describing what's wrong with the XML file.
XML Validation Task V0.1

















Please email me (address is under the Help button) bugs and feature requests for this task.
Items to address:
  • More validation to make it really monkey proof
  • Different icon
Thinking about:
  • Providing multiple XSD's (but how many?)
  • Providing XML and/or XSD content (not path) via string variable
  • Option to throw warning instead of error






Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008, 2012, 2014 and 2016 version on the download page.

V0.1 Initial version for 2008 to 2016
V0.2 ? (Leave a comment)

Installation
The installer registers the DLL in the GAC and copies it to the task folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\Tasks\). After that you have to restart BIDS because it caches the GAC on startup. Restarting SQL Server Data Tools is not necessary.

Bugs or feature suggestions
Please contact me or leave a comment if you have any suggestions or found a bug in this Custom task.

Case
If you declare a connection manager in BIML, but don't use it in one of the tasks or transformations, it won't be created. Can you force BIML to create the connection managers nevertheless?


No connection managers were created
















Solution
In some cases you want to override this feature and just create the connection managers. For example when using Custom Tasks/Transformations where BIML doesn't recognize a connection manager attribute.



To force BIML to create the connection managers you need to add a second <Connections> tag, but this time within the package tag. And within this tag you can add <Connection> tags with a ConnectionName attribute. As a value you need to need to supply the name of the connection manager that you created in the first <Connections> tag.
Force creating connection managers
















<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<AdoNetConnection ConnectionString="Data Source=.;Initial Catalog=tempdb;Integrated Security=True;"
Provider="System.Data.SqlClient.SqlConnection, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
Name="myStage"
CreateInProject="true"
/>
<FileConnection FileUsageType="ExistingFolder"
FilePath="d:\"
Name="myFolder"
CreateInProject="false"
/>
</Connections>

<Packages>
<Package Name="myPackage" ProtectionLevel="DontSaveSensitive">
<Tasks>
<Container Name="myContainer">

</Container>
</Tasks>
<Connections>
<!-- Force creating connection managers -->
<Connection ConnectionName="myStage" />
<Connection ConnectionName="myFolder" />
</Connections>
</Package>
</Packages>
</Biml>


You can even determine the guid of each connection manager.
<Connections>
<!-- Force creating connection managers -->
<Connection ConnectionName="myStage"
Id="{365878DA-0DE4-4F93-825D-D8985E2765FA}"/>
<Connection ConnectionName="myFolder"
Id="{365878DA-0DE4-4F93-825D-D8985E2765FB}"/>
</Connections>


And if you need the same GUID in multiple places within your script, but you want a random GUID, then you can add a string variable and fill it with a random GUID. Then you can use that variable in multiple places.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<AdoNetConnection ConnectionString="Data Source=.;Initial Catalog=tempdb;Integrated Security=True;"
Provider="System.Data.SqlClient.SqlConnection, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
Name="myStage"
CreateInProject="true"
/>
<FileConnection FileUsageType="ExistingFolder"
FilePath="d:\"
Name="myFolder"
CreateInProject="false"
/>
</Connections>

<#
// Create Random Guid but use it in multiple places
string myGuid = System.Guid.NewGuid().ToString();
#>

<Packages>
<Package Name="myPackage" ProtectionLevel="DontSaveSensitive">
<Tasks>
<Container Name="myContainer">

</Container>
</Tasks>
<Connections>
<!-- Force creating connection managers -->
<Connection ConnectionName="myStage"
Id="<#=myGuid#>"/>
<Connection ConnectionName="myFolder"
Id="{365878DA-0DE4-4F93-825D-D8985E2765FB}"/>
</Connections>
</Package>
</Packages>
</Biml>
It's not ready yet, but I'm proud to announce the first SSIS book by me and fellow MVP Régis Baccaro (B|T).
Extending SSIS with .NET Scripting


Extending SSIS with .NET Scripting will be a timeless and comprehensive scripting toolkit for SQL Server Integration Services to solve a wide array of everyday problems that SSIS developers encounter. The detailed explanation of the Script Task and Script Component foundations will help you to develop your own scripting solutions, but this book will also show a broad arsenal of readymade and well documented scripting solutions for all common problems.

Feel free to contact us for ideas and suggestions. We will post status updates on twitter and our blogs.

It could be that the number of blogposts will slightly reduce the coming months due writing obligations (but only temporarily).
Case
I have a lot of dimension packages in SSIS that all insert a default record for unknown dimension values. It's a lot of repetitive and boring work. Is there an alternative for creating an insert query manually?
A typical dimension package





















Solution
Instead of creating an insert query manually for each dimension table you could also create a Stored Procedure to do this for you. Instead of the insert query in the Execute SQL Task you execute this Stored Procedure in the Execute SQL Task.
-- TSQL code
USE [datamart]
GO

/****** datamart: StoredProcedure [dbo].[InsertUnknownDimensionRow] ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[InsertUnknownDimensionRow](@TableName nvarchar(128))
AS
BEGIN

-- This Stored Procedure inserts a record in the dimension table
-- for unknown dimension values. It generates an insert statement
-- based on the column datatypes and executes it.
-- The integer column with identity enabled gets the value -1 and
-- all other columns get a default value based on their datatype.
-- Columns with a default value are ignored.

-- Create temporary table for column specs of dimension table
DECLARE @TableSpecs TABLE (
COLUMN_ID int identity,
COLUMN_NAME nvarchar(128),
DATA_TYPE nvarchar(128),
CHARACTER_MAXIMUM_LENGTH int,
COLUMN_IS_IDENTITY bit
)

-- Use the information schema to get column info and insert it
-- to the temporary table.
INSERT @TableSpecs
SELECT C.COLUMN_NAME
, C.DATA_TYPE
, C.CHARACTER_MAXIMUM_LENGTH
, columnproperty(object_id(C.TABLE_SCHEMA + '.' + C.TABLE_NAME)
, C.COLUMN_NAME, 'IsIdentity') AS COLUMN_IS_IDENTITY
FROM INFORMATION_SCHEMA.COLUMNS C
WHERE QUOTENAME(C.TABLE_NAME) = QUOTENAME(@TableName)
AND C.COLUMN_DEFAULT IS NULL
ORDER BY C.ORDINAL_POSITION

-- Variables to keep track of the number of columns
DECLARE @ColumnId INT
SET @ColumnId = -1

DECLARE @ColumnCount INT
SET @ColumnCount = 0

-- Variables to create the insert query
DECLARE @INSERTSTATEMENT_START nvarchar(max)
DECLARE @INSERTSTATEMENT_END nvarchar(max)

SET @INSERTSTATEMENT_START = 'INSERT INTO ' + QUOTENAME(@TableName) + ' ('
SET @INSERTSTATEMENT_END = 'VALUES ('

-- Variables to complete the insert query with
-- extra enable and disable identity statements
-- You could add an extra check in the loop to
-- make sure there is an identity column in the
-- table. Otherwise the SET IDENTITY_INSERT
-- statement will fail.
DECLARE @IDENITYSTATEMENT_ON nvarchar(255)
DECLARE @IDENITYSTATEMENT_OFF nvarchar(255)

SET @IDENITYSTATEMENT_ON = 'SET IDENTITY_INSERT ' + QUOTENAME(@TableName) + ' ON;'
SET @IDENITYSTATEMENT_OFF = 'SET IDENTITY_INSERT ' + QUOTENAME(@TableName) + ' OFF;'

-- Variables filled and use the WHILE loop
DECLARE @COLUMN_NAME VARCHAR(50)
DECLARE @DATA_TYPE VARCHAR(50)
DECLARE @CHARACTER_MAXIMUM_LENGTH INT
DECLARE @COLUMN_IS_IDENTITY BIT

-- WHILE loop to loop through all columns and
-- create a insert query with the columns
WHILE @ColumnId IS NOT NULL
BEGIN
-- Keep track of the number of columns
SELECT @ColumnId = MIN(COLUMN_ID)
, @ColumnCount = @ColumnCount + 1
FROM @TableSpecs
WHERE COLUMN_ID > @ColumnCount

-- Check if there are any columns left
IF @ColumnId IS NULL
BEGIN
-- No columns left, break loop
BREAK
END
ELSE
BEGIN
-- Get info for column number x
SELECT @COLUMN_NAME = COLUMN_NAME
, @DATA_TYPE = DATA_TYPE
, @CHARACTER_MAXIMUM_LENGTH = CHARACTER_MAXIMUM_LENGTH
, @COLUMN_IS_IDENTITY = COLUMN_IS_IDENTITY
FROM @TableSpecs
WHERE COLUMN_ID = @ColumnCount
END

-- Start building the begin of the statement (same for each column)
SET @INSERTSTATEMENT_START = @INSERTSTATEMENT_START + @COLUMN_NAME + ','

-- Start building the end of the statement (the default values)
IF @COLUMN_IS_IDENTITY = 1
BEGIN
-- Default value if the current column is the identity column
SET @INSERTSTATEMENT_END = @INSERTSTATEMENT_END + '-1,'
END

IF @DATA_TYPE IN ('int', 'numeric', 'decimal', 'money', 'float', 'real', 'bigint', 'smallint', 'tinyint', 'smallmoney') AND (@COLUMN_IS_IDENTITY = 0)
BEGIN
-- Default value if the current column is a numeric column,
-- but not an identity: zero
SET @INSERTSTATEMENT_END = @INSERTSTATEMENT_END + '0,'
END

IF @DATA_TYPE IN ('char', 'nchar', 'varchar', 'nvarchar')
BEGIN
-- Default value if the current column is a text column
-- Part of the text "unknown" depending on the length
SET @INSERTSTATEMENT_END = @INSERTSTATEMENT_END + '''' + LEFT('Unknown', @CHARACTER_MAXIMUM_LENGTH) + ''','
END

IF @DATA_TYPE IN ('datetime', 'date', 'timestamp', 'datatime2', 'datetimeoffset', 'smalldatetime', 'time')
BEGIN
-- Default value if the current column is a datetime column
-- First of january 1900
SET @INSERTSTATEMENT_END = @INSERTSTATEMENT_END + '''' + CONVERT(varchar, CONVERT(date, 'Jan 1 1900')) + ''','
END

IF @DATA_TYPE = 'bit'
BEGIN
-- Default value if the current column is a boolean
SET @INSERTSTATEMENT_END = @INSERTSTATEMENT_END + '0,'
END
END

-- Remove last comma from start and end part of the insert statement
SET @INSERTSTATEMENT_START = LEFT(@INSERTSTATEMENT_START, LEN(@INSERTSTATEMENT_START) - 1) + ')'
SET @INSERTSTATEMENT_END = LEFT(@INSERTSTATEMENT_END, LEN(@INSERTSTATEMENT_END) - 1) + ');'

-- Execute the complete statement
EXEC (@IDENITYSTATEMENT_ON + ' ' + @INSERTSTATEMENT_START + ' ' + @INSERTSTATEMENT_END + ' ' + @IDENITYSTATEMENT_OFF)

END

GO
-- Tweak the code for your own needs and standards
-- Optional extra check if you don't want to truncate
-- your dimensions: is there already a default/unknown
-- record available

Execute Stored Procedure


















Note: only the most common datatypes are handled. Add more if-statements if you expect data types like varbinary, xml, image or sql_variant

Had a nice day at SQL Saturday #336 in Utrecht! The PowerPoint slides of my SSIS Development Best Practices session are available for download. I added some screens, text and URL's for additional information (see notes in PowerPoint)
Case
I have a Foreach Loop Container with a file enumerator. The wildcard is *.xls, but it also returns *.xlsx files. How do I prevent that?

Loop through *.xls also includes xlsx files





















My xls loop includes xlsx and xlsm files


















Solution
This is actually similar to the DIR command in a DOS/Command Prompt.
All xls files? (/b is to remove my Dutch header/footer)











The workaround is simple. And if you don't like the solution then you could use my Sorted File Enumerator that also supports regular expression wildcards.

1) Dummy
Add an empty/dummy task or Sequence Container in your Foreach Loop Container. And connect it to your first task.

Empty/collapsed Sequence Container added

























2) Precedence Constraint Expression
Add an expression on the Precedence Constraint between the dummy and your first task. It should look something like LOWER(RIGHT(@[User::FilePath], 4)) == ".xls" (replace the variablename and/or file extension).

Expression with LOWER and RIGHT to check the file extension

















3) The result
Now test the package (Replace my example Script Task with your own tasks).
The result: only two xls files and no xlsx or xlsm files





















Case
The path of the XSD file in the XML Source component is hardcoded, but the path on my production environment is different than my development environment. The XML source doesn't have expressions. How do I make this XSD path configurable?
XML Source Connection Manager Page
without Connection Managers



























Solution
Unfortunately the XML source component does lack some very basic functionalities like the use of a Connection Manager (although ironically it's called the Connection Manager page). The source can use variables as input, but there isn't such option for the XSD file.
XML Task can use Connection Managers


























1) Find Expressions
The XML Source Component doesn't support expressions, but the Data Flow Task itself does. Go to the properties of the Data Flow Task and locate the expressions and click on the ... button

Select Data Flow Task and press F4 to get properties


























2) Select Property
Locate the XMLSchemaDefinition property of your XML Source in the Property Expression Editor and add an expression on it by clicking on the ... button.
Property Expression Editor
















3) Add Expression
Now you can either replace its value by a variable or a parameter if you use SSIS 2012 and above.
Expression Builder
























That's it. Now you have a workaround for the absence of a real Connection Manager. An alternative could be to use a Script Component as XML source or an XML file with an inline XSD schema.
Case
I want to use nested includes in a BIML Script (an include in an include), but the second level isn't working. It seems to skip it without giving an error.
No second Sequence Container













Solution
First be careful with (too many) nested includes! It could make your BIML script obscure. There are two tricks to solve this problem. They came to me via twitter from @cathrinew and @AndreKamman.

Solution A:
Use a full path in the include tag instead of only the name:
Using the fullpath













Big downside is of course the full path in your BIML Script. In a multi-user environment with for example TFS that could be an issue because everybody needs the same project path.

Solution B:
A better option is to use CallBimlScript instead of include:
Using CallBimlScript












And you could also pass parameters to the included file and use relative path and then reuse the file in multiple projects.
Case
Is the Row Sampling Transformation a (fully) blocking component or a non-blocking component? Various blogs and forums disagree on each other. Is it possible to see it in SSIS?

Solution
By adding some logging to the package you can see the Pipeline Execution Trees of your Data Flow. This will tell you whether a transformation creates a new buffer. When it creates a new buffer then the component is a-synchronously / blocking.


1) Add logging
Open your package and go to the SSIS menu and choose Logging...
SSIS Menu - Logging...
















2) SQL logging
For this example I selected SQL Server logging, pressed the add button and then selected a Connection Manager.
SQL Server logging






















Next check the Data Flow Task (1) and the enable the SQL Server Log (2) for it. After that go to the Details tab (3).
Enable logging for the Data Flow Task





















In the details tab scroll down and select PipelineExecutionTrees. Then click on the advanced button.
Logging Details





















In the advanced section make sure that the MessageText is checked. In this column you will find the execution trees text. After that click OK to close the logging window.
Check MessageText
















3) Run and check log
Run the package and then check the log. In the column Message you can find the Execution Trees.
SQL Server Logging. Check message column.











Now check these examples and see which is blocking. Multiple paths means that there is a transformation that creates new buffers. The last example is the Row Sampling:
Plain Data Flow: 1 path - No blocking

Data Flow with Sort: 2 paths - Sort is blocking

Data Flow with Percentage Sampling: 1 path - No blocking

Data Flow with Row Sampling: 3 paths - Row Sampling is blocking














































4) Partial Blocking or Fully Blocking?
Unfortunately you can't see in the log whether the task is partially blocking (like union or merge join) or fully blocking (like sort and aggregate), but you can see it in the Data Flow Task when running your package in Visual Studio (BIDS/SSDT). The Row Sampling Transformation is fully blocking because it apparently need all data before it sends data to its output. So try not to use it unnecessarily.

Row Sampling is Fully Blocking































The Row Sampling isn't just doing a TOP X, but it spreads the sampled rows over all buffers. Because you don't know the number of records or buffers that is coming you have to have all rows before you can pick randomly X records from the whole set.

An alternative (with less random rows) could be to use a Script Component that adds a row number and then use a Conditional Split to select the first X rows (and perhaps combine it with a modulo expression like: Rownumber % 3 == 0 && Rownumber <= 3000). The Conditional Split is a non-blocking component. Note: this isn't necessarily faster. Check it first for your case!

So why is the Percentage Sampling not blocking? It just takes the same percentage of each buffer and can therefore be synchronous.


Confirmed by Mister SSIS himself!