Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts
(2020-June-07) It is a very simple concept, you have a database query result, either a direct database table output or a stored procedure returned result that needs to be sent as email notification. This email could serve as a status report of your daily data processing job or alert notification with some metrics that require immediate attention, and you want to be as user-friendly as possible with this message aka HTML format for your dataset is required.


I've built this solution approach over a year go and wanted to document and share it after recently using it in one of my projects. Again, I agree, there other more or less effective ways to achieve the very same goal, this blog post is just a recollection of my efforts to send HTML formatted SQL query results using Azure Data Factory and Logic App.

(A) Database Query
Let's say I want to send an email message with the Top 10 Selling Products using Microsoft AdventureWorks Sales LT database:

I also created a stored procedure to prepare an HTML output result. This where the whole trick is created (or the purpose for this blog post = spoiler alert :-). Knowing, that my Azure Logic App to transmit email message usually takes one variable for email body message then this drives my efforts to convert SQL based multiple records' result into a single string data structure along with HTML formatting options applied.

1) First I need to use <li> HTML tag to list all of my dataset records
2) Then I need to wrap this list of items with the <ol> HTML tag to add numbers for each of the lines (your case might be different: a table definition or text coloring might be necessary).

Achieving my first requirement was easy by combining all the data elements into one column with the "li" name, and the second requirement was accomplished by converting all my records into one data cell with FOR XML RAW ('') option and adding "ol" as my ROOT.

The output of this query is not very pretty but you need to see how my ugly duckling converts into a beautiful HTML swan :-)


The rest is easy.  

(B) Azure Logic App
My Logic App has an HTTP trigger and one "Send an email" action


(C) Azure Data Factory
The main job of my Azure Data Factory (ADF) pipeline is done by the Lookup activity, the only ADF activity that could return output result of a SQL Server stored procedure, regular Stored Procedure Activity in ADF won't help me with this.



The output of my Lookup activity looks ugly as expected, but we only care that it should come out as one single string data element:


(D) Received HTML formated email message
After passing the output of my stored procedure via ADF Lookup activity to my Logic App, I received an email message as I expected: all lines formatted and nothing more.


Little did I know how much can be done with this FOR XML T-SQL clause and embedding HTML tags within. 
My data adventure continues! :-) 

(2018-Mat-31) Support for spatial geometry functions was initially introduced in SQL Server 2008 and then it was greatly improved in Denali (SQL Server 2012) version of the product - https://docs.microsoft.com/en-us/sql/relational-databases/spatial/spatial-data-sql-server

Back then, when I was reading about the spatial features in SQL Server, I wondered where I could use this functionality with the creation of geo points, line polygons; however, working with geometry objects a bit more helped me to see some real use cases where this could be applicable.

Let's say you have the London Tower Bridge and you want to know when a taxi cab with your very important guest crosses this bridge. By creating a geo polygon for the bridge you can use spatial functions in SQL Server to check if a taxi cab GPS tracker coordinates intersect with the bridge polygon geometry object - STIntersects (geometry Data Type).





First, I've located my map objects in the QGIS tool using Google Satelite layer:




Then I created a polygon that would resemble the bridge area over the Thames River:




Then by extracting the nodes of my polygon, I can see each individual geo points coordinates:





And then after migrating this dataset into my dbo.LondonBridgeCoordinates SQL Server table, I can use this script to create the very same geometry object in my database:


-- POLYGON definition
DECLARE @coords nvarchar(max);

-- POLYGON CREATION FROM LON/LAT COORDINATES
WITH polygon
AS (SELECT
id,
latitude,
longitude,
CONCAT(LTRIM(STR(longitude, 22, 6)), ' ', LTRIM(STR(latitude, 22, 6))) AS coords
FROM [dbo].[LondonBridgeCoordinates]),
-- LON/LAT CONCATENATION
polygon_coordinates
AS (SELECT
STUFF((SELECT
', ' + coords
FROM polygon p
ORDER BY p.id
FOR xml PATH (''), TYPE)
.value('(./text())[1]', 'VARCHAR(MAX)'), 1, 2, '') AS polygon_coords,
(SELECT
CONCAT(LTRIM(STR(longitude, 22, 6)), ' ', LTRIM(STR(latitude, 22, 6))) AS coords
FROM [dbo].[LondonBridgeCoordinates]
WHERE id = 1)
AS first_coordinate)

SELECT @coords = (polygon_coords) FROM polygon_coordinates
-- LET’S SEE HOW A POLYONG WOULD LOOK LIKE
SELECT geometry ::STPolyFromText('POLYGON((' + @coords + '))', 4326).MakeValid()






I can also save the output of the STPolyFromText function as a database object and use it with all other available spatial functions in SQL Server. 

Please let me know if you can find other uses besides checking geo objects intersection. There are so many possibilities!

Case

How can we simplify the process of loading database tables and reduce the time needed to create SSIS packages.

Solution

There are several steps that need to be taken prior to creating such a solution.
These steps are:
  1. Create the databases 'Repository' and 'Staging' and required schema's
  2. Create a Meta table called 'Layer' and a table called 'TableList' in the repository database
  3. Create the Sales tables in the Sales database
  4. Fill the Meta tables with the required meta data
  5. Create a BIML script that will create the Extract and Load SSIS package
  6. Generate the package using BIML Express in Visual Studio 2015 to create the SSIS package
For this solution the following prerequisites need to be met;
  • SQL Server will be used as source and destination platform
  • The Adventure Works 2014 database will be used as source
  • The selected tables from the Adventure Works database are present in the destination database and all required schema’s and specific datatypes, if applicable (we will be using a few tables from the Person schema that do not use custom datatypes)
  • Visual Studio 2015 is installed (Community/Professional/Enterprise)
  • BIML Express is installed

    1) - Create the databases and schema's

    In this step the databases 'Repository' and Staging are created and the required schema's.
    --Create the databases Repository and Staging and required schema's
    CREATE DATABASE [Repository] CONTAINMENT = NONE ON PRIMARY
    ( NAME = N'Repository', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Repository.mdf' , SIZE = 7168KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
    LOG ON
    ( NAME = N'Repository_log', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Repository_log.ldf' , SIZE = 5184KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
    GO
    ALTER DATABASE [Repository] SET RECOVERY SIMPLE;
    GO
    USE Repository
    go
    CREATE SCHEMA rep
    GO
    CREATE DATABASE [Staging] CONTAINMENT = NONE ON PRIMARY
    ( NAME = N'Staging', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Staging.mdf' , SIZE = 7168KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
    LOG ON
    ( NAME = N'Staging_log', FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\Staging_log.ldf' , SIZE = 5184KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
    GO
    ALTER DATABASE [Staging] SET RECOVERY SIMPLE;
    GO
    USE Staging
    go
    CREATE SCHEMA Sales
    GO

    2) - Create the Meta tables

    During this step we will be creating the meta tables 'Layer' and 'TableList'. The first table will contain the id, name and prefix of the layers present in the Datawarehouse environment. In this blog the only entry present is the Staging area id and name. Normally this table also holds the name of for example the Datawarehouse and Datamart layer or any other layers present in a Business Intelligence environment.
    The column 'LayerId' is used in the other table named 'TableList' and is used to make the distinction between the table names present in the respective layer. In this blog there will only be Staging Area tables described.
    The table 'TableList' contains the following columns;
    • LayerId - The id of the layer the table belongs to
    • TableName - The name of the table
    • SchemaName - The name of the schema of the table
    • TableType - The type of the table (fe. user table)
    • LoadOrderNr - The order in which the tables are loaded (or created by other processes)
    • WhereClause - Any optional where clause that is used during the load proces (Default this column must be set to '1 = 1')
    • ActiveInd - Indicates if the table is active or inactive during the BIML creation proces
    • InsertDate - The date when the table entry was inserted in the 'TableList' table
    Note: Some of the above columns are not or less applicable to the BIML script in this post, but they are used in other generic scripts used to create and load datawarehouse packages (more about this in future posts)
    --Create the meta tables
    USE [Repository]
    GO
    IF OBJECT_ID('[rep].[Layer]', 'U') IS NOT NULL
    BEGIN
    DROP TABLE [rep].[Layer]
    END
    GO
    CREATE TABLE [rep].[Layer](
    [LayerId] [smallint] NOT NULL,
    [LayerName] [nvarchar](50) NOT NULL,
    [LayerPrefix] [nvarchar](10) NOT NULL,
    ) ON [PRIMARY]
    GO
    IF OBJECT_ID('[rep].[TableList]', 'U') IS NOT NULL
    BEGIN
    DROP TABLE [rep].[TableList]
    END
    GO
    CREATE TABLE [rep].[TableList](
    [LayerId] [smallint] NULL,
    [TableName] [nvarchar](100) NULL,
    [SchemaName] [nvarchar](100) NULL,
    [ServerNr] [smallint] NULL,
    [TableType] [nvarchar](100) NULL,
    [LoadOrderNr] [int] NULL,
    [WhereClause] [nvarchar](250) NULL,
    [PrimaryKey] [nvarchar](250) NULL,
    [ActiveInd] [nchar](1) NULL,
    [InsertDate] [datetime] NULL
    ) ON [PRIMARY]
    GO

    3) - Create the Sales tables

    During this step we will be creating the Sales tables in the target database Staging.
    --Create the Sales tables
    USE Staging
    GO
    CREATE TABLE [Sales].[ShoppingCartItem](
    [ShoppingCartItemID] [int] IDENTITY(1,1) NOT NULL,
    [ShoppingCartID] [nvarchar](50) NOT NULL,
    [Quantity] [int] NOT NULL,
    [ProductID] [int] NOT NULL,
    [DateCreated] [datetime] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL,
    ) ON [PRIMARY]
    GO
    CREATE TABLE [Sales].[SpecialOffer](
    [SpecialOfferID] [int] IDENTITY(1,1) NOT NULL,
    [Description] [nvarchar](255) NOT NULL,
    [DiscountPct] [smallmoney] NOT NULL,
    [Type] [nvarchar](50) NOT NULL,
    [Category] [nvarchar](50) NOT NULL,
    [StartDate] [datetime] NOT NULL,
    [EndDate] [datetime] NOT NULL,
    [MinQty] [int] NOT NULL,
    [MaxQty] [int] NULL,
    [rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
    [ModifiedDate] [datetime] NOT NULL,
    ) ON [PRIMARY]
    GO
    CREATE TABLE [Sales].[SpecialOfferProduct](
    [SpecialOfferID] [int] NOT NULL,
    [ProductID] [int] NOT NULL,
    [rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
    [ModifiedDate] [datetime] NOT NULL,
    ) ON [PRIMARY]
    GO

    4) - Fill the meta tables with the required meta data

    After creating the database and metadata tables, they need to be filled with the meta data that will be used by the BIML script in the next step ('BIML Load_STG_Tables_From_Microsoft.biml').
    The script provided below inserts the layer information used in this blog and the table meta information of those tables for which the SSIS load proces will be created.

    If you want to test the Where Clause functionality you can replace the value '1 = 1' with '1 = 1 AND ShoppingCartItemID = 2' in the column 'WhereClause' in the table 'TableList' for the tablename 'sales.ShoppingCartItem'. This will place a filter on the table.
    The BIML script will use the meta table information to create one SSIS package with the name 'SSIS STG Load STG Tables SQL.dtsx'

    --Insert the meta information in the meta tables
    USE [Repository]
    GO
    TRUNCATE TABLE [rep].[Layer];
    TRUNCATE TABLE [rep].[TableList];
    INSERT [rep].[Layer] ([LayerId], [LayerName], [LayerPrefix]) VALUES (1, N'staging', N'stg');
    INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'SpecialOffer', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
    INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'SpecialOfferProduct', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
    INSERT [rep].[TableList] ([LayerId], [TableName], [SchemaName], [TableType], [LoadOrderNr], [WhereClause], [ActiveInd], [InsertDate]) VALUES (1, N'ShoppingCartItem', N'Sales', N'user_table', 1, N'1 = 1', N'1', CAST(GETDATE() AS DateTime));
    GO

    5) - Create the BIML script

    Once the previous steps have been executed it is time to create the BIML script. The BIML script starts with declaring the information needed to create the connection strings to the different database and the server(s) where they recide on. For this example all the databases are SQL Server 2016 databases. It would also be possible to store that information in a meta table but for this post the information is placed inside the BIML script. The BIML script will create one package with the name 'SSIS STG Load STG Tables SQL'and for each table in the 'TableList' table a sequence container will be created with two SSIS components. The first component is a SQL Task component that will use a T-SQL command to truncate the target table. The second component is a Data Flow Task containing a Source and Destination component which will load the data from the target to the source table. Alle the sequence components are executed parallel to each other.

    --The BIML code that can be placed inside a BIML file.


    <#
    string pRepServerName = "localhost";
    string pRepDatabaseName = "Repository";
    string pRepProvider = "SQLNCLI11.1;Integrated Security=SSPI";
    string pRepSchema = "rep";

    string pSourceServerName = "localhost";
    string pSourceDBName = "AdventureWorks2014";
    string pSourceProvider = "SQLNCLI11.1;Integrated Security=SSPI";
    string pSourceSchema = "Sales";

    string pTargetServerName = "localhost";
    string pTargetDBName = "Staging";
    string pTargetProvider = "SQLNCLI11.1;Integrated Security=SSPI";
    string pTargetSchema = "Sales";
    #>


    <#
    string pLayer = "Staging";
    #>

    <#
    string csRepository = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
    ,pRepServerName, pRepDatabaseName, pRepProvider);

    string csSource = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
    ,pSourceServerName, pSourceDBName, pSourceProvider);

    string csTarget = String.Format("Data Source={0};Initial Catalog={1};Provider={2};Auto Translate=False;"
    ,pTargetServerName, pTargetDBName, pTargetProvider);
    #>










    <#
    StringBuilder sMETAGetTableName = new System.Text.StringBuilder();

    sMETAGetTableName.Append("SELECT ");
    sMETAGetTableName.Append(" TableName ");
    sMETAGetTableName.Append(" , SchemaName ");
    sMETAGetTableName.Append(" , WhereClause ");
    sMETAGetTableName.Append("FROM ");
    sMETAGetTableName.Append(pRepSchema);
    sMETAGetTableName.Append(".TableList AS TAB ");
    sMETAGetTableName.Append("INNER JOIN ");
    sMETAGetTableName.Append(pRepSchema);
    sMETAGetTableName.Append(".Layer AS LYR ");
    sMETAGetTableName.Append(" ON ( TAB.LayerId = LYR.LayerId) ");
    sMETAGetTableName.Append("WHERE 1 = 1 ");
    sMETAGetTableName.Append(" AND TAB.ActiveInd = 1 ");
    sMETAGetTableName.Append(" AND LYR.LayerName = '");
    sMETAGetTableName.Append(pLayer);
    sMETAGetTableName.Append("' ");
    sMETAGetTableName.Append("ORDER BY ");
    sMETAGetTableName.Append(" TAB.LoadOrderNr");

    DataTable tblMETATableNames = ExternalDataAccess.GetDataTable(csRepository, sMETAGetTableName.ToString());
    foreach (DataRow METATableNameRow in tblMETATableNames.Rows) {
    #>
    <#=METATableNameRow["TableName"] #>" ConstraintMode="Linear">

    "
    ResultSet="None"
    ConnectionName="OLEDB Target">
    TRUNCATE TABLE <#=pTargetSchema #>.<#=METATableNameRow["TableName"] #>


    ">

    "
    ConnectionName="OLEDB Source"
    ValidateExternalMetadata="false">
    SELECT
    CAST(1 AS INTEGER) AS DUMMY_COLUMN
    <# StringBuilder sGETSelectColumn = new System.Text.StringBuilder();
    sGETSelectColumn.Append("SELECT " );
    sGETSelectColumn.Append(" col.name AS column_name " );
    sGETSelectColumn.Append("FROM sys.columns AS col " );
    sGETSelectColumn.Append("INNER JOIN sys.objects AS obj " );
    sGETSelectColumn.Append(" ON(col.object_id = obj.object_id) " );
    sGETSelectColumn.Append("INNER JOIN sys.types AS typ " );
    sGETSelectColumn.Append(" ON(col.user_type_id = typ.user_type_id)" );
    sGETSelectColumn.Append("WHERE 1 = 1 " );
    sGETSelectColumn.Append(" AND obj.name = '"+ METATableNameRow[0].ToString() +"'");
    DataTable tblSelectColumn = ExternalDataAccess.GetDataTable(csSource, sGETSelectColumn.ToString());
    foreach (DataRow SelectColumn in tblSelectColumn.Rows) {
    #>
    , [<#=SelectColumn["COLUMN_NAME"] #>]
    <# } #>FROM <#=METATableNameRow["SchemaName"] #>.<#=METATableNameRow["TableName"] #>
    WHERE <#=METATableNameRow["WhereClause"] #>



    "
    ConnectionName="OLEDB Target">
    " />






    <# } #>





    <#@ template language="C#" hostspecific="true"#>
    <#@ import namespace="System.Data"#>
    <#@ import namespace="System.Data.SqlClient"#>
    <#@ import namespace="System.Text"#>

    6) - Generate the package using BIML Express

    Once the BIML file has been created it is time to generate the SSIS package by using BIML Express in Visual Studio 2015.
    The package can be simply generated by right clicking the BIML package and selecting 'Generate SSIS Packages'.
    Using BIML Expres to generate the package

    The package has been generated by BIML

    Using BIML Expres to generate the package

    Summary

    In this post we create the following components;
    • Repository database and one schema
    • Staging database and one schema
    • Two meta tables to be used by the BIML script from this post (and possible future posts)
    • Three Staging tables
    • A BIML file called 'BIML Load_STG_Tables_From_Microsoft.biml'
    • A generated SSIS Package named 'SSIS STG Load STG Tables SQL.dtsx'

      Case
      The case is about importing flat files (CSV’s) without the necessity of metadata. Because BIML always checks if the tables are accessible before creating the packages, the first step is to create the tables with BIML and the second step is to create the SSIS packages for transporting the data.

      1. Creating tables in the database 
      2. Packages to fill this database

      Because of the size of the solution I’ve created two separate solutions, one for creating the tables and a second for creating the SSSI packages. You can click on the link to go to the other solution (which I will deploy later this month).


      Solution - Creating tables with BIML
      In this solution, we create a BIML that is going to create the tables and all the columns are defined as strings.
      We have to create two BIML scripts, the first script for defining the table definition and the second for creating the actual package.

      1) CSV files
      For this example we are using two CSV files (age and sickleave) which are comma separated and have columnnames on the first row. These columnnames will be used in the create table statement

      the drop map








      content csv file










      2) Tabledefinitions
      The first biml is called “1_TableDefinitions.biml”
      In this biml we define the path were the CSV files are located, an array with the names of the csv files and also some string which we going to use further in the code.
      We use two “foreach loops”, the first one loops trough the array with files and the second one loops trough the actual file (to extract the column names).

      Normally (without the loop) the code should look like this:

      < tables>
      <columns>
      <column datatype="Int32" identityincrement="1" identityseed="1" name="AgeID">
      <column datatype="Int32" name="AgeFrom">
      <column name="AgeTo">
      <column datatype="String" length="255" name="AgeCategoryEmployee">
      </columns>
      </columns>
      </tables>

      Default BIML uses INT as an default datatype, in this case we use a string.
      Now we add the loop in place and the complete code looks like this
      <Biml xmlns="http://schemas.varigence.com/biml.xsd">

      <#
      string Prefix="Man";

      // the locatie of the csv's'
      string path = @"D:\Drop\Man";
      // Put all the filenames with the extension csv in a string array
      string[] myFiles = Directory.GetFiles(path, "*.csv");
      // string that will be filled with the filename
      string filename;
      // string array for columnnames extracted from CSV
      string[] myColumns;
      #>
      <Connections>
      <OleDbConnection
      Name="OLEDB_STG_<#=Prefix#>"
      ConnectionString="Data Source=APPL43;Initial Catalog=dummy_STG;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;">
      </OleDbConnection>
      </Connections>
      <Databases>
      <Database ConnectionName="OLEDB_STG_<#=Prefix#>" Name="dummy_STG"/>
      </Databases>
      <Schemas>
      <Schema Name="dbo" DatabaseName="dummy_STG" Owner="dbo"/>
      </Schemas>
      <Tables>
      <!-- loop trough the array of files-->
      <# foreach(string filePath in myFiles)
      {
      // extract the filename from the path to use as tablename
      fileName = Path.GetFileNameWithoutExtension(filePath);

      #>
      <Table Name="<#=Prefix#>_<#=fileName#>" SchemaName="dummy_STG.dbo">
      <Columns>
      <!-- loop trough the file looking for the columnnames-->
      <#
      // read first row of csv to extract to columnnames
      // and split on comma to create an array
      StreamReader myFile = new StreamReader(filePath);

      myColumns = myFile.ReadLine().Split(',');
      // close file after reading first line
      myFile.Close();

      // Loop through column array
      foreach(string myColumn in myColumns)
      {
      #>
      <Column Name="<#=myColumn#>" DataType="String" Length="255"></Column>
      <# } #>
      </Columns>
      </Table>
      <# }#>
      </Tables>
      </Biml>

      <#@ template language="C#" hostspecific="true"#>
      <#@ import namespace="System.IO"#>


      3) CreateTable
      Secondly we are going to create the biml, called 2_CreateTables.biml. which creates the actual package that contains the create statements to generate the tables.
      BIML has an method to create SQL tables “RootNode.Tables.First().GetTableSql();”
      We use this method to create ‘SQL create statement’ the of table

      The code looks like this

      <Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Packages>
      <Package Name="CreateTables" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
      <Tasks>
      <#
      // Loop trough the table definition os the first biml
      foreach(var table in RootNode.Tables) {#>
      <ExecuteSQL Name="SQL - Drop_Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
      <DirectInput>
      <#=table.GetTableSql()#>
      </DirectInput>
      </ExecuteSQL>
      <# } #>
      </Tasks>
      </Package>
      </Packages>
      </Biml>

      <!--Includes/Imports for C#-->
      <#@ template language="C#" hostspecific="true"#>
      <#@ import namespace="System.Data"#>
      <#@ import namespace="System.Data.SqlClient"#>


      We’ve created 2 bimls, 1_TableDefinitions.biml and 2_CreateTables.biml. Now comes the important part (I’m using Biml Express) for generating the package. First we click on 1_TableDefinitions and secondly on and 2_CreateTables, if you have selected the 2 biml scripts  you click with your right mouse on 1_TableDefinitions.biml and generate SSIS packages. If you do this otherwise, you will get an empty SSIS package. .

      Generate SSIS package







      Below you can see the result of your BIML scripts: a package with an execute SQL Task for each table you need to create.
      Visual studio











      The actual create statement looks like this

      SET ANSI_NULLS ON
      SET QUOTED_IDENTIFIER ON
      GO
      -------------------------------------------------------------------
      IF EXISTS (SELECT * from sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Man_Age]') AND type IN (N'U'))
      DROP TABLE [dbo].[Man_Age]
      GO

      CREATE TABLE [dbo].[Man_Age]
      (
      -- Columns Definition
      [AgeID] nvarchar(255) NOT NULL
      , [AgeFrom] nvarchar(255) NOT NULL
      , [AgeTo] nvarchar(255) NOT NULL
      , [AgeCategoryEmployee] nvarchar(255) NOT NULL
      , [AgeCategoryClient] nvarchar(255) NOT NULL

      -- Constraints

      )
      ON "default"
      WITH (DATA_COMPRESSION = NONE)
      GO
      -------------------------------------------------------------------


      Summary
      We created two biml scripts one for creating to table definition and one which creates the actuale packages.The result in Management Studio looks like this.

      Management studio result














      In the next blog I’m going to explain how to create SSIS packages that transport the data from the csv files







      Case
      We have sensors in our building that output data every minute which we want to use in a dimensional model. Therefore we need a Time Dimension to aggregate data on parts of the day. How do you create and populate a Time Dimension?
      International Time Dimension

      









      Solution
      Since creating and populating a Time Dimension is a one-time job, T-SQL is probably the best way to accomplish this. For this example we need a Time Dimension on minute level and therefore we have created a loop that executes a insert query 1440 times (24hours * 60minutes).

      Besides the hours and minutes we also want to add a name of the day part in two languages and a couple of time strings in both 12 and 24hour notation to make this dimension usable for various countries.


      -- Drop the table if it already exists
      IF OBJECT_ID('dbo.DimTime', 'U') IS NOT NULL
      BEGIN
      DROP TABLE dbo.DimTime;
      END

      -- Then create a new table
      CREATE TABLE [dbo].[DimTime](
      [TimeId] [int] IDENTITY(1,1) NOT NULL,
      [Time] [time](0) NULL,
      [Hour] [int] NULL,
      [Minute] [int] NULL,
      [MilitaryHour] int NOT null,
      [MilitaryMinute] int NOT null,
      [AMPM] [varchar](2) NOT NULL,
      [DayPartEN] [varchar](10) NULL,
      [DayPartNL] [varchar](10) NULL,
      [HourFromTo12] [varchar](17) NULL,
      [HourFromTo24] [varchar](13) NULL,
      [Notation12] [varchar](10) NULL,
      [Notation24] [varchar](10) NULL
      );

      -- Needed if the dimension already existed
      -- with other column, otherwise the validation
      -- of the insert could fail.
      GO

      -- Create a time and a counter variable for the loop
      DECLARE @Time as time;
      SET @Time = '0:00';

      DECLARE @counter as int;
      SET @counter = 0;


      -- Two variables to store the day part for two languages
      DECLARE @daypartEN as varchar(20);
      set @daypartEN = '';

      DECLARE @daypartNL as varchar(20);
      SET @daypartNL = '';


      -- Loop 1440 times (24hours * 60minutes)
      WHILE @counter < 1440
      BEGIN

      -- Determine datepart
      SELECT @daypartEN = CASE
      WHEN (@Time >= '0:00' and @Time < '6:00') THEN 'Night'
      WHEN (@Time >= '6:00' and @Time < '12:00') THEN 'Morning'
      WHEN (@Time >= '12:00' and @Time < '18:00') THEN 'Afternoon'
      ELSE 'Evening'
      END
      , @daypartNL = CASE
      WHEN (@Time >= '0:00' and @Time < '6:00') THEN 'Nacht'
      WHEN (@Time >= '6:00' and @Time < '12:00') THEN 'Ochtend'
      WHEN (@Time >= '12:00' and @Time < '18:00') THEN 'Middag'
      ELSE 'Avond'
      END;

      INSERT INTO DimTime ([Time]
      , [Hour]
      , [Minute]
      , [MilitaryHour]
      , [MilitaryMinute]
      , [AMPM]
      , [DayPartEN]
      , [DayPartNL]
      , [HourFromTo12]
      , [HourFromTo24]
      , [Notation12]
      , [Notation24])
      VALUES (@Time
      , DATEPART(Hour, @Time) + 1
      , DATEPART(Minute, @Time) + 1
      , DATEPART(Hour, @Time)
      , DATEPART(Minute, @Time)
      , CASE WHEN (DATEPART(Hour, @Time) < 12) THEN 'AM' ELSE 'PM' END
      , @daypartEN
      , @daypartNL
      , CONVERT(varchar(10), DATEADD(Minute, -DATEPART(Minute,@Time), @Time),100) + ' - ' + CONVERT(varchar(10), DATEADD(Hour, 1, DATEADD(Minute, -DATEPART(Minute,@Time), @Time)),100)
      , CAST(DATEADD(Minute, -DATEPART(Minute,@Time), @Time) as varchar(5)) + ' - ' + CAST(DATEADD(Hour, 1, DATEADD(Minute, -DATEPART(Minute,@Time), @Time)) as varchar(5))
      , CONVERT(varchar(10), @Time,100)
      , CAST(@Time as varchar(5))
      );

      -- Raise time with one minute
      SET @Time = DATEADD(minute, 1, @Time);

      -- Raise counter by one
      set @counter = @counter + 1;
      END

      Below a very basic example of the usage of this Time Dimension. The LightsAttrium is 0 or 1 indicating whether the lights in the atrium are on or off. By using the average of that during a daypart (night, moning, etc) it shows the percentage of the lights being on. At night it's almost always on and during the afternoon it's off most of the time.
      Basic example of usage.












      Conclusion
      Above a very complete, but yet still basic, example of a Time Dimension. You can add or remove columns for your own specific purposes (let us know the additions in the comments below). Extending this to for example seconds is very easy. Just change the 1440 in the WHILE to 86400 (24*60*60) and change the DATEADD at the bottom to "second" instead of "minute".



      Case
      I have a list of teacher and student combinations with one combination per record and I want to created a comma delimited list of students per teacher.

      Solution
      If your source is a database then the easiest solution is a TSQL query like this:
      --TSQL Query
      WITH UniqueTeachers AS
      (
      SELECT DISTINCT Teacher
      FROM TeacherStudentTable
      )
      SELECT Teacher
      , Students = STUFF((
      SELECT ',' + Student
      FROM TeacherStudentTable
      WHERE Teacher = UniqueTeachers.Teacher
      ORDER BY Student
      FOR XML PATH(''), TYPE).value('.','varchar(100)'), 1, 1, '')
      FROM UniqueTeachers
      ORDER BY Teacher
      The query in SSIS as source



























      If your source is for example a flat file or a database that doesn't support a query like this, then there are also options within SSIS. For this solution I use a asynchronous Script Component.

      1) Sorted source
      We need a sorted source because we are comparing records with each other. In this case make sure the source is sorted on teacher first (and optional secondly on student).
      Add Sort transformation if source isn't sorted




















      2) Script Component
      Add a Script Component (type transformation) and select the Teacher and Student columns as ReadOnly input columns.
      Input columns: Teacher and Student




















      3) Asynchronous
      We need to make the Script Component asynchronous because it throws out a different number of rows than there are incomming. Go to the Inputs and Outputs pane, click on Output 0 and change the SynchronousInputID to None.
      Asynchonous



















      4) Output
      We now need to create an output for the Script Component. Expand the Output 0 and add two columns:
      Teacher (same data type and size as the input column teacher)
      Students (same data type as the input column student, but larger to fit multiple student names)
      Output columns




















      5) The Script
      Copy the three variables and the two methods to your Script Component (and remove any other existing methods).
      // C# Code
      using System;
      using System.Data;
      using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
      using Microsoft.SqlServer.Dts.Runtime.Wrapper;

      [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
      public class ScriptMain : UserComponent
      {
      bool initialRow = true; // Indicater for the first row
      string teacher = ""; // Name of the teacher to track teacherchanges between rows
      string students = ""; // The comma delimited list of students

      public override void Input0_ProcessInput(Input0Buffer Buffer)
      {
      // Loop through buffer
      while (Buffer.NextRow())
      {
      // Process an input row
      Input0_ProcessInputRow(Buffer);

      // Change the indicator after the first row has been processed
      initialRow = false;
      }

      // Check if this is the last row
      if (Buffer.EndOfRowset())
      {
      // Fill the columns of the existing output row with values
      // from the variable before closing this Script Component
      Output0Buffer.Teacher = teacher;
      Output0Buffer.Students = students;
      }
      }

      public override void Input0_ProcessInputRow(Input0Buffer Row)
      {
      if (initialRow)
      {
      // This is for the first input row only

      // Create a new output row
      Output0Buffer.AddRow();

      // Now fill the variables with the values from the input row
      teacher = Row.Teacher;
      students = Row.Student;
      }
      else if ((!initialRow) & (teacher != Row.Teacher))
      {
      // This isn't the first row, but the teacher did change

      // Fill the columns of the existing output row with values
      // from the variable before creating a new output row
      Output0Buffer.Teacher = teacher;
      Output0Buffer.Students = students;

      // Create a new output row
      Output0Buffer.AddRow();

      // Now fill the variables with the values from the input row
      teacher = Row.Teacher;
      students = Row.Student;
      }
      else if ((!initialRow) & (teacher == Row.Teacher))
      {
      // This isn't the first row, and the teacher did not change

      // Concatenate the studentsname to the variable
      students += "," + Row.Student;
      }
      }

      // Little explanation:
      // Rows are created in memory with .AddRow()
      // and will be submitted to the output when a
      // new / subsequent row is created or when
      // the last buffer has been finished.
      }

      or in VB.Net

      'VB.Net code
      Imports System
      Imports System.Data
      Imports System.Math
      Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
      Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

      <Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
      <CLSCompliant(False)> _
      Public Class ScriptMain
      Inherits UserComponent

      Private initialRow As Boolean = True ' Indicater for the first row
      Private teacher As String = "" ' Name of the teacher to track teacherchanges between rows
      Private students As String = "" ' The comma delimited list of students

      Public Overrides Sub Input0_ProcessInput(Buffer As Input0Buffer)
      ' Loop through buffer
      While Buffer.NextRow()
      ' Process an input row
      Input0_ProcessInputRow(Buffer)

      ' Change the indicator after the first row has been processed
      initialRow = False
      End While

      ' Check if this is the last row
      If Buffer.EndOfRowset() Then
      ' Fill the columns of the existing output row with values
      ' from the variable before closing this Script Component
      Output0Buffer.Teacher = teacher
      Output0Buffer.Students = students
      End If
      End Sub

      Public Overrides Sub Input0_ProcessInputRow(Row As Input0Buffer)
      If initialRow Then
      ' This is for the first input row only

      ' Create a new output row
      Output0Buffer.AddRow()

      ' Now fill the variables with the values from the input row
      teacher = Row.Teacher
      students = Row.Student
      ElseIf (Not initialRow) And (teacher <> Row.Teacher) Then
      ' This isn't the first row, but the teacher did change

      ' Fill the columns of the existing output row with values
      ' from the variable before creating a new output row
      Output0Buffer.Teacher = teacher
      Output0Buffer.Students = students

      ' Create a new output row
      Output0Buffer.AddRow()

      ' Now fill the variables with the values from the input row
      teacher = Row.Teacher
      students = Row.Student
      ElseIf (Not initialRow) And (teacher = Row.Teacher) Then
      ' This isn't the first row, and the teacher did not change

      ' Concatenate the studentsname to the variable
      students += "," & Convert.ToString(Row.Student)
      End If
      End Sub

      ' Little explanation:
      ' Rows are created in memory with .AddRow()
      ' and will be submitted to the output when a
      ' new / subsequent row is created or when
      ' the last buffer has been finished.
      End Class

      Note: You can change the delimiter in C# line 70 and VB.Net line 64.


      6) Testing
      For testing purposes I added a derived column and a couple of data viewer.
      The result

















      

      Conclusion: both methods have the same result. For a SQL database source, the T-SQL method is probably a little faster. If you can do the sort in the source the performance differences will diminish.

      In one of my next blog posts I will show you how to split a comma separated list in to records.
      Case
      When I join two flows in SSIS, the flows need to be sorted. According to the Performance Best Practices I try to sort them in the source components because that's better for performance. Ofcource that's not always possible, for instance when your source is a flat file, and then you need the SSIS sort component.

      But watch out when combining these two sort methods. Because SSIS and SQL don't sort the same way! The sort of SSIS is case sensitive and the sort in SQL Server is case insensitive. See the result when you combine these two. This will result in unwanted situations and missing joins. Is there a solution?
      Click to enlarge





















      Solution
      This has to do with the different sorting collations. SSIS uses the Windows collation (Case-Sensitive) and SQL Server uses SQL collation (Case-Insensitive, by default). You can either adjust the SQL sort to SSIS or the SSIS sort to SQL.

      Adjust SQL Sort to SSIS: Case-Sensitive ORDER BY
      You can solve this by changing the source query of the sorted source:
      -- Notice the extra COLLATE
      SELECT myStringColumn
      FROM myTable
      ORDER BY myStringColumn
      COLLATE Latin1_General_CS_AS_WS


      The suffix CS_AS_WS stands for Case-sensitive, accent-sensitive, kana-insensitive, width-sensitive. See the compleet suffix list at msdn. This new query will result in a correct join in SSIS.
      Both the same sort






















      Adjust SSIS Sort to SQL: Case-Insensitive Sort Transformation
      Important: This solution has one side affect. The merge join will also be Case Insensitive!

      1) Source
      In the sorted OLE DB source you already used the advanced editor to tell SSIS this source is sorted with an ORDER BY in the query, but you now also need to tell SSIS that it is sorted Case-Insensitive by setting the ComparisonFlags property to Ignore case. Note: this doesn't change the actual sorting. You are just telling SSIS how it is sorted.
      Indicate that source is sorted Case-Insensitive
























      2) Sort
      In the SORT Transformation you also need to set the Comparison Flags property to Ignore case. Note: This will change the actual sorting.
      
      Setting the Comparison Flags property to Ignore case

























      3) Result
      Now both flows are sorted the same and the Merge Join works as a Case-Insensitive Inner join (similar to a T-SQL INNER JOIN).
      Like a T-SQL Inner Join


















      Case
      I'm searching for the right column in a large Microsoft SQL database. Is there a way to find all columns with a certain name without manually browsing through all tables?

      Solution
      A colleague once gave me this handy query. It searches for column names in an entire Microsoft SQL database. An indispensable query when you have to do a little research while building SSIS packages.

      --Search for column name in database
      SELECT tab.name as [Table/View]
      , CASE
      WHEN tab.xtype = 'V' THEN 'view'
      ELSE 'table'
      END as [Type]
      , col.name as [ColumnName]
      , typ.name as [Datatype]
      FROM dbo.syscolumns as col
      , dbo.sysobjects as tab
      , dbo.systypes as typ
      WHERE col.id = tab.id
      AND col.xtype = typ.xtype
      AND tab.xtype in ('V','U')
      -- Pick on of these possibilities
      --AND col.name = 'version'
      --AND col.name like '%version%'

      Select the right database and change commented out lines.
      The result