Showing posts with label C#. Show all posts
Showing posts with label C#. Show all posts
Case
I want to create an Azure Function with C# code. How do I create and deploy one in Azure (and use it Azure Data Factory)?
Write C# in Visual Studio Code to create an Azure Function











Solution
In this blogpost we will create and deploy a very simple 'hello world' Azure Function with an HTTP trigger which you can extend to your own needs. After that we can use the Azure Data Factory pipeline with an Azure Function activity to execute it.

As an alternative you could also create an Azure Function with a Blob Storage trigger that executes when a new file arrives, but we rather want to use that same trigger type to start an Azure Data Factory pipeline that then starts this Function followed by other pipeline activities. This way we have one place that does the triggering/orchestration.

1) Create new Azure Function project
Please first follow the steps of our previous post on how to prepare Visual Studio Code for creating Azure Functions with C#. After that open Visual Studio code and perform the steps below to create your first hello world Function.
  • In Visual Studio code click on the Azure icon in the left menu.
  • In the newly opened pane click on the folder with the lightning icon on it to create a new project. (An additional function can later-on be added to the project with the Lightning-plus icon)
  • Select the folder of the new project (or use the Browse... option)
  • Next select C# as coding language
  • Select HTTP trigger as the template for this example
  • Enter the Function name. This is the name of the function within the project (that can contain multiple functions). Example: myCSharpFunction
  • Provide a namespace: Bitools.Function
  • For this test example use anonymous as Authorization level
  • The project has been created, but their could be an additional action in step 2
Create new Azure Function project

















2) Unresolved dependencies
This extra step seems to be a bug in the Azure Function extension for C# in Visual Studio code. After the project has been generated Visual Studio Code will show the following error in the lower right corner. This seems to refer to some missing references.
There are unresolved dependencies.
Please execute the restore command to continue.













If you don't get this error then Microsoft probably solved the bug. When you do get it, the only thing you have to do is clicking the Restore button. After that some extra files will be added in the obj folder of your project. (see previous post to compare extension versions)
Execute the restore command

Spot the differences

























3) Code in myCSharpFunction.cs
The file 'myCSharpFunction.cs' contains your C# code. The name could be different when you gave your function a different name. Below you see the standard / generated code with some extra comment lines for if you are new to C#. For this example we do not extend the code.
// This section lists the namespaces that this function will be using frequently,
// and saves the programmer from specifying a fully qualified name every time that
// a method that is contained within is used
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;

namespace Bitools.Function
{
public static class myCSharpFunction
{
// Main function and entry point of this Azure Function
[FunctionName("myCSharpFunction")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
// Log information
log.LogInformation("C# HTTP trigger function processed a request.");

// Retrieve parameter 'name' from querystring
string name = req.Query["name"];

// Also try to retrieve the same parameter from the request body
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
// If not found in querystring then use requestbody
name = name ?? data?.name;

// If name is still empty throw an error that a name parameter
// is expected else response with a greeting
string responseMessage = string.IsNullOrEmpty(name)
? "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response."
: $"Hello, {name}. This HTTP triggered function executed successfully.";

return new OkObjectResult(responseMessage);
}
}
}

4) Debug locally
Now we are going to test the Azure Function locally on our Windows device. There are multiple ways to start debugging. Pressing F5 is probably the easiest. See animated gif for more options.
  • In the Run menu on the top of the screen you will find the Start Debugging option. 
  • The terminal pane on the bottom will show a lot of details. Wait a few seconds for it to finish and click (while also pressing CTRL) on the green URL.
  • A new browser window will open and it shows the error output that it cannot find the name parameter.
  • In the browser add a querystring after the URL: ?name=Joost (or your own name of course). Now it will respond with a greeting
  • Close the browser and then hit the disconnect icon on top to stop debugging
Debugging your function locally

















5) Create Azure Function in Azure Portal
Before you can deploy your newly created function you first need to create an Azure Function in the Azure portal.
  • Go to the Azure Portal and click on Create a resource
  • Search for function and select Function App
  • Click on the Create button
  • On the Basics tab you find the most important settings
  • Select your Subscription and Resource Group
  • Enter an unique Function App name
  • Select .NET Core as Runtime stack
  • Select 3.1 as Version 
  • Select the Region (probably the same as your Resource Group)
  • Optionally go to the Hosting tab for extra settings
  • Choose a new or existing Storage account
  • Change the Plan type (default: Serverless)
  • Optionally go to the Monitoring tab for extra settings
  • Disable or enable Application insights and change its name
  • Click the Review + create button
  • Review the settings and click on the Create button
Create new Azure Function (app) on Azure portal















Note 1: you cannot create an Azure Function with a Windows worker (.NET Core) if there is already a Linux worker (Python) in that same resource group and with the same region.

Note 2: you could also perform these steps within Visual Studio Code during deployment.

6) Deploy Azure Function to Azure Portal
Now that we have an (empty) Azure Functions app in the Azure portal we can deploy our newly created Azure Function to this resource.
  • In Visual Studio code click on the Azure icon in the left menu.
  • In the newly opened pane click on the blue arrow (deploy) icon
  • In the drop down select your Azure Functions App from the previous step
Deploy Azure Functions from Visual Studio Code

















7) Testing in portal
Now that we have deployed our project to Azure Functions we can test it in the Azure Portal. For this example we will use the post method.
  • Go to the Azure Portal and then open your Azure Functions App
  • In the left menu click on Functions
  • In the list of functions click on your function (only one in this example)
  • In the left menu click on Code + Test
  • Click on the test button (top center)
  • Change the HTTP method to post
  • Select one of the keys
  • Enter a JSON message in the body: {"name":"Joost"} (name=case-sensitive)
  • Click on the Run button and see the result
Testing in the Azure Portal

















8) Executing from Azure Data Factory
Now if you want to execute this new Azure Function in Azure Data Factory with the Azure Function Activity you can follow the steps in this previous post. However without code changes it will return an error stating that the response is invalid: 3603 - Response Content is not a valid JObject
3603 - Response Content is not a valid JObject














At the moment it is returning a so called JArray, but it is expecting a JObject (J = JSON). Any other return types than JObject will throw the error above. To overcome this we need a minor code change at the end by changing the return construction at line 37.
// This section lists the namespaces that this function will be using frequently,
// and saves the programmer from specifying a fully qualified name every time that
// a method that is contained within is used
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;

namespace Bitools.Function
{
public static class myCSharpFunction
{
// Main function and entry point of this Azure Function
[FunctionName("myCSharpFunction")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
// Log information
log.LogInformation("C# HTTP trigger function processed a request.");

// Retrieve parameter 'name' from querystring
string name = req.Query["name"];

// Also try to retrieve the same parameter from the request body
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
name = name ?? data?.name;

// If name is still empty throw an error that a name parameter
// is expected else response with a greeting
return name != null
? (ActionResult)new OkObjectResult(new {message = "Hello " + name})
: new BadRequestObjectResult("Pass a name in the query string or in the request body for a personalized response.");

/*
string responseMessage = string.IsNullOrEmpty(name)
? "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response."
: $"Hello, {name}. This HTTP triggered function executed successfully.";

return new OkObjectResult(responseMessage);
*/
}
}
}

Below a couple of screenshots on how to configure and test this in Azure Data Factory. You might want to store the Function key in Azure Key Vault to avoid keys in your ETL/ELT code.
Set up the Azure Function Activity in Azure Data Factory

















After configuring the Azure Function activity you can hit the debug button and see the result. This output could then be used as input for successive pipeline activities
Successfully executing and getting the response


















Conclusion
First a big thank you to colleague Walter ter Maten for always helping me out with the C# stuff. In this blog post you learned how to create, test and deploy your first (very basic) Azure Function App with C# code. Then we also showed you how to execute this from Azure Data Factory. In a couple of follow up posts we will show you how to build some useful functions for DWH projects and show you some technical stuff like adding Azure Key Vault to the game. Also check out the Python version of this blogpost.


Case
I want to create Azure Functions on my Windows device, but which tools and extensions do I need to install?
Write Python or C# in Visual Studio code to create Azure Functions









Solution
In this blogpost we will show you which tools you need to install to create an Azure Function with either .NET or with Python code. The screenshots are of the current versions at the time of writing, but you might just want to take the latest stable version when downloading. In the upcoming Azure Functions posts we will create some basic Hello World functions to show the basics of creating and deploying your first Function. After that we will show some more useful functions for the Data Warehousing developers. For example to convert Excel or XML files to an easier readable format for Azure Data Factory or Synapse Polybase: CSV.

1) Download and install Visual Studio Code
For this blog post we will be using Visual Studio Code instead of the regular Visual Studio. Where this 'regular' Visual Studio is a so called Integrated Development Environment (IDE), the newer Visual Studio Code is more a lightweight source code editor. Ideal for some coding with PowerShell, C# or Python. Use the link below to download Visual Studio Code and then install it.
https://code.visualstudio.com/download
Installing Visual Studio Code




















2) Install extensions for Python
If you want to use Python for your Azure Functions you need to install Python for Windows and the Python extension for Visual Studio code. First download and install Python for windows 64bit. The default is a 32 bit version, but you can also find the 64 bit version slightly down the page (search for Windows x86-64 executable installer).
https://www.python.org/downloads/windows/
Install Python for Windows 64bit
















Then install the Python extension for Visual Studio code. When clicking on the install button on the website it will ask to open it with Visual Studio Code. Within Visual Studio Code you have to click on the install button again. After installation it will ask to point to the previously installed Python interpreter.
https://marketplace.visualstudio.com/items?itemName=ms-python.python
Install Python extension for Visual Studio code


















Note: You can also use the Extensions icon in the left menu of Visual Studio code to search for this specific extension.

There is one last installation required: Linter Pylint, but Visual Studio Code will ask for it when creating your first Azure Function with Python code: Linter pylint is not installed.
Install Linter pylint for Visual Studio Code







Installing Linter pylint within Visual Studio Code














3) Install extentions for C#
If you want to create Azure Functions with C# then you first need to install .Net Core SDK. Make sure to install the version to Build apps (Run Apps is not sufficient). The minimum version is .NET Framework 4.7.2 or .NET Core 2.2, but try the most recent version depending on your needs and the Runtime version of Azure Functions. If you forget this then you will recieve an error while trying to create a C# Function.
Receive an error when have not installed .NET Core SDK











Furthermore you should install the C# extension from Microsoft. This extension is not mandatory, but will be recommended when creating your first C# Azure Function.
Install C# extension for Visual Studio code














Note: You can also use the Extensions icon in the left menu of Visual Studio code to search for this specific extension.

4) Install extensions for Azure Functions
Next extension is Azure Functions for Visual Studio code. When clicking on the install button on the website it will ask to open it in Visual Studio Code. Within Visual Studio Code you have to click on the install button again.
https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-azurefunctions
Azure Functions for Visual Studio Code














Note: You can also use the Extensions icon in the left menu of Visual Studio code to search for this specific extension.

5) Install Azure Functions Core Tools
To make you able to debug the Azure Function code locally we need Azure Functions Core Tools, but to install that we first need to install Node Package Manager (NPM) which is included in nodejs (more detailed info here).
https://nodejs.org/en/download/
Install NodeJS with NPM




















Last step of this installation is to open a Command Promt (or PowerShell promt) in Administrator mode to install Azure Functions Core Tools. With the command npm -v you can check your npm version. Now use the following command for the installation (more detailed info here):
npm i -g azure-functions-core-tools@3 --unsafe-perm true
Install Azure Functions Core Tools via command prompt















Conclusion
In this introduction post you read which tools and extensions to install to create Azure Functions. Quite a lot installations, but manageable when following the steps above. We focused on the most popular languages (in the DWH scene) C# and Python, but there are way more languages to choose from like Java(script) or PowerShell. Each with its own extensions.

As mentioned before the next post about Azure Functions will be about deploying your first simple function with Python or C#. After that we will focus on the more functional Azure Functions solutions, but with a focus on the DWH scene. Also bringing Azure Key Vault to the game is a must for Azure Functions.

Case
I want to calculate a hash value for a couple of columns in SSIS. In T-SQL you can use HASHBYTES, but that doesn't work for other sources like flat files and for SQL 2012/2014 the input is limited to only 8000 bytes. Is there an alternative for HASHBYTES?
Calculating a hash value over multiple columns















Solution
There are several alternatives for the T-SQL HASHBYTES. First of all there are various custom components available for SSIS like the SSIS Multiple Hash on codeplex, but if you don't want to (or cannot) use custom components, you can accomplish the same result with a little .NET scripting. If you really want to stick to T-SQL, then you can also first stage your files in a table and then calculate the hash with T-SQL afterwards. This blog will show you the scripting solution.

But first, why do you need a hash? When you want to keep track of history with a Persistent stage, Data Vault or Data Warehouse you want to know whether the record from the stage layer is different then the one you have in your historical data layer. You could check each column one by one, but when you have a whole bunch of columns that could be a lot of work and a bit slow.

A hash in ETL is used to generate a single, corresponding (but shorter) value for a whole bunch of columns. It is stored in the stage table as a new column. If one character changes in one of those columns then the hash value will also be different. When comparing the two records (one from the stage layer and one from the historical layer) you now only have to compare the hash value. If it did not change you know you don't have to process the record in your historical layer. Since you only want to calculate the hash once (in the stage package) you will also store it in the historical layer.

Now it is time to explain the scripting solution

1) Starting point
The starting point of this example is a Data Flow Task with a Flat File source component.
Flat File Source












2) Script Component - Input Columns
Add a new Script Component (transformation) to the Data Flow Task. Give it a suitable name and connect it to your flow. Then edit it and select all columns you want to hash on the Input Columns pane. Since we are not changing the existing column you can keep the default Usage Type 'ReadOnly'.
Script Component Input Columns

















Which columns do you want to hash? Three most chosen options:
  1. If you do not know the Primary Key: select all columns to calculate the hash.
  2. If you do know the Primary Key: select all columns except the Primary Key to calculate the hash.
  3. If the Primary Key consists of multiple columns you could even calculate a separate hash for the key columns only.
3) Script Component - Output Column
We need to store the calculated hash in a new column. Go to the Inputs and Outputs pane and add a new column in Output 0. The data type is string and the size depends on which hash algoritme you want to use. For this example we use the MD5 algoritme which returns a 128 bits hash. When you convert that to an ASCII string it would be a 32 character string (that only contains hexadecimal digits).
Script Component Inputs and Outputs























4) Script Component - The script preparation
Now we are almost ready to add the actual script. Go to the Script pane. Select your scripting language. This example will be in C#. Then hit the Edit Script... button to start the Vsta environment. This is a new instance of Visual Studio and will take a few moments to start.
Edit Script...























Optional:
I always start by removing all unnecessary methods and comments to keep the code clean. For this example we do not need the PreExecute and PostExecute methods and I do not want to keep the default help comments.
Clean up before start



















5) Script Component - The code
First we need to add two extra usings to shorten the code. Unfold the Namespaces region at the top and add the following usings:
using System.Security.Cryptography;
using System.Text;

Then Locate the method called Input0_ProcessInputRow and add a new GetMd5Hash method below this existing method (below its closing }). The new method is copied from this MSDN page. I only changed the encoding to Unicode (see note 1):
static string GetMd5Hash(MD5 md5Hash, string input)
{
// Convert the input string to a byte array and compute the hash.
byte[] data = md5Hash.ComputeHash(Encoding.Unicode.GetBytes(input));

// Create a new Stringbuilder to collect the bytes
// and create a string.
StringBuilder sBuilder = new StringBuilder();

// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}

// Return the hexadecimal string.
return sBuilder.ToString();
}


And at last change the code of the existing method Input0_ProcessInputRow to:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string Separator = "|";
string RowData = "";

using (MD5 md5Hash = MD5.Create())
{
Row.Hash = GetMd5Hash(md5Hash, RowData = (Row.Title_IsNull ? "" : Row.Title.ToString()) + Separator + (Row.FirstName_IsNull ? "" : Row.FirstName.ToString()) + Separator + (Row.MiddleName_IsNull ? "" : Row.MiddleName.ToString()) + Separator + (Row.LastName_IsNull ? "" : Row.LastName.ToString()));
}
}


The code above first concatenates all columns with a separator between them (see note 2) and it checks whether the value isn't NULL because we cannot add NULL to a string (see note 3). You will see that it repeats this piece of code for each column before calling the hash method:
(Row.Title_IsNull ? "" : Row.Title.ToString()) + Separator
For the first record in our example it will hash the following text: Mr.|Syed|E|Abbas
And for the third row that contains a null value it will hash the this text: Ms.|Kim||Abercrombie

6) Testing the code
After closing the Vsta editor and clicking OK in the Script Component to close it, add a dummy Derived Column behind it and add a Data Viewer to see the result.
Similar result to T-SQL HASHBYTES


















Note 1:
When you want the exact same result as with T-SQL HASHBYTES then you have to make sure you use the same encoding. Otherwise you get a different hash. In the method GetMd5Hash on the first line of code you see Encoding.Unicode.GetBytes(. There are more options besides Unicode. For example: ASCII, UTF7, UTF8, UTF32, etc. etc. However, as long as you don't have to compare hashes generated by to different methods (T-SQL and .Net) it doesn't matter. In this stackoverflow post you find more examples.

Note 2:
The column separator is added to prevent unwanted matches. If you have these two records with two columns:
Column1Column2
123456
123456
Without the separator these two will both get concatenated to 123456 and therefor generate the same hash. With the separator you will have two different values to hash: 123|456 and 12|3456. Choose your separator wisely. The number 3 would not be a wise choice in this case.

Note 3:
In the code you see that the columns are checked for null values because you cannot add null to a string. The null values are replace with an empty string. However this shows a bit of an imperfection of this method, because a string with a null value isn't the same as an empty string. To overcome this you could use a different string that is likely to occur in your text. For Numeric and Date data types you could just add an empty string, something like:
(Row.MyNumberColumn_IsNull ? "" : Row.MyNumberColumn.ToString()) + Separator
(Row.MyDateColumn_IsNull ? "" : Row.MyDateColumn.ToString()) + Separator


Note 4:
md5 only uses 128 bits and there are better, saver (, but also a bit slower) methods to calculate hashes:
SHA and SHA1 - 160 bits
SHA2_256 - 256 bits
SHA2_512 - 512 bits

Safer? As long as you don't use it to hash passwords you are still OK with md5.
Better? In rare cases two different strings could return the same md5 hash, but you have a higher chance to win the galaxy lottery.

Rather use SHA2_512? Just use this code instead:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string Separator = "|";
string RowData = "";

using (SHA512 shaHash = new SHA512Managed())
{
Row.hash2 = GetShaHash(shaHash, RowData = (Row.Title_IsNull ? "" : Row.Title.ToString()) + Separator + (Row.FirstName_IsNull ? "" : Row.FirstName.ToString()) + Separator + (Row.MiddleName_IsNull ? "" : Row.MiddleName.ToString()) + Separator + (Row.LastName_IsNull ? "" : Row.LastName.ToString()));
}
}

static string GetShaHash(SHA512 shaHash, string input)
{
// Convert the input string to a byte array and compute the hash.
byte[] data = shaHash.ComputeHash(Encoding.Unicode.GetBytes(input));

// Create a new Stringbuilder to collect the bytes
// and create a string.
StringBuilder sBuilder = new StringBuilder();

// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}

// Return the hexadecimal string.
return sBuilder.ToString();
}

Too much columns => too much coding?
In my book Extending SSIS with .NET Scripting you will find a script component example that just loops through all columns to generates the hash. No money to buy it? I used this code as the base for that script.
An other alternative is to generate the Script Component and its code with BIML. Here is an example of a Script Component in BIML, but getting the hash to work is a bit of a challenge but doable.
Case
I want to use the users from my Active Directory in my Data Warehouse for an Employee Dimension. How can I accomplish that in SSIS?














Solution
With a little .Net scripting in a Script Component you can accomplish an Active Directory source in your Data Flow Task. Before you start you need to figure out which fields are available in your Active Directory. If you don't know the fieldnames you could set a breakpoint on the foreach loop and add a watch on the result variable. Then you could browse through the properties of that variable to find all available fields.

1) Script Component Source
Create a Data Flow Task and add a Script Component and choose "Source" as the type. After that give you source a suitable name like "SRC_SCR - Active Directory Users".
Script Component - Source





















2) Output columns
Next edit the Script Component and go to the Inputs and Outputs pane. This is where we need to specify all the new output columns and datatypes. For my Active Directory example I have added 5 string (DT_STR) columns: Fullname, Firstname, Surename, Department and Manager. You have to figure out the required length. You could start with the default 50 or change it to a more save 255.
Add new output columns

























3) Edit script
Go back to the first page and choose C# as your scripting language and then hit the Edit Script buton to open the vsta environment.
Edit script

























4) Reference
To tell our script about Active Directory, we first need to add a new reference to the .NET assembly System.DirectoryServices. In the Solution Explorer right click References and
 choose Add Reference... Then scroll down, select System.DirectoryServices and click OK to confirm.
Add reference

















5) The Script - Namespaces
Under namespaces add a new using for our newly referenced assembly: using System.DirectoryServices; This allows you to shorten the code.
Add using



















6) The Script - CreateNewOutputRows
Now replace your CreateNewOutputRows method with the one below and add the GetPropertyValue method from below within your class tags under the existing methods. You could remove the PostExecute and PreExecute methods since we don't need them in this example.

The CreateNewOutputRows method adds new rows to the buffer by calling Output0Buffer.AddRow(); within a loop construction and then it fills all columns with data from the Active Directory. Before the foreach loop you find the code to connect to the Active Directory with some optional filter.
public override void CreateNewOutputRows()
{
// Specify the connnectionstring of your domain
// @mycompany.com => LDAP://DC=mycompany,dc=com
// Consider using a variable or parameter instead
// of this hardcoded value. On the other hand
// how many times does your domain changes
string domainConnectionString = "LDAP://DC=ilionx,dc=com";

using (DirectorySearcher ds = new DirectorySearcher(new DirectoryEntry(domainConnectionString)))
{
ds.Filter = "(&" +
"(objectClass=user)" + // Only users and not groups
"(department=*)" + // All departments
"(givenname=j*)" + // Firstname starts with j
")";
// See ds. for more options like PageSize.
//ds.PageSize = 1000;

// Find all persons matching your filter
using (SearchResultCollection results = ds.FindAll())
{
// Loop through all rows of the search results
foreach (SearchResult result in results)
{
// Add a new row to the buffer
Output0Buffer.AddRow();
// Fill all columns with the value from the Active Directory
Output0Buffer.Fullname = GetPropertyValue(result, "cn");
Output0Buffer.Surename = GetPropertyValue(result, "sn");
Output0Buffer.Department = GetPropertyValue(result, "department");
Output0Buffer.Manager = GetPropertyValue(result, "manager");
Output0Buffer.Firstname = GetPropertyValue(result, "givenname");
}
}
}
}

// Extra method to avoid having an if construction around each column
// It checks whether it can find a value. If not it returns an empty string.
private static string GetPropertyValue(SearchResult Results, string Property)
{
// Null value results in count zero
if (Results.Properties[Property].Count > 0)
{
return Results.Properties[Property][0].ToString();
}
else
{
return "";
}
}


7) The result
To test the result add a dummy Derived Column and a Data Viewer to see the values. If it works you can add the rest of the components to the Data Flow Task to fill your dimension table.
The result of my script
















Summary
A very simple and short script to get data from your Active Directory. For more filter examples visit
MSDN. For very large Active Directories you have to play with the PageSize property.
Note that there are also scriptless options like with an ADO.Net source or with Third Party components. Each solution has its own pros and cons.




Case
I am filling an Object variable with an Execute SQL Task and I want to use it in a Foreach Loop Container (Foreach ADO Enumerator), but the Foreach Loop stays empty. So I want to check the value of my Object variable. However debugging the package does not show me the value of Object variables. How can I see the content of my Object variable?

No (readable) value for Object variables





















Solution
A solution could be to use a Script Task after the Execute SQL Task to show the content of the Object variable. The script below shows the top (x) records in a MessageBox. The code doesn't need any changes. The only change that you could consider to make is changing the number of records to show in the MessageBox (see C# variable maxRows).
Getting content of Object variable



















1) Add a Script Script Task
Add a new Script Task to the surface of your Control Flow and connect it to your Execute SQL Task. Then edit the Script Task to provide one Object variable in the property ReadOnlyVariables or ReadWriteVariables. This should of course be the same Object variable as in your Execute SQL Task.
Provide one Object variable























2) Edit Script
Make sure to select Microsoft Visual C# as Script Langugage and then hit the Edit Script button to open the Vsta environment. Then first locate the Namesspaces to add an using for System.Data.OleDb.
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.OleDb; // Added
#endregion

Then scroll down and located the Main method "public void Main()" and replace it with the code below.
public void Main()
{
// Show max number of data rows in a simgle messagebox
int maxRows = 3;

/////////////////////////////////////////////////////////////////////
// No need to change lines below
/////////////////////////////////////////////////////////////////////

// Create a table object to store the content of the object variable
OleDbDataAdapter dataAdapter = new OleDbDataAdapter();
DataTable myTable = new DataTable();

// Create message string to show the content of the object variable
string message = "";
string header = "Error";

// Five checks before looping through the records in the object variable
////////////////////////////////////
// 1) Is a variable provided?
////////////////////////////////////
if (Dts.Variables.Count.Equals(0))
{
message = "No read-only or read-write variables found";
}
////////////////////////////////////
// 2) Multiple variables provided
////////////////////////////////////
else if(Dts.Variables.Count > 1)
{
message = "Please provide only 1 read-only or read-write variable";
}
////////////////////////////////////
// 3) Is it an object variable?
////////////////////////////////////
else if (!Dts.Variables[0].DataType.ToString().Equals("Object"))
{
message = Dts.Variables[0].Name + " is not an Object variable";
}
////////////////////////////////////
// 4) Is it null or not an table?
////////////////////////////////////
else
{
try
{
// Try to fill the datatable with the content of the object variable
// It will fail when it is null or not containing a table object.
dataAdapter.Fill(myTable, Dts.Variables[0].Value);
}
catch
{
// Failing the third check
message = Dts.Variables[0].Name + " doesn't contain a usable value";
}
}

////////////////////////////////////
// 5) Is it containing records
////////////////////////////////////
if (myTable.Rows.Count > 0)
{
int j = 0;
// Loop through all rows in the dataset but don't exceed the maxRows
for (j = 0; j < myTable.Rows.Count && j < maxRows; j++)
{
// Get all values from a single row into an array
object[] valuesArray = myTable.Rows[j].ItemArray;

// Loop through value array and columnnames collection
for (int i = 0; i < valuesArray.Length; i++)
{
message += myTable.Rows[j].Table.Columns[i].ColumnName + " : " + valuesArray[i].ToString() + Environment.NewLine;
}
// Add an empty row between each data row
message += Environment.NewLine;
}

// Create header
header = "Showing " + j.ToString() + " rows out of " + myTable.Rows.Count.ToString();
}
else if (!message.Equals(""))
{
// Don't do anything
// Record count is 0, but an other validition already failed
}
else
{
// Record count is 0
message = Dts.Variables[0].Name + " doesn't contain any rows";
}

// Show message with custom header
MessageBox.Show(message, header);

Dts.TaskResult = (int)ScriptResults.Success;
}
Now close the Vsta environment and click on OK in the Script Task editor to finish it.


3) The result
Now run the package to see the result. I tried to make it a bit monkey proof by adding some checks in the code. If you provide a good and filled variable then it will show the data. Otherwise it will show an error telling you what's wrong.
The result




Case
For a client I need to read hundreds of bus route matrices and they all vary in size. This makes it hard to read them dynamically with a Foreach Loop Container because the number of columns differs per file. And I don't want to create hundreds of Data Flow Tasks by hand. Even BIML won't help this time, because the routes change regularly and I don't want to generate and deploy packages every day.
I need to dynamically unpivot data within the Data Flow Task. How do I solve this within SSIS?
Dynamically unpivot data



















Solution
The trick for this case is to read everything as one big column and then dynamically split and unpivot the column in a Script Component Transformation. The unpivot output will always have three columns: Start Station, End Station and Distance. And the good news is that it has only a few lines of relatively easy code.
The solution


























1) Source with one big column
Change your Flat File Connection Manager so that it will read everything as one big column. Make sure the column is big enough to fit all data. For this example I called the column 'ColumnOne'.
Flat File with one column only














2) Script Component Transformation Input
Drag a Script Component on the surface and choose Transformation. Connect it to your source. Then edit the Script Component  and go to the 'Input Columns' page. On that page select the column with all the matrix data as ReadOnly.
Input Columns
























3) Script Component Transformation Input
On the 'Inputs and Outputs' page we need to add the new output columns. For this example I need a StartStation (string), EndStation (string) and the Distance (int).
An other important step is setting the SynchronousInputID property (of Output 0) to 'None'. This makes the transformation asynchronous which means the number of row in could be unequal to the number of rows out. And that means the input buffer with records isn't reused in this component, but a new output buffer will be created.
Inputs and Outputs
























4) The script
Go to the script page, choose C# as scripting language and hit the Edit Script button. And now copy the contents of my Input0_ProcessInputRow method to your Input0_ProcessInputRow method. And there are also two variables called Stations and Distances. They are declared above this method. Copy those to your code and put them on the same place.
I also remove the unused methods PreExecute, PostExecute and CreateNewOutputRows to keep the code clean and mean.
#C# Code
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
#endregion

/// <summary>
/// Split and unpivot data
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
// Define two arrays for distances and stations
// The Stations array will be filled only once
// The Distances array will change for each row
string[] Stations;
string[] Distances;

/// <summary>
/// This method is called once for every row that passes through the component from Input0.
/// </summary>
/// <param name="Row">The row that is currently passing through the component</param>
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// The first time this method executes the Stations array
// is still empty (null). In the true clause of the if-
// statement we will fill the Stations array.
// Therefore the second, third, etc. time this method
// executes we will go to the false clause of the if-
// statement.
if (Stations == null)
{
// We know that the first row contains the stations.
// We will add those to the stations array and use
// it to determine the end station later on.

// Split the string from ColumnOne on ; (or your own
// column separator). The Split returns an array.
Stations = Row.ColumnOne.Split(';');
}
else
{
// Now the rows will contain distances (and the StartStation)
// Split the distances on ; (or your own column separator)
Distances = Row.ColumnOne.Split(';');

// Now loop through distances array, but start on 1 (not on 0)
// because 0 contains the StartStation in the distances array
for (int counter = 1; counter < Distances.Length; counter++)
{
// Add new Row and then fill the columns
Output0Buffer.AddRow();
// Get the Distance from the Distance array and convert it to int
Output0Buffer.Distance = Convert.ToInt32(Distances[counter]);
// Get the Start station from the distance array (the first item)
Output0Buffer.StartStation = Distances[0];
// Get the End station from stations array
Output0Buffer.EndStation = Stations[counter];
}
}
}
}

4) The result
Now close the Script Component and add more transformations or a destination and see what the Script Component does with your data. I added a dummy Derived Column and Data Viewer to see the data before and after the Script Component. For this file I had 27 rows and columns as input and 676 rows as output (26 * 26).



Case
The case is about importing flat files (CSV’s) without the necessity of metadata. Because BIML always checks if the tables are accessible before creating the packages, the first step is to create the tables with BIML and the second step is to create the SSIS packages for transporting the data.

  1. Creating tables in the database 
  2. Packages to fill this database

Because of the size of the solution I’ve created two separate solutions, one for creating the tables and a second for creating the SSSI packages. You can click on the link to go to the other solution (which I will deploy later this month).


Solution - Creating tables with BIML
In this solution, we create a BIML that is going to create the tables and all the columns are defined as strings.
We have to create two BIML scripts, the first script for defining the table definition and the second for creating the actual package.

1) CSV files
For this example we are using two CSV files (age and sickleave) which are comma separated and have columnnames on the first row. These columnnames will be used in the create table statement

the drop map








content csv file










2) Tabledefinitions
The first biml is called “1_TableDefinitions.biml”
In this biml we define the path were the CSV files are located, an array with the names of the csv files and also some string which we going to use further in the code.
We use two “foreach loops”, the first one loops trough the array with files and the second one loops trough the actual file (to extract the column names).

Normally (without the loop) the code should look like this:

< tables>
<columns>
<column datatype="Int32" identityincrement="1" identityseed="1" name="AgeID">
<column datatype="Int32" name="AgeFrom">
<column name="AgeTo">
<column datatype="String" length="255" name="AgeCategoryEmployee">
</columns>
</columns>
</tables>

Default BIML uses INT as an default datatype, in this case we use a string.
Now we add the loop in place and the complete code looks like this
<Biml xmlns="http://schemas.varigence.com/biml.xsd">

<#
string Prefix="Man";

// the locatie of the csv's'
string path = @"D:\Drop\Man";
// Put all the filenames with the extension csv in a string array
string[] myFiles = Directory.GetFiles(path, "*.csv");
// string that will be filled with the filename
string filename;
// string array for columnnames extracted from CSV
string[] myColumns;
#>
<Connections>
<OleDbConnection
Name="OLEDB_STG_<#=Prefix#>"
ConnectionString="Data Source=APPL43;Initial Catalog=dummy_STG;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;">
</OleDbConnection>
</Connections>
<Databases>
<Database ConnectionName="OLEDB_STG_<#=Prefix#>" Name="dummy_STG"/>
</Databases>
<Schemas>
<Schema Name="dbo" DatabaseName="dummy_STG" Owner="dbo"/>
</Schemas>
<Tables>
<!-- loop trough the array of files-->
<# foreach(string filePath in myFiles)
{
// extract the filename from the path to use as tablename
fileName = Path.GetFileNameWithoutExtension(filePath);

#>
<Table Name="<#=Prefix#>_<#=fileName#>" SchemaName="dummy_STG.dbo">
<Columns>
<!-- loop trough the file looking for the columnnames-->
<#
// read first row of csv to extract to columnnames
// and split on comma to create an array
StreamReader myFile = new StreamReader(filePath);

myColumns = myFile.ReadLine().Split(',');
// close file after reading first line
myFile.Close();

// Loop through column array
foreach(string myColumn in myColumns)
{
#>
<Column Name="<#=myColumn#>" DataType="String" Length="255"></Column>
<# } #>
</Columns>
</Table>
<# }#>
</Tables>
</Biml>

<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.IO"#>


3) CreateTable
Secondly we are going to create the biml, called 2_CreateTables.biml. which creates the actual package that contains the create statements to generate the tables.
BIML has an method to create SQL tables “RootNode.Tables.First().GetTableSql();”
We use this method to create ‘SQL create statement’ the of table

The code looks like this

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="CreateTables" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
<Tasks>
<#
// Loop trough the table definition os the first biml
foreach(var table in RootNode.Tables) {#>
<ExecuteSQL Name="SQL - Drop_Create <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
<DirectInput>
<#=table.GetTableSql()#>
</DirectInput>
</ExecuteSQL>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>

<!--Includes/Imports for C#-->
<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data"#>
<#@ import namespace="System.Data.SqlClient"#>


We’ve created 2 bimls, 1_TableDefinitions.biml and 2_CreateTables.biml. Now comes the important part (I’m using Biml Express) for generating the package. First we click on 1_TableDefinitions and secondly on and 2_CreateTables, if you have selected the 2 biml scripts  you click with your right mouse on 1_TableDefinitions.biml and generate SSIS packages. If you do this otherwise, you will get an empty SSIS package. .

Generate SSIS package







Below you can see the result of your BIML scripts: a package with an execute SQL Task for each table you need to create.
Visual studio











The actual create statement looks like this

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
-------------------------------------------------------------------
IF EXISTS (SELECT * from sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Man_Age]') AND type IN (N'U'))
DROP TABLE [dbo].[Man_Age]
GO

CREATE TABLE [dbo].[Man_Age]
(
-- Columns Definition
[AgeID] nvarchar(255) NOT NULL
, [AgeFrom] nvarchar(255) NOT NULL
, [AgeTo] nvarchar(255) NOT NULL
, [AgeCategoryEmployee] nvarchar(255) NOT NULL
, [AgeCategoryClient] nvarchar(255) NOT NULL

-- Constraints

)
ON "default"
WITH (DATA_COMPRESSION = NONE)
GO
-------------------------------------------------------------------


Summary
We created two biml scripts one for creating to table definition and one which creates the actuale packages.The result in Management Studio looks like this.

Management studio result














In the next blog I’m going to explain how to create SSIS packages that transport the data from the csv files