Search Suggest

Regular Expressions in SSIS

Case
I want to use a Regular Expression to filter incorrect emailaddresses in SSIS.

Solution
There are a couple of options:
Let's elaborate the last option and use this file as a source:
"Name";"emailaddress"
"Joost";"joost@blogger.com"
"John";"john@.nl"
"Max";"max@gmail.com"
"William";"william@hotmail,com"
"Bill";"b.gates@microsoft.com"

1) Add source
Add a Flat File Source Component in your dataflow for the file example above.
Flat File mailinglist
















2) Add Script Component
Add a Script Component, type transformation, after your Flat File Source Component.
Script Component (transformation)


















3) Input Columns
 Edit the Script Component and select the email address column as an input column (readonly).
Input Columns tab



















4) Outputs
Change the name of the default output port to CorrectEmail and set the ExclusionGroup to 1. Add an extra output port for the IncorrectEmail and set the ExclusionGroup to 1. Also select the input port as the SynchronousInputID. Now you have two output ports. One for the correct and one for the incorrect email address.
Inputs and Outputs tab













4) The Script
Hit the Edit Script button to open the VSTA editor. SSIS creates some standard methods, but we only use Input0_ProcessInputRow. You can remove the rest of the methods. We are also adding a new method that validates the email address. It uses a .Net Regex method. You can search the web for your own regex pattern and change it in the extra method.
// C# code
using System;
using System.Data;
using System.Text.RegularExpressions; // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

// Method that will be executed for each row.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// Validate the email address
if (IsCorrectEmail(Row.Email))
{
// Redirect the correct email address
Row.DirectRowToCorrectEmail();
}
else
{
// Redirect the incorrect email address
Row.DirectRowToInCorrectEmail();
}
}

// A boolean method that validates an email address
// with a regex pattern.
public bool IsCorrectEmail(String emailAddress)
{
// The pattern for email
string emailAddressPattern = @"^(([^<>()[\]\\.,;:\s@\""]+"
+ @"(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@"
+ @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
+ @"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
+ @"[a-zA-Z]{2,}))$";
// Create a regex object with the pattern
Regex emailAddressRegex = new Regex(emailAddressPattern);
// Check if it is match and return that value (boolean)
return emailAddressRegex.IsMatch(emailAddress);
}
}

' VB.Net code (SSIS 2008)
Imports System
Imports System.Data
Imports System.Math
Imports System.Text.RegularExpressions ' Added
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

_
_
Public Class ScriptMain
Inherits UserComponent

' Method that will be executed for each row.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
' Validate the email address
If (IsCorrectEmail(Row.Email)) Then
' Redirect the correct email address
Row.DirectRowToCorrectEmail()
Else
' Redirect the incorrect email address
Row.DirectRowToIncorrectEmail()
End If
End Sub

' A boolean method that validates an email address
' with a regex pattern.
Public Function IsCorrectEmail(ByVal emailAddress As String) As Boolean
' The pattern for email
Dim emailAddressPattern As String = "^(([^<>()[\]\\.,;:\s@\""]+" + _
"(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@" + _
"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" + _
"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+" + _
"[a-zA-Z]{2,}))$"

' Create a regex object with the pattern
Dim emailAddressRegex As Regex = New Regex(emailAddressPattern)

' Check if it is match and return that value (boolean)
Return emailAddressRegex.IsMatch(emailAddress)
End Function

End Class


5) The result
For testing purposes I added two Derived Columns with data viewes on the Data Flow Paths.
The result














Jamie Thomson, alias SSIS Junkie, has an other example in VB.net available. And you can also use Regular Expression to clean column values. More about that is described in this article.

Post a Comment