Search Suggest

Regex filter for Foreach Loop

Case
The standard Foreach Loop File Enumerator only has a wildcard filter, but in some cases a regular expression filter would be more useful. See/vote this request at Microsoft Connect.

Solution
At the moment I'm working on a custom File Enumerator with regular expression support, but for those who don't want to wait or don't want to use custom components... You could achieve the same result with a Script Task.

*update: Now also available as Custom Foreach Enumerator. *

1) Foreach Loop
For this case I will use a standard Foreach File Enumerator that fills a variable FilePath. The filter is *.* so all files will be returned.
Standard foreach loop


















2) Variables
I will use two extra variables in this example: RegexFilter (string) for storing the regular expression and PassesRegexFilter (boolean) for indicating whether the filename passes the regular expression filter.
Variables













3) Script Task
Add a Script Task in front of the Data Flow Task and give it a suitable name.
Script Task






















4) Add variables
Edit the Script Task and add the FilePath and RegexFilter as ReadOnlyVariables and the PassesRegexFilter as ReadWriteVariable.
Variables






















5) The Script
Copy the following script to the Script Task.
// C# Code for filtering filenames with Regex
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Text.RegularExpressions; // Added
using System.IO; // Added

namespace ST_02b6595da2274d7182409fb43af929ae.csproj
{
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
#region VSTA generated code
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion

public void Main()
{
// Get the filename from the complete filepath
String FileName = Path.GetFileName(Dts.Variables["User::FilePath"].Value.ToString());

// Create a regex object with the pattern from the SSIS variable
Regex FileFilterRegex = new Regex(Dts.Variables["User::RegexFilter"].Value.ToString());

// Check if it is match and return that value (boolean) to the SSIS variable
Dts.Variables["User::PassesRegexFilter"].Value = FileFilterRegex.IsMatch(FileName);

Dts.TaskResult = (int)ScriptResults.Success;
}
}
}
or with VB.Net
' VB.Net Code for filtering filenames with Regex
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Text.RegularExpressions ' Added
Imports System.IO ' Added

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

Enum ScriptResults
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
End Enum

Public Sub Main()
' Get the filename from the complete filepath
Dim FileName As String = Path.GetFileName(Dts.Variables("User::FilePath").Value.ToString())

' Create a regex object with the pattern from the SSIS variable
Dim FileFilterRegex As Regex = New Regex(Dts.Variables("User::RegexFilter").Value.ToString())

' Check if it is match and return that value (boolean) to the SSIS variable
Dts.Variables("User::PassesRegexFilter").Value = FileFilterRegex.IsMatch(FileName)

Dts.TaskResult = ScriptResults.Success
End Sub

End Class


6) Precedence Constraints Expression
Connect the Script Task to the Data Flow Task and add an expression that does the real filtering:
@[User::PassesRegexFilter] == true
Precedence Constraint with expression





















7) The result
For testing the result, I added an other Script Task below the Data Flow Task that logs all files. My regular expression in this test case: .*[0-9]\.csv$    (all csv files that end with a number in the filename)
The result























Note 1: to gain some performance I should have changed the Foreach Loop File Enumerator wildcard from *.* to *.csv

Note 2: this method could be a bit inefficient if your file collection contains thousands of files and you only need two of them.

Post a Comment