Search Suggest

Script Component as source 1

Case
My CSV file has some rows where not all columns at the end of the row are filled:

"test1";"abc";"xyz";"123"
"test2";"cba";"zyx";"321"
"test3";"abc"
"test4";"efg";"zyx"
"test5";"cba";"zyx";"321"

If you use a flat file connection manager, you will get strange things:
Some rows will 'merge'




















And if the last row is incomplete you will get:








Is there a workaround for this poorly formatted file?

Solution
You can use a Script Component as a source to solve this.
* UPDATE: This has been solved in SQL Server 2012 RC0 *

1) Script Component
Go to your Data Flow and add a Script Component. The Type should be Source.
Script Type Source




















2) Add Columns
Edit the Script Component and go to the Inputs and Outputs tab. Add four string columns at the output section (this example only uses strings).
Add output columns



















3) Connection Manager
Go to the Connection Managers tab and add a connection (1) to avoid a hardcoded connection string in your Script. The Connection Manager Type (2) should be FILE and the Usage type (3) should be Existing file. After that change the default name from "Connection" to "myFile".
Add FILE connection to avoid hardcoded connection strings


























4) The Script
Go to the first tab and hit the Edit Script... button. This script is in the default C# language. SSIS will create three methods: PreExecute, PostExecute and CreateNewOutputRows. Remove the first two, because we don't need them. Replace your method with this one and add the using row at the top.
// C# code
using System;
using System.Data;
using System.IO; // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void CreateNewOutputRows()
{
// Read file (use the FILE connection added in the script component, named myFile)
using (StreamReader sr = new StreamReader(this.Connections.myFile.ConnectionString, System.Text.Encoding.UTF7)) // Encoding is optional
{
String line;
// Read lines from the file until the end of the file is reached.
while ((line = sr.ReadLine()) != null)
{
// Split the line into columns
string[] columns = line.Split(';');

// Add one new row
this.Output0Buffer.AddRow();

// Fill columns, but check if they exist
if (columns.Length > 0)
{
// Remove the " at the start and end of the string
// with a trim or use a substring.
Output0Buffer.MyFirstColumn = columns[0].TrimStart('"').TrimEnd('"');
}
if (columns.Length > 1)
{
Output0Buffer.MySecondColumn = columns[1].TrimStart('"').TrimEnd('"');
}
if (columns.Length > 2)
{
Output0Buffer.MyThirdColumn = columns[2].TrimStart('"').TrimEnd('"');
}
if (columns.Length > 3)
{
Output0Buffer.MyFourthColumn = columns[3].TrimStart('"').TrimEnd('"');
}
}
}
}
}
or VB.Net
// C# code
' VB.Net code
Imports System
Imports System.Data
Imports System.IO 'Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
Inherits UserComponent

Public Overrides Sub CreateNewOutputRows()
' Read file (use the FILE connection added in the script component, named myFile)
Using sr As New StreamReader(Me.Connections.myFile.ConnectionString, System.Text.Encoding.UTF7) ' Encoding is optional

' Create variable to store a line of text
Dim line As String = sr.ReadLine

' Read lines from the file until the end of the file is reached.
Do While (Not line Is Nothing)

' Split the line into columns
Dim columns As String() = line.Split(";")

' Add one new row
Me.Output0Buffer.AddRow()

' Fill columns, but check if they exist
If columns.Length > 0 Then
' Remove the " at the start and end of the string
' with a trim or use a substring.
Output0Buffer.MyFirstColumn = columns(0).TrimStart("""").TrimEnd("""")
End If
If columns.Length > 1 Then
Output0Buffer.MySecondColumn = columns(1).TrimStart("""").TrimEnd("""")
End If
If columns.Length > 2 Then
Output0Buffer.MyThirdColumn = columns(2).TrimStart("""").TrimEnd("""")
End If
If columns.Length > 3 Then
Output0Buffer.MyFourthColumn = columns(3).TrimStart("""").TrimEnd("""")
End If

' Goto next line
line = sr.ReadLine
Loop
End Using
End Sub
End Class

5) The result
Add a random transformation component after your source and add a dataviewer between them to see the result.
Notice the NULL values

















Note: You should add some error handling to your script (Try Catch) to avoid unexpected errors (no file, other delimiter, empty rows, etc.).

Note: If you are using this in a foreach loop, you don't need the connection manager. The variable from the foreach loop already contains the filepath.

Post a Comment