Search Suggest

Script Component as source 2

Case
This is an follow up on another strangely formatted file article: My CSV file has splitted rows. There is only one attribute on each row:
id,1
name,Joost
address,Street 2
id,2
name,Jack
address,Avenue 4b
id,3
name,William
address,Plaza 5

I want to convert it to:
id,name,address
1,Joost,Street 1
2,Jack,Avenue 4b
3,William,Plaza 5

The pivot isn't usable because the records aren't grouped with a key. What's the solution?

Solution
You can use a Script Component as a source to solve this.

1) Script Component
Go to your Data Flow and add a Script Component. The Type should be Source.
Script Type Source




















2) Add Columns
Edit the Script Component and go to the Inputs and Outputs tab. Add three columns at the output section:
  1. ID (four-byte signed integer)
  2. Name (string)
  3. Address (string)
Add output columns



















3) Connection Manager
Go to the Connection Managers tab and add a connection (1) to avoid a hardcoded connection string in your Script. The Connection Manager Type (2) should be FILE and the Usage type (3) should be Existing file. After that change the default name from "Connection" to "myFile".
Add FILE connection to avoid hardcoded connection strings


























4) The Script
Go to the first tab and hit the Edit Script... button. This script is in the default C# language. SSIS will create three methods: PreExecute, PostExecute and CreateNewOutputRows. Remove the first two, because we don't need them. Replace your method with this one and add the using row at the top.
// C# code
using System;
using System.Data;
using System.IO; // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void CreateNewOutputRows()
{
// Variables to store column values
int ID = 0; // Column 1
string Name = ""; // Column 2
string Address = ""; // Column 3
int NumberOfColumns = 3;

// Counter to keep track of the current column
int counter = 0;

// Read file (use the FILE connection added in the script component, named myFile)
using (StreamReader sr = new StreamReader(this.Connections.myFile.ConnectionString))
{
String line;
// Read lines from the file until the end of the file is reached.
while ((line = sr.ReadLine()) != null)
{
// Raising the counter
counter++;

// Split the line into columns
string[] columns = line.Split(',');

// Fill the right variable
if (counter.Equals(1))
{
// Column 1
ID = System.Convert.ToInt32(columns[1]);
}
else if (counter.Equals(2))
{
// Column 2
Name = columns[1];
}
else if (counter.Equals(3))
{
// Column 3
Address = columns[1];
}

// Add a row if the last column has been reached
if (counter.Equals(NumberOfColumns))
{
// Add one new row and fill columns
this.Output0Buffer.AddRow();
Output0Buffer.ID = ID;
Output0Buffer.Name = Name;
Output0Buffer.Address = Address;

// Last column, reset counter
counter = 0;
}
}
}
}
}

5) The result
Add a random transformation component after your source and add a dataviewer between them to see the result.
The result




















Note: You should add some error handling to your script (Try Catch) to avoid unexpected errors (no file, other delimiter, empty rows, etc.).

Let me know if you have another strangely formatted file. Maybe I can find a solution to process it.

Post a Comment