Search Suggest

Working with Arrays in Azure Data Factory

(2019-June-06) I remember that I had a professor at my university who would often encourage his students to learn and get more experienced with simple things first. Learn the basics in and out and then move forward to more complicated concepts and practices; that was his motto, and he really tried to share this idea with us.

Previously published blog posts about using Variables in Azure Data Factory:
Setting Variables in Azure Data Factory Pipelines
Append Variable activity in Azure Data Factory: Story of combining things together  
System Variables in Azure Data Factory: Your Everyday Toolbox 
Azure Data Factory: Extracting array first element

Simple things sometimes can be overlooked as well. With the addition of Variables in Azure Data Factory Control Flow (there were not available there at the beginning), Arrays have become one of those simple things to me. 

Image by Magnascan from Pixabay

Currently, there are 3 data types supported in ADF variables: String, Boolean, and Array. The first two are pretty easy to use: Boolean for logical binary results and String for everything else, including the numbers (no wonder there are so many conversion functions in Azure Data Factory that we can use).

Going back to my memory flashback of the professor guidance for learning and using simple things, I've finally realized that they are worth to get more experienced with! Why? Because arrays are everywhere in the Control Flow of Azure Data Factory:
(1) JSON output most of the activity tasks in ADF can be treated as multiple level arrays
(2) Collections that are required for the "ForEach" activity can be outsourced from the preceding (1) activity outputs
(3)  "Set Variable" and "Append Variable" activity could be used to store receding (1) activity outputs for further data transformation
(4) You can create Arrays manually by transforming existing linear values or setting them with hard-coded values (fixed collections).

Don't forget about various functions and expressions to support your work with Arrays in Azure Data Factory (https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions):
intersection Returns a single array or object with the common elements between the arrays or objects passed to it.
union Returns a single array or object with all of the elements that are in either array or object passed to it.
first Returns the first element in the array or string passed in.
last Returns the last element in the array or string passed in.
skip Returns the elements in the array starting at index Count.
length Returns the number of elements in an array or string.
jsonConvert the parameter to a JSON type value.
arrayConvert the parameter to an array.
createArrayCreates an array from the parameters. 
rangeGenerates an array of integers starting from a certain number, and you define the length of the returned array.

Just to show a quick example of some simple operations with arrays, I had created this ADF pipeline with 4 main components:



(1) Lookup task to read a CSV file with 2 columns of Syllabic music notes:

Where the JSON output of this activity task contains 7 elements:


(2) Set Variable task converts a text string of "C-D-E-F-G-A-B"

into an array variable Notes_Alpabet using this expression:
@split(variables('Notes_Alphabet_String'),'-')



(3) Then looping through a collection of array elements of the (1) activity task output:
@activity('Lookup Notes_Syllabic').output.value

I then append a combination of Syllabic and Alphabet music notes into the Notes_Combined array variable

using this expression
@concat(item().ID,'-',item().Note,'-',variables('Notes_Alphabet')[add(int(item().ID),-1)])
important part is that I can locate the Notes_Alphabet variable element with the index of the Lookup Notes_Syllabic collection index: 
variables('Notes_Alphabet')[add(int(item().ID),-1)] where item().ID value comes from the first column of my sourcing file.

(4) As a result, I'm copying the content of the array Notes_Combined variable into another array Notes_Combined_View variable for Debug purpose:


And this helped me to see how both Syllabic and Alphabet music notes correspond to each other.

The code of this ADF pipeline can be found here:
https://github.com/NrgFly/Azure-DataFactory/blob/master/Samples/pipeline/adf_arrays_sample_pl.json

And I hope you will find this blog post helpful in your journey to explore simple things of the Azure Data Factory!

Post a Comment