Showing posts with label AZURE. Show all posts
Showing posts with label AZURE. Show all posts

(2020-Oct-14Ok, here is my problem: I have an Azure Data Factory (ADF) workflow that includes an Azure Function call to perform external operations and returns output result, which in return is used further down my ADF pipeline. My ADF workflow (1) depends on the output result of the Azure Function call; (2) plus a time efficiency of the Azure Function call is another factor to consider, if its time execution hits 230 seconds or more, ADF Azure Function will fail with a time-out error message and my workflow is screwed.

Image by Ichigo121212 from Pixabay 

I either have some high hopes that my Azure Function calls in a data factory pipeline will stay within 230 seconds or I need to make a change and replace a generic Azure Function call with something else, something more stable and reliable.

The time of 230 seconds is the maximum amount of time that an HTTP triggered function can take to respond to a request and Microsoft recommends either to refactor your serverless code execution or try and use Durable Functions, which is an extension of Azure Functions - https://docs.microsoft.com/en-us/azure/data-factory/control-flow-azure-function-activity#timeout-and-long-running-functions

Back in April of 2020, I have already blogged about the use of Azure Functions in Data Factory pipelines - https://server.hoit.asia/2020/04/using-azure-functions-in-azure-data.html. I had already described possible variations of using Web, Webhook, and Azure Function activities to execute your Function App code and my frustration with the 230 seconds time limit.

So, I decided to check if a Durable Function could be a remedy for a long-running process that Azure Data Factory tries to govern. The official documentation describes Durable Functions as, “stateful functions in a serverless compute environment… they let you define stateful workflows by writing orchestrator functions and stateful entities by writing entity functions using the Azure Functions programming model”. I’m still confused by this definition, let I will be the only one confused. But for me the term “durable” for a function, means that it should provide a stable execution of long-running processes and support for a reliable orchestration of my serverless Function App code.

The first thing, I did, I searched online if anyone else had already shared their pain points and possible solutions of using Durable Functions in Azure Data Factory:

The first two ADF posts gave me some confidence that Durable Functions could be used in ADF, however, they only provided some screen-shots, no code examples, and no pattern to pass input to a Durable function and process its output in the end, which was critical to my real project use-case; but I still give credit to both guys for sharing this information. The third post is one of many very detailed and well written about Durable Functions, but they didn’t contain information about ADF and PowerShell code for my Function App that I was looking for. So, this was my leap of faith to do further exploration and possibly create an ADF solution with the Durable Functions that I needed.

Initial Information and Tutorial for Azure Durable Functions
Microsoft provides some very good examples and tutorials to start working with Durable Functions in the Azure Portal - https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=powershell. You have a way to create three types of Durable Functions or components; all of them will be necessary to build a single durable Function App workflow:
- Starter: to “start” a durable function “orchestrator
- Orchestrator: to “orchestrate” execution of an “activity” function
- Activity: actual serverless code of your function app that you want to perform

Then you can create sample durable functions in your Azure Function App:

The sample code is a simple solution to write the output of different city's names:









(2020-Sep-26) Last week one of my Azure Data Factory (ADF) deployment pipelines failed with an error that it couldn’t find some of the deployment parameters that I try to override, and I wondered what might have caused the issue. Usually, after configuring and testing all the deployment steps, there are very few things that can topple a normal and successful deployment process of your data factory, and it’s got to be big.

My first step was to remove parameters overriding instances from a deployment pipeline. There is no need to reference and override a deployment pipeline parameter if it hadn’t come out of the process of creating my ADF Azure Resource Manager (ARM) template. That quick fix worked and my ADF deployment process successfully continued and new code was deployed to a testing environment.

Photo by Karolina Grabowska from Pexels

However, when I checked one of the web-activity tasks in my testing Data Factory after the deployment was finished, that activity still had a URL reference to a resource from the Development environment and not the Testing. That URL reference correction (overriding) used to properly happen during the deployment process. You can change the default parameterization template and prescribe your Data Factory to include some specific properties as additional custom parameters to deploy - https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment#use-custom-parameters-with-the-resource-manager-template. I have also blogged about it two months ago - https://datanrg.blogspot.com/2020/06/raking-custom-parametersvariables-for.html

My initial custom parameterization to collect all the web activities URLs from an ADF pipeline was configured the following way with the Pipelines/Activities path to my URL properties.



If those settings are still correct, then why I’m not getting any of the web activities URL properties extracted into my ADF ARM template. And then I realized after moving my web-activity task from the main pipeline level into a Switch Case activity, the original custom parameterization process couldn’t longer find them. I needed to make further adjustments to my custom parameters template.



By going to one level deeper into my ADF activities properties by including an additional “Cases” case Pipelines/Activities/Cases, I was able to capture my web-activities URL attributes.

This aha moment had shown me that modifying my ADF custom parameters template is a very good way to capture very specific attributes for deployment, however, it’s also very important to remember to review this parameter template file after making some major modification to your ADF code and see if some of the settings are no longer relevant.

That’s why I have tagged this post as “Forget Me Not” because I’m a regular person and tend to forget things :-)


(2020-Sep-13) Array of arrays, in a JSON world, it’s a very common concept to have a set of sets of other elements in a dataset. Despite how strange it may sound, our real life is filled with similar analogies: books arranged by categories in a library, plates stacked up in a cupboard, or even someone’s closet has many sections that are filled with different clothing items. Arrays are part of our daily life too :-)

Photo by Lovefood Art from Pexels

In my previous blog post - Setting default values for Array parameters/variables in Azure Data Factory, I had helped myself to remember that arrays could be passed as parameters to my Azure Data Factory (ADF) pipelines. This time I’m helping myself to remember that an array of other arrays can also exist as ADF pipeline parameters’ values.

(1) Simple array of values:

This array will be passed into my par_meal_array parameter

Then Azure Data Factory ForEach will loop through this set of elements and each individual value will be referenced with the use of the @item() expression.


(2) Simple array with sub-elements:


Sub-elements’ values during each iteration of this array can be referenced with: 
- @item().meal_type
- @item().meal_time

(3) Array of arrays:


My meal array of items could be easily referenced as @item().meal_items (e.g. my first element of the meal_items would have this set of values: ["Egg","Greek Yogurt","Coffee"]). 

Since I’m already in the ForEach loop container then I can pass this array into another ForEach loop container. However, this is currently not possible in Azure Data Factory unless this lower-level ForEach loop container exists in another pipeline and I can just simply execute it from the top ForEach loop container and pass it as another pipeline array parameter.


Closing thoughts:
I just wish that Azure Data Factory would allow having nested ForEach loop containers in a single pipeline.