Showing posts with label ADF. Show all posts
Showing posts with label ADF. Show all posts

(2020-Oct-14Ok, here is my problem: I have an Azure Data Factory (ADF) workflow that includes an Azure Function call to perform external operations and returns output result, which in return is used further down my ADF pipeline. My ADF workflow (1) depends on the output result of the Azure Function call; (2) plus a time efficiency of the Azure Function call is another factor to consider, if its time execution hits 230 seconds or more, ADF Azure Function will fail with a time-out error message and my workflow is screwed.

Image by Ichigo121212 from Pixabay 

I either have some high hopes that my Azure Function calls in a data factory pipeline will stay within 230 seconds or I need to make a change and replace a generic Azure Function call with something else, something more stable and reliable.

The time of 230 seconds is the maximum amount of time that an HTTP triggered function can take to respond to a request and Microsoft recommends either to refactor your serverless code execution or try and use Durable Functions, which is an extension of Azure Functions - https://docs.microsoft.com/en-us/azure/data-factory/control-flow-azure-function-activity#timeout-and-long-running-functions

Back in April of 2020, I have already blogged about the use of Azure Functions in Data Factory pipelines - https://server.hoit.asia/2020/04/using-azure-functions-in-azure-data.html. I had already described possible variations of using Web, Webhook, and Azure Function activities to execute your Function App code and my frustration with the 230 seconds time limit.

So, I decided to check if a Durable Function could be a remedy for a long-running process that Azure Data Factory tries to govern. The official documentation describes Durable Functions as, “stateful functions in a serverless compute environment… they let you define stateful workflows by writing orchestrator functions and stateful entities by writing entity functions using the Azure Functions programming model”. I’m still confused by this definition, let I will be the only one confused. But for me the term “durable” for a function, means that it should provide a stable execution of long-running processes and support for a reliable orchestration of my serverless Function App code.

The first thing, I did, I searched online if anyone else had already shared their pain points and possible solutions of using Durable Functions in Azure Data Factory:

The first two ADF posts gave me some confidence that Durable Functions could be used in ADF, however, they only provided some screen-shots, no code examples, and no pattern to pass input to a Durable function and process its output in the end, which was critical to my real project use-case; but I still give credit to both guys for sharing this information. The third post is one of many very detailed and well written about Durable Functions, but they didn’t contain information about ADF and PowerShell code for my Function App that I was looking for. So, this was my leap of faith to do further exploration and possibly create an ADF solution with the Durable Functions that I needed.

Initial Information and Tutorial for Azure Durable Functions
Microsoft provides some very good examples and tutorials to start working with Durable Functions in the Azure Portal - https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=powershell. You have a way to create three types of Durable Functions or components; all of them will be necessary to build a single durable Function App workflow:
- Starter: to “start” a durable function “orchestrator
- Orchestrator: to “orchestrate” execution of an “activity” function
- Activity: actual serverless code of your function app that you want to perform

Then you can create sample durable functions in your Azure Function App:

The sample code is a simple solution to write the output of different city's names:









(2020-Oct-05Adding a row number to your dataset could a trivial task. Both ANSI and Spark SQL have the row_number() window function that can enrich your data with a unique number for your whole or partitioned data recordset. 

Recently I had a case of creating a data flow in Azure Data Factory (ADF) where there was a need to add a row number.

Photo by Micah Boerma from Pexels

Instant reaction was to this new additional row number column using a derived column transformation - https://docs.microsoft.com/en-us/azure//data-factory/data-flow-derived-column. However, this was my mistake and ADF notified me that rowNumber() -  data flow function was only available in the Windows transformations - https://docs.microsoft.com/en-us/azure/data-factory/data-flow-window.


OK, I moved on and added a Window Transformation task, which does require at least one sorting column.

However, this was a bit of an issue in my data case. I didn’t need to sort and change the order of my recordset. Transformed dataset needed to be in the same order as its original sourcing data, as it would help to locate a transformed data record with its sourcing sibling record with the help of the new Row Number column.

So I thought that it wouldn't hurt to add just a static constant value column (let’s say with value  = 1 ) and then use this new (1) valued column in my rowNumber() window transformation as a sorting attribute. 

1) First I added a Derived Column transformation with the Column_Value_1 column value set to 1 (or any other constant value of your preference).
2) Then I added a Window transformation with the rowNumber() function and Column_Value_1 as a sorting column.
3) Then I was able to see my output result with preserved order of records and additional Row Number column.



Update:
Joseph Edwards has just pointed out that using the Surrogate Key transformation would achieve the very same result of adding a row number column, and this step doesn't even require sorting columns. I've just tested, and it's working, amazing! I have written this whole blog post just to realize that there is a better way to number rows in the Azure Data Factory flows :-)



Case
I want to start and stop my Azure Analysis Services from within Azure Data Factory, but I don't want write code or use other Azure services like Azure Automation or Azure Logic Apps to do this. Is there an Azure Data Factory-only solution where we only use the standard pipeline activities from ADF?
Save some money on your Azure Bill by pausing AAS




















Solution
Yes you can use the Web Activity to call the Rest API of Azure Analysis Services (AAS), but that requires you to give ADF permissions in AAS via its Managed Service Identity (MSI). If you already used our Process Model example, then this is slightly different (and easier).


1) Add ADF as contributer to AAS
Different than for processing one of the AAS models we don't need SSMS to add ADF as an Server Administrator. Instead we will use Access control (IAM) on the Azure portal to make our ADF a contributor for the AAS that we want to pause or resume.
  • Go to your AAS the Azure portal
  • In the left menu click on Access control (IAM)
  • Click on + Add and choose Add role assignment
  • In the new Add role assignment pane select Contributor as Role
  • In the Assign access to dropdown select Data Factory
  • Select the right Subscription
  • Now Select your Data Factory and click on the Save button
Add ADF as Contributor to AAS















2) Add Web Activity
In your ADF pipeline you need to add a Web Activity to call the Rest API of Analysis Services. First step is to determine the Rest API URL. Replace in the string below, the <xxx> values with the subscription id, resource group and servername of your Analysis Services. The Rest API method we will be using is 'Suspend' but you can replace that word by 'Resume' to startup the AAS:
https://management.azure.com/subscriptions/<xxx>/resourceGroups/<xxx>/providers/Microsoft.AnalysisServices/servers/<xxx>/Suspend?api-version=2017-08-01

Example:
https://management.azure.com/subscriptions/a74a173e-4d8a-48d9-9ab7-a0b85abb98fb/resourceGroups/bitools/providers/Microsoft.AnalysisServices/servers/bitools2/Suspend?api-version=2017-08-01

Second step is to create a JSON message for the Rest API. Well the Rest API doesn't use it, but it is required in the Web activity when you use POST as method. So you just need to create a dummy json message:
{
"Dummy": "Dummy"
}
  • Add the Web activity to your pipeline
  • Give it a descriptive name like Pause AAS (or Resume AAS)
  • Go to the Settings tab
  • Use the Rest API URL from above in the URL property
  • Choose POST as Method
  • Add the dummy JSON message from above in the Body property
  • Under advanced choose MSI as Authentication method
  • Add 'https://management.azure.com/ in the Resource property (different than process example)
Web Activity calling the AAS Rest API



















Then Debug the Pipeline to check the suspend/resume action














3) Retrieve info
By changing the method type from POST to GET (body property will disappear) and removing the method (suspend or pause) from the URL, you can retrieve information about the AAS. Like status and pricing tier. You could for example use that to first check the status before changing it.
Retrieve service info via GET


Summary
In this post you learned how pause and resume your Analysis Services to save some money on your Azure bill. The big advantage of this method is that you don't need other Azure services which makes maintenance a little easier. In a next post we will also show you how to change the pricing tier

Case
There are several methods to process Azure Analysis Services models like with Logic Apps, Azure Automation Runbooks and even SSIS, but is there an Azure Data Factory-only solution where we only use the pipeline activities from ADF?
Process Azure Analysis Services




















Solution
Yes you can use the Web Activity to call the Rest API of Azure Analysis Services (AAS), but that requires you to give ADF permissions in AAS via its Managed Service Identity (MSI).

1) Create ADF service principal
In the next step we need a user which we can add as a Server Administrator of AAS. Since we will not find the managed identity of ADF when we search for a user account, we will have to create one. This 'user' is called a service principal.
  • Go to ADF in the Azure portal (not the Author & Monitor environment)
  • In the left menu click on Properties which you can find under General
  • Copy the 'Managed Identity Application ID' and the 'Managed Identity Tenant' properties to a notepad and construct the following string for the next step:
    app:<Application ID>@<Tentant> (and replace the <xxx> values with the properties)
    app:653ca9f9-855c-45df-bfff-3e7718159295@d903b4cb-ac8c-4e31-964c-e630a3a0c05e

Create app user from ADF for AAS















2) Add user as Server Administrator
Now we need to connect to your Azure Analysis Services via SQL Server Management Studio (SSMS) to add the user from the previous step as a Server Administrator. This cannot be done via the Azure portal.
  • Login to your AAS with SSMS
  • Right click your server and choose Properties
  • Go to the Security pane
  • Click on the Add... button
  • Add the service principal from the previous step via the Manual Entry textbox and click on the Add button
  • Click on Ok the close the property window

Add Server Administrator via Manual Entry














After this step the 'user' will appear on the portal as well, but you can not add it via the portal.
Analysis Services Admins
















3) Add Web Activity
In your ADF pipeline you need to add a Web Activity to call the Rest API of Analysis Services. First step is to determine the Rest API URL. Replace in the string below, the <xxx> values with the region, servername and modelname of your Analysis Services. The Rest API method we will be using is 'refreshes':
https://<region>.asazure.windows.net/servers/<servername>/models/<modelname>/refreshes

Example:
https://westeurope.asazure.windows.net/servers/bitoolsserver/models/bitools/refreshes

Second step is to create a JSON message for the Rest API to give the process order to AAS. To full process the entire model you can use this message:
{
"Type": "Full",
"CommitMode": "transactional",
"MaxParallelism": 2,
"RetryCount": 2,
"Objects": []
}
Or you can process particular tables within the model with a message like this:
{
"Type": "Full",
"CommitMode": "transactional",
"MaxParallelism": 2,
"RetryCount": 2,
"Objects": [
{
"table": "DimProduct",
"partition": "CurrentYear"
},
{
"table": "DimDepartment"
}
]
}
See the documentation for all the parameters that you can use.
  • Add the Web activity to your pipeline
  • Give it a descriptive name like Process Model
  • Go to the Settings tab
  • Use the Rest API URL from above in the URL property
  • Choose POST as Method
  • Add the JSON message from above in the Body property
  • Under advanced choose MSI as Authentication method
  • Add 'https://*.asazure.windows.net' in the Resource property (note this URL is different for suspending and resuming AAS)
Web Activity calling the AAS Rest API
























Then Debug the Pipeline to check the process result















4) Retrieve refreshes
By only changing the method type from POST to GET (body property will disappear) you can retrieve information about the processing status and use that information in the next pipeline activities.
Retrieve process status via GET


Summary
In this post you learned how process your Analysis Services models with only Azure Data Factory. No other services are needed which makes maintenance a little easier. In a next post we will also show you how to Pause or Resume your Analysis Services with Rest API. With a few extra steps you can also use this method to refresh a Power BI dataset, but we will show that in a future post.







Case
I want to create an Azure Function with C# code. How do I create and deploy one in Azure (and use it Azure Data Factory)?
Write C# in Visual Studio Code to create an Azure Function











Solution
In this blogpost we will create and deploy a very simple 'hello world' Azure Function with an HTTP trigger which you can extend to your own needs. After that we can use the Azure Data Factory pipeline with an Azure Function activity to execute it.

As an alternative you could also create an Azure Function with a Blob Storage trigger that executes when a new file arrives, but we rather want to use that same trigger type to start an Azure Data Factory pipeline that then starts this Function followed by other pipeline activities. This way we have one place that does the triggering/orchestration.

1) Create new Azure Function project
Please first follow the steps of our previous post on how to prepare Visual Studio Code for creating Azure Functions with C#. After that open Visual Studio code and perform the steps below to create your first hello world Function.
  • In Visual Studio code click on the Azure icon in the left menu.
  • In the newly opened pane click on the folder with the lightning icon on it to create a new project. (An additional function can later-on be added to the project with the Lightning-plus icon)
  • Select the folder of the new project (or use the Browse... option)
  • Next select C# as coding language
  • Select HTTP trigger as the template for this example
  • Enter the Function name. This is the name of the function within the project (that can contain multiple functions). Example: myCSharpFunction
  • Provide a namespace: Bitools.Function
  • For this test example use anonymous as Authorization level
  • The project has been created, but their could be an additional action in step 2
Create new Azure Function project

















2) Unresolved dependencies
This extra step seems to be a bug in the Azure Function extension for C# in Visual Studio code. After the project has been generated Visual Studio Code will show the following error in the lower right corner. This seems to refer to some missing references.
There are unresolved dependencies.
Please execute the restore command to continue.













If you don't get this error then Microsoft probably solved the bug. When you do get it, the only thing you have to do is clicking the Restore button. After that some extra files will be added in the obj folder of your project. (see previous post to compare extension versions)
Execute the restore command

Spot the differences

























3) Code in myCSharpFunction.cs
The file 'myCSharpFunction.cs' contains your C# code. The name could be different when you gave your function a different name. Below you see the standard / generated code with some extra comment lines for if you are new to C#. For this example we do not extend the code.
// This section lists the namespaces that this function will be using frequently,
// and saves the programmer from specifying a fully qualified name every time that
// a method that is contained within is used
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;

namespace Bitools.Function
{
public static class myCSharpFunction
{
// Main function and entry point of this Azure Function
[FunctionName("myCSharpFunction")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
// Log information
log.LogInformation("C# HTTP trigger function processed a request.");

// Retrieve parameter 'name' from querystring
string name = req.Query["name"];

// Also try to retrieve the same parameter from the request body
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
// If not found in querystring then use requestbody
name = name ?? data?.name;

// If name is still empty throw an error that a name parameter
// is expected else response with a greeting
string responseMessage = string.IsNullOrEmpty(name)
? "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response."
: $"Hello, {name}. This HTTP triggered function executed successfully.";

return new OkObjectResult(responseMessage);
}
}
}

4) Debug locally
Now we are going to test the Azure Function locally on our Windows device. There are multiple ways to start debugging. Pressing F5 is probably the easiest. See animated gif for more options.
  • In the Run menu on the top of the screen you will find the Start Debugging option. 
  • The terminal pane on the bottom will show a lot of details. Wait a few seconds for it to finish and click (while also pressing CTRL) on the green URL.
  • A new browser window will open and it shows the error output that it cannot find the name parameter.
  • In the browser add a querystring after the URL: ?name=Joost (or your own name of course). Now it will respond with a greeting
  • Close the browser and then hit the disconnect icon on top to stop debugging
Debugging your function locally

















5) Create Azure Function in Azure Portal
Before you can deploy your newly created function you first need to create an Azure Function in the Azure portal.
  • Go to the Azure Portal and click on Create a resource
  • Search for function and select Function App
  • Click on the Create button
  • On the Basics tab you find the most important settings
  • Select your Subscription and Resource Group
  • Enter an unique Function App name
  • Select .NET Core as Runtime stack
  • Select 3.1 as Version 
  • Select the Region (probably the same as your Resource Group)
  • Optionally go to the Hosting tab for extra settings
  • Choose a new or existing Storage account
  • Change the Plan type (default: Serverless)
  • Optionally go to the Monitoring tab for extra settings
  • Disable or enable Application insights and change its name
  • Click the Review + create button
  • Review the settings and click on the Create button
Create new Azure Function (app) on Azure portal















Note 1: you cannot create an Azure Function with a Windows worker (.NET Core) if there is already a Linux worker (Python) in that same resource group and with the same region.

Note 2: you could also perform these steps within Visual Studio Code during deployment.

6) Deploy Azure Function to Azure Portal
Now that we have an (empty) Azure Functions app in the Azure portal we can deploy our newly created Azure Function to this resource.
  • In Visual Studio code click on the Azure icon in the left menu.
  • In the newly opened pane click on the blue arrow (deploy) icon
  • In the drop down select your Azure Functions App from the previous step
Deploy Azure Functions from Visual Studio Code

















7) Testing in portal
Now that we have deployed our project to Azure Functions we can test it in the Azure Portal. For this example we will use the post method.
  • Go to the Azure Portal and then open your Azure Functions App
  • In the left menu click on Functions
  • In the list of functions click on your function (only one in this example)
  • In the left menu click on Code + Test
  • Click on the test button (top center)
  • Change the HTTP method to post
  • Select one of the keys
  • Enter a JSON message in the body: {"name":"Joost"} (name=case-sensitive)
  • Click on the Run button and see the result
Testing in the Azure Portal

















8) Executing from Azure Data Factory
Now if you want to execute this new Azure Function in Azure Data Factory with the Azure Function Activity you can follow the steps in this previous post. However without code changes it will return an error stating that the response is invalid: 3603 - Response Content is not a valid JObject
3603 - Response Content is not a valid JObject














At the moment it is returning a so called JArray, but it is expecting a JObject (J = JSON). Any other return types than JObject will throw the error above. To overcome this we need a minor code change at the end by changing the return construction at line 37.
// This section lists the namespaces that this function will be using frequently,
// and saves the programmer from specifying a fully qualified name every time that
// a method that is contained within is used
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;

namespace Bitools.Function
{
public static class myCSharpFunction
{
// Main function and entry point of this Azure Function
[FunctionName("myCSharpFunction")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
// Log information
log.LogInformation("C# HTTP trigger function processed a request.");

// Retrieve parameter 'name' from querystring
string name = req.Query["name"];

// Also try to retrieve the same parameter from the request body
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
name = name ?? data?.name;

// If name is still empty throw an error that a name parameter
// is expected else response with a greeting
return name != null
? (ActionResult)new OkObjectResult(new {message = "Hello " + name})
: new BadRequestObjectResult("Pass a name in the query string or in the request body for a personalized response.");

/*
string responseMessage = string.IsNullOrEmpty(name)
? "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response."
: $"Hello, {name}. This HTTP triggered function executed successfully.";

return new OkObjectResult(responseMessage);
*/
}
}
}

Below a couple of screenshots on how to configure and test this in Azure Data Factory. You might want to store the Function key in Azure Key Vault to avoid keys in your ETL/ELT code.
Set up the Azure Function Activity in Azure Data Factory

















After configuring the Azure Function activity you can hit the debug button and see the result. This output could then be used as input for successive pipeline activities
Successfully executing and getting the response


















Conclusion
First a big thank you to colleague Walter ter Maten for always helping me out with the C# stuff. In this blog post you learned how to create, test and deploy your first (very basic) Azure Function App with C# code. Then we also showed you how to execute this from Azure Data Factory. In a couple of follow up posts we will show you how to build some useful functions for DWH projects and show you some technical stuff like adding Azure Key Vault to the game. Also check out the Python version of this blogpost.


(2020-May-24) It has never been my plan to write a series of articles about how I can work with JSON files in Azure Data Factory (ADF). While working with one particular ADF component I then had discovered other possible options to use richness and less constrained JSON file format, which in a nutshell is just a text file with one or more ("key" : "value") pair elements. 

Sometimes you could have an array of those pair elements or an array of other arrays; now you see where I'm going with the name of my current blog post :-) 




Previous blog posts on using JSON files in Azure Data Factory:

There are several ways how you can explore the JSON way of doing things in the Azure Data Factory. The first two that come right to my mind are:
(1) ADF activities' output - they are JSON formatted
(2) Reading JSON files - the task itself produces JSON formatted results too 

It's not a new thing to know that we can reference nested elements of ADF activities' output since it's represented in JSON format or pass the JSON file content to other tasks/components that can process this format.

But what if you need to pass a complete output of your ADF activity task further down your pipeline. Or you need to pass a JSON array elements to another ADF activity or sub-pipeline as the parameter value. Let's explore what other options available in Azure Data Factory for this very interesting use case.

(1) ADF activities' output
Let's say I have a "Get Metadata" task in my ADF pipeline that reads the content of a particular folder in my storage account.


The main portion of the output for this activity may look this way:


I already know that I can get access to the list of files/folders using this ADF expression:
@activity('List of files and folders').output.childitems

Or getting to the first file by referencing a very first element [0] of this JSON array:
@activity('List of files and folders').output.childitems[0]


However, if I want to store the complete output of this activity into a separate Array variable with preserving the option of referencing all nested elements, then this can only be done by wrapping the output into array by using the function with the same name:
@array(activity('List of files and folders').output)

Otherwise you will get a data mismatch error message. This will become very helpful if you want to pass the JSON output into the next activity of your ADF pipeline to process. It works and it's fun! :-)


(2) Reading JSON files 
Let's get more creative and read a real JSON file where you have more flexibility and control over its content.

Let's say I want to read the following JSON file of various baking lists of ingredients:
https://opensource.adobe.com/Spry/samples/data_region/JSONDataSetSample.html
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
},
{
"id": "0002",
"type": "donut",
"name": "Raised",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
},
{
"id": "0003",
"type": "donut",
"name": "Old Fashioned",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
]

The output of my ADF Lookup activity that references this file

will correctly show me 3 elements of this JSON array:



Where a list of items for my "For Each" loop activity can be simply set with the following expression:
@activity('Reading JSON file').output.value

Here is the interesting part, let's say I want to execute another ADF pipeline within my "For Each" loop activity and pass one baking recipe (or list of ingredients) as a parameter. Azure Data Factory is flexible enough and I can accomplish this with the following expression:
@array(item())


My sub-pipeline accepts this array parameter value and does further JSON data elements referencing jobs:


I can save incoming parameter value into (var_baking_payload variable) with this expression:
@pipeline().parameters.param_baking_payload 

List of toppings (var_toppings variable) can be set with this expression:
@variables('var_baking_payload')[0].topping

List of batters (var_batters variable) can be set with this expression:
@variables('var_baking_payload')[0].batters.batter



The thing that I got excited the most while working with JSON file outputs in Data Factory pipelines was that I could still pass JSON arrays between different tasks/activities or pass those arrays to another sub-pipeline as a parameter, and nested elements referencing still worked! 

If I'm the last one to learn this, I'm still excited :-)

Happy Data Adventures!

Case
I want to create an Azure Function with Python code. How do I create and deploy one in Azure?
Write Python in Visual Studio Code to create an Azure Function











Solution
In this blogpost we will create and deploy a very simple 'hello world' Azure Function with an HTTP trigger which you can extend to your own needs. After that we can use the Azure Data Factory pipeline with an Azure Function activity to execute it.

As an alternative you could also create an Azure Function with a Blob Storage trigger that executes when a new file arrives, but we rather want to use that same trigger type to start an Azure Data Factory pipeline that then starts this Function followed by other pipeline activities. This way we have one place that does the triggering/orchestration.

1) Create new Azure Function project
Please first follow the steps of our previous post on how to prepare Visual Studio Code for creating Azure Functions with Python. After that open Visual Studio code and perform the steps below to create your first hello world Function.
  • In Visual Studio code click on the Azure icon in the left menu.
  • In the newly opened pane click on the folder with the lightning icon on it to create a new project. (An additional function can later-on be added to the project with the Lightning-plus icon)
  • Select the folder of the new project (or use the Browse... option)
  • Next select Python as coding language
  • Select the Python interpreter. In our example it is py 3.8.3 (if it's not in the list you need to browse to python.exe on your device)
  • Select HTTP trigger as the template for this example
  • Enter the Function name. This is the name of the function within the project (that can contain multiple functions)
  • For this test example use anonymous as Authorization level
  • The project has been created, but continue with the next step below with some additional actions before you can start coding
Create new Azure Function project















2) Select interpreter and install Linter pylint
To finish the creation of the project we need to select the location of the interpreter and install a linter.
  • Click on the popup in the bottom left corner to select the interpreter. 
  • On the top in the middle you can now select the interpreter. Select the one in the .venv folder which is a subfolder of your project.
  • Next step is to install Linter pylint by clicking on the install button on the new popup in the bottom left corner.
  • Now wait a few seconds for the installation to finish
Select interpreter and install Linter pylint















3) Code in __init__.py
The file '__init__.py' contains your Python code. Below you see the standard / generated code with some extra comment lines for if you are new to Python. For this example we do not extend the code.
# Import module for logging purposes
import logging

# Import module for Azure Functions and give it an alias
import azure.functions as func

# Main function and entry point of this Azure Function
def main(req: func.HttpRequest) -> func.HttpResponse:
# Log information
logging.info('Python HTTP trigger function processed a request.')

# Retrieve parameter 'name' from querystring
name = req.params.get('name')
# If not found try to retrieve it from the request body
if not name:
try:
# Check if there is a request body
req_body = req.get_json()
except ValueError:
# On failure do nothing
pass
else:
# On success try to retrieve name from request body
name = req_body.get('name')

# If a name was found then response with 'Hello [name]'
if name:
return func.HttpResponse(f"Hello {name}!")
else:
# If a name was not found response with an error message
return func.HttpResponse(
"Please pass a name on the query string or in the request body",
status_code=400
)

4) Debug locally
Now we are going to test the Azure Function locally on our Windows device. There are multiple ways to start debugging. Pressing F5 is probably the easiest. See animated gif for more options.
  • In the Run menu on the top of the screen you will find the Start Debugging option. 
  • The terminal pane on the bottom will show a lot of details. Wait a few seconds for it to finish and click (while also pressing CTRL) on the green URL.
  • A new browser window will open and it shows the error output that it cannot find the name parameter.
  • In the browser add a querystring after the URL: ?name=Joost (or your own name of course). Now it will respond with a greeting
  • Close the browser and then hit the disconnect icon on top to stop debugging
Debugging your function locally














5) Create Azure Function in Azure Portal
Before you can deploy your newly created function you first need to create an Azure Function in the Azure portal.

  • Go to the Azure Portal and click on Create a resource
  • Search for function and select Function App
  • Click on the Create button
  • On the Basics tab you find the most important settings
  • Select your Subscription and Resource Group
  • Enter an unique Function App name
  • Select Python as Runtime stack
  • Select the Python Version (3.8 in our example)
  • Select the Region (probably the same as your Resource Group)
  • Optionally go to the Hosting tab for extra settings
  • Choose a new or existing Storage account
  • Change the Plan type (default: Serverless)
  • Optionally go to the Monitoring tab for extra settings
  • Disable or enable Application insights and change its name
  • Click the Review + create button
  • Review the settings and click on the Create button

Create new Azure Function (app) on Azure portal


















Note 1: you cannot create an Azure Function with a linux worker (python) if there is already a Windows worker (C#) in that same resource group and with the same region. You will then get an error: LinuxWorkersNotAllowedInResourceGroup - Linux workers are not available in resource group bitools. Use this link to learn more https://go.microsoft.com/fwlink/?linkid=831180. Also see the Azure Function documentation. Summary: don't mix C# and Python functions within the same resource group.

Note 2: you could also perform these steps within Visual Studio Code during deployment.

6) Deploy Azure Function to Azure Portal
Now that we have an (empty) Azure Functions app in the Azure portal we can deploy our newly created Azure Function to this resource.
  • In Visual Studio code click on the Azure icon in the left menu.
  • In the newly opened pane click on the blue arrow (deploy) icon
  • In the drop down select your Azure Functions App from the previous step
Deploy Azure Functions from Visual Studio Code














7) Testing in portal
Now that we have deployed our project to Azure Functions we can test it in the Azure Portal. For this example we will use the post method.
  • Go to the Azure Portal and then open you Azure Functions App
  • In the left menu click on Functions
  • In the list of functions click on your function (only one in this example)
  • In the left menu click on Code + Test
  • Click on the test button (top center)
  • Change the HTTP method to post
  • Select one of the keys
  • Enter a JSON message in the body: {"name":"Joost"} (name=case-sensitive)
  • Click on the Run button and see the result
Testing in the Azure Portal

















7) Executing from Azure Data Factory
Now if you want to execute this new Azure Function in Azure Data Factory with the Azure Function Activity you can follow the steps in this previous post. However without code changes it will return an error stating that the response is invalid: 3603 - Response Content is not a valid JObject
3603 - Response Content is not a valid JObject
















At the moment it is returning a so called JArray, but it is expecting a JObject (J = JSON). Any other return types than JObject will throw the error above. To overcome this we need two code changes. First we need to import the JSON module by adding: import json at the top of the code with the other imports. Then we need to adjust the return of the Hello world message. See changes after lines 4 and 30.
# Import module for logging purposes
import logging

# import json module to return a json message
import json

# Import module for Azure Functions and give it an alias
import azure.functions as func

# Main function and entry point of this Azure Function
def main(req: func.HttpRequest) -> func.HttpResponse:
# Log information
logging.info('Python HTTP trigger function processed a request.')

# Retrieve parameter 'name' from querystring
name = req.params.get('name')
# If not found try to retrieve it from the request body
if not name:
try:
# Check if there is a request body
req_body = req.get_json()
except ValueError:
# On failure do nothing
pass
else:
# On success try to retrieve name from request body
name = req_body.get('name')

# If a name was found then response with 'Hello [name]'
if name:
#return func.HttpResponse(f"Hello {name}!")
message = {'message': f"Hello {name}!"}
return json.dumps(message)
else:
# If a name was not found response with an error message
return func.HttpResponse(
"Please pass a name on the query string or in the request body",
status_code=400
)

Below a couple of screenshots on how to configure and test this in Azure Data Factory. You might want to store the Function key in Azure Key Vault to avoid keys in your ETL/ELT code.
Set up the Azure Function Activity in Azure Data Factory

















After configuring the Azure Function activity you can hit the debug button and see the result. This output could then be used as input for successive pipeline activities
Successfully executing and getting the response


















Conclusion
First a big thank you to colleague Jasper Diefenbach for helping me out with the Python stuff. In this blog post you learned how to create, test and deploy your first (very basic) Azure Function App with Python code. Then we also showed you how to execute this from Azure Data Factory. In a couple of follow up posts we will show you how to build some useful functions for DWH projects and show you some technical stuff like adding Azure Key Vault to the game. Also check out the C# version of this blogpost.