Search Suggest

Using Azure Functions in Azure Data Factory

(2020-Apr-19) Creating a data solution with Azure Data Factory (ADF) may look like a straightforward process: you have incoming datasets, business rules of how to connect and change them and a final destination environment to save this transformed data. Very often your data transformation may require more complex business logic that can only be developed externally (scripts, functions, web-services, databricks notebooks, etc.).

In this blog post, I will try to share my experience of using Azure Functions in my Data Factory workflows: my highs and lows of using them, my victories and struggles to make them work. If you share the same pain points, if you find any mistakes or feel a total misrepresentation of facts, please leave your comments, there is no better opportunity to learn from positive critiques :-)

Azure Functions gives you the freedom to create and execute a small or moderate size of code in C#, Java, JavaScript, Python, or PowerShell. This freedom releases you from a need to create a special infrastructure to host this development environment, however, you still need to provision an Azure storage account and App Insights to store your Azure Function code and collect metrics of its execution.


Azure Data Factory provides you with several ways to execute Azure Functions and integrate them into your data solution (this list is not complete):
- Web Activity
- Webhook Activity 
- Azure Function Activity

Web Activity
Connection and setting details:
- URL: you need to specify your Azure Function REST API endpoint
- MethodRest API method of your endpoint: "GET", "POST", "PUT", "DELETE", "PATCH".
- Body: JSON request details
- Dataset: Linked Service and dataset that you want to pass to your Azure Function
- Integration runtime
- Authentication: None (anonymous), Basic (with user name and password), MSI and Client Certificate
Pros:
Support of MSI authentication to use password-less connectivity within your Azure environment
- Easy way to execute your Azure Function with just URL, Method and Body for your endpoint call. 
- Use of Retry & Retry interval settings could help to restart failed ADF activities to call your function code
Cons:
Web Activity can only call publicly exposed URLs. It doesn't support URLs that are hosted in a private virtual network.
- The Web Activity will timeout after 1 minute with an error if it does not receive a response from the REST API endpoint (this timeout has nothing to do with a configured Azure Function timeout).

Webhook Activity
Connection and setting details:
- URL: you need to specify your Azure Function REST API endpoint
- MethodRest API method of your endpoint: "POST".
- Body: JSON request details
- Timeout: The timeout within which the webhook should be called back (default value is 10 minutes).
- Authentication: None (anonymous), Basic (with user name and password), MSI and Client Certificate
Pros:
Support of MSI authentication to use password-less connectivity within your Azure environment
- Easy way to execute your Azure Function with just URL, Method and Body for your endpoint call. 
Cons:
- The Webhook Activity will timeout after 1 minute with an error if it does not receive a response from the REST API endpoint (this timeout has nothing to do with a configured Azure Function timeout).
- The concept of using callbackUri property to return a response from your Azure Function is not well explained in the official documentation and may confuse some developers like me :-)

Azure Function Activity
Connection and setting details:
Azure Function Linked Service: reference point to your Azure Function App
Azure Function Namename of the function that you can access via Linked Service based Function App
MethodRest API method of your endpoint: "GET", "POST", "PUT"
Body: JSON request details
Authentication: Function App URL and access key for your Azure Function (configured in a Linked Service)
Pros:
Azure Function Linked Service function key could be sourced from a Key Vault which simplifies both storing/accessing this secret key as well as seamless deployment to other environments.
- Running time of your Azure function is more than 1 minute, however, it is still limited to 230 seconds regardless of your Azure Function timeout setting.
- Use of Retry & Retry interval settings could help to restart failed ADF activities to call your function code
Cons:
- Previously mentioned Azure Function ADF activity timeout limitation of 230 seconds may require the use of HTTP polling if your external code requires more time to complete.

Sample Code of Azure Function and its use in Azure Data Factory
Azure Function
I've created a very simple Azure PowerShell function to return a current time based on the time zone name provided as an input parameter:
using namespace System.Net

# Input bindings are passed in via param block.
param($Request, $TriggerMetadata)

# Write to the Azure Functions log stream.
Write-Host "PowerShell HTTP trigger function processed a request."

# Interact with body of the request.
$timezone = $Request.Body.timezone


if ($timezone) {
$status = [HttpStatusCode]::OK
$timelocal = Get-Date
$body = [System.TimeZoneInfo]::ConvertTimeBySystemTimeZoneId($timelocal, [System.TimeZoneInfo]::Local.Id, $timezone)
Write-Host "Requested TimeZone: $timelocal"
Write-Host "Requested TimeZone: $timezone"
Write-Host "Converted Time: $body"
}
else {
$status = [HttpStatusCode]::BadRequest
$body = "Please pass a name on the query string or in the request body."
}

#Start-Sleep -Seconds 100

# Associate values to output bindings by calling 'Push-OutputBinding'.
Push-OutputBinding -Name Response -Value ([HttpResponseContext]@{
StatusCode = $status
Body = $body
})

Web Activity execution in ADF
I pass the URL of my function, POST method and JSON Body request for the "Eastern Standard Time" zone:


Output Result
Successful execution of my ADF web activity returned a time value in the Output.Response attribute: 
 

Additional notes
1) After setting an artificial way for my function to run for more than 1 minute with the PowerShell "Start-Sleep" command, I received an error 2108 message, "A task was canceled. No response from the endpoint" that confirmed this Web Activity limitation in Azure Data Factory:
{ "errorCode": "2108", "message": "Error calling the endpoint ''. Response status code: ''. More details:Exception message: 'A task was canceled.'.\r\nNo response from the endpoint. Possible causes: network connectivity, DNS failure, server certificate validation or timeout.", "failureType": "UserError", "target": "Web Activity", "details": [] } 


Webhook Activity execution in ADF
I pass the URL of my function, POST method and JSON Body request for the "Eastern Standard Time" zone and set a time out to 9 minutes, just to test:


Output Result
Successful execution of my ADF Webhook activity returned a time value in the Output.Response attribute along with the Status code: 

Additional notes
1) Webhook requires a callBackUri property to be used to return a response from your Azure function. When you test your code, this property is added automatically to your JSON Body request. It's like you don't see it, but you have to explicitly use it in your code:
if ($Request.Body.callBackUri)
{
$callBackUri = $Request.Body.callBackUri
Write-Host "Received callBackUri: $callBackUri"
}

Which then you need to send back to your Data Factory from your function code:
#Call back to Azure Data Factory
if ($callBackUri)
{
$OutputBody = "{ ""Response"":"""+$body+""", ""Status"":"""+$status+"""}"
Invoke-RestMethod -Method 'Post' -Uri $callBackUri -Body $OutputBody
}

2) After setting an artificial way for my function to run for more than 1 minute with the PowerShell "Start-Sleep" command, I received a BadRequest error message, "The request failed with status code '\"BadRequest\"" that confirmed this Webhook Activity limitation in Azure Data Factory:
{ "errorCode": "BadRequest", "message": "The request failed with status code '\"BadRequest\"'.", "failureType": "UserError", "target": "WebHook", "details": "" }


Azure Function Activity execution in ADF
I connect to my Azure Function via Linked Service connection along with the POST method and JSON Body request for the "Eastern Standard Time" zone:



Output Result
Successful execution of my ADF Azure Function activity returned a time value in the Output.Response attribute:



Additional notes
1) Azure Function requires your output to be formatted in JSON JObject format, otherwise, your azure function activity will fail and return the error message that Response Content is not a valid JObject.
# Associate values to output bindings by calling 'Push-OutputBinding'.
Push-OutputBinding -Name Response -Value ([HttpResponseContext]@{
StatusCode = $status
Body = "{ ""Response"":"""+$body+"""}"
})

2) After setting an artificial way for my function to run for more than 230 seconds (just for testing) with the PowerShell "Start-Sleep" command, I received a 3608 error message, "Call to provided Azure function'' failed with status-'BadGatewaywhich confirmed this Azure Function Activity limitation in Azure Data Factory.

Conclusion: I'm being positive :-)
1) Don't use ADF Web Activity to execute your Azure Function due to a number of limitations and contra arguments highlighted above.
2) Use ADF Webhook activity for a very short execution time code (execution less than 1 minute) if you require an explicit request and synchronous workflow.
3) Azure Function activity in ADF is the most favorable approach to execute the code of your Azure Function:
- More time to execute (still limited to 230 seconds)
- ADF pipeline code is cleaner (no use of hard-coded URLs) which makes this approach the best candidate for CI/CD pipelines in Azure DevOps.

However, if you really want to run very long Azure Functions (longer than 10, 30 or 60 minutes) and use Data Factory for this, you can: (1) Create a "flag-file" A in your ADF pipeline, (2) this "flag-file" A could be served as a triggering event for your Azure Function, (3) your Azure Function after this triggering event will run and at the end will create another "flag-file" B, (4) which could be served as a new triggering event for another pipeline in your Azure Data Factory. I have also written a blog post about this Event-driven architecture (EDA) with Azure Data Factory - Triggers made easy.

Happy Data Adventures!

Post a Comment