Showing posts with label STREAM_ANALYTICS. Show all posts
Showing posts with label STREAM_ANALYTICS. Show all posts
Case
How do you use Azure Data Lake Store to store Stream Analytics data and why would you do that?

Solution
In an earlier post we have set up a Stream Analytics Job where sensor data is send to a Blob file. In this case we create a new Stream Analytics Job with Data Lake Store as output and were we use the sensor data as input. Despite the fact that the Input and Output is different, the query stays the same as other Stream Analytics Jobs. You can find the configuration of the Stream Analytics Job query here.

A Data Lake Store is a scalable repository optimized for storing IoT data, log files and other large datasets for Big Data scenarios. It contains folders and these folders contain the files (data). As said before, you can also store (sensor) data using Blob storage. These two storage options are similar, but there are some important differences, which we will explain later in this post.

Overview Data Lake Store with different types of data




















1) Create a Data Lake Store
First we have to create the Data Lake Store. Click on New (+ icon) in your portal and under the Storage category you will find the Data Lake Store. Give it a suitable name. Next we choose the Resource Group which we created earlier by setting up the IoT Hub. The benefit of choosing the same group is that among other things, the rights are the same. At this time there are not many options available in both Location and Pricing, so it's default.

Azure Portal - Create Data Lake Store














2) Create the Stream Analytics Job
Before creating this new job, we already added a new consumer group to our IoT Hub, called 'DataLake'. Using multiple consumer groups makes it possible for several consumer applications to read data from this IoT Hub independently. Click here to see where you can add/manage consumer group(s).

Note:
The Azure Portal is still in development, so adding a new consumer group is now at a different place than in an earlier post. See screenshot below.

Azure Portal - Adding Consumer Groups














Now we can create the job. We choose the same Resource Group as the IoT Hub and Storage account, just like the creation of the Data Lake. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create Stream Analytics Job














3) Configure the Stream Analytics Job
We start with defining the Input. We choose 'Data stream' as Source Type because the sensor data is an ongoing stream and is derived from the IoT Hub. Under Source we choose 'IoT hub'. In our case we have one IoT Hub, but when you have multiple IoT Hubs you can choose one from the list. Now our IoT Hub appears automatically. Next we choose 'datalake' as Consumer group. This is the one we have created earlier. Finally we choose 'JSON' as Event serialization format.

Now we can specify the Output. Give it a suitable name and choose 'Data Lake Store' as Sink. The corresponding (in thise case the one we have created earlier) Account Name will be selected automatically. Next we enter a file path to store the files in our Data Lake Store account. Optionally, you can specify this path with date and time. Now the data will be stored in multiple instances per day and per hour and this makes the storage more clearly. At last you will choose the 'JSON' format.

In this post we only configuring the Input and Output of the job, as mentioned earlier. Now we can run the job with a valid query.

Azure Portal - Configure Stream Analytics Job














Result
After running the Stream Analytics Job, the data is now stored in the Data Lake Store. You can find this in the Azure portal. Open the Data Lake Store and go to Data Explorer. Now you see one folder named 'sensordata'. This folder contains multiple subfolders: year, month, day and hour. Now we have only one file with data in the subfolder 'hour', but each next hour there will be a new file (as long as your Stream Analytics Job is running). This is exactly what we have configured earlier. In each file, the data is stored per 10 seconds. It works!

Azure Portal - Result Data Lake Store















Differences between Blog Storage and Data Lake Store
The first big difference is the size limits. There are no limitations in account/file size or number of files at a Data Lake Store, while Blob storage has such restrictions. In addition Data Lake Store has built-in Hadoop integration. Therefore (along with the unlimited storage) this makes it suitable for storing Big Data and then analyze this data. Another difference is the authentication. Blob storage works with generated storage access keys, while a Data Lake Store use Azure Active Directory for this. 

Overall a Data Lake Store has more possibilities then Blob storage and is optimized for Big Data purposes. The general purpose of Blob Storage is storing data in different scenarios like backups or media files (for streaming). The starting prices are lower for Blob storage, but you have different storage prices for your Blob Storage account. This means: you can make it as expensive as you want and in some cases the monthly charges per GB will be higher than a Data Lake Store. Click here for more details about the differences and prices between a Data Lake Store and Blob storage.

Conclusion
Azure Data Lake Store is very useful with Big Data scenarios because it can combine storage (to more then 1 petabyte) with the ability to analyze this data. This can be done with Hadoop (built-in integration) or Azure Data Lake Analytics, which is specially optimized to work with Azure Data Lake Store. From this perspective it offers more than, for example Blob storage.
Case
How do you use Azure Blob Storage to store stream analytics data and why would you do that?

Solution
In earlier posts we have set up an IoT environment with an IoT Hub and a couple of Stream Analytics Jobs where sensor data is sent to different destinations: Power BI for a real-time dashboard and Azure SQL Database to store the data. In this case we create a new Stream Analytics Job with Blob Storage as output and we use the sensor data as input. The Input and Output are different, but the query is the same as other Stream Analytics Jobs. Configuring the query of the Stream Analytics Job can you find here.

Reasons to use Blob Storage is the diversity of storing in the cloud of text or binary files. The unstructured data can include documents, social data (photos, videos, music and blogs), Big Data (logs, IoT and large datasets) or images and text for web applications. Click here for more pricing details about Blob Storage. You can also store (sensor) data in an Azure Data Lake. Click here for more information about this, along with the major differences between Blob Storage and Data Lake.

For the use of Azure Blob storage you need a Storage account. Then you can add containers to this account, which include the Blob files. In the case of an image, you also find a metadata file. In our case the Blob files are JSON files and contain sensor data.

Overview Azure Blob Storage with images











1) Create a Storage account
To make use of Blob Storage you have to create a Storage account first. After we give it a suitable name, we choose Blob storage and 'RA-GRS' as Replication. This is the default and it contains the most options. Click here for more information about this. Next we choose 'Hot' by Access tier, because we want access the sensor data frequently. 


Azure Portal - Create a Storage account















Note:
You can also create a Storage account when setting up the Output of the Stream Analytics Job, but you have less options so it is not recommended. 

2) Create the Stream Analytics Job
Before creating the new job, we had already add a new consumer group to our IoT Hub. We called it 'blob'. Using multiple consumer groups makes it possible for several consumer applications to read data from this IoT Hub independently. Click here to see where you can add/manage consumer group(s). 

Now we can create the job. We choose the same Resource group as the IoT Hub and Storage account, because these are in the same life cycle. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create Stream Analytics Job














3) Configure the Stream Analytics Job
First we must add a new Input to the job. The default Source Type is 'Data stream'. We choose this because  the sensor data is an ongoing stream and is derived from the IoT Hub. The Source is 'IoT hub' and then the IoT Hub that you have created automatically appears. If you have more then one IoT Hub, you can choose one from the drop-down list. After this you must choose the right Consumer group. This is the new group (blob) we have created earlier. Finally you choose 'JSON' as Event serialization format

Next we add a new Output. First choose 'Blob storage' in Sink and 'Use blob storage from current subscription' as Subscription, because you have configured the storage account earlier. Otherwise you can edit the storage account settings here by choosing 'Provide blob storage settings manually'. Then you create a new container. Optionally, you can define one or more instances (subfolders) within the container. With this option you can change the date and time format so you can have multiple instances including different dates and/or time folders. This makes it more clear (just like your own local File Explorer) and you have the choice to select data from a specific day/time. We made a instance called 'sensor'. At last you will choose the 'JSON' format. 

As we said, we discuss only the configuration of the Input and Output of the job in this post. After configured the job we will run the job with a valid query. 

Azure Portal - Configure the Stream Analytics Job for Blob














Result
Now the data is stored, we want to see what's in our Blob. Therefore you have to go to your Azure Storage account in the portal. Every object, in our case a Blob, that you store in Azure Storage has a unique URL address. For the Blob service with the storage account name (bitoolsblobstorage) you have created and the container name (sensordata) with the instance (sensor) the URL/endpoint is: 
http://bitoolsblobstorage.blob.core.windows.net/sensordata/sensor
More information about the Azure Storage endpoints here.

In the portal go to the Storage account you have made earlier. Click on the container URL and then you see the instance (subfolder). Now you can drill down further on the specific month, day and hour of the incoming sensor data.

Azure Portal - Your Blob file














Conclusion
It looks a lot like the other Stream Analytics posts, but in this case the data is stored in a Blob file. From this point you have several options to do something with this data. First you can do nothing off course and in that case you use Blob purely for storage (backup). You will find the other options in the Cortana Intelligence Suite. For example process the data with Azure Data Factory (this can also be done with traditional SSIS, which is off course not a part of the CIS), analyse the data with Machine Learning or visualize the data in Power BI. 










Case
I want to aggregate data in a stream. How does that work in Stream Analytics?

Example data: count people in front of booth    

















Solution
Because it's a stream you aggregate data in a certain time window instead of the whole dataset. The Stream Analytics Query language is very similar to TSQL and it it has some extensions like the Windowing functions to aggregate data in time windows. At the moment there are three Windowing extensions (Hopping, Sliding and Tumbling), but it is not inconceivable that more windowing functions will be added in the near future.

General rules:
  • The length of each window is fixed
  • Windowing only work in combination with the GROUP BY clause
  • The time units can be day, hour, minute, second, millisecond or microsecond, but the maximum size of the window in all cases is 7 days.


1) Tumbling Window
The Tumbling Window is the easiest to explain. It aggregates data within a X second/minute/etc. time window and does that every X seconds/minutes/etc.

For example: Tell me the average number of visitors per booth over the last 10 seconds every 10 seconds:
SELECT    Booth, avg(HeadCount) as AvgHeadCount
FROM HeadCountStream TIMESTAMP BY MeasurementTime
GROUP BY Booth, TumblingWindow(second, 10)

Tumbling Window














Tumbling Window


















2) Hopping Window
The Hopping Window is very similar to the Tumbling Window, but here the windows have a overlap. It aggregates data within a X second/minute/etc. time window and does that every Y seconds/minutes/etc.
For example: Tell me the average number of visitors per booth over the last 10 seconds every 5 seconds:
SELECT    Booth, avg(HeadCount) as AvgHeadCount
FROM HeadCountStream TIMESTAMP BY MeasurementTime
GROUP BY Booth, HoppingWindow(second, 10, 5)

Hopping Window



















Hopping Window

















3) Sliding Window
The Sliding Window is the most difficult to explain. It aggregates the values in the time window every time a new event/measurement occurs or an existing event/measurement falls out of the time window.
So when using the Sliding Window you are interested in aggregating values when ever an event occurs. This is in contrast to the Hopping and Tumbling windows which have a fixed interval.

For example: Tell me the average number of visitors per booth in the last 10 seconds:
SELECT    Booth, avg(HeadCount) as AvgHeadCount
FROM HeadCountStream TIMESTAMP BY MeasurementTime
GROUP BY Booth, SlidingWindow(second, 10)

Sliding Window                                                                              

















  • The first aggregation occurs when the first measurement value (1) is streamed.
  • The second aggregation occurs when a new measurement value (3) is streamed.
  • The third, fourth, fifth, etc. is equal to the second aggregation because each time a new measurement value (1) is streamed.
  • The last aggregation occurs when no more new measurement values are streamed and the second last measurement value (2) falls out of the time window.

Sliding Window



















Compared to the Hopping Window (with the same data) you only get an extra result row at the start (1) and one at the end (1) because in this example the events happen in a fixed interval. It gets more interesting when the events are coming in more randomly like tweets about a certain subject.

Timestamp by
You probably noticed the TIMESTAMP BY measurementTime clause after the FROM. This tag lets you set the exact timestamp that an event occurred, rather than the arrival time in the IoT Hub. This timestamp is used by the windowing functions.

Testing query with Windowing functions
In the old portal you can test the query and study the result (at the moment of writing, testing is not yet supported in the new portal). For this you need a json file with some messages in it. These messages should look identical like the messages you send via the IoT Hub. For this example I created a text file with the following text in it:
{"headCount":1,"measurementTime":"2016-09-17T18:25:43.511Z","sensorName":"A"}
{"headCount":3,"measurementTime":"2016-09-17T18:25:47.511Z","sensorName":"A"}
{"headCount":2,"measurementTime":"2016-09-17T18:25:53.511Z","sensorName":"A"}
{"headCount":3,"measurementTime":"2016-09-17T18:25:57.511Z","sensorName":"A"}
{"headCount":4,"measurementTime":"2016-09-17T18:26:03.511Z","sensorName":"A"}
{"headCount":2,"measurementTime":"2016-09-17T18:26:07.511Z","sensorName":"A"}
{"headCount":2,"measurementTime":"2016-09-17T18:26:13.511Z","sensorName":"A"}
{"headCount":1,"measurementTime":"2016-09-17T18:26:17.511Z","sensorName":"A"}

When you hit the test button in the query editor you need to upload a json file for testing. Then it will use that data to test your Stream Analytics query and show the result. In the pictures below you will see the test data in the upper right corner and the query result at the bottom. The red numbers show how the average was calculated.
Sliding Window


















When you leave out one measurement, two rows will change in the result.
Sliding Window, leaving out 1 measurement


















When you leave out two successive measurements, two rows will change in the result and one row will disappear.
Sliding Window, leaving out 2 successive measurements



















Conclusion
Tumbling and hopping window are easy to understand. Sliding window is a little harder to understand, but writing the query is very easy. Using the windowing functions is something you would probably want to do in the hot path to stream to PowerBI. This way you don't get to much data in the stream.
Case
Your sensors are connected with an IoT Hub and is generating data. In our previous post we send the real-time data to an Power BI dashboard. What are the other possibilities in Azure with this data?

Solution
In our previous post we distinguish two streams for our data: Cold path and Hot path. In this case we store the data in a Azure SQL Database, which is a form of the Cold path. See here for the full list of Azure SQL Databases (size and prices) where you can choose from. The reason to store the data may be, for example, to analyze the data or to prepare a dataset as input for your Machine Learning experiment/model. Just like the Hot path, we are setting up a (separate) Stream Analytics Job for this. You can have multiple Outputs in one Job, for example real-time Power BI and SQL Database, but when you want to edit the query for the data to the database, you must stop the Job and also your real-time data is not sent. Before we create the job, we first set up the Azure SQL Server and after that the SQL Azure Database. The reason for this is that we want to select the database by the Output of the Stream Analytics Job and therefore we need to create it first (along with the server).

Cold path with Stream Analytics









1) Create the Azure SQL Server
Go to the Azure portal and click on 'More services' on the bottom of the menu (left-hand side of the screen) and search for SQL server. When you have opened it, click on 'Add' to create a new Stream Analytics Job. Perhaps you noticed that in our previous post instead of 'Browse' now 'More service' stands. The portal is still in development so there are regular updates.

Azure Portal - Create SQL Server















Now you can fill in your Server name (the name cannot contain spaces). Next we create a SQL Server login and this is our Server admin. You can also use Azure Active Directory (user or group) for this. Click here for more information. The Subscription is filled automatically. After this we choose a Resource group. For the convenience we choose the Resource group we have created earlier by setting up the IoT Hub, but you can also create a general Resource group so the server can be used for purposes other than IoT. Otherwise the server has the same lifecycles, permissions and policies as the IoT Resource group. Our Location is the Netherlands, so we choose West-Europe.

Azure Portal - Create SQL Server (continuation)















Tip:
When the deployment of the server succeeded, the server must appear in the list of SQL Servers. If not, you must click on the 'Refresh' button at the top under SQL Servers.

2) Create the Azure SQL Database
Next we create the database. In your Azure portal click on 'More services' and search for SQL Database. When you have opened it, click on 'Add' to create a new database.

Azure Portal - Create SQL database














Choose a name for your database. The Subscription is filled automatically and next we choose for the same Resource group as earlier by setting up the SQL Server. Select 'Blank database' (new database) and choose the SQL Server that you have created earlier. If you don't choose a server, Azure creates automatically one. That is the reason why we set up the server first, because maybe you want to create one server (with a general name) and to attach here multiple databases. Otherwise you have a separate server for every database. That can also be a conscious choice off course, but that is not what we want in this case. At last we choose the 'Basic' database, but here you can choose the size that fits your needs. The Collation is default.

Azure Portal - Create SQL database (continuation)














Tip:
If you decide to add in Management Studio (once you have connected) a new database, be aware of the fact that Azure creates default the S3 (Standard) version of a database. Therefore, you should always create a new database in the Azure portal, so that you can choose the right size and price.

3) Connect to SQL Server and create a table
Once the database is created, you can connect to the SQL Server in Management Studio. In this case our Server name is 'bitools.database.windows.net,1433'. As you can see the name includes the default port of 1433 (this is the only port on which the service is available). Next you choose 'SQL Server Authentication' and fill in the login and password that you have created earlier by setting up the SQL Server. The first time you must Sign In with your Azure account. This is also the case when you have not made connection to the server for a while. At last you must add your client IP for access to the server. Your IP is now added to the firewall.

SQL Server Management Studio - Connect to Azure














Before we create the Stream Analytics Job, we must do one last thing and that is create a table where we can store the sensor data. I have created the following table in the database:

CREATE TABLE [dbo].[sensorData](
[SensorName] [nvarchar](max) NULL,
[MeasurementCount] [bigint] NULL,
[MeasurementTime] [datetime] NULL,
[Temperature] [float] NULL,
[Humidity] [float] NULL,
[Pressure] [float] NULL,
[Altitude] [float] NULL,
[Decibel] [float] NULL,
[DoorOpen] [bigint] NULL,
[Motion] [bigint] NULL,
[Vibration] [bigint] NULL,
[Illumination] [float] NULL
)

As you can see I choose for float as datatype, because the standard data types in Azure are floats. This means that input datatypes such as decimal and numeric are converted to floats.

Tip:
You can manage the firewall in the Azure portal. Go in the portal to your server, click on it and under settings you will find 'Firewall'. Here you can add Client IP's to allow connection to the server. Note that this can only be done by an user who have the role of 'Owner'. In this case I created the server so I'm automatically owner of the server.


4) Create the Stream Analytics Job
In your Azure portal click on 'New'. Type in 'Stream Analytics Job' and click on it. Next you click on the result, in this case only Stream Analytics Job. After that, you can click on the 'Create' button.

Azure Portal - Create Stream Analytics Job (extensive)














Tip:
In our previous post we create a new Stream Analytics Job on a faster and different way. This way is extensive and gives some general information about, in this case, a Stream Analytics Job. So if you want more information about a feature in Azure before you create it, this is a useful way. 

Now you can fill in your Job name (the name cannot contain spaces) and the Subscription is filled automatically. Next choose a Resource group. These groups are made by setting up the IoT Hub. Click here for more information and how you create it. When you have created this, it appears in the list of 'use existing' and you can choose this one. Our Location is the Netherlands, so we choose West-Europe. At last you can pin your Job right away to your dashboard, with the checkbox at the bottom. You may have noticed at the first screenshot that I have already pin the previous Stream Analytics Job to my dashboard.

Azure Portal - Create Stream Analytics Job (continuation)














5) Define the Input
Once the Job is created, you must add a new Input. Because I have pinned the Job (screenshot 6 of 'Create the Stream Analytics Job'), you can select it from the dashboard. The default Source Type is 'Data stream', because  the sensor data is an ongoing stream and is derived from the IoT Hub. Optionally, you can add 'Reference data' as type. This data is like static metadata next to your sensor data, it gives your sensor data more meaning. Here you can find more information about this kind of data. The Source is 'IoT hub' and then the IoT Hub that you have created automatically appears. If you have more then one IoT Hub, you can choose one from the drop-down list. The next thing is to choose the right Consumer group. These groups are made by setting up the IoT Hub. Click here for more information and how you create it. In this case we want to store the sensor data in a Azure database, so you choose 'azuredb'. Finally you choose 'JSON' as Event serialization format. Click here for more information and how you create such a JSON message.
  1. Click on the job
  2. Click on 'Inputs'
  3. Click on 'Add'
  4. Fill in a name, select 'Data stream' as Source Type  and select your IoT Hub as Source
  5. Select 'azuredb' as Consumer group and choose 'JSON' as Event serialization format
Azure Portal - Define Input














6) Define the Output
After the Input, you create the Output. When you have given the Output a suitable name, you choose 'SQL database' in Sink. Then select your Database that you have created, in our case 'IoT_Sensor'. After that the Server name, that you have created too, is filled in automatically. Next you must connect to the database with the SQL Server login you made earlier. Now you can choose a table, in our case 'sensorData'
  1. Click on the job and click on 'Outputs'
  2. Click on 'Add'
  3. Fill in a name and select 'SQL database' as Sink
  4. Select your Azure Database that you have created
  5. Log in with the SQL login Username and Password
  6. Choose the created Table
Azure Portal - Define Output














7) Define the Query
Now the Output is defined, you can build up the query. Compared to our previous post you see that there are already some updates have been made. For example, on the left you see your Input and Output. The query needs always an Input and an Output, so that's why we have created the Output first. It is good to know that the language is SQL, but there are certain differences with a normal SQL query. In addition, the standard data types are floats. 
Besides a FROM clause, there is an INTO clause. For the FROM you will use your defined Input and the Output is used for the INTO. Additionally, there are various new windowing functions available. This will be discussed in another blog. You will find more details about the Stream Analytics Query Language here. For now we use a simple query without those functions. 

The query:

SELECT   CAST(sensorName as nvarchar(max)) as SensorName
, CAST(1 as bigint) as MeasurementCount
, CAST(measurementTime as datetime) as MeasurementTime
, Temperature
, Humidity
, Pressure
, Altitude
, Decibel
, CAST(doorOpen as bigint) as DoorOpen
, CAST(motion as bigint) as Motion
, CAST(vibration as bigint) as Vibration
, Illumination
INTO [saj-bitools-DB-Output]
FROM [saj-bitools-DB-Input]

Unfortunately, the testing of the query is not supported in the new Azure Portal. They are working on it.

Azure Portal - Define Query














7) Start the Job
At last you must start the Stream Analytics Job. You can choose between ad-hoc (now) or a scheduled day and time (custom).

Azure Portal - Start the Stream Analytics Job














Result
We have started the Stream Analytics Job and now we want to see the result. Go back to your SQL Server Management Studio and connect to the Azure Database. When you look at the table, you must see the results. In our case we have sent 100 messages to our database. The messages are sent every 10 seconds.

SQL Server Management Studio - Table results











Conclusion
The steps are logical, but the sequence of the execution is very important. Also be careful about which database you buy, because there are big differences between the prices. 
Case
Your sensors are connected with an IoT Hub and is generating data. What are the possibilities with this data and more important, how does this works in Azure?

Solution
In Azure, we can use Stream Analytics to do this. We distinguish two streams for our data:
  • Cold path
  • Hot path
Possibilities with Stream Analytics








There is also the possibility for Machine Learning. You can use this for both Cold path and Hot path. We are skipping this for now. This will be discussed in another blog.

Cold path
The Cold path means that the data will be stored for further processing, before presenting it to the end users. Examples of a Cold path are loading the data into a Azure Data Lake or an Azure SQL Database. For the latter, you have to create first the Azure SQL Database. You also have to organize the authentication, make some custom tables where the data can be stored in and finally make the Stream Analytics Job. Setting up all this will be explained in another blog. 

Hot path
The other path is the Hot path and this means that the data is real-time and will be displayed into Power BI. First you have to create a Stream Analytics Job and then do the configuration.


1) Create the Stream Analytics Job
You can make the Job in both old portal and the new portal. We prefer to make as much as possible in the new portal, because this will be the standard in the future.

Go to 'Browse' on the bottom of the menu (left-hand side of the screen) and search for Stream Analytics jobs. When you have opened it, click on 'Add' to create a new Stream Analytics Job. 

Azure Portal - Create Stream Analytics Job













Tip: 
As you can see, on the left in the Azure menu I have some standard resources. When you click on a blank star (favorite) button, it will append on the left in your menu (see screenshot 2). You can also select 'All recourses' in the menu. This contains a list of all features and here you can add any feature, also a Stream Analytics Job.

Now you can fill in your Job name (the name cannot contain spaces) and the Subscription is filled automatically. Next choose a Resource group. These groups are made by setting up the IoT Hub. Click here for more information and how you create it. When you have created this, it appears in the list of 'use existing' and you can choose this one. Our Location is the Netherlands, so we choose West-Europe. At last you can pin your Job right away to your dashboard, with the checkbox at the bottom. You may have noticed at the first screenshot that I have already pin some some Jobs to my dashboard

Azure Portal - Create Stream Analytics Job (continuation)














Once the Job is created, it will appears in the list of Stream Analytics Jobs with the status 'Created'.

2) Define the Input
Once the Job is created, you must add a new Input. The default Source Type is 'Data stream', because  the sensor data is an ongoing stream and is derived from the IoT Hub. Optionally, you can add 'Reference data' as type. This data is like static metadata next to your sensor data, it gives your sensor data more meaning. Here you can find more information about this kind of data. The Source is 'IoT hub' and then the IoT Hub that you have created automatically appears. If you have more then one IoT Hub, you can choose one from the drop-down list. The next thing is to choose the right Consumer group. These groups are made by setting up the IoT Hub. Click here for more information and how you create it. In our case we want to present the sensor data in a Power BI dashboard, so you choose 'powerbi'. Finally you choose 'JSON' as Event serialization format. Click here for more information and how you create such a JSON message.
  1. Click on the job
  2. Click on 'Inputs'
  3. Click on 'Add'
  4. Fill in a name, select 'Data stream' as Source Type  and select your IoT Hub as Source
  5. Select 'powerbi' as Consumer group and choose 'JSON' as Event serialization format
Azure Portal - Define Input















3) Define the Output
After the Input, you create the Output. When you have given the Output a suitable name, you choose 'Power BI' in Sink. Then you should authorize with your Power BI account. In most cases, this will be the same as your Azure account. Please note that this only can be done with an organization account. When you are successfully logged in, you must choose a Group Workspace. The default is 'My Workspace'. The downside of 'My Workspace' is that when you want to share your Power BI dashboard with other persons and such persons enable to edit this dashboard, this doesn't work in a local workspace. You must choose a workspace where several people are part of, for example a SharePoint group. Everybody in this group can see and edit Power BI dashboard(s) in this workspace. In our example is this the 'IoT' group.
  1. Click on the job and click on 'Outputs'
  2. Click on 'Add'
  3. Fill in a name and select 'Power BI' as Sink
  4. Authorize with your Azure account
  5. Select your Group Workspace
  6. Fill in a Dataset name and Table name
Azure Portal - Define Output














4) Define the Query
Now the Output is defined, you can build up the query. Because the query needs always an Input and an Output, we created the Output first. It is good to know that the language is SQL, but there are certain differences with a normal SQL query. In addition, the standard data types are floats. 
Besides a FROM clause, there is an INTO clause. For the FROM you will use your defined Input and the Output is used for the INTO. Additionally, there are various new windowing functions available. We use one of them. You will find more details about the Stream Analytics Query Language here. For now we use the Tumbling Window function and the Timestamp function. 

Tumbling Window function
With the Tumbling Window function you can define your own time intervals, that will not overlap each other. Our data is send per second to our IoT Hub, but here we make an interval of one minute. Based on that we use several aggregate functions, that will calculate the data for one minute. See this post for more details.

Timestamp By function
The other function is the Timestamp. A query in Stream Analytics contains nearly always a datetime field, because the data is send at a specific time. This field is passed to 'TIMESTAMP BY' function. This ensures that the data is coming in at the time when it is created, instead of the time when the data has been sent. Perhaps the transfer of the data is delayed and causes the data in a different order by entry. 

The query:

SELECT   sensorName
, Max(measurementTime) as measurementTime
, CAST(1 as bigint) as measurementCount
, Avg(decibel) as decibel
, Sum(doorOpen) as doorOpen
, Avg(humidity) as humidity
, Avg(illumination) as illumination
, Sum(motion) as motion
, Avg(pressure) as pressure
, Avg(temperature) as temperature
, Sum(vibration) as vibration
INTO [saj-bitools-Output]
FROM [saj-bitools-Input] TIMESTAMP by measurementTime
GROUP BY sensorName
, TumblingWindow(minute, 1)

Unfortunately, the testing of the query is not supported in the new Azure Portal. They are working on it.

Azure Portal - Define Query
















5) Start the Job
At last you must start the Stream Analytics Job. You can choose between ad-hoc (now) or a scheduled day and time (custom).


Azure Portal - Start the Stream Analytics Job
















Power BI
We have started the Stream Analytics Job and now we want to present the data in Power BI. Log in here with your Power BI account. Select on the top left your Group Workspace. This is the same group that you have chosen earlier when defining the Output. When you have selected the right workspace, the dataset (defined in the Output) appears automatically. Select the dataset en now you can build a report. In 10-15 minutes we have created a simple report. It contains the amount of input events, averages of the measurements and a timeline with these measurements.

When you click the refresh button, the data will be refreshed. Next you can pin the live page to a dashboard. If the pin doesn't work the first time by adding at to a new dashboard, just pin the report again to the existing dashboard (you created previously). Now you have your first Power BI dashboard with live sensor data.
  1. Click on 'Power BI'
  2. Click on the menu
  3. Select your Group Workspace
  4. The dataset appears automatically
  5. Click on the dataset
  6. Make the report and click on 'Pin Live Page'
  7. Pin the report to a new dashboard en give it a suitable name
  8. Select the dashboard and you will see your report
Create the Power BI dashboard


Power BI example





















Note
In another blog we will show you how to send your sensor data to an Azure SQL Database. Because you have these different options for your output, we choose to create a separate job for our livestream sensor data to Power BI. You can create multiple Inputs, Query's and Outputs in one job. The disadvantage is that you want to edit a single query (for example the livestream query), you must stop the entire job. This means that other Outputs (like Azure database) are stopped too and do not receive any data anymore through this job. 

Conclusion
A lot of steps to do before you can make a Power BI dashboard. If you can build once, you can pretty quickly put together other dashboards.