Tracking My Custom Processing Metrics with Application Insights

I have a lot of back-end applications that run processing jobs for users. These jobs run all day and for the most part they are pretty quick (less than a few seconds), but sometimes everything bogs down and all the jobs come to a halt. Users are left with a loading spinner on their client app as they wonder what is happening.

What I want is an easy to use dashboard to show me how many of these jobs are running, which user is running them, and how long they are taking. Then I can look at the dashboard and see if there is a spike of total jobs, a spike in job execution time, or a spike in a single user’s jobs.

I have some custom data like userId, accountId (for which account is being processed). I need those fields so I can filter and group the data.

I could do this by writing a record to the database at the end of each execution, but then I have to connect it to PowerBI and write my own reporting queries. I also need to do all this work everytime I add a new process to report on. I want something easy!

App Insights, maybe?

I’ve used App Insights in the past. I plugged in a set of service to service HTTP calls as request / dependency tracing, but it created a fire hose of data that was difficult to extract any useful information from.

This time instead of a fire hose, I’m going to push in the exact data I want and attempt to build a dashboard to show me the data I want.

Prototype

My prototype solution is going to simulate a process running for different users for various lengths of time. I’m going to push that data to App Insights, and generate reports and a dashboard that I can glance at to see how the system is behaving.

I created a console C# app with the .NET Framework (4.7.2). The only Nuget package I need is Microsoft.ApplicationInsights.

Here’s the app without the metrics code. Simple, just processing jobs that take between 1 and 5 seconds for users that range from 1 to 10.

internal class Program
{
    private static readonly Random _random = new Random();

    public static async Task Main(string[] args)
    {
        while (true)
        {
            var userId = _random.Next(1, 10);

            await Process(userId);
            await Task.Delay(1000);
        }
    }

    private static async Task Process(int userId)
    {
        var processingTime = _random.Next(1, 5);
        await Task.Delay(processingTime * 1000);
    }
}

To log to App Insights, I need a reference to a TelemetryClient. TelemetryClient handles the communication to App Insights and I can create a single instance of the class to share for this entire operation.

    private static readonly TelemetryClient _telemetryClient = new TelemetryClient();

I also need to set the InstrumentationKey on the TelemetryClient. The InstrumentationKey comes from your Azure App Insights portal. Without this key, it won’t know where to store your data. You can copy it from the front page of your App Insights in Azure.

    _telemetryClient.InstrumentationKey = "-- ENTER YOUR KEY HERE --";

I need to track the length of the operation on my own using a Stopwatch. There are other ways to let App Insights track the length for me, but I want to do this simple for now.

    var stopWatch = new Stopwatch();
    stopWatch.Start();

    var processingTime = _random.Next(1, 5);
    await Task.Delay(processingTime * 1000);

    stopWatch.Stop();

Now, I’m going to use the TelemetryClient’s TrackEvent method to store the custom Event to AppInsights. TrackEvent takes in an EventName string, which I’ll use to identify this operation at App Insights. I’m calling it “ProcessExecution”.

You can also add two other sets of data to your event. Properties and Metrics.

Properties are the things that you may want to filter and group by. So, in my case UserId is a property. Properties are always strings.

Metrics are the things that you want to report on. In this case, ProcessingTime is the metric. Metrics are always numeric.

    var properties = new Dictionary<string, string> { { "userId", userId.ToString() } };
    var metrics = new Dictionary<string, double> { { "processingTime", stopWatch.ElapsedMilliseconds } };

    _telemetryClient.TrackEvent("ProcessExecution", properties, metrics);

And that’s it! This should give me everything I need to report on how long this processing is taking.

I’ll run the app and let it go for a few minutes, then login to Azure to find my data.

Azure Reporting

In Azure, I go to Application Insights, and choose Analytics from the top menu. This takes me to a query tool that looks SQL-y but it’s not SQL. My immediate reaction is ugh… I don’t want to learn a new query language.

But this is actually pretty easy. I read through this Get Started document on the Kusto language and it taught me everything I needed to know for this:
https://docs.microsoft.com/en-us/azure/azure-monitor/log-query/get-started-queries. Take the time to read through it, it will save you so much time in the long run.

The table we’re interested in is customEvents and we want the records that are named ProcessExecution.

customEvents
| where name == "ProcessExecution"

And there’s our data! (I can’t believe it worked on the first try…)



Now, what I want to see is the average processing time for each minute of the day. Easy with my new Kusto skills!

customEvents
| where name == "ProcessExecution"
| extend processingTime = toint(customMeasurements.processingTime)
| summarize avg(processingTime) by bin(timestamp, 1m)
| order by timestamp desc

This filters the custom events to our ProcessExecution events. Extend grabs the processingTime out of the metrics data that we added (notice it is a string, so we need to cast it to an integer). Summarize lets us average the processingTime over 1 minute intervals by using the bin(timestamp, 1m). That gives me this awesome report.

And now I can even display it as a Chart by clicking on the Chart button above the results.

I can even quickly add this chart to a Dashboard by using the Pin button at the top right of the screen.

This was pretty easy, much easier than I anticipated. I got this done in about an hour, and then got my actual application using it in about 2 more hours. I now have a running dashboard that shows me average execution time, executions per user, and execution failure rates over time.

I’m going to let this run in Production for a while and see how much data it consumes because with App Insights you pay for the data you use. There is also the ability to add Alerts, so I could get an Alert anytime a threshold is hit. So if ExecutionTime goes over 30 seconds, I could alert my development team. I may play with this in the future, but the costs rise when you add Alerts.

I’m pretty excited that this was so easy. Writing this from scratch with a database would have been pretty quick, too, but I wouldn’t have been able to get the quick reporting data and I’d have had to manage all the storage of this data on my own. If this goes well over the next few weeks I’ll add it to even more of my services.