ASP.NET Core has built-in, customizable health check middleware. A utility can periodically probe the health endpoint to determine the health of the web app, which is useful if your web app is load-balanced or hosted in Kubernetes. This post is about implementing a custom health check that you can configure to track any Event Counter.
Goodbye, Windows Performance Counters
Windows performance counters are useful. There are all kinds of perf counters provided out-of-the-box with the .NET Framework. But this all went away with cross platform .NET Core. Microsoft made ETW (Event Tracing for Windows) work on other platforms (as EventSource), so why not perf counters?
Azure Application Insights documents that support for performance counters is limited (emphasis added) and that EventCounters are the way to go:
Support for performance counters in ASP.NET Core is limited:
- SDK versions 2.4.1 and later collect performance counters if the application is running in Azure Web Apps (Windows).
- SDK versions 2.7.1 and later collect performance counters if the application is running in Windows and targets NETSTANDARD2.0 or later.
- For applications targeting the .NET Framework, all versions of the SDK support performance counters.
- SDK Versions 2.8.0 and later support cpu/memory counter in Linux. No other counter is supported in Linux. The recommended way to get system counters in Linux (and other non-Windows environments) is by using EventCounters
Hello, Event Counters
Event Counters are a replacement for Windows performance counters that work cross-platform. EventCounters rely on ETW, or EventSource, to emit metrics. Starting with .NET Core 3.0, some EventCounters are available out-of-the-box. More counters are available in .NET 5. Plus, you can create your own EventCounters.
Reporting on the health of your ASP.NET Core application
If you’re on Azure, Application Insights has a lot of built-in metrics that you can build alerts on. If you want to alert on an EventCounter, the quickest solution is to use Application Insights’ EventCounterCollectionModule.
If your ASP.NET Core web application runs in places other than Azure or if an outside application doesn’t have access to AppInsights data, you may want to expose a health endpoint. This is part of ASP.NET Core in the form of health checks.
Adding a health check is simple. ASP.NET Core has a built-in reference to the
Microsoft.Extensions.Diagnostics.HealthChecks
package. Add the AddHealthChecks()
and MapHealthChecks("/health")
lines
to your startup class:
|
|
There are also health checks available for Entity Framework. For instance, the DbContext health check:
|
|
Detailed health reporting
The /health
endpoint by default reports “Healthy”, “Degraded”, or “Unhealthy”
but provides no further details. Microsoft’s documentation provides examples
of how to write a custom JSON response with Newtonsoft.JSON:
|
|
The resulting response would look like this (including the EF DbContext health check):
|
|
There isn’t an accepted standard for the content of a health check. A proposal was submitted some time ago but not adopted as an official RFC. See https://tools.ietf.org/id/draft-inadarei-api-health-check-01.html
Kubernetes, and perhaps other software, only checks the HTTP status code (see here). The response content is ignored. ASP.NET Core health checks return a 200 status code for “Healthy” and “Degraded” statuses and “503” for “Unhealthy” (see here).
Custom health check based on EventCounters
The purpose of this post is on how to listen to EventCounters to create a custom health check. EventCounters are emitted through EventSource. To capture events from an EventSource, start by creating a subclass of EventListener.
|
|
The OnEventSourceCreated
method is called whenever any code in the process
creates an event source. This generally happens very early during startup.
In my experience, this method is called before the constructor. This creates
a slight problem because if I need something passed in the constructor to
determine what event sources to listen to, then I
won’t have enough information to register as a listener for that event source
when the OnEventSourceCreated
method is called.
An easy solution to this is to hold on to a list of the EventSources until you’re ready:
|
|
We’ll need a way to filter down to just the event sources we need later on. The only interesting thing to check on the event source is the name. This interface will allow a developer to register a filter with dependency injection.
|
|
Implementations of this interface will be passed into the constructor:
|
|
As an example, let’s use the thread count on the thread pool. This is in the
System.Runtime
event source. We’ll fill out the rest of this filter later on
in this post.
|
|
Add this filter in the Startup.ConfigureServices
method:
|
|
The System.Runtime
event source is
built into .NET Core 3.1 and later. Built-in EventCounters are documented
here.
Making the health check non-blocking
When a request is made against the health check endpoint, each registered health check can reactively probe into a system to determine if it’s healthy. EventCounters are emitting on a regular basis (every 1 second by default). It’s best not to wait until the events are emitted after the request as a slow response to a health check could trigger an alert from monitoring infrastructure.
If the latest value from a counter is cached, then the health check can respond instantly. My approach is to use an ASP.NET hosted service. The reason I chose this is to allow the event listener to correctly unregister from the event sources when the hosted service is shut down.
Here’s the code for the hosted service:
|
|
The hosted service gets the EventCounterHealthCheck
object from dependency
injection using the IServiceProvider
. We added two methods to the health
check class: RunHealthCheckAsync
, which will run continuously until the
StopUpdates
method is called. We’ll implement these methods later.
EventCounterHealthCheck
will need to implement the IHealthCheck
interface.
This has a method to return the HealthCheckResult. This can be constructed
asynchronously and returned on demand.
|
|
The default health check result is Unhealthy
. This is helpful if the health
check is used for a readiness probe since that will signal when our instance
(process, pod, etc.) is ready to receive traffic.
Updating the HealthCheckResult
Now that the IHealthCheck
interface is implemented, we need to update the
internal HealthCheckResult
. Let’s start with RunHealthCheckAsync
method
that is started by the hosted service and runs continuously until the
StopAsync
method is called.
|
|
The first step is to review all the EventSource
s to see if any match the
sources we want to listen to. An assumption made here is that all
the event sources have been registered already. This is true for the built-in
.NET event sources but may not be true for a custom event source. One could
re-evaluate the event sources periodically but we’ll do it once to keep the
code simple.
The while loop continues running until the boolean flag is set. Since event
counter data is only emitted every one second (configurable), the loop has a
sleep. Adding a
CancellationToken
to the delay timer allows us to stop the health check
loop immediately. This is done in the StopUpdates
method:
|
|
Next, let’s look into how CheckAllEventSources
works.
|
|
The general idea is that when an event is received, we can look up the list of
filters interested in that event source. In order to receive the events, we
have to enable our EventListener
as a listener for the source. This is done
in the EnableEventSource
method.
|
|
Note that we keep a list of weak references to all the EventSource
objects
we listen to. This will allow the garbage collector to clean them up if nothing
else is using them. When the hosted service is shut down, we stop listening to
those event sources by calling ReleaseEventSources
.
|
|
Handling events
Now that we’re listening to event sources, we’ll get events via the
OnEventWritten
method. Event source data is written in a particular way into
they payload of the event data. This method grabs the counter name and max
value from the event payload.
|
|
If this is an event counter, the payload should contain a dictionary with the
name of the counter. There are a few other fields in this dictionary based on
the counter. For the purposes of this example, we grab the Max
value.
Now we can fill out the EventCounterHealthCheck.OnEventWritten
method to call
all the filters.
|
|
This means the IEventCounterFilter interface changes:
|
|
Let’s implement that in ThreadPoolThreadCountFilter
:
|
|
We’re only interested in the thread pool’s thread count counter. We’ll get the
updated value for this counter every 1 second (as specified in
EnableEventSource
). The part that’s still missing is updating the health
check result based on the value of this counter.
Getting health status from each filter
Every second, the main loop in RunHealthCheckAsync
will update the health
check result by calling UpdateHealthCheckResult
. This method will go to each
filter to get an updated status. If any status is degraded or unhealthy, that
will be the overall status for the health check.
|
|
The filter interface gets a new method, UpdateHealthStatus
:
|
|
Here is an example implementation for our thread pool thread count filter.
|
|
This filter arbitrarily picks 200 as the threshold. Getting above 180 threads indicates a “degraded” state. Degraded doesn’t change the status code returned by the “/health” endpoint.
Adding the custom health check
The AddHealthChecks
method has a IHealthChecksBuilder
object that we can
use to add our health check. We’ll need an extension method in its own static
class.
|
|
This grabs the EventCounterHealthCheck
object from dependency injection. That
will need to be registered in the Startup
class.
|
|
Testing it out
Assuming you’ve been following along by starting with the ASP.NET Core template in Visual Studio and adding the code above, you should be able to hit F5 and start testing. Add “/health” to the URL in the browser (e.g. “https://localhost:44360/health”). You should see a response like this:
|
|
Try setting the threshold lower, clicking around on the website and refreshing the health check to verify that other statuses and status codes appear.