Health check statistics

The result of all Nexus health checks are stored when they go from one health status to another or when the description returned from the check changes. This means that if you click on a health check in the Admin UI you get statistics about when the health check status changed and for how long the period was when it was unhealthy or degraded.

The built-in checks for Nexus queues and jobs for example will show a history of when jobs fail or when queues have messages with error status and for how long.

The health check dashboard also shows indicators for each check of which status the previous results for the check was. This makes it easier to spot that a check that is currently healthy recently had an unhealthy period.

Importance of the result description

Nexus uses the description text of a health check result to determine that something changed about the check. For the health check statistics every time the result description changes while the status is unhealthy or degraded the result is stored. And if the status is healthy and the description changes a counter is incremented to signal how many different healthy checks there's been in a period.

This means that you should consider what you include in the description. You should not include values such as the current date/time because that will change every time the check is called.

Avoid doing this:

public async Task<HealthCheckResult> CheckHealthAsync(NexusHealthCheckResult? previousHealthCheckResult, ScheduledNexusHealthCheckInitiator initiator, HealthCheckContext context, CancellationToken cancellationToken)
{
    return HealthCheckResult.Healthy($"Everything was fine on {DateTime.UtcNow}");
}

But you should be doing things like this:

public async Task<HealthCheckResult> CheckHealthAsync(NexusHealthCheckResult? previousHealthCheckResult, ScheduledNexusHealthCheckInitiator initiator, HealthCheckContext context, CancellationToken cancellationToken)
{
    var errorCount = GetErrorCount();
    if (errorCount === 0)
    {
        return HealthCheckResult.Healthy($"No errors found");
    }
    else
    {
        return HealthCheckResult.Unhealthy($"{errorCount} errors found");
    }
}

By including the error count in the description Nexus will store the result if the error count changes and you'll get more details when you look at the unhealthy period.

If you want to include even more details you should use the data dictionary on HealthCheckResult like this:

var data = new Dictionary<string, object>();
data.Add("Something", "Interesting");
return HealthCheckResult.Unhealthy($"{errorCount} errors found", null, data);

The data dictionary will be serialized and stored and displayed in the health check details.

Statistics retention

By default Nexus will store the health check statistics for 30 days before it's deleted. You can set a different value on the options object when initializing Nexus health checks:

builder.Services.AddNexusHealthChecks(options =>
{
    options.StatisticsRetention = TimeSpan.FromDays(7);
});

If you set this to TimeSpan.Zero then Nexus will never store any statistics for health checks.