Unlocking the secrets of network load balancer metrics: A guide for sysadmins who’ve seen a few logs in their day

Written by Sascha Neumeier | Jan 21, 2025

Hello, fellow sysadmins! Whether you’re sipping your seventh coffee or recovering from the latest “urgent” Slack ping, let’s talk about something near and dear to our uptime-loving hearts: network load balancer metrics. Specifically, the ones that save our bacon when the network decides it’s that time of day.

This guide is for sysadmins with a few years under their belts (and maybe some gray hairs from rogue deployments). We’ll explore how to tame the chaos of monitoring NLBs and other load balancers—whether you’re dealing with AWS, Oracle, or some other cloud giant. Let’s dive in.

Why network load balancer metrics matter (no, really)

A Network Load Balancer (NLB) might seem like just another cog in the machine until it starts acting up. Metrics are your best friends when things go sideways—or better yet, when you’re preventing them from going sideways in the first place. These metrics give you visibility into key aspects like latency, routing, tcp connections, and even your unhealthyhostcount (which, fun fact, doesn’t just describe your work-life balance 😉).

Whether you're using Elastic Load Balancing (ELB) or tinkering with a gateway load balancer, these numbers help you understand what’s happening under the hood. More importantly, they help you save time, resources, and possibly your sanity when you’re trying to troubleshoot a stubborn backend server.

The key network load balancer metrics to watch (your checklist)

Here’s your survival kit of metrics, broken down into digestible pieces:

Traffic metrics

ActiveFlowCount: How many flows are currently running? Keep an eye on this to spot bottlenecks.
Number of connections: Tracks connection requests for both TCP and UDP protocols.
Ingress and egress: The number of bytes going in and out—critical for billing and understanding traffic patterns.

Performance metrics

Latency: The time it takes for a request to travel from the frontend to the backend. Spikes here might mean it’s time to revisit your target group health checks.
TLS negotiation time: For secure connections, monitor the time taken for SSL/TLS handshakes.

Error metrics

UnhealthyHostCount: Tracks how many backend servers are failing health checks. (Yes, the dreaded red in your console).
Connection errors: Missed or dropped tcp connections can indicate deeper issues.

Availability and capacity metrics

Backend availability: How many targets in the target group are available in each availability zone?
Aggregate utilization: Monitor how close you are to hitting limits in your vpc or security groups.

Observability: The art of seeing everything, all the time

Observability isn’t just a buzzword; it’s the practice of knowing what’s going on in your systems without needing to guess. This is where tools like CloudWatch Metrics, access logs, and time series dashboards come into play.

Tip: If your metric names look like something from ancient docs, don’t panic. Focus on the following metrics: latency, activeflowcount, and number of connections. They give you the clearest picture of system health and performance.

Automate, aggregate, and notify: Work smarter, not harder

Let’s be honest: We’re all trying to save time. Setting up real-time notifications for key thresholds (like unhealthyhostcount or latency) can mean the difference between an easy fix and a 2 a.m. wake-up call.

Automation tools

Use the AWS CLI to script health checks or fetch metadata about your endpoints.
Implement Lambda functions to respond to anomalies or route traffic dynamically.

Aggregation and dashboards

Metrics like total number of requests or time period averages are easier to digest in aggregate. Whether you’re using Amazon Web Services, Oracle, or another provider, a good dashboard can make you feel like a hero in a compute storm.

Pricing and metrics: Why it pays to pay attention

Don’t forget the pricing side of things. Metrics like the number of bytes transferred or the health status of backend servers can directly affect your costs. If you’re not monitoring elastic load balancing usage or setting budgets, you’re probably overspending somewhere.

My famous last words

If there’s one thing I’ve learned in this field, it’s that metrics are like sysadmin horoscopes: sometimes cryptic, but always worth checking. So, whether you’re wrestling with NLBs, API gateways, or plain old security groups, keep these metrics in mind. And don't forget: logs don’t judge, but they do laugh when you forget a semicolon. 😜

Happy monitoring, friends!

Oh, and if you're ready to monitor some NLB metrics, Try PRTG Network Monitor free for 30 days and experience a hassle-free monitoring experience.

View full post