When users complain of dropped video calls, stuttering applications, or files that won't upload properly, 90% of the time you can probably blame packet loss. It's one of those network performance issues that make you feel like the whole network is shot, even when your equipment is fine.
Packet loss occurs when data packets transmitted through a network fail to reach their destination. Imagine ordering a desk that ships in five boxes. Boxes 1, 2, 4, and 5 arrive undamaged, but box 3—containing every last screw, bolt, and connector, of course—has gone missing in logistics-land. You're stuck with a pile of would-be furniture and a useless instruction manual that keeps mentioning pieces you'll never see.
Network protocols like UDP which don't handle retransmitting lost data automatically are particularly sensitive to packet loss, so real-time applications like voice and video stutter badly when even 1% of packets are lost. TCP-based applications will retransmit lost packets but that makes everything run slower and feels less responsive.
The good news is that packet loss is almost never random or permanent, it's caused by identifiable and fixable issues with your network, once you know what to look for. In this blog, we'll explore what causes packet loss and how to track it down with network monitoring before it drives you nuts.
The most common and easiest-to-diagnose cause of packet loss is network congestion. When more data than a network link can handle tries to travel down that link, packets have to be dropped. Network equipment such as routers and switches have buffers to temporarily store packets during short-lived traffic spikes, but if those buffers overflow due to excessive traffic, new packets arriving when the buffers are full get dropped.
Congestion occurs when link bandwidth is exceeded due to bursty traffic patterns or when insufficient capacity exists during high-usage periods. It typically happens during busy hours when many users are on the network simultaneously, during large file transfers or backups, or when network capacity hasn't been increased to match organizational growth. You'll often see it during daily backup windows, the first hour of the workday when everyone is logging in at once, or when someone is watching Netflix or streaming 4K video over your corporate network.
Not-so-fun-fact: Nothing lasts forever, and that includes the switch you bought during the last budget surplus, and when it dies, you get packet loss. Routers, switches, and network interface cards get physical faults that cause packet loss. How about physical ethernet cables that are chewed on by dogs and cats, run around corners dozens of times, or inadvertently stepped on, leading to intermittent connectivity issues or packet loss? Or even misbehaving badly connected ports on switches or routers that randomly drop packets.
Heat problems in network closets with little to no cooling airflow is another common overheating culprit. Overheating network devices cause components to fail, which then causes dropped packets. Also, wireless networks have their own unique issues with wireless radio interference from other nearby wireless networks, or your microwave oven, building material interference from concrete walls, floors, or ceilings blocking and degrading signals enough to cause packet loss. I could go on...
Network hardware can be misconfigured to drop packets intentionally or by accident. Firewalls are designed to inspect and filter traffic for security, but overly aggressive firewall rules can actually drop legitimate packets. VPNs can also cause packet loss if packet encapsulation from the VPN exceeds the Maximum Transmission Unit (MTU) size, requiring fragmentation that fails. If your VPN server is overloaded or not set up correctly, this results in dropped packets.
Routing problems are another common source of packet loss. Errors in routing tables, issues with routing protocol convergence, or misconfigured route announcements can cause packets to be sent in the wrong direction, leading to packet loss. Sometimes the operating system is the problem. If a server's CPU is overloaded, it can't process incoming packets quickly enough and the network interface buffer overflows and drops packets.
Sometimes the problem isn't your network at all, but originates with your ISP, an upstream provider, or somewhere along the internet path between your network and the destination. This is frustrating because you can't do much about it directly, but it's important to identify it so you're not chasing problems on your own equipment when the issue is actually your ISP's problem (and so you have the proof you need when you call your ISP and they blame your router).
ISP-related packet loss can be due to congestion on their side of the network, routing misconfigurations, or problems with their physical connections. You can often identify this by running traceroute to see where the packet loss starts. If internal network testing shows no packet loss but you see issues at your ISP's first hop or beyond, that's the data you need to give your ISP to help them debug the issue on their end.
PRTG Network Monitor gives you several sensor types designed to detect and diagnose packet loss in your network. Here are the ones you should use:
Ping Sensors: The most basic way to detect packet loss. Set up ping sensors to continuously ping critical devices and monitor packet loss percentages. Use different ping intervals and packet sizes to represent different traffic types.
Quality of Service (QoS) Sensors: Specifically monitor packet loss, jitter, and latency. Ideal for connections carrying VoIP or video conferencing traffic where packet loss impacts quality the most.
Bandwidth Sensors: Keep an eye on bandwidth consumption to spot network congestion before buffers overflow and start dropping packets.
SNMP Sensors: Monitor CPU, memory, and other performance metrics on network devices to preempt hardware issues before they lead to packet loss.
The most useful part of PRTG's packet loss monitoring is that PRTG tracks all these metrics over time. A one-time ping test will show you if packet loss is occurring at that moment but PRTG's historical data lets you see patterns in your network monitoring setup. Does packet loss only happen during backups? Only on Thursday afternoons when everyone joins the all-hands video conference? Context makes troubleshooting infinitely less stressful than staring at an inert command prompt while your career choices are questioned.
Imagine a company that has implemented IP phones for their entire workforce. Users are soon complaining of poor call quality, with calls being choppy and words getting dropped mid-sentence. Running some of PRTG's QoS sensors reveals there's 2-3% packet loss on the segment of the network that's serving the IP phones during normal business hours.
Continuing the investigation with bandwidth monitoring shows network traffic levels spike every time the packet loss is observed. The culprit? An automated backup system that someone had mistakenly scheduled to run during the day instead of overnight, consuming precious bandwidth that would otherwise be available to voice traffic. The solution was simple: reschedule the backup for an off-peak time and set up some QoS policies to prioritize voice traffic to ensure those UDP packets for VoIP make it through even when the network is busy.
After you've resolved some immediate packet loss issues, take steps to prevent it with these tips:
Monitor and plan capacity: Keep an eye on network traffic patterns and capacity. When consistently operating at 70-80% or more of bandwidth capacity, plan an upgrade before congestion starts occurring. You should also look into any packet loss above 0.1% and try to reduce it. For VoIP and video conferencing, packet loss is noticeable to end-users at 0.5% but regular data applications can usually tolerate up to 1% packet loss before users notice.
Implement QoS policies: Prioritize critical traffic, especially UDP-based real-time applications like voice and video conferencing that can't recover from packet loss. TCP-based applications can recover from packet loss by retransmitting missing packets, but real-time applications have to be treated specially.
Maintain your hardware: Firmware updates, health checks, and a replacement schedule for aging hardware that's getting long in the tooth will prevent problems before they start. Don't wait until your hardware fails catastrophically to replace it.
Optimize wireless: If WiFi networks are in your environment, do a WiFi survey to identify interference sources and dead zones. Place access points in optimal locations and hardwire critical systems that can't tolerate packet loss where possible.
Set up continuous monitoring: Configure PRTG to catch packet loss early with alerts when it exceeds your defined thresholds. Don't assume packet loss will fix itself – while congestion that resolves quickly won't reoccur, hardware failures, misconfigurations, and capacity issues are only going to keep happening (or get worse) until you fix them. Continuous monitoring is the only way to know for sure if the problem is still there.
Packet loss isn't a mysterious force. It has causes and reasons and with the right monitoring in place and a systematic approach to troubleshooting, you can usually tell within minutes if you're dealing with congestion, hardware issues, software problems, or an ISP problem.
PRTG's packet loss monitoring gives you the insight you need to quickly know if packet loss is happening and where, so you can fix it before it has a chance to cause problems for your users. Give PRTG's free 30-day trial a try and set up some ping sensors and QoS monitoring on your critical network links so you'll be armed with data next time your users notice packet loss.