The following article is the last in our 3-part series about CDNs (Content Delivery Networks). Our guest author, Matt Conran from Network Insight, discusses CDN performance metrics and monitoring, what you can do to optimize your website for CDNs, and which strategy is better for CDNs: build or buy?
CDN Performance Metrics
Throughput
While the CDN providers measure their performance using many different metrics, throughput is the mother of all performance measures. Singularly, throughput covers a large number of use cases, unlike, for example, latency when used by itself. While running a video, latency tells you nothing about whether the video is going to buffer, because there is no correlation between buffering and latency. One can infer that lower latency will lead to less buffering and better performance, but in a world where you can measure throughput, it can tell you realistically how fast an object will load.
Would you prefer to be 10ms away with 5% packet loss or 20ms away with no packet loss? The performance metrics Latency and Time To First Byte (TTFB) won't tell you about packet loss; it's the relationship between them. Low latency is fantastic, and many strive to lower latency by employing optimization techniques, but there is little point in having low latency when there is even a little bit of packet loss.
Initially, Time To First Byte was deployed because there was no Real User Monitoring (RUM) available. RUM is a passive technique that records the user's interaction on a website. TTFB was a good approximation of how fast a page was going to load, but we don't need to approximate anymore because we can now measure it more accurately with RUM.
In a world where you can measure how long it takes to send a request and receive the reply, you should always measure the full reply. 99% of Web applications do nothing until the full reply is received.
Optimized TCP Stack
When a regular TCP client starts a connection, it starts on the premise that the host is back in the 1970's and that we are all on one cable with no packet loss. The map of the Internet is clearly different now than it was back then, and the default TCP behaviour is suboptimal for application performance. Individual TCP stacks need to be tuned and optimized, aligned to current network conditions.
If a connection is received, TCP should start with the assumption that it's 2017, and that there is high packet loss, high latency and jitter. This is a more realistic approach to today's networks and allows you to make the right decisions, for example, correct sizing of the TCP window.
The number you need to be looking at is the 90th percentile, not average or mean. How fast is it when something bad is going on? How well is the CDN delivering for users on poor connections? For the poorly-performing connections, this is where an optimised TCP stack comes into play. It understands that the connection is poor, enabling the right decisions to be able to load faster than someone who connects to a conventional TCP stack.
Website Optimizations
The goal of website optimization is to decrease latency in order to improve page load speed and optimise users' experience. There is a range of tools and hacks available that can help with this. In addition to reducing RTT by shortening the distance, we have techniques such as automatic content caching, image optimizations, accelerated mobile links, automatic HTTPS rewrites and many more.
HTTP/2
Challenges surface as websites get larger with more assets to download. You might presume this is not too much of a problem because bandwidth is increasing, but we are still at the mercy of the speed of light and RTT.
There have been many improvements to HTTP1 such as
-
HTTP persistent, which reuses the same connection
-
Sharding, which splits assets over multiple hosts, and
-
Image grouping, which places images together into one file.
But there are just hacks to get around the problem. Adding additional kludges to protocols does nothing for no one. The adoption of HTTP2 across the board brings huge improvements for everyone - server, end user and network.
One of the most efficient ways to improve page loads is the move to HTTP2. It's a significant leap from the previous HTTP1 protocol, and most modern browsers have supported it since 2015. As far as Web applications are concerned, nothing has changed. The methods, status codes, header fields, and URIs are the same. The major optimizations come in the form of how data is framed and transported between client and server connections. This new method opens new avenues for improving web application performance.
HTTP2 is a cleaner approach to optimizing website performance. One of its biggest features is its new format which enables multiplexing. Multiplexing enables users to request many files and receive all those files simultaneously. HTTP2 also uses a server push feature. The server recognises what you want and starts sending files before the request.
There are also excellent optimization features such as Flow control and Header compression available.
Image Optimization
Image optimization features such as Polish are useful for heavy image websites. Polish reduces the size of images by employing compression and metadata reduction techniques. Images retain their quality, while smaller file sizes result in faster page load times.
Another nice technique is Mirage for mobile image optimization. It detects the mobile device and network bandwidth. Devices with small screens on slower connections receive lower resolution images.
Building Versus Buying a CDN
There is a lot of debate about whether it's better to buy or build a CDN network. We have come a long way from the physical world of appliances where building a CDN was a costly and time-consuming exercise. Back in those days, traditional physical appliances and rolling out physical PoP locations was the only valid approach.
But now, with the birth of the Cloud, Network Function Virtualisation (NFV), and the plethora of open source tools, there are fewer barriers to entry when building a DIY CDN. Engineers can utilize the public cloud, which offers a pre-built ready-to-go global footprint. Install a few VM's in different availability zones, employ open source monitoring and load balancing tools, advertise a couple of address and we are done, right?!
It depends on how complicated you want to get, but building a CDN is not a difficult task, and it's getting easier by the day. A very basic CDN might only take 30 minutes to build. But building a CDN is not the hard part!!
CDNs are serious technology investments. The monitoring and operational activities of running a CDN are the challenges, not the building. What happens when something goes wrong -- how will you know and who will fix it? Not only do these questions need to be answered, the entire process should be automated.
What makes a CDN stand out from the others is the ability to run and operate a CDN efficiently. A well-run CDN separates itself from the rest. Efficiency derives from years of experience and lessons learned. Design and configuration skills can be acquired, but day to day knowledge of CDN operations is priceless. A well-managed CDN has automated and streamlined process between departments. All backend systems are integrated with each other, so that billing starts on the correct day, for example.
This is not something that comes with a DIY CDN based on off-the-shelf open source software. Only when you have full automation of service and internal processes between departments can you consider yourself a full-fledged CDN provider.
Monitoring a CDN
We all know that CDNs can behave badly at times. Some of the problems are out of their control due to upstream provider changes, while others arise from inadequate monitoring of internal outages and performance related issues. If you have decided to buy service from a CDN or have dual CDNs for redundancy and high availability, then how do you know when something is wrong? Or when the right threshold to failover is reached?
Dual CDN providers are useful for enhanced redundancy and are increasing in importance as managed services get hit by aggressive DDoS IoT botnets. However, monitoring for failover between two different providers is not a standard product, and it's probably best to employ an external company who specialises in this service.
Some CDNs may have multiple hidden endpoints from a single front end, making it difficult to monitor and collect performance metrics with traditional monitoring tools. In these cases, you are either stuck with simple round robin or the need to move to a more advanced traffic management system, such as NS1 Pulsar. Round Robin is a primitive form of redundancy and does not take any performance metrics into consideration.
This is where Internet traffic management services come to play. Their platforms use RUM metrics to inform users which CDN is performing the best, based on selected metrics such as throughput, latency and jitter. Their systems take CDN monitoring to a new level, accurately reporting what the users are seeing. Singular metrics can be used, but the power of commercial traffic management comes into play as they can combine multiple parameters together. This combination of metrics offers a complete CDN performance map, giving a full picture of the Internet and which areas are performing poorly. For RUM, non-intrusive java scripts are installed in hosts in various CDNs, allowing individual users to test which CDN is best performing based on those metrics.
The correct monitoring gives you the right information at the right time to make accurate decisions. It creates a custom application performance map that accurately fits customer requirements.
And that brings us to the end of this blog series! If you missed them, the previous articles in this series were:
WANT TO KNOW MORE?
Here are some additional links you might find interesting:
- Matt Conran's Network Insight Blog
- Paessler's Cloud Ping and Cloud HTTP Sensors