Do you remember the good old days when you could march right up to the server and give it a good whack on the side? You know, the old ones-two fingers-across-the-top. It didn't work a lot of the time, but when it did, it worked like magic.
Troubleshooting and performance monitoring is never that easy with your infrastructure hosted in Microsoft's azure-tinted cloud kingdom. For when your Azure virtual machines get weird and start acting funny, you can't just pop into a Microsoft data center and give their hardware the Fonzie.
Azure metrics, that's the closest you're gonna get to being able to put your ear up against the server and listen for that telling clicking sound on the hard drive.
Azure Monitor metrics are the pulse check of your cloud environment. You can use metrics to determine if your Azure resources are the cat's meow or if they're crying in pain.
But if you've ever spent any time in the Azure portal with the built-in monitoring solution trying to track down a problem, you've probably noticed that while Azure makes it easy to collect metric data, it's tough to see the forest for the trees when you can't connect that data back to the rest of your infrastructure. That's where a unified monitoring solution like PRTG comes in and changes the whole Azure monitoring game from chaos to clarity.
Before we jump into how PRTG can help with Azure monitoring, let's get a firm handle on what Azure metrics are and what role they play.
Azure Monitor metrics are basically just time-series data points (numbers measured at specific time intervals) that tell you how your Azure resources are performing. Azure automatically collects and provides platform metrics for every type of resource in Azure, like virtual machines, storage accounts, SQL databases, networking resources, Kubernetes services, and more.
These metrics represent standard measurements like CPU usage percentages, available memory, IOPS, request counts, response time, queue length, throughput, error rates, etc. Azure defines a specific set of these metrics for each Azure resource type that you can monitor. Metrics are reported in near real-time and are a primary source of insights into the status and performance of your Azure resources.
By default, each Azure resource reports metrics at one-minute frequency, which are stored in a scalable time-series database optimized for performance monitoring. The data is available for query within seconds, which makes Azure metrics an ideal data source for real-time dashboards, alerts on current conditions, and immediate diagnosis of ongoing problems.
Metrics are the standard way to retrieve and view time-series performance data in Azure Monitor. The primary tool is the Metrics Explorer in the Azure portal, which is a basic analytics tool with the ability to create simple visualizations and charts, as well as examine how your Azure resources are performing. You can also query, retrieve, and work with metric data in other tools like PowerBI and Excel.
Azure metrics are absolutely critical to effective Azure monitoring, but if you're managing a complex Azure cloud with hundreds or thousands of resources, spread across multiple resource groups and Azure subscriptions, sifting through metrics manually in the portal quickly becomes unmanageable. That's why a single, unified monitoring system that spans the full breadth of your entire hybrid IT environment is an absolute necessity.
One source of confusion that Azure admins encounter is differentiating Azure Monitor metrics and Azure Monitor logs. While they both contain performance data and other valuable insights, they have different purposes and use cases.
Azure Monitor metrics are automatically collected lightweight numerical values that describe the state of a system at a specific time. Metrics are regularly sampled and stored in a dedicated, structured time-series database, making them highly performant and ideal for real-time monitoring and alerting, particularly for known performance counters that you want to track continuously.
Metrics can be described as concise, standardized data points that tell you "how much" or "how many" of something at regular intervals.
Azure Monitor logs (formerly known as Log Analytics) are rich, detailed records of events that you configure to collect and store. Logs contain more context and detail about events that require some level of human interpretation or analysis. Logs in Azure Monitor are stored in a log data store, which is optimized for complex querying and deep investigation. Logs can be leveraged for deep analysis using Kusto queries and are often used for the detection of unknown or anomalous issues that metrics might not reveal.
Logs are better described as unstructured data that contain human-readable, contextual information about when and where an event happened.
Think of metrics as your wristwatch that gives you a quick glance at heart rate, step count, and other basic data you want to track every second, while logs are more like your personal medical record, which you consult when something needs further investigation.
The problem for most organizations isn't that it's difficult to collect metrics from Azure. It's that those metrics have to be integrated with everything else in your environment.
PRTG Network Monitor has specialized capabilities for monitoring Azure and the Azure cloud, including dedicated Azure sensors that integrate with your Azure cloud accounts via the REST API and pull in critical Azure metrics from across your Azure cloud environment.
PRTG integrates with the Azure Monitor API to collect metrics from Azure Monitor resources and incorporate that data into the PRTG unified monitoring system. This integration allows you to visualize virtual machine performance metrics, storage account metrics, Azure SQL metrics, and any other Azure resource metrics alongside all of your other on-premises and cloud monitoring data.
Metrics integration works by querying the Azure API and retrieving the latest values from the supported resource types at regular intervals. PRTG then processes that metric data so you can set thresholds, create notifications, and build full-stack dashboards that provide visibility into your entire IT environment in a single view.
You can easily build a dashboard in PRTG that shows Azure virtual machine performance metrics and storage account performance metrics while also tracking on-premises server load, web app response time, cloud container metrics, and internal and external network traffic - all from a single location. No more tabbing back and forth between Azure Monitor, Azure DevOps, network monitoring, and log management tools when you need to troubleshoot performance issues.
Azure Monitor provides a wealth of platform metric data for each of the different Azure resource types in its cloud services catalog. But PRTG enhances your monitoring capabilities with additional Azure sensors and features that bring your Azure cloud monitoring to the next level.
The killer feature of PRTG is that you can correlate data across platforms. You can easily compare metrics between Azure virtual machines and on-premises physical servers, Azure SQL databases and on-premises databases, or cloud storage and local storage systems.
That cross-platform visibility is critical for troubleshooting complex, multistep problems that span your hybrid cloud and on-premises environments. It's easy to spot a latency issue between Azure and on-premises systems when all of that data is flowing through a single monitoring tool.
Another critical PRTG advantage is the storage and retention of historical data. PRTG retains historical metric data longer and with more flexible retention options than Azure Monitor's default one-year retention period. That gives you the ability to correlate long-term trends and compare current performance against historical baselines or previous issue periods to make data-driven capacity planning and provisioning decisions based on months or even years of metric history.
Instead of defining multiple alerting rules in Azure Monitor and other monitoring tools, PRTG provides a single alert system that reduces alert fatigue, enables complex multi-condition alerts and thresholds, provides flexible notification options and escalation paths, and allows you to assign urgency based on whether issues are being resolved in a timely manner.
Beyond the platform metrics you can track in Azure Monitor, PRTG also provides the ability to create your own custom sensors to monitor specific Azure services, track Azure billing and cost, watch application-specific metrics that are critical to your business, and check compliance of Azure resources against your organization's compliance standards.
Now that you understand PRTG's Azure monitoring capabilities in the abstract, let's see how they apply in some real-world scenarios:
One of the classic use cases for PRTG is the situation where your users report that an Azure-hosted application seems to be performing poorly or less responsively, but it's not clear where the bottleneck is coming from. Is it an application issue or the database? Could the virtual machine be maxed out? Maybe there's increased network latency?
If you're managing all of this with disjointed tools like Azure Monitor, Application Insights, Log Analytics, and others, you'd be bouncing around dozens of dashboards and reporting tools when you really just need to see all of the relevant data in one place.
In PRTG, you can build a single dashboard that shows Azure virtual machine performance metrics next to Azure SQL database metrics, application response time data from Application Insights, network latency between on-premises users and Azure, and even bring in related on-premises infrastructure metrics.
You might notice that the performance problems align with the timing of a backup process running on your on-premises systems, which is also causing an increase in latency on your VPN connection to Azure. Not something you'd ever discover just by looking at Azure virtual machine metrics or the application performance metrics.
Another common scenario where PRTG's Azure cloud monitoring is the hero is when an organization wants to optimize costs in their Azure cloud environment. PRTG is great at uncovering opportunities for cost savings by identifying underutilized or overprovisioned resources based on actual performance data and historical usage patterns.
Azure can be sneaky about how quickly those bills can balloon if you're not on top of it. PRTG can help identify underutilized virtual machines with consistently low CPU usage, overprovisioned storage accounts with high unused capacity, idle or unused services that still accrue charges, or right-sizing opportunities for resources based on actual performance data.
By correlating resource usage patterns against performance requirements, PRTG makes it possible to scale down or up as needed, switch to lower-cost tiers, or enable auto-scaling to match real demand and uncover waste without degrading performance.
Everyone has been there – you wait until your Azure resources are completely maxed out before doing something about it. No one likes that panicked scramble to scale something up when your users are screaming about performance. PRTG's historical analysis and trend monitoring can make that a thing of the past.
PRTG can help identify resource utilization trends and growth patterns and use that to project when you'll need to add additional capacity, enabling you to plan infrastructure changes in advance and scale out ahead of demand, as well as detect abnormal usage patterns that might indicate a problem.
That helps to ensure smooth operations and also smoothens the budgeting process because you can proactively estimate infrastructure needs instead of reacting to them. Catching trends early can prevent the crisis mode that sets in when an unexpected capacity shortage occurs.
To truly make your Azure cloud monitoring strategy shine, there are a few best practices to keep in mind:
Pay special attention to metrics that directly affect your users. Response time and availability are obvious ones, but also keep an eye on resource contention that could cause bottlenecks, like CPU, memory, and disk I/O performance. Don't forget about those metrics that drive costs so you can keep costs under control and continue to track service-specific metrics like application-specific metrics for deeper application health visibility.
Set graduated warning and error thresholds. It's not enough to just set one threshold to detect when something is wrong. Add in graduated warning thresholds to spot developing problems early and error thresholds for critical conditions. Maybe even consider setting different thresholds for peak and off-peak hours or weekdays vs. weekends to account for different expected usage patterns. If you run a business with noticeable seasonal usage patterns, consider using seasonally adjusted thresholds to prevent false positives.
Connect related metrics across services. In-depth insights come from connecting related metrics across your infrastructure. Connect the frontend app performance with the backend database to understand how they affect each other. Examine the relationship between network throughput and app response time, or storage latency and app transaction rates. Watch how virtual machine performance relates to user load to help identify potential bottlenecks before your users notice anything is wrong.
Automate routine responses with PRTG. For common scenarios that you can automate a response for, let PRTG do the work. You can configure PRTG to automatically restart services when health checks fail, scale resources when utilization hits a threshold, run Azure runbooks or scripts for more complex remediation tasks, or clear temp files when disk space gets low, among other actions. This automation can often resolve the issue quickly without users ever noticing a problem occurred.
A: Azure metrics differ from traditional on-premises monitoring in several significant ways that require a shift in your monitoring approach.
First, accessibility is dramatically different. With on-premises servers, you typically need to install agents or additional software directly on the servers to collect detailed metrics. In Azure, these platform metrics are automatically collected and exposed through APIs, giving you immediate access to a wealth of information without any extra configuration. This access allows you to monitor more extensively but also demands more discipline in selecting what to monitor.
Second, the metrics themselves are different because Azure resources are abstracted from the underlying hardware. You don't monitor hardware-level metrics like fan speed or power supply voltage in Azure the same way you would on a physical server. Instead, metrics are focused on the virtual resources, like a VM's CPU performance, which represents a share of physical resources. This abstraction requires a greater focus on metrics indicating contention or resource constraints, such as CPU queue length or throttling events, which can indicate that your virtual resources are competing with other workloads on the same physical hardware.
Third, cloud resources are more dynamic than on-premises servers. On-premises infrastructure tends to be static, but in Azure, resources can be spun up or down, resized, or deleted automatically based on demand or deployment automation. This dynamic nature requires your monitoring solution to be more adaptive, automatically discovering new resources and adjusting thresholds based on the current state of the environment.
A: Azure Monitor provides two types of metrics – platform metrics and custom metrics. Each has distinct characteristics and use cases.
Platform metrics are the default metrics that Azure automatically collects for all Azure services with no additional configuration or setup required. They are the standard, pre-defined metrics provided by Azure, such as CPU usage percentage, available memory, or disk operations. Platform metrics are the "free" data (included in the cost of your Azure resource) and form the backbone of your monitoring efforts in Azure.
Custom metrics, by contrast, are metrics that you define and send to Azure Monitor yourself. Custom metrics are typically reported from your applications or custom services and allow you to track business-specific or application-specific metrics that are not covered by the platform metrics. Examples of custom metrics include business transactions per second, user login counts, or custom application queue lengths.
You should use platform metrics for the majority of your monitoring requirements, since they are automatically collected and generally cover all of the standard performance counters you would want to track for any given resource. But you should consider investing the time and effort into custom metrics when you need to monitor application-specific behaviors not covered by platform metrics, want to track business metrics in conjunction with technical metrics, need to tie user activities to infrastructure performance, or are implementing custom health checks that use application-specific business logic to determine the health of your services.
A: Azure platform metrics are free with your resources but it's worth taking a step back to think about costs and performance if you're architecting an end-to-end Azure monitoring solution.
Respect API throttling limits
Azure APIs have rate limits. If you are polling Azure metrics too frequently, you could trigger throttling which may result in missing data. PRTG uses batching of API requests to keep overhead low, and uses adaptive intervals to poll for metrics that are frequently-changing at a faster rate, versus slower for lower-priority metrics that don't change as often. This ensures you have the freshest data possible without spiking your API consumption.
Factor in metric data ingestion charges for custom metrics and logs
Azure platform metrics are free, but if you enable collection for custom metrics, these are billed at a per-metric-data-point-ingested pricing model. Logs are also ingested in a similar way, so if you are correlating metrics with logs to perform advanced analysis or debugging, these costs will apply as well. To reduce this, you can filter noisy metrics as close to the source as possible, and make sure you're not over-collecting metrics that change very little. Configure higher frequency collection for volatile, business-critical metrics, and lower frequency for stable metrics. You can aggregate metrics at a higher interval before ingestion if fine resolution is not required, and dimension filtering to only collect the most important metric dimensions.
Use retention periods wisely
Azure Monitor enforces different default retention periods on metrics versus logs versus other data in the Metrics or Logs Explorer that's not in Azure Monitor (legacy). You can (and probably will) configure longer retention periods if you need to keep this data for compliance, analysis or troubleshooting. These longer retention periods in Azure Monitor are additional costs, so if you keep your long-term historical metrics in PRTG's own database with long retention times, you could reduce your Azure Monitor retention periods to reduce costs. Of course you can configure your desired data retention at any interval in PRTG.
Azure metrics are critical data points but as is true with any monitoring data, taken by themselves, they may not always make sense. A CPU metric by itself won't tell you much – whether that's an important or unimportant metric depends on how critical the role the monitored instance plays in your application is.
Azure metrics give you visibility into performance, but what's causing the performance change? Without that context, you could spend a lot of time digging through metrics correlations before deciding what the problem is and/or where to begin troubleshooting.
The beauty of using a unified monitoring platform is that you can more easily see those correlations and how different infrastructure components in Azure are related. Is the increased latency in your Azure SQL database the reason for slower response times on your web application? How does network latency between your on-premises site and Azure impact customer experience? With visibility across your IT, it's easier to troubleshoot such complex issues, optimize performance across your hybrid IT landscape, plan for future capacity needs rather than react to them, and more.
Monitoring your Azure resources with PRTG means your Azure metrics are no longer siloed in the Azure portal, but are integrated into your broader IT monitoring for context and correlation that give you the big picture needed to truly master your monitoring and your environment.
Ready to see your Azure monitoring in a new light? TRY PRTG FREE FOR 30 DAYS or watch an in-depth demo how PRTG can take your Azure monitoring to the next level.