Paessler Blog - All about IT, Monitoring, and PRTG

Monitoring VMware vSphere Performance: A Guide to Virtual Infrastructure Optimization

Written by Sascha Neumeier | Sep 10, 2025

Performance problems in a virtualized environment can snowball quickly. One virtual machine running slowly is one thing, but soon an entire application stack may begin to crawl as problems cascade through all of the virtual resources it touches. You find yourself looking through reams of metrics trying to figure out where things have gone awry. You pore over vCenter Server performance graphs in the middle of the night wondering why the whole virtual infrastructure is grinding to a halt.

Proactive performance monitoring of vSphere with an eye towards root cause optimization is much more effective than constant fire-fighting. In this guide to vSphere performance monitoring, I'll take you through everything you need to know to identify and optimize your VMware virtual infrastructure.

The Basics of vSphere Performance Monitoring

Before we can dive into monitoring our virtual infrastructure, it's critical that we have a common understanding of the basic building blocks of the vSphere performance stack.

vSphere Architecture

A typical VMware vSphere virtual infrastructure is made up of several key building blocks. At the physical layer are one or more ESXi hypervisor hosts running on x86 hardware. These hosts are centrally managed through either a vCenter Server running as a Windows application or the purpose-built vCenter Server Appliance. Nested inside the ESXi hosts are individual virtual machines running various guest operating systems. Datastores serve as repositories for storing virtual machine files and virtual disks. Resource pools provide a convenient way to group virtual machines under a common set of resource controls and policies.

Each layer of the virtual infrastructure must effectively utilize underlying resources such as CPU, memory, storage I/O, and network bandwidth. These resources are shared across all of the virtual objects in each layer, which creates a natural bottleneck as competition for resources increases. Application performance is also closely linked with performance at lower layers and is frequently the ultimate performance metric of interest.

In vSphere, these layers also correspond to levels of abstraction from physical hardware up to the guest operating system running inside of virtual machines.

Key Performance Metrics to Track

The vSphere performance stack is highly granular, so it's important to know what key metrics to track in each category.

CPU utilization alone is not a good indicator of performance health; in fact, sometimes low CPU usage is a potential indicator of performance issues. You want to track basic CPU metrics at both the host and VM level along with more detailed metrics such as CPU ready time or CPU contention ratios to identify virtual machines competing for physical CPU cycles.

Memory performance is another critical component of virtual infrastructure optimization. In addition to tracking memory usage in GB or GB/second, you should also watch for memory ballooning from the hypervisor to reclaim memory from guest OSes, swap activity when physical memory is under pressure, and compressed memory use.

Storage is the most common bottleneck in vSphere environments, so ensure you monitor datastore latency, IOPS, disk throughput, virtual disk performance, and queue depth.

Networking is the final major area for performance consideration. Bandwidth usage, throughput, dropped packets, latency, and network utilization metrics provide a complete picture of your virtual infrastructure's network health.

vSphere Performance Monitoring Tools

VMware provides several tools for basic performance monitoring, but each has strengths and weaknesses. One of the easiest ways to get started with vSphere performance monitoring is to use the vSphere Client.

vSphere Client Performance Charts

VMware provides both HTML5 and legacy clients for accessing vCenter Server. The performance charts within the vSphere Client are particularly useful for real-time and historical performance analysis. By navigating through the tree structure of vSphere objects such as ESXi hosts, resource pools, and virtual machines, you can access intuitive graphical representations of performance metrics over time. These charts can compare up to three different performance metrics at a time and provide different chart views (stacked, overlaid, etc.). You can also create customized chart views and pin them for easy reference and quick performance problem diagnosis.

esxtop for Command-Line Monitoring

The esxtop tool is a command-line utility for advanced performance monitoring of ESXi hosts. It's designed for users who are familiar with Linux-style text-based command-line tools and it provides a high level of detail for real-time performance. The esxtop tool reveals much more than most of the graphical options when it comes to the lowest layer of the virtual infrastructure. The output can be a bit overwhelming at first glance, but you get both precision and granularity in the metrics that other tools simply do not offer.

vCenter Server Performance Monitoring

vCenter Server includes a more advanced monitoring feature set with additional capabilities for the whole virtual infrastructure. These features include threshold-based alarms and notifications so you can set performance rules and get alerted automatically when metrics exceed the thresholds you specify. With these capabilities, you can monitor multiple vSphere objects at once to compare performance and more easily spot outliers or areas of concern. All of these features for performance monitoring exist in both vCenter Server and the purpose-built vCenter Server Appliance.

You can accomplish a lot with just the performance monitoring tools included in vCenter Server. However, they have some major limitations. They don't persist much historical data, provide little automation or remediation, don't correlate well to guest OS metrics, and are focused solely on the virtualization layer without integration into the wider IT infrastructure.

Common Performance Bottlenecks in vSphere

Let's look at some common vSphere performance bottlenecks you will likely encounter along with guidance for troubleshooting and optimization.

CPU Bottlenecks

CPU bottlenecks are not always obvious, but you should keep an eye out for a few key indicators. Consistently seeing high CPU ready times over 5% or significant co-stop values when vSMP VMs are reporting scheduling delays are strong signs of CPU contention among virtual machines. You may also see evidence that VM performance is suffering at the guest OS level even though the hypervisor reports resources are available.

Remediation might include changing VM resource allocation and shares to prioritize workloads or using the Distributed Resource Scheduler (DRS) to automatically rebalance workloads. Spend some time right-sizing VMs since many are often over-provisioned with too many vCPUs and rarely use them. Finally, consult with your application owners to see if there are any ways to optimize CPU-intensive applications that are putting too much strain on the system.

Memory Bottlenecks

Memory issues can also be difficult to spot, but there are several telltale signs. Watch for high balloon driver activity reclaiming memory from VMs, regular host swapping to disk when memory is full, or VMs reporting memory compression. You may also notice that applications are running slowly or unresponsively despite CPU being free.

Fixing memory bottlenecks could include simply increasing memory allocation to the VMs which are suffering the most after you've identified which ones they are. Work with application owners to see if there are any memory leaks in the applications running on guest OSes which could be wasting resources. In some cases, you may also need to consider adding more physical memory to your ESXi hosts. The last option for remediation includes tuning VM memory reservations and limits to better allocate memory between workloads.

Storage Performance Bottlenecks

Storage is typically the source of most performance problems in vSphere and some of the hardest to resolve. High datastore latency that is consistently over 10ms for non-optimized workloads is a common bottleneck indicator. If you start to see queue build up on the storage adapter or virtual disks backing your VMs are performing slowly, those are also likely storage bottlenecks. You may also notice vMotion failures or timeouts if storage is also being used for VM migration.

Improving storage performance can include configuring storage tiering with vSAN or another tiering solution to provide workloads with appropriate levels of performance. Spread your VMs across multiple datastores to help alleviate the I/O load. If possible, optimize your virtual disk types by right-sizing between thin or thick provisioning based on your typical workload patterns. For more persistent performance problems, you may also need to upgrade storage to faster disk like SSD or even NVMe to keep up with demand.

Network Performance Bottlenecks

Network bottlenecks in vSphere are typically less common, but here are a few ways to spot them. Packet drops or drops at the virtual or physical switch level indicate network congestion. High latency between VMs or between virtual machines and external resources can also cause application performance problems. Bandwidth saturation can be an issue during busy periods or times of heavy network activity. Finally, you may notice vMotion failures or slow migrations that are related to the network layer.

Remediation starts with network segregation so different types of traffic (iSCSI, NFS, regular Ethernet) cannot negatively impact one another. Optimize vSwitch or distributed switch configuration to ensure it's handling packets as efficiently as possible. You may need to upgrade physical network switches and adapters to provide more bandwidth for some use cases. Jumbo frame configuration for storage networks can also be a quick performance win for iSCSI or NFS traffic.

Sample vSphere Performance Monitoring Scenarios

Scenario 1: Troubleshooting a VM's Performance Problems

You've got a virtual machine running a critical Microsoft SQL Server which is the core component of your application stack. Suddenly, latency for queries to this SQL Server increases noticeably and end users are starting to see poor performance from applications that rely on data from this server. Follow these steps to troubleshoot and resolve performance issues quickly and efficiently.

🪛 First, check the VM-level performance metrics in vCenter Server to see what's happening within the virtual machine with the tools you already have available. Examine host-level resource contention to see if any other VMs are impacting the SQL Server's ability to get the resources it needs. Look at datastore performance metrics for the virtual disks attached to this VM since storage is often a bottleneck for database workloads. Finally, correlate this information with guest OS performance metrics pulled in via VMware Tools for a complete picture of what is happening at all levels.

Now you can move onto remediation based on the root cause of the problem. This could include moving the VM to a less contended datastore or storage host, adjusting resource allocations and shares, or tuning the database configuration itself.

Scenario 2: Optimizing an Existing vSphere Environment for Scale

After some growth in your user base, the infrastructure needs to grow to support 30% more workloads without any new physical hardware. This is the perfect opportunity to take a methodical approach to optimization so that the vSphere environment can effectively support all of the additional demands.

🪛 The first step is to perform an audit of current resource usage patterns across all of the ESXi hosts in your virtual infrastructure. This provides a sense of where there is still capacity and where constraints are going to occur in the future. You'll also want to look at which resources are both underutilized and overutilized to determine where there are balancing opportunities across different workloads.

You can implement resource pools with the appropriate shares and limits so that critical workloads receive priority without a single workload consuming all of the resources. Templates are an excellent way to standardize new VM deployments with optimized configurations from the get-go. DRS and Storage DRS can do a lot of the balancing for you automatically when workloads shift. It's important throughout this process to establish performance baselines and thresholds for ongoing monitoring to ensure the environment is staying in an optimized state as it scales.

Scenario 3: Monitoring VDI Performance During Busy Periods

VDI (virtual desktop infrastructure) environments tend to experience performance problems during morning login storms when large numbers of users attempt to access their desktops simultaneously. This presents a good opportunity to create custom performance charts focused on the timeframe when issues are occurring.

🪛 Monitor host CPU, memory, and storage during these peak periods to see what resources become constrained first. It is also important to measure login times and desktop performance metrics to understand the end user experience during the busy periods. Once you identify the likely resource bottlenecks, which are frequently storage IOPS limitations, you can implement some focused solutions such as staggering login times or tweaking storage performance.

Use real-time monitoring to ensure that changes have a positive impact during subsequent peak periods to verify that you've successfully remediated the performance problem.

FAQs: VMware vSphere Performance Monitoring Questions Answered

Q: How can I correlate guest OS performance with hypervisor metrics?

A: To view the entire picture of performance, it's important to have visibility into both the hypervisor layer as well as guest OS level performance. Install VMware Tools on guest operating systems to collect additional performance data for virtual machines at the VM level.

The most effective monitoring approach combines hypervisor metrics via the VMware API with guest OS performance data via WMI for Windows-based systems or SNMP/SSH for Linux-based systems. This dual-layer performance monitoring provides correlated views between performance at the hypervisor level and performance within guest OSes.

This correlation is what allows you to see a complete picture of performance across the virtualization stack. Identifying patterns in guest OS performance in parallel with vSphere metrics helps you understand whether an issue originates at the hypervisor level or within the guest OS itself.

Q: How is monitoring ESXi hosts directly different from using vCenter Server?

A: Monitoring performance through vCenter Server allows you to view your whole virtual infrastructure as a unified entity. Cross-host metrics such as vMotion activity, DRS activity, and statistics can be seen only by using this centralized approach, which makes it much easier to manage and understand at a high level. Monitoring directly at the ESXi host level can provide lower latency and greater detail for host-level performance and resource utilization metrics. This direct approach also has the added advantage of not being dependent on vCenter Server. If the vCenter Server goes down for some reason or is undergoing maintenance, you will still be able to maintain performance visibility.

The ideal situation is to monitor both simultaneously since the performance data collected using both approaches provides complementary insights. A comprehensive monitoring solution should read the VMware vSphere API for the higher-level metrics from the vCenter Server but also directly connect to each ESXi host for more detailed metrics at the hypervisor layer. This provides a good balance of redundancy and detail that is important for large-scale deployments.

Q: How do I identify "noisy neighbor" virtual machines which impact the performance of other VMs?

A: Noisy neighbors in virtual environments occur when one virtual machine is consuming an inordinate amount of resources, which in turn impacts other VMs running on the same host. Look for VMs which are consistently in the high ranges of resource usage patterns, but more importantly look at resource contention instead of just overall usage.

High CPU ready time, excessive memory ballooning, and datastore latency spikes are all indicative of a VM that may be causing problems for other virtual machines on the same physical host. Use visualization tools that make it easy to spot correlated performance across multiple VMs with comparative charts and dashboards. When you've identified a potential noisy neighbor, consider isolating it on dedicated storage or CPU resources if possible. You could also adjust its resource limits or work with application owners to ensure the workload running on the VM is properly optimized.

Q: What's the ideal polling interval for vSphere performance metrics?

A: This depends on your unique environment and what you're trying to do. For real-time troubleshooting situations, shorter intervals between polls of 20-60 seconds are fine-grained enough to get timely data which will help to identify issues which are in development. General monitoring and trend analysis is probably best at longer intervals between 5-15 minutes which strike a good balance between capturing enough data but not so much that you're wasting storage space.

A good monitoring solution allows you to specify different polling intervals for different metrics and systems so you can adjust for both detail and storage capacity. You could have critical infrastructure monitored more frequently while less critical systems can be left for longer intervals. The more often you poll the greater the overhead on your monitoring platform as well as the vSphere environment. Performance monitoring works best when you have it on a large scale, so it's a good idea to ensure you have the right balance based on your own environment and monitoring needs.

Conclusion: vSphere Performance Monitoring Done Right

As you can see, vSphere performance monitoring is a multi-layered process that requires visibility into all aspects of your virtual infrastructure. From physical hardware up through the hypervisor layer, virtual machine performance, guest OS metrics, and even application performance indicators at the top of the stack. Only by monitoring all of the layers of the virtual infrastructure stack can you get a full understanding of performance.

By augmenting VMware's built-in tools with additional features in PRTG Network Monitor you can create a comprehensive performance monitoring strategy. It's one that not only alerts you to problems, but can also help you to optimize your infrastructure for maximum scalability and utilization. When you approach virtualization from this holistic perspective you can be sure to reduce downtime, improve end user experiences, and maximize your virtualization investment.

Want to get better at vSphere performance monitoring? TRY PRTG NETWORK MONITOR FREE and see for yourself how easy it is to get it all under control with extensive VMware vSphere monitoring.