In this, the first in a series of articles looking at how PRTG can help Exchange Admins to manage their systems, we look at the continued popularity of email as a corporate communications tool. We’ll also see how PRTG’s pre-defined Exchange sensors can provide a great overview of system health and performance. In subsequent articles, we’ll see how custom sensors can provide even deeper insight into the many components and sub-systems that make up an Exchange infrastructure.
The death of email has been predicted many times in recent years, with various justifications – instant messaging is better, spam makes email insecure and inefficient, social media is cooler, people change addresses too often. However, as is often the case, the facts do not agree with pundit’s opinions, as the Email Statistics Report 2015-2019 by The Radicati Group shows:
2015 |
2016 |
2017 |
2018 |
2019 |
|
Global Email Accounts (Mio) |
4353 |
4626 |
4920 |
5243 |
5594 |
% Growth |
|
6% |
6% |
7% |
7% |
2015 |
2016 |
2017 |
2018 |
2019 |
|
Global Daily Email Messages (Bn) |
206 |
215 |
225 |
236 |
247 |
% Growth |
|
5% |
5% |
5% |
5% |
So, far from becoming extinct, the prevalence and popularity of email continues to increase. The move towards cloud based services is starting to change the way organisations provision their email services. Research from Gartner shows that around 13 percent of publicly listed companies have already moved their email into the cloud.
But this means that most organisations are still using on premise email services and while surveys differ in the precise market share, they do all agree that Microsoft Exchange is still the clear market leader when it comes to business email systems.
Since its initial release in 1996, Exchange has evolved from a relatively simple X.400 based messaging system, into a complex application that provides many features -
In turn, these features rely on many aspects of the IT infrastructure – servers, network, storage and the rest must all be performing optimally for Exchange to fulfil its function as the primary communications tool for most organisations. This is where PRTG can make an Exchange Admin’s life easier, by ensuring that all the supporting infrastructure, and the Exchange system itself, is healthy and performant.
In subsequent articles, we’ll look at how PRTG’s Custom Sensors can be used to “deep-dive” into the health of the Exchange system, what metrics we should be monitoring, and some of the performance danger signs to look out for. But to start with, let’s take a look at PRTG’s out-of-the-box Exchange sensors.
Before we get into the specifics of the individual sensors, a word about thresholds / limits. Where possible, I’ve tried to give guidance about the “healthy” values you should look for from these sensors, but for many of them the performance figures will vary across deployments – An Exchange system serving a 10-person company will perform very differently to one in a global corporation or multi-tenant, MSP environment.
This is why baselining is important when setting up a new monitoring system or adding new systems & devices. After installation, it’s a good idea to leave PRTG gathering data for a week or two, before setting thresholds & notifications. That way, you can get a feel for what is “normal” performance in your environment, and you can use this data to set your thresholds accordingly. One of the most common reasons for monitoring projects failing is the lack of baselining before activating notifications. Support teams quickly learn to ignore floods of false-positive notifications coming from a badly tuned monitoring system and this inevitably leads to real, critical alerts being missed.
So, with that advice in mind, let’s take a look at the Exchange specific sensors that are included with every PRTG license, including the 100 sensor freeware version.
This Is a great general purpose Exchange sensor. Adding this to a server will allow you to choose from over a dozen different metrics that report on the health and performance of many of the key components of Exchange. The specific sensors available will vary based on the roles and configuration of the server it is assigned to.
Metric |
Description |
Recommendation |
Queue size |
The number of messages waiting to be processed in the message queues |
The lower the better, ideally 0 |
Average delivery time |
The average time in seconds between the submission of a message to the public folder store and submission to other storage providers for the last 10 messages. |
The lower the better |
Logon operations per second |
Shows the number, per second, of mailbox store logon operations. |
N/A |
Sent, delivered, and submitted messages per second |
This shows the number of messages processed by Exchange per second. Large changes to this number, either up or down, could indicate problems. |
N/A |
Messages queued for submission |
Shows the current number of submitted messages not yet processed by the transport system |
Should not exceed 50. Queue should not persist for more than 15 mins. |
Remote Procedure Call (RPC) packets operations per second |
Shows the rate at which RPC operations occur, per second. |
N/A |
RPC latency, requests, and slow packets |
This measures the overall performance of the RPC subsystem |
Fast is better |
RPC sent, slow, outstanding, and failed requests (store interface) |
Another indicator of RPC performance |
Fast is better |
Read and write bytes RPC clients per second |
Another indicator of RPC performance . Large changes to this number, either up or down, could indicate problems |
N/A |
Number of active and anonymous users |
Number of users connected to the mailbox store |
N/A |
Database page faults per second |
The rate that database file page requests require the database cache manager to allocate a new page from the database cache. |
The lower the better |
Log record stalls per second |
This shows the number of log records that cannot be added to the log buffers per second because the log buffers are full |
The lower the better. Log stalls will lead to increases in RPC latency. Could be caused by disk I/O bottlenecks. |
Log threads waiting |
The number of threads waiting for their data to be written to the log in order to complete an update of the database. |
The lower the better |
Database cache size in bytes and miss in percent |
the amount of system memory used by the database cache manager to hold commonly used information from the database file(s) to prevent file operations. |
% miss should be as low as possible |
Current unique users (OWA) |
Shows the number of active users currently logged into the Outlook Web Application |
N/A |
Average response time (OWA) |
The average time (in milliseconds) that elapsed between the beginning and end of an OEH or ASPX request. |
The lower the better |
Some of the features of this Sensor are also available in separate PowerShell based Sensors. As with all WMI based sensors, this will have a relatively high impact on PRTG’s system performance. We recommend using fewer than 200 WMI based sensors per Probe.
Another general-purpose sensor that will also create individual sensors for the objects selected. This sensor provides statistics for over 30 of the various message queues used to transport email from sender, through the Exchange system and to the recipient. You can choose different sensors for each of the “high”, “normal”, “low” and “none” message priorities that Exchange uses, as well as “total” to see an overall summary. Setting thresholds or limits on some of the more important queues is a great way to ensure that mail delivery is taking place, as any increase in the number message being held in a queue would indicate a message delivery problem. In general, look for low values in the “queue length” channels and high values in the “items completed” channels.
Find more information at the manual.
The rest of the pre-defined sensors are all PowerShell based, so there are some pre-requisites that must be taken care of before they can be used.
Both Remote PowerShell and Remote Exchange Management Shell must be enabled on the target system, and PowerShell 2.0, or later, must be installed on the server running the Probe on which the sensor is running. This page provides details of how to use PowerShell based sensors. In particular, make sure the execution policy is set to "unrestricted" to allow scripts to run. This needs to be done for the version of PowerShell that is invoked by PRTG, which is not necessarily the version that appears on the Start Menu. If this isn't done you will probably get an "Unauthorised Access" error on the sensor.
To fix this, on the Core Server or Remote Probe, where the sensor is to be created, open a CMD prompt as an Administrator (not a PowerShell session) and type the following:
%systemroot%\SysWOW64\WindowsPowerShell\v1.0\powershell.exe
When the command prompt changes to "PS", enter the following command:
set-executionpolicy unrestricted
This PowerShell based sensor must be assigned to a server holding the Mailbox server role (rather than a CAS or Transport Server). It will return details of the backup status of the mail database(s) held on the server. It contains channels for:
Use this to keep a check on the history of your Exchange backups. Setting a limit on the “Time Since” channels will notify you if the system goes too long without a backup being taken.
Find more information at the manual.
Another PowerShell based sensor, this one checks the operational state of the database that holds the individual mailboxes and specifically reports on:
Limits on the “Validity” and “Mount State” channels will notify you if the database goes offline or experiences corruption.
Find more information at the manual.
Introduced in Exchange 2013, Database Availability Groups (DAGs) form the basis of a high availability resilience feature for Exchange. With a DAG being a group of up to 16 (in Exchange 2016) Mailbox servers that host a set of mail store databases that can provide automatic database level recovery in the event of a failure of individual servers or databases. This sensor provides detailed information on the status of an Exchange DAG:
Limits assigned to the various queue lengths will notify the administrator of problems with DAG replication.
Find more information at the manual.
The Mail Queue Sensor monitors the number of items in the outgoing mail queue of an Exchange Server. Like the WMI based Transport Queue Sensor mentioned above, this is a great sensor for checking that outbound email is leaving your mail system. Assigning limits to the various channels in this sensor will allow the administrator to immediately see if messages are backing up. For all channels, lower values are better.
Find more information at the manual.
The Mailbox Sensor returns metrics for individual user and system mailboxes. The data returned includes
Assigning limits on this sensor is a great way for Admins to be warned when individual mailboxes are approaching policy size limits, and identifying unused or orphan mailboxes.
Find more information at the manual.
Microsoft have been talking about deprecating Public Folders in Exchange for several years, but they’re still available in Exchange 2016. This sensor returns the same statistics for Public Folders as the Mailbox Sensor does for individual mailboxes (see above):
Find more information at the manual.
These out-of-the-box sensors will provide Exchange Admins with a good overview of the health of their systems. But by using PRTG’s Custom Sensors we can get an even deeper insight into how well our Exchange servers are performing and we’ll look into how to do this in the next part of this series.
Part 2: Your Secret Weapon for Monitoring Exchange: Custom WMI, PerfMon and Script Sensors
Part 3: Metrics That Matter: Processor and Process Metrics for MS Exchange
Part 4: PRTG & The Exchange Admin - Metrics That Matter: Memory