4 steps to a successful IT infrastructure monitoring concept
Originally published on June 21, 2021 by Sascha Neumeier
Last updated on August 31, 2021 • 13 minute read
IT is the foundation of your business and monitoring is the insurance for your IT. Especially for large IT environments, monitoring is a vital but complex task.
In this article, you’ll learn how to optimize and organize your monitoring to save time and money while delivering better IT services to your business. I will show how you create a successful concept for monitoring your IT infrastructure in only 4 steps – regardless if you want to monitor 50 devices or 5000!
I've linked the headlines of the 4 steps here so you can quickly jump to each point.
- Define points of measurement, thresholds, and alerts
- Segment your network
- Build a centralized overview
- Define response teams and set up notifications
Define points of measurement, thresholds, and alerts
Before you can plan your monitoring architecture, you need to understand your environment. And at the core of that, you will need to know how many points of measurement you have.
For everything you want to monitor, there will be several points of measurement. If you want to monitor devices themselves, you will need to monitor things like device temperature, fan speed, storage remaining, CPU power, or other metrics that might be relevant.
Obviously, the more points of measurement you have, the more processing power and planning you will require for your monitoring concept.
To give each point of measurement a meaning, you need to define thresholds. So not only do you need to know what you want to measure, but you need to define an accepted range of operation for each component you are monitoring.
Examples of thresholds: a device shouldn’t get hotter than a specific temperature, available storage should not get below 10%, and so on. When thresholds are exceeded, an alert is triggered, and the relevant teams notified.
Segment your network
In large networks, it’s not feasible to simply have potentially thousands (or even tens of thousands) of polling engines all over your network sending data back to one central monitoring server. Rather, you will need to logically segment your infrastructure.
A segmented monitoring architecture might look something like this:
Here, each segment of the network is monitored separately, with the monitoring data sent to a central monitoring server. How you segment your infrastructure depends on your specific environment. One way is to segment the monitoring by region:
Another option is to segment the monitoring by functionality. For example, one instance monitors the servers, while another instance monitors applications:
Remember, these are just examples intended to give you an idea of how you need to think about the monitoring architecture. For your specific situation, you may need the guidance and advice of a certified integrator.
Build a centralized overview
Regardless of how you set your monitoring up, you will probably have several monitoring servers collecting data from different parts of your infrastructure. Now you must put it all together so that it can help you manage your entire IT, all from one central point. The way to do this is to create dashboards with an overview of the infrastructure, so that you can tell immediately if there are potential or current issues.
Depending on how you segment your network, you might manage everything from one location, in which case one central dashboard providing an overall summary would make sense. Or, you might have sites administered separately, each with their own dashboards.
Get an overview of IT services
A centralized view should be very high level. But what does this mean? Again, it depends on how you segment your network, but a good way to do this is to relate components of your infrastructure to IT services. For example: your company’s E-mail service, the licensing system, or software build processes are all IT services provided by several connected bits of hardware and connectivity.
Once you’ve defined your IT services, you can map the relevant parts of the infrastructure to them. Let’s take the E-mail service example: the mail server, storage servers and the internet connection are the components of your network and infrastructure that you map to the “E-mail service” IT service. On your centralized dashboard, you would only see the health of the E-mail service.
If a minor issue occurs – let’s say a redundant mail server has performance problems – the email service itself would not be endangered since there are failover mail servers. A notification would be sent to a team member, but there wouldn’t be an alert for the entire team – and on the centralized dashboard, the service would be green.
However, if there is a service-critical problem — maybe a crash of the core switch all mail data passes through — then the dashboard sends an alert to the whole team and the E-mail service turns red on the centralized dashboard. At this point, you can drill down to the underlying components to see what part of the infrastructure is causing the problem.
SLA Monitoring and reporting
Arranging your infrastructure as business services not only makes it easy to get an overview, but also makes it easier to manager service level agreements (SLAs).
Large enterprises often have many SLAs in place. There are internal SLAs to ensure that the IT teams are meeting certain requirements. Then there are external, or customer-facing, SLAs for organizations that provide services to external stakeholders. For example: you might have an uptime agreement for a certain service; in this case, you need to constantly check connectivity to that service and raise an alert when it is not available. Or if you have an agreement that a certain amount of bandwidth will be available, you need to constantly measure available bandwidth and raise an alert if it becomes too low.
Structuring your business services according to the SLAs you need to track will give you a better overview into the status of the services you are providing and – if there is an issue – let you drill down to discover the root cause of the problem to solve it before SLAs are violated.
Define response teams and set up notifications
In order to manage large IT infrastructure, the IT department is often divided into areas of competencies, so you have separate teams for different functions. For example: One team might be responsible for the online storefront, another team for the E-mail services, and so on. These teams would of course be responsible for monitoring their respective areas, too.
For your monitoring concept, define the user groups according to the areas that they focus on. Then, you define notifications for failures in those areas to go to the specific teams that need to know.
Paessler PRTG Enterprise Monitor: up to the big challenge
Paessler PRTG Enterprise Monitor is a scalable solution for monitoring large IT
infrastructure. It keeps monitoring simple, and everything is included – no need to get additional add-ons or modules. And it is based on a subscription license model.
This means that the focus is OpEx as opposed to CaPex, and the tool can up- or downscale according to your infrastructure needs.
PRTG Enterprise Monitor helps cover the challenges with monitoring large networks:
- Vendor agnostic
- Support of all major protocols and monitoring technologies
- Horizontal scaling through unlimited server installations
- Centralized overview focused on IT services provided by ITOps Board
- Advanced alerts management to reduce alert noise
- Detailed overview provided by PRTG Desktop and web interface
- Roles and rights system, individual dashboards for specific user groups
- Apps (iOS and Android) for maximum flexibility
- Integration with other tools, like Flowmon, Plixer, etc. to bring together monitoring information and reduce alert noise
- API for transferring data to analysis tools and integrate with other monitoring and management tools
Here, my colleague Shaun explains the features and benefits of PRTG Enterprise Monitor in an easy-to-understand video.
For more information on how our products solve your challenges, simply contact us. As large monitoring setups require planning, experience and a lot of monitoring know-how, I recommend talking to one of our PRTG experts around the globe.
Interested in more details? Get our guide!
We have prepared a guide to successful enterprise IT monitoring for you. In it, you will find a lot more information on how to successfully monitor large IT infrastructures. Just click here and download the guide.