Four steps for developing a successful IT infrastructure monitoring concept
-Sebastian Krueger, Vice President for Asia Pacific at Paessler AG
IT and traditional engineering environments don’t tend to be bundled together rather one tends to be a support extension that allows the other to work more productively.
However, as more New Zealand companies embark on their digital transformation, more businesses are increasingly relying on multi-cloud strategies, diverse applications and complex infrastructures to drive business outcomes and enhance outputs.
With the many layers and complexities, analysing performance and reliability is increasingly difficult. IT Monitoring should be at the core of IT environments of any size, but observability becomes even more crucial in larger environments that feature more than 1,000 monitored devices and applications – such as manufacturing.
If large enterprises optimise and organise their monitoring better they will save time and money while delivering better IT services to their employees, partners, suppliers and customers.
A good way of looking at it is, if IT is the foundation of large organisations, then monitoring is the insurance for their IT.
To ensure business continuity, IT teams have had to adapt several IT strategies, from expanding VPN capacity to finding different ways of doing unified communications (UC). Gartner says that SD-WAN solutions will serve two to three per cent of the global remote workforce by the end of 2021, driven by the need to improve and secure work-from-home connectivity.
Monitoring is a vital but complex task in large enterprises, in order to create a successful concept for monitoring a large IT infrastructure, these four steps are crucial to developing a successful monitoring concept:
- Define points of measurement, thresholds, and alerts
- Segment the network
- Build a centralised overview
- Define response teams and set up notifications
Define points of measurement
Prior to planning a monitoring architecture, it is important to understand the entire environment. And at the core of that, organisations will need to know how many points of measurement they have. Obviously, the more points of measurement there are, the more processing power and planning will be required for their monitoring concept.
For everything they want to monitor, there will be several points of measurement. If they want to monitor devices themselves, they will need to monitor things like device temperature, fan speed, storage remaining, CPU power, or other metrics that might be relevant.
Define thresholds
To give each point of measurement a meaning, they need to define thresholds. So not only do they need to know what they want to measure but they need to define an accepted range of operation for each component they are monitoring.
Examples of thresholds: a device shouldn’t get hotter than a specific temperature, available storage should not get below 10% and so on. When thresholds are exceeded, an alert is triggered and the relevant teams are notified.
Segment the network
In large networks, it’s not feasible to simply have potentially thousands or even tens of thousands of polling engines all over the network sending data back to one central monitoring server. Rather, they will need to logically segment their infrastructure.
Build a centralised overview
Regardless of how the monitoring is set up, enterprises will probably have several monitoring servers collecting data from different parts of their infrastructure.
It is imperative that they put it all together so that it can help them manage their entire IT infrastructure all from one central point. The way to do this is to create dashboards with an overview of the infrastructure so they can see immediately if there are potential or current issues.
Depending on how they segment the network, they might be able to manage everything from one location, in which case one central dashboard providing an overall summary would make sense. Alternatively, they might have sites administered separately, each with their own separate dashboards.
Define response teams and set up notifications
In order to manage a large IT infrastructure, the IT department is often divided into areas of competencies and they will have separate teams for different functions.
For example, one team might be responsible for the online storefront, another team for the email services, and so on. These teams would of course be responsible for monitoring their respective areas, too.
Developing a successful IT monitoring concept
To develop a successful IT infrastructure monitoring concept, enterprises should define the user groups according to the areas that they focus on.
Then, they need to define notifications for failures in those areas to go to the specific teams that need to know.
This sophisticated level of proactive monitoring ensures full observability for enterprise IT infrastructures, allowing them to predict or identify performance problems before they become urgent and ensures that network resources always operate as intended.