Just about every kind of company or organization has a reliance on data for daily operations. Because of this, databases and data warehouses are central to many environments. And most data architectures look very similar: there is a number of data sources, such as external systems or apps, which deliver data into the central system. This incoming data is in different formats and structures, and so it needs to be consolidated and loaded into the database in a way that matches how the data is stored there. This is where Extract, Transform, and Load (referred to as “ETL”) processes come into play.
These ETL processes are the barrier for entry for the data coming into the data mart or warehouse, and that means that this is a big point of failure. It’s also a complex one, too, because many of the ETL processes deal with different data input formats, different data volumes, and so on. And if things go wrong with the ETL step, then you have a situation where the data is either not loaded into the database, or it’s loaded incorrectly. Say, for example, there’s a problem with an external source system and the data doesn’t get delivered. Or, perhaps data is imported twice. These scenarios result in incomplete and inaccurate data respectively. And so, while ETL issues may not result in downtime for operations, it does have an impact when data is used to make critical business decisions.
Many Database Management Systems offer ways to monitor the ETL processes, such as mechanisms to check on individual ETL steps. But this covers only one aspect of it. What about problems with the delivery of the data from external systems? What about data that is incorrect? It might be a good idea to use a monitoring tool to expand the visibility of your ETL processes.
Before going into some ideas, it's worth mentioning that on a general level, there are two things that can help you check on your ETL processes: checking the data flow between two databases or tables, and doing a validation check on data. These two principles can be executed in a wide variety of ways, depending on your data architecture or environment.
That said, let's look at some other ideas for expanding the visibility of your ETL processes.
Because the data is coming from external sources, it makes sense to monitor those external sources. This includes the execution time of the jobs that deliver the data, as well as whether those jobs are running at all. This way, you can pick up problems that might be happening in the source systems.
Of course, it makes sense that you monitor the ETL jobs too. Just like for the external jobs, you can measure execution time, and whether the jobs ran or not at a specific time.
There are some trends that you can use to predict future issues. It is worthwhile to spend some time analyzing your architecture and models and identify potential harbingers of problems. Here are some examples:
If you monitor these kinds of trends over time, you can spot potential issues before they occur and proactively fix them.
When management suddenly gets a business intelligence report that shows a massive unexplained bump in sales, it might be time to celebrate…or maybe, time to check that the data is correct. More often than not, big changes in reports that do not have an explanation indicate that something probably went wrong with the data. This is something that might only become apparent if someone notices that figures seem to be incorrect, but it's better not to leave this to chance.
Some monitoring tools (like our very own PRTG Network Monitor, for example) provide a way to monitor counts of records or data sets. You can then set threshold alerts for minimum or maximum values based on historic data. If the count of records is above this threshold, then an alert is generated. The data can then be looked at more closely to ensure that the data is correct. And hey, fingers crossed that it was really more sales and not some data glitch!
In addition to the ETL processes, it's still important to keep an eye on the traditional database metrics. Things like database response time, connectivity, and so on are all important indicators of the big picture.
In a future blog article, I will discuss some of the sensors that PRTG Network Monitor provides for monitoring databases, and how you can cover some of the ideas mentioned above using them. Until then, I'd love to hear from you. Do you monitor ETL processes? Any tips of your own? Leave them in the comments below!