Understanding MQTT architecture: a deep dive
Originally published on September 06, 2023 by Dr. Andreas Schiffler
Last updated on January 23, 2024 • 16 minute read
Monitoring industrial information networks is essential to ensure performance, reduce unplanned downtime, and optimize energy use in daily production. An important aspect of monitoring these networks is understanding the protocols in use, the most common of which are HTTP, OPC UA, and MQTT, among others. For this article, we will focus specifically on monitoring MQTT, which has become more common in Operational Technology networks with the rise of the IIoT (Industrial Internet of Things).
So how do you solve the task of monitoring MQTT? One requirement is having convenient and reliable tools at your disposal, such as the MQTT round-trip sensor. But the other crucial point is having knowledge of the MQTT architecture in use, which can be used to understand data flows and troubleshoot and take effective actions in case of any issues. This means you need to have knowledge of factors like the MQTT broker in use, high availability requirements, and more.
In the following sections, we will briefly examine MQTT architectures used in reliable, highly available, and performant environments. But first, let's take a look at what MQTT brokers and topics are.
MQTT: Why, how, what?
Communication is a key value on the shop-floor level in industrial production networks. As you can imagine, workers are accustomed to using messenger applications to easily send, receive or share information in groups or between individuals throughout their workday, ensuring tasks are complete. MQTT is a lightweight and standardized protocol typically used by machines and controllers to send, receive, or share information in groups or between specific entities. The concept is quite close to that of personal messenger applications. As a result, MQTT is widely used in many smart applications for factories, cities, homes, and vehicles.
One can publish information on a specific topic, and others can subscribe to these topics to receive the published information instantly. The instance that manages the distribution of published messages is the so-called message broker. This software can run on small embedded devices like a Raspberry Pi, or in cloud services like Cedalo’s Pro Mosquitto MQTT Broker.
At this point, the main points to consider about MQTT clients (such as a PLC or smart sensor device) are:
- they can publish messages on topics (e.g., “temperature/from/sensorA”)
- they receive messages from subscribed topics (e.g., “temperature/from/sensorB”) via the MQTT broker
- they keep the connection to the broker established or reconnect in case of connection loss
For further details on MQTT, check out IT explained: MQTT.
The connectivity and functionality of the MQTT broker are crucial to ensuring this communication and, therefore, the transportation of the information, so it is a good idea to constantly monitor the state of the broker and establish a high availability architecture.
As mentioned before, using MQTT from the client’s perspective is straightforward – publish and subscribe to topics. But the architecture needed to realize the information transport can differ according to the requirements for information security, availability, and the different locations where data is produced/consumed. We'll take a look at the basic MQTT architecture, and then move on to high availability brokers, a real world scenario for MQTT architecture, and static versus dynamic message mapping, all of which are important to understand when setting up your monitoring.
The basic MQTT architecture
In the basic architecture, there are multiple clients and one broker in the same location. Essentially, "same location" means that the underlying network communication does not need to cross borders via an internet connection.
For example, the information is transported from one client to another through the MQTT broker using a TCP Ethernet connection (see Figure 1).
Figure 1 - Basic MQTT Architecture: one broker and multiple clients in the same location
This architecture can easily be set up for local deployments and a small number of clients (< 500) without using transport encryption (TLS), as other mechanisms can protect the network against violation. An advantage of the basic architecture is the easy set up process and the availability of numerous client software packages.
A drawback of using this architecture in production environments is managing access rights. This means different clients can or can not access every topic, and there may be performance and availability issues when the number of clients increases or when there are high publish rates for messages. So to prepare for a scalable and performant architecture, a high availability broker with an access management API will be needed – and this is introduced in the next section.
The concept of high availability MQTT brokers
The basic architecture uses one broker. However, you can have clients connected from different locations communicating with the broker over the internet. If you have requirements with high demands on the availability of the MQTT broker, the concept of a high availability cluster (HA Broker) comes into play.
This HA broker is realized using three, five, or even more instances (nodes) of the same broker. These nodes can be located in different availability zones, typically on the servers of infrastructure providers. For an example of managing multiple nodes of a broker (and different clustering modes), refer to Mosquitto MQTT High Availability and its different clustering modes.
Figure 2 - HA Broker with three cluster nodes and zone redundant endpoint
Different concepts exist to realize these HA brokers (for German readers, this heise online article outlines it nicely).
One approach is to elect a cluster broker (node) to serve as the master and manage traffic flow. Meanwhile, the other brokers remain in standby mode and can be set to act as master within milliseconds.
At a sufficiently high-level view, HA brokers provide an endpoint that is reachable over the internet, local network, or virtual private networks (VPN) that guarantees a technical availability of 99.9% or higher, independent of the number of messages or the number of connected clients. It's possible to have thousands or tens of thousands of clients without noticing any performance leaks. The monitoring and the reaction of node failures are done automatically by the underlying software, such as a cluster extension plugin for the broker instances. You can read the technical details on the Mosquitto High Availability Cluster for more information.
Possible real-world scenario for MQTT architecture
In the real world, production information network data flows or streams use different protocols. So, for example, one common approach is to use webhooks (HTTP, POST) for data sinks or streaming analytics ingests. On the other hand, data sources in a production environment consist of PLCs with real-time capabilities or embedded PCs. Here the MQTT protocol is often used. According to the concept of the automation pyramid, the entities in the information networks can be arranged in information levels (see Figure 3).
Figure 3 - Conceptual view of different information levels for a possible production environment scenario in the context of data consumers and data sinks.
Additional individual decisions have to be made regarding the location of the data handling instances, in other words the MQTT broker – either on premises or in the cloud. The latter can be further divided into using shared or dedicated infrastructure.
This approach will result in the following high-level requirements:
- Multiple clients in different locations and multiple brokers using bridging and high-availability clusters have to communicate.
- Reliable company-level site-to-site connections need to be established via VPN, private cable connection, or using HTTPS data transport.
- An aligned concept for topic naming and metadata values arrangement is needed.
Let’s have a detailed look at the first high-level requirement. The basic MQTT architecture can be used to offer instant communication and data flow between clients, as shown above. Figure 3 demonstrates that the brokers can be implemented on-premises as an HA broker with multiple nodes. Brokers 1, 2, and 3 provide message distribution services for different production lines or logistic infrastructure.
The key functionality to interconnect these brokers is called bridging. Bridging MQTT brokers makes it possible to map messages between different levels or sites. You can, for example, name the topics something along the lines of “from_level_0” or “energy_metrics_for_facility_mmgt”. For a more detailed explanation on bridging, refer to the Mosquitto Bridge Configuration Explained blog.
Ultimately, the broker’s bridging capabilities ensure that clients located on different information levels or in different locations can exchange information over multiple bridge brokers. For this case, whether the brokers are hosted on-premises or in the cloud does not matter.
The major challenge, also for monitoring, is to align and define topic naming conventions and topic mappings. The risk of building message loops increases with the number of brokers involved. Detecting message loops and redundant traffic is another challenge where traffic analytic capabilities are needed.
In the illustrated scenario shown in Figure 3, there is also the case that the bridging is done from MQTT to the HTTP protocol. This makes sense for selected topics and, therefore, specific information to aggregate them and store them in a central data lake fed by a webhook on the company level.
Static vs. dynamic message mapping
The architecture described above (refer to Figure 3) can be realized through a static configuration for MQTT bridging. But the need for smart dynamic bridging may increase as we put possible use cases into practice.
Imagine if, at a company level, you recognize that a metric or information for a highly aggregated BI report is missing. Then you could post a request to an AI-Bot with the prompt, “For the next evaluation period of the monthly turnover, please include a bar chart for the sum of kilometers the AGVs run per production line.” This could certainly be a future functionality requirement.
The executing functions would then have to automatically redefine the bridging configuration through all levels to ensure the data is transported from the AGVs to the central data lake. Technically, one building block is already available using remote interfaces like the MQTT API and REST API to update the MQTT broker configurations. This can be called smart dynamic bridging and is the starting point for more advanced MQTT architectures.
About the author: Dr. Andreas Schiffler is a research professor at the Technical University of Wuerzburg-Schweinfurt in the field of production and data technology in mechanical engineering. In addition to research topics related to 3D metal printing, Dr. Schiffler developed a Kubernetes cluster for the practice-oriented basics of IoT and Industry 4.0 as part of the student training.