There is a certain irony in the situation facing industrial companies today. On the one hand, sensors, PLCs, and connected equipment generate unprecedented volumes of data. On the other, research from Forrester, cited across multiple industry reports, suggests that roughly 70% of that data goes unused. Not because it lacks value, but because it never reaches the people or systems that could make use of it.
The issue, then, is not data generation. It is data flow.
This is where industrial data flow management becomes critical, not as an additional technical layer, but as the underlying infrastructure that enables real-time visibility and decision-making. Without it, even the most valuable data remains out of reach.
Before diving into architectures and communication protocols, it is worth understanding why this challenge persists despite the fact that industrial organizations have relied on monitoring and control systems for decades.
The answer is simple: silos.
Historically, industrial environments were built for performance, reliability, and operational stability, not for connectivity. A Siemens PLC installed twenty years ago was never designed to communicate with a SAP ERP system. A SCADA platform running a chemical plant was not intended to stream data into a cloud-based data lake. These systems operated in isolation by design. Fewer connections meant fewer points of failure and a lower risk of security breaches.
Today, that model is increasingly at odds with the realities of connected manufacturing. Production teams need real-time performance metrics. Management wants a consolidated view of operations across multiple sites. Maintenance teams need to identify anomalies before they result in costly downtime. None of this is possible unless data can move seamlessly from the shop floor to the applications and stakeholders that depend on it, reliably, consistently, and securely.
According to the 2024 IT/OT Observatory study conducted by NXO and Cisco, which surveyed 135 industrial decision-makers, 64% of respondents identified data mobility as the top priority in their IT/OT convergence efforts. Yet only 11% had fully completed that convergence at the time of the survey, while 44% had begun the process. Progress is clearly being made, but the journey is far from complete.
Before a pressure reading can make its way from a sensor to a monitoring platform or a cloud data pipeline, every system involved must be able to communicate using the same language. That is the role of industrial communication protocols, and despite appearances, the landscape is far from standardized.
Originally developed by Modicon in 1979, Modbus remains one of the most widely deployed protocols in industrial environments. It uses a straightforward master-slave architecture: a controller requests data from a device, and the device responds. The introduction of Modbus TCP later allowed the protocol to adapt to Ethernet networks and remain relevant well into the digital era.
Its enduring popularity comes down to two things: simplicity and compatibility. Almost every piece of industrial equipment supports Modbus in one form or another.
The downside is that Modbus was built for a very different generation of industrial systems. It has no native security mechanisms, offers no standardized data model, and relies on a polling architecture in which the master must repeatedly query every connected device. As the number of assets increases, this approach can become a significant bottleneck.
For existing machine fleets, Modbus remains indispensable. But when it comes to feeding high-volume, real-time data streams into cloud platforms and analytics environments, it quickly begins to show its age.
OPC UA (Open Platform Communications Unified Architecture) takes a fundamentally different approach. Developed by the OPC Foundation, it is built around a hierarchical object model that allows systems to exchange not only data, but also context.
A temperature reading, for example, can be transmitted together with information about the asset it belongs to, the unit of measurement being used, and the expected operating range.
That semantic layer is what makes OPC UA so powerful. Unlike Modbus, which primarily moves raw values from one point to another, OPC UA delivers structured information that can be understood and interpreted consistently across systems.
Another major advantage is its support for publish/subscribe (Pub/Sub) communication. Instead of waiting for a controller to request data, devices can publish updates automatically whenever a value changes. Subscribers receive the information immediately, without having to poll for it. For real-time industrial applications, this represents a major shift in architecture.
OPC UA also includes built-in security features such as encryption, authentication, and message signing. This makes it particularly attractive in industries facing increasingly strict cybersecurity and compliance requirements, including those affected by regulations such as NIS 2.
The trade-off, however, is complexity.
Establishing an OPC UA connection requires several preliminary exchanges, including handshakes, security negotiations, and session creation. Compared with protocols such as Modbus or MQTT, the connection overhead is significantly higher.
In environments where bandwidth is limited or devices have constrained computing resources, that additional complexity can become a genuine obstacle.
MQTT (Message Queuing Telemetry Transport) was designed with a very different set of priorities in mind. From the outset, it was intended for environments where connectivity is unreliable, bandwidth is scarce, and devices have limited processing power and memory.
As a result, the protocol is exceptionally lightweight. Its binary format minimizes network traffic, connection overhead is minimal, and its publish/subscribe architecture, centered around a broker, makes it ideal for distributed industrial systems.
In a typical deployment, industrial gateways or PLCs publish measurements to MQTT topics such as factory/line3/press/temperature. A broker, often running Eclipse Mosquitto or EMQX, receives those messages and distributes them to any subscribed applications, whether that is a SCADA platform, a data pipeline, or a predictive maintenance solution.
One of MQTT’s greatest strengths is its resilience. If a network outage occurs, messages can be queued and delivered once connectivity is restored, helping ensure that critical operational data is not lost.
Its weakness lies elsewhere.
MQTT provides a highly efficient transport mechanism, but it says nothing about the meaning of the data being transported. A message can contain a value associated with a specific topic, yet the protocol itself does not define what that value represents, which sensor produced it, or what unit of measurement is being used.
To address this limitation, the industrial sector has developed standards such as Sparkplug B, which adds a structured and standardized payload format on top of MQTT, bringing the semantic context that MQTT lacks by design.
In reality, OPC UA and MQTT are rarely rivals. More often, they serve different purposes within the same architecture.
OPC UA is typically used inside the plant network, where rich data models, interoperability, and security are essential. MQTT then acts as the transport layer for moving information to cloud platforms, enterprise applications, or remote sites, where efficiency, scalability, and resilience matter most.
Rather than choosing between the two, many industrial organizations are combining them to create end-to-end data pipelines that are both intelligent and efficient.
Understanding communication protocols is only part of the story. To see how industrial data management works in practice, you need to look at the architecture those protocols operate within.
In a modern industrial environment, data typically passes through three distinct layers before it reaches the applications that consume it.
Everything starts on the shop floor.
Sensors measuring temperature, vibration, pressure, flow, current, and countless other variables continuously generate analog or digital signals. These signals are collected by control systems such as programmable logic controllers (PLCs), data acquisition units, or distributed control systems (DCSs) used in continuous-process industries. The data is then exposed through protocols such as Modbus, Profinet, or OPC UA.
A well-instrumented production line can generate thousands of measurements, with sampling rates ranging from a few milliseconds to several seconds.
In theory, all of that raw data could be sent directly to the cloud. In practice, that would be both expensive and largely unnecessary. A temperature reading that changes very slowly does not become more valuable simply because it is sampled ten times per second. At a certain point, you are collecting noise rather than insight.
This is where the architecture becomes truly interesting.
Industrial edge gateways, whether ruggedized industrial PCs or dedicated devices from vendors such as Siemens, Moxa, or Advantech, serve as the bridge between operational equipment and higher-level systems.
Their first role is protocol normalization. A single gateway may ingest data from Modbus RTU devices, Profinet networks, OPC UA servers, and traditional 4-20 mA sensors, then convert everything into a common format before forwarding it upstream. In many environments, the edge layer is what allows decades-old equipment to coexist with modern cloud platforms.
Its second role is data reduction.
Rather than forwarding every measurement generated by every device, the gateway can transmit only meaningful changes, aggregate values over predefined time windows, or alerts triggered by specific thresholds. This dramatically reduces the amount of data that needs to travel across the network while preserving the information that actually matters.
Increasingly, edge devices are also responsible for local analytics.
Simple anomaly-detection rules, data quality checks, and even lightweight machine learning models can now run directly on embedded ARM or x86 hardware using frameworks such as TensorFlow Lite or OpenVINO.
The benefit is straightforward: decisions can be made immediately, without waiting for data to travel to a remote cloud environment and back. In applications where milliseconds matter, that difference can be critical.
Once data has been standardized and filtered at the edge, it moves into the integration layer.
This is where data pipelines, event-streaming platforms, and message brokers take over.
Technologies such as Apache Kafka have become the backbone of many industrial data architectures because they provide a scalable and resilient way to move large volumes of real-time data. In a typical setup, edge gateways publish events to Kafka topics, while downstream applications, whether ERP systems, maintenance platforms, data lakes, analytics tools, or supervisory systems, subscribe only to the streams they need.
This decouples data producers from data consumers, making the entire architecture far more flexible and easier to scale.
Real-time processing engines such as Apache Flink and Spark Streaming can then enrich, filter, aggregate, or transform the incoming data before it reaches business applications.
For smaller sites, Apache NiFi often offers a more approachable alternative. Its visual interface allows teams to design, monitor, and manage data flows without having to build everything through code. Routing logic, encryption, and error handling can all be configured within the platform itself.
One of the most overlooked challenges at this stage is data governance.
When an edge gateway sends a temperature reading, the receiving system needs more than just the number itself. It needs context. Is the value a temperature, pressure, or flow measurement? Is it expressed in degrees Celsius or Fahrenheit? Which asset generated it? Which production line does it belong to?
Without that context, even perfectly transmitted data quickly loses its value.
This is why metadata management has become such an important part of modern industrial architectures. In the Kafka ecosystem, this often takes the form of a schema registry that defines how data should be interpreted across systems.
Without a shared understanding of data structures and meanings, pipelines gradually become opaque and difficult to maintain. Teams can no longer confidently trace where information originated, how it has been transformed, or whether it can be trusted.
The result is a problem that many industrial organizations know all too well: the data swamp.
Unlike a well-governed data lake, a data swamp is full of information but short on understanding. Data accumulates faster than it can be organized, documented, or contextualized. Eventually, the challenge is no longer collecting data. It is figuring out what any of it actually means.
Layered architectures look elegant on paper. Real-world industrial projects are rarely that neat.
The biggest obstacles are usually not technical. They stem from legacy infrastructure, organizational silos, cybersecurity requirements, and data quality issues that have accumulated over years, sometimes decades.
Virtually every factory operates a mix of technologies from different eras.
It is not unusual to find PLCs installed in the 1990s still running reliably alongside drives commissioned in 2012 and IoT sensors deployed just a few years ago. Many of these older systems support nothing more than Modbus RTU over RS-485, with no Ethernet connectivity whatsoever.
Integrating them into a modern data architecture often requires protocol converters, intermediary gateways, and occasionally a fair amount of creative engineering. It also tends to cost more and take longer than initially expected.
The challenge is not that legacy equipment cannot be connected. It is that every exception adds another layer of complexity to the overall architecture.
A surprisingly simple question often reveals a deeper problem:
Who owns the data generated by a production line?
Is it the maintenance team responsible for the PLCs? The IT department managing the network infrastructure? The process engineering team using the data to improve quality and performance?
In many facilities, there is no clear answer.
As a result, data models go undocumented, measurement quality is rarely monitored systematically, and integration projects evolve through a succession of workarounds rather than a coherent long-term strategy.
Over time, these small compromises accumulate until nobody fully understands how the data ecosystem actually works.
As operational technology (OT) networks become increasingly connected to IT systems, the attack surface expands significantly.
Legacy protocols such as Modbus were never designed with cybersecurity in mind. They provide no built-in authentication, encryption, or access control mechanisms. If network segmentation is inadequate, any device connected to the network may be able to communicate directly with a Modbus-enabled controller.
This was once considered an acceptable risk in isolated industrial environments. It is no longer.
Regulations such as NIS 2 are forcing many industrial organizations to take a far more rigorous approach to cybersecurity. That means understanding how data flows through the organization, documenting those flows, and assessing the risks associated with every connection between OT and IT systems.
For many companies, the challenge is not implementing security controls. It is first gaining visibility into an architecture that has evolved organically over many years.
Even the most sophisticated data pipeline cannot fix bad data.
Poorly calibrated sensors, incorrect timestamps, undocumented variables, and inconsistent naming conventions all create problems that no amount of downstream analytics can fully solve.
Yet data quality at the field level is rarely audited with the same rigor applied to production processes themselves.
The consequences are familiar: sensor drift that goes unnoticed, outliers that contaminate historical datasets, and cryptic Modbus tags that make it impossible to determine what a particular measurement actually represents.
Addressing these issues is rarely glamorous work. In fact, it is often one of the least visible parts of an industrial data initiative. It is also one of the most valuable.
Despite the complexity, the projects that succeed tend to follow a few common principles.
Map first, connect second
Before installing a single gateway or deploying a new platform, organizations need a clear picture of their existing environment.
Which assets are connected? Which protocols do they support? Which variables are available? How frequently are measurements generated?
This inventory often forms the foundation of the entire project. Without it, integration efforts become little more than educated guesswork.
Many organizations now use industrial network discovery tools to automate part of this process, but the goal remains the same: understand the landscape before attempting to modernize it.
Standardize on open protocols whenever possible
Every new equipment purchase is an opportunity to reduce future integration challenges.
Specifying OPC UA compatibility as a procurement requirement, for example, represents a relatively small investment that can deliver significant flexibility over the lifespan of the asset.
For existing equipment, protocol gateways often provide a practical alternative. Translating Modbus data into MQTT or OPC UA is typically far less expensive than replacing functional machinery simply to achieve connectivity.
Treat data governance as a core project, not an afterthought
Many industrial organizations focus heavily on infrastructure while underestimating the importance of governance.
Yet decisions such as how MQTT topics are named, how OPC UA namespaces are structured, which metadata must be recorded, and how normal operating ranges are documented have a direct impact on the long-term maintainability of the system.
These tasks may seem administrative, but they determine whether a data architecture remains usable five years from now or gradually becomes impossible to manage.
Separate OT and IT networks properly
Modern industrial architectures require data to move between production systems and enterprise applications. That does not mean the two environments should be directly connected.
A properly designed industrial DMZ provides a controlled boundary between OT and IT networks. Data can flow from production systems to business applications through carefully managed gateways, filtering mechanisms, or one-way communication channels, while preventing direct access to critical control systems.
The objective is simple: enable visibility without compromising operational security.
Start small, document everything, then scale
The most successful industrial data projects rarely begin with an ambitious effort to connect an entire organization at once.
Instead, they start with a single production line, a specific workshop, or a limited group of assets. The architecture is documented thoroughly, processes are tested, and lessons are learned before expanding to additional sites or systems.
This approach may seem slower initially, but it almost always proves faster in the long run.
Complexity grows exponentially in industrial environments. Starting with a manageable scope allows organizations to build repeatable patterns before scaling them across the business, rather than becoming overwhelmed by the challenge of connecting everything at once.

Sources
Copyright © 2026. MOTILDE. All rights reserved.