Industrial Data Flow Management

Summary

Looking for expert guidance?

Curious about this topic? Got a question or a project in mind?

Let's talk

There is a certain irony in the situation facing industrial companies today. On the one hand, sensors, PLCs, and connected equipment generate unprecedented volumes of data. On the other, research from Forrester, cited across multiple industry reports, suggests that roughly 70% of that data goes unused. Not because it lacks value, but because it never reaches the people or systems that could make use of it.

The issue, then, is not data generation. It is data flow.

This is where industrial data flow management becomes critical, not as an additional technical layer, but as the underlying infrastructure that enables real-time visibility and decision-making. Without it, even the most valuable data remains out of reach.

Why industrial data does not flow effectively

Before diving into architectures and communication protocols, it is worth understanding why this challenge persists despite the fact that industrial organizations have relied on monitoring and control systems for decades.

The answer is simple: silos.

Historically, industrial environments were built for performance, reliability, and operational stability, not for connectivity. A Siemens PLC installed twenty years ago was never designed to communicate with a SAP ERP system. A SCADA platform running a chemical plant was not intended to stream data into a cloud-based data lake. These systems operated in isolation by design. Fewer connections meant fewer points of failure and a lower risk of security breaches.

Today, that model is increasingly at odds with the realities of connected manufacturing. Production teams need real-time performance metrics. Management wants a consolidated view of operations across multiple sites. Maintenance teams need to identify anomalies before they result in costly downtime. None of this is possible unless data can move seamlessly from the shop floor to the applications and stakeholders that depend on it, reliably, consistently, and securely.

According to the 2024 IT/OT Observatory study conducted by NXO and Cisco, which surveyed 135 industrial decision-makers, 64% of respondents identified data mobility as the top priority in their IT/OT convergence efforts. Yet only 11% had fully completed that convergence at the time of the survey, while 44% had begun the process. Progress is clearly being made, but the journey is far from complete.

Protocols: the hidden complexity behind industrial data flows

Before a pressure reading can make its way from a sensor to a monitoring platform or a cloud data pipeline, every system involved must be able to communicate using the same language. That is the role of industrial communication protocols, and despite appearances, the landscape is far from standardized.

Modbus: the legacy protocol still powering the factory floor

Originally developed by Modicon in 1979, Modbus remains one of the most widely deployed protocols in industrial environments. It uses a straightforward master-slave architecture: a controller requests data from a device, and the device responds. The introduction of Modbus TCP later allowed the protocol to adapt to Ethernet networks and remain relevant well into the digital era.

Its enduring popularity comes down to two things: simplicity and compatibility. Almost every piece of industrial equipment supports Modbus in one form or another.

The downside is that Modbus was built for a very different generation of industrial systems. It has no native security mechanisms, offers no standardized data model, and relies on a polling architecture in which the master must repeatedly query every connected device. As the number of assets increases, this approach can become a significant bottleneck.

For existing machine fleets, Modbus remains indispensable. But when it comes to feeding high-volume, real-time data streams into cloud platforms and analytics environments, it quickly begins to show its age.

OPC UA: the backbone of modern interoperability

OPC UA (Open Platform Communications Unified Architecture) takes a fundamentally different approach. Developed by the OPC Foundation, it is built around a hierarchical object model that allows systems to exchange not only data, but also context.

A temperature reading, for example, can be transmitted together with information about the asset it belongs to, the unit of measurement being used, and the expected operating range.

That semantic layer is what makes OPC UA so powerful. Unlike Modbus, which primarily moves raw values from one point to another, OPC UA delivers structured information that can be understood and interpreted consistently across systems.

Another major advantage is its support for publish/subscribe (Pub/Sub) communication. Instead of waiting for a controller to request data, devices can publish updates automatically whenever a value changes. Subscribers receive the information immediately, without having to poll for it. For real-time industrial applications, this represents a major shift in architecture.

OPC UA also includes built-in security features such as encryption, authentication, and message signing. This makes it particularly attractive in industries facing increasingly strict cybersecurity and compliance requirements, including those affected by regulations such as NIS 2.

The trade-off, however, is complexity.

Establishing an OPC UA connection requires several preliminary exchanges, including handshakes, security negotiations, and session creation. Compared with protocols such as Modbus or MQTT, the connection overhead is significantly higher.

In environments where bandwidth is limited or devices have constrained computing resources, that additional complexity can become a genuine obstacle.

MQTT: built for industrial IoT at scale

MQTT (Message Queuing Telemetry Transport) was designed with a very different set of priorities in mind. From the outset, it was intended for environments where connectivity is unreliable, bandwidth is scarce, and devices have limited processing power and memory.

As a result, the protocol is exceptionally lightweight. Its binary format minimizes network traffic, connection overhead is minimal, and its publish/subscribe architecture, centered around a broker, makes it ideal for distributed industrial systems.

In a typical deployment, industrial gateways or PLCs publish measurements to MQTT topics such as factory/line3/press/temperature. A broker, often running Eclipse Mosquitto or EMQX, receives those messages and distributes them to any subscribed applications, whether that is a SCADA platform, a data pipeline, or a predictive maintenance solution.

One of MQTT’s greatest strengths is its resilience. If a network outage occurs, messages can be queued and delivered once connectivity is restored, helping ensure that critical operational data is not lost.

Its weakness lies elsewhere.

MQTT provides a highly efficient transport mechanism, but it says nothing about the meaning of the data being transported. A message can contain a value associated with a specific topic, yet the protocol itself does not define what that value represents, which sensor produced it, or what unit of measurement is being used.

To address this limitation, the industrial sector has developed standards such as Sparkplug B, which adds a structured and standardized payload format on top of MQTT, bringing the semantic context that MQTT lacks by design.

In reality, OPC UA and MQTT are rarely rivals. More often, they serve different purposes within the same architecture.

OPC UA is typically used inside the plant network, where rich data models, interoperability, and security are essential. MQTT then acts as the transport layer for moving information to cloud platforms, enterprise applications, or remote sites, where efficiency, scalability, and resilience matter most.

Rather than choosing between the two, many industrial organizations are combining them to create end-to-end data pipelines that are both intelligent and efficient.

From the shop floor to the cloud: how industrial data flows are built

Understanding communication protocols is only part of the story. To see how industrial data management works in practice, you need to look at the architecture those protocols operate within.

In a modern industrial environment, data typically passes through three distinct layers before it reaches the applications that consume it.

The field layer: where data is generated

Everything starts on the shop floor.

Sensors measuring temperature, vibration, pressure, flow, current, and countless other variables continuously generate analog or digital signals. These signals are collected by control systems such as programmable logic controllers (PLCs), data acquisition units, or distributed control systems (DCSs) used in continuous-process industries. The data is then exposed through protocols such as Modbus, Profinet, or OPC UA.

A well-instrumented production line can generate thousands of measurements, with sampling rates ranging from a few milliseconds to several seconds.

In theory, all of that raw data could be sent directly to the cloud. In practice, that would be both expensive and largely unnecessary. A temperature reading that changes very slowly does not become more valuable simply because it is sampled ten times per second. At a certain point, you are collecting noise rather than insight.

The edge layer: the real control center

This is where the architecture becomes truly interesting.

Industrial edge gateways, whether ruggedized industrial PCs or dedicated devices from vendors such as Siemens, Moxa, or Advantech, serve as the bridge between operational equipment and higher-level systems.

Their first role is protocol normalization. A single gateway may ingest data from Modbus RTU devices, Profinet networks, OPC UA servers, and traditional 4-20 mA sensors, then convert everything into a common format before forwarding it upstream. In many environments, the edge layer is what allows decades-old equipment to coexist with modern cloud platforms.

Its second role is data reduction.

Rather than forwarding every measurement generated by every device, the gateway can transmit only meaningful changes, aggregate values over predefined time windows, or alerts triggered by specific thresholds. This dramatically reduces the amount of data that needs to travel across the network while preserving the information that actually matters.

Increasingly, edge devices are also responsible for local analytics.

Simple anomaly-detection rules, data quality checks, and even lightweight machine learning models can now run directly on embedded ARM or x86 hardware using frameworks such as TensorFlow Lite or OpenVINO.

The benefit is straightforward: decisions can be made immediately, without waiting for data to travel to a remote cloud environment and back. In applications where milliseconds matter, that difference can be critical.

The integration layer: where data becomes usable

Once data has been standardized and filtered at the edge, it moves into the integration layer.

This is where data pipelines, event-streaming platforms, and message brokers take over.

Technologies such as Apache Kafka have become the backbone of many industrial data architectures because they provide a scalable and resilient way to move large volumes of real-time data. In a typical setup, edge gateways publish events to Kafka topics, while downstream applications, whether ERP systems, maintenance platforms, data lakes, analytics tools, or supervisory systems, subscribe only to the streams they need.

This decouples data producers from data consumers, making the entire architecture far more flexible and easier to scale.

Real-time processing engines such as Apache Flink and Spark Streaming can then enrich, filter, aggregate, or transform the incoming data before it reaches business applications.

For smaller sites, Apache NiFi often offers a more approachable alternative. Its visual interface allows teams to design, monitor, and manage data flows without having to build everything through code. Routing logic, encryption, and error handling can all be configured within the platform itself.

One of the most overlooked challenges at this stage is data governance.

When an edge gateway sends a temperature reading, the receiving system needs more than just the number itself. It needs context. Is the value a temperature, pressure, or flow measurement? Is it expressed in degrees Celsius or Fahrenheit? Which asset generated it? Which production line does it belong to?

Without that context, even perfectly transmitted data quickly loses its value.

This is why metadata management has become such an important part of modern industrial architectures. In the Kafka ecosystem, this often takes the form of a schema registry that defines how data should be interpreted across systems.

Without a shared understanding of data structures and meanings, pipelines gradually become opaque and difficult to maintain. Teams can no longer confidently trace where information originated, how it has been transformed, or whether it can be trusted.

The result is a problem that many industrial organizations know all too well: the data swamp.

Unlike a well-governed data lake, a data swamp is full of information but short on understanding. Data accumulates faster than it can be organized, documented, or contextualized. Eventually, the challenge is no longer collecting data. It is figuring out what any of it actually means.

The real challenges teams face on the ground

Layered architectures look elegant on paper. Real-world industrial projects are rarely that neat.

The biggest obstacles are usually not technical. They stem from legacy infrastructure, organizational silos, cybersecurity requirements, and data quality issues that have accumulated over years, sometimes decades.

A patchwork of old and new equipment

Virtually every factory operates a mix of technologies from different eras.

It is not unusual to find PLCs installed in the 1990s still running reliably alongside drives commissioned in 2012 and IoT sensors deployed just a few years ago. Many of these older systems support nothing more than Modbus RTU over RS-485, with no Ethernet connectivity whatsoever.

Integrating them into a modern data architecture often requires protocol converters, intermediary gateways, and occasionally a fair amount of creative engineering. It also tends to cost more and take longer than initially expected.

The challenge is not that legacy equipment cannot be connected. It is that every exception adds another layer of complexity to the overall architecture.

Fragmented ownership of industrial data

A surprisingly simple question often reveals a deeper problem:

Who owns the data generated by a production line?

Is it the maintenance team responsible for the PLCs? The IT department managing the network infrastructure? The process engineering team using the data to improve quality and performance?

In many facilities, there is no clear answer.

As a result, data models go undocumented, measurement quality is rarely monitored systematically, and integration projects evolve through a succession of workarounds rather than a coherent long-term strategy.

Over time, these small compromises accumulate until nobody fully understands how the data ecosystem actually works.

Cybersecurity is often addressed too late

As operational technology (OT) networks become increasingly connected to IT systems, the attack surface expands significantly.

Legacy protocols such as Modbus were never designed with cybersecurity in mind. They provide no built-in authentication, encryption, or access control mechanisms. If network segmentation is inadequate, any device connected to the network may be able to communicate directly with a Modbus-enabled controller.

This was once considered an acceptable risk in isolated industrial environments. It is no longer.

Regulations such as NIS 2 are forcing many industrial organizations to take a far more rigorous approach to cybersecurity. That means understanding how data flows through the organization, documenting those flows, and assessing the risks associated with every connection between OT and IT systems.

For many companies, the challenge is not implementing security controls. It is first gaining visibility into an architecture that has evolved organically over many years.

Data quality issues start at the source

Even the most sophisticated data pipeline cannot fix bad data.

Poorly calibrated sensors, incorrect timestamps, undocumented variables, and inconsistent naming conventions all create problems that no amount of downstream analytics can fully solve.

Yet data quality at the field level is rarely audited with the same rigor applied to production processes themselves.

The consequences are familiar: sensor drift that goes unnoticed, outliers that contaminate historical datasets, and cryptic Modbus tags that make it impossible to determine what a particular measurement actually represents.

Addressing these issues is rarely glamorous work. In fact, it is often one of the least visible parts of an industrial data initiative. It is also one of the most valuable.

What organizations should do in practice

Despite the complexity, the projects that succeed tend to follow a few common principles.

Map first, connect second

Before installing a single gateway or deploying a new platform, organizations need a clear picture of their existing environment.

Which assets are connected? Which protocols do they support? Which variables are available? How frequently are measurements generated?

This inventory often forms the foundation of the entire project. Without it, integration efforts become little more than educated guesswork.

Many organizations now use industrial network discovery tools to automate part of this process, but the goal remains the same: understand the landscape before attempting to modernize it.

Standardize on open protocols whenever possible

Every new equipment purchase is an opportunity to reduce future integration challenges.

Specifying OPC UA compatibility as a procurement requirement, for example, represents a relatively small investment that can deliver significant flexibility over the lifespan of the asset.

For existing equipment, protocol gateways often provide a practical alternative. Translating Modbus data into MQTT or OPC UA is typically far less expensive than replacing functional machinery simply to achieve connectivity.

Treat data governance as a core project, not an afterthought

Many industrial organizations focus heavily on infrastructure while underestimating the importance of governance.

Yet decisions such as how MQTT topics are named, how OPC UA namespaces are structured, which metadata must be recorded, and how normal operating ranges are documented have a direct impact on the long-term maintainability of the system.

These tasks may seem administrative, but they determine whether a data architecture remains usable five years from now or gradually becomes impossible to manage.

Separate OT and IT networks properly

Modern industrial architectures require data to move between production systems and enterprise applications. That does not mean the two environments should be directly connected.

A properly designed industrial DMZ provides a controlled boundary between OT and IT networks. Data can flow from production systems to business applications through carefully managed gateways, filtering mechanisms, or one-way communication channels, while preventing direct access to critical control systems.

The objective is simple: enable visibility without compromising operational security.

Start small, document everything, then scale

The most successful industrial data projects rarely begin with an ambitious effort to connect an entire organization at once.

Instead, they start with a single production line, a specific workshop, or a limited group of assets. The architecture is documented thoroughly, processes are tested, and lessons are learned before expanding to additional sites or systems.

This approach may seem slower initially, but it almost always proves faster in the long run.

Complexity grows exponentially in industrial environments. Starting with a manageable scope allows organizations to build repeatable patterns before scaling them across the business, rather than becoming overwhelmed by the challenge of connecting everything at once.

Supervision Guide: Master Your Critical Environments

Sensors, software, cameras, alarm systems… Explore the best practices to manage, secure, and optimize your systems.

Access the free guide

Key takeaways

Industrial data flow management encompasses all the mechanisms used to collect, transport, filter, normalize, and distribute data from production equipment to the systems and applications that need it.
According to Forrester, around 70% of industrial data goes unused, largely due to inadequate architectures and weak data governance.
Modbus, OPC UA, and MQTT serve different purposes and are often complementary rather than interchangeable. Modbus remains essential for legacy systems, OPC UA provides rich semantics and strong security for local industrial networks, while MQTT is ideally suited for moving data efficiently to cloud and remote platforms.
The edge layer is the cornerstone of any serious industrial data architecture. It handles protocol translation, data filtering, volume reduction, and increasingly, local processing through embedded analytics and machine learning.
Apache Kafka has emerged as the leading message broker for high-volume industrial data pipelines, often combined with Apache NiFi for orchestration and Apache Flink for real-time stream processing.
The biggest barriers to success are usually organizational rather than technical. Common challenges include weak data governance, unclear ownership between IT and OT teams, and poor visibility into sensor and data quality.
According to the 2024 NXO/Cisco IT/OT Observatory, 64% of industrial decision-makers consider data mobility the top priority in their IT/OT convergence initiatives. Yet only 11% have fully completed that transformation.
Security should be designed into the architecture from the outset, not added later. Network segmentation, controlled data flows, and protocols with built-in encryption and authentication are now essential components of modern industrial environments.

Sources

NXO & Cisco, IT/OT Observatory 2024 (135 industrial decision-makers surveyed)
Forrester Research, industrial data utilization studies
Linux Embedded, edge computing and industrial data pipeline architectures (2025)
OPC Foundation, OPC UA specifications and technical documentation

Industrial data visualization: what dataviz really changes in control rooms

The real challenges of industrial human-machine interfaces

Industrial data flow management: optimizing data collection and analysis

Categories