Part 2: Why ROS Switched From TCPROS to DDS
A deep-dive into DDS for ROS. Did Open Robotics make the right call?
Introduction: Enter DDS
In Part 1, we explored how ROS 1’s transport layer, TCPROS, became a bottleneck for real-time, production robots. It lacked support for multicast, fine-grained Quality of Service, and introduced failure risks that worried commercial customers.
Open Robotics knew they had to rebuild their transport layer, but this time, they didn’t want to build it themselves.
The new middleware needed to be:
Built for real-time performance
Efficient enough for embedded systems
Scalable to distributed systems with many nodes
Reliable and fault-tolerant
Widely adopted and proven in production environments
They adopted DDS, a commercially proven pub/sub middleware used in aviation, defense, and industrial automation.
In this post, we’ll break down what DDS is, how it works, and why it was the right backbone for ROS 2.
What is DDS?
DDS, short for Data Distribution Service , is a standard protocol for pub/sub communication. It was developed in the early 2000s by the Object Management Group (OMG), the same organization that developed CORBA.
DDS wasn’t initially designed for robotics, but its initial use cases shared many qualities with robots. From avionics to air traffic control to defense, the systems it served were time-sensitive and mission-critical.
Unlike niche academic protocol, DDS was already battle-tested. Companies like RTI and Prismtech (now ADLINK) had been commercializing and scaling DDS implementations in production for a decade. If the world’s biggest governments and enterprises could trust DDS, so could the fledgling ROS!
That maturity and track record made DDS an attractive choice for ROS 2.
How DDS Works
DDS is fully decentralized. There is no broker, no master, or any other centralized service. Every participant handles discovery, negotiation, and communication on their own.
Let’s walk through what happens when a ROS 2 node wants to publish or subscribe to a topic using DDS.
DomainParticipants
When a node wants to publish or subscribe to a topic, it creates a DomainParticipant
. This is the core object in DDS. It represents a node joining a specific domain, which can be thought of as a namespace.
From this participant, the node can create:
DataWriters, publishing messages to a topic
DataReaders, subscribing to messages from the topic
Each participant describes their communication preferences. This includes the topic name, QoS policy, and finally, the message types. These types are defined using an Interface Definition Language written in .msg
files.
If a DataWriter
and DataReader
match on these three fronts, they connect and begin transferring data.
Discovery via Multicast
When a participant spins up, it announces itself over the network using multicast. Multicast allows a message to be delivered to many listeners simultaneously without opening separate connections for each one.
This mechanism powers Simple Discovery, which DDS vendors implement out of the box.
What Transport Does DDS Use?
Most DDS implementations (including ROS 2’s) default to using UDP as their transport layer, specifically with a protocol called RTPS (Real-Time Publish-Subscribe).
UDP (User Datagram Protocol) is faster than TCP because it has minimal overhead:
No connection handshake
No guaranteed delivery
No in-order packet enforcement
This is why it’s widely used in video streaming, where it’s acceptable to drop a frame here and there, as long as the stream remains smooth throughout the video.
As we discussed, TCP suffers from HOL Blocking. In UDP, packets can arrive out of order, and that’s okay. It takes the tradeoff in favor of speed for perfect reliability. And in robotics, this is the tradeoff we prefer!
The awesome part about DDS is that you can control the knob between speed and reliability. You can configure retries, ordering, and other settings via QoS Policies.
How Do ROS 2 and DDS Interact?
ROS 2 doesn’t talk to DDS directly. Instead, it uses an abstraction layer called the ROS Middleware Interface (rmw
).
This layer acts as a bridge between ROS 2’s client libraries and the underlying DDS implementation. It lets ROS developers write portable code without being locked into a certain DDS vendor.
Several DDS implementations are available for ROS 2, including:
eProsima Fast DDS (ROS 2 default)
open source, widely used, good community support
open source, low-latency, well-suited for embedded and real-time use cases
commercial, mature, great tools and support, often used in safety-critical systems
All of these vendors implement the same OMG DDS spec, but they differ in performance, licensing, tooling, and support.
At the top of the stack, ROS 2 client libraries, like rclpp
for C++ and rclpy
for Python, plug into rmw, which in turn delegates to the configured DDS backend.
This gives us a clear separation of concerns.
Comparing DDS and TCPROS
So, how do DDS and TCPROS stack up?
DDS Advantages:
Real-time capable: No HOL blocking and low latency
Tunable QoS: can prioritize critical messages, tune deliverability, or set liveliness expectations
Multicast support: One message to many recipients without duplication
No ROS Master: Discovery is decentralized and removes a single point of failure
DDS Downsides:
3rd party vendor: Have to choose a vendor, potentially jumping through implementation hoops and paying for an extra commercial license
Increased complexity: QoS profiles and discovery settings can have a high learning curve
Harder to debug: Decentralized systems with UDP can be tricky to trace
Still, this tradeoff was more than worth it.
ROS 2 needed a middleware built for real-time, distributed robotics. Moreover, DDS was the right choice for commercial customers with custom, highly-performant setups.
The Bet Paid Off
The switch to DDS wasn’t a small one. It broke compatibility with ROS 1. It introduced new complexity. It restarted the bar for developers and integrators. But it was the right call.
ROS 2 continues to be the default robotics framework in the industry (though it is fraught with Rust-based competitors)! From warehouse fleets to surgical robots to defense companies, ROS 2 has scaled well to the most stringent use cases.
The Magic 8 Ball says that OSRF made the right decision back in 2014. And for that, they deserve their kudos!