Communication over the kinds of Data-Links used for unmanned vehicles presents important challenges dues to the low bandwidth, intermittent, and lower reliability of these links. Classic network protocols such as TCP do not operate well in this environment forcing application developers to implement their own reliability and session management. This presentation describes he issues and alternatives.
3. UAVs part of larger integrated network
Vehicle LAN
Data Link
Ground Station
LAN
Avionics
Net Centric GIG
Tactical
Backbone
Real-Time
Ground Station
Backend
WAN
4. Characteristics of UAV Communications
In-Vehicle comm.
Data Link comm.
Ground Station comm.
Net-Centric Backbone comm.
5. Inside Vehicle Communications
Deeply-embedded, low-power
– Limited CPU speed
– General Purpose Processors and FPGAs
Memory constrained devices
– Limited RAM
– Flash filesystem or none
Dedicated IPC transports
– Back plane
Certification requirements
– DO-178B Operating Systems
Challenging
Environment to
operate on!
6. Data-Link Communications
Multiple traffic types:
– Sensor data streams
– Command & Control data
– Status, Intelligence, Mission, Supervisory
Different traffic requirements for each type:
– Urgency, Priority, Reliability, Volume
– Stealth operations
Challenging communications channel:
– Large latency, low throughput channel
– Lossy links
– Disconnections
– Asymmetric bandwidth (downlink vs uplink)
7. Data Link Types & Requirements
Low
Throughput
High
reliability
High integrity
Aggregate
Performance
for HRDL +
HCDL
High
Throughput
(streaming
data)
Moderate
Throughput
High Avail.
High Integrity
Reqs
C2 and
Status data
transfer in
emergency
Relay xfer:
High Altitude
Platform or
UAV
Sensor
Data
C2 Data
Status Data
(position
attitude)
Use
Back UpBeyond Line
of Sight
High
Capacity
(HCDL)
High
Reliability
(HRDL)
9. Ground Station Characteristics
Heterogeneous system
– RT storage & processing of sensor data
– Integration with display
– Integration with C2 / supervision systems
– Integration with net-centric back end
Multiple programming languages:
C/C++/Java/.NET
Multi-Platform: Linux/Windows/Embedded
Modular, reconfigurable
Varying assignments of Ground-Station to UAV
10. Ground Station Requirements
Be able to handle and adapt a variety of:
– CPUs, / Computer platforms
– Traffic flows
– Programming Languages
– Operating Systems
Provide a modular framework
– Support reconfiguration
– Support evolution, extensibility
– Support SOA tenets
Support operational use-cases
– Link fallback
– Multi-station handoff
13. Outline
UAV Communication Requirements
Why TCP-based solutions do not work
Implementing your own data-link protocol
Using middleware (DDS) for the Data-Link
Conclusions
14. TCP-based solutions do not work
for the Data Link
TCP has fundamental problems in the Data-Link…
– Un-tunable timers & congestion control algorithm
– Bad behavior on lossy networks & networks with dropouts
– Bad behavior on large latency links
The consequences are:
– Protocol Problems
Head of line blocking
Brittle connection-oriented model
Byte-oriented. Lacks prioritization
Inflexible reliability model. Not stealth
– Performance problems
Slow connect
Low link utilization
TCP
15. TCP problem: Head-of-line blocking
TCP funnels all traffic over single reliable stream
A Byte cannot be delivered until all previous Bytes
have been received
– A lost packet will “block” all future traffic until that
packet is repaired
– A large message will “block” all future traffic until it is
completely delivered
– IMAGE: Broken Bicycle blocking race car
– IMAGE: Large Tractor blocking race car
TCP’s “stream-oriented” reliability model not
suitable for Data Link
TCP
16. TCP issues: Brittle connections
TCP relies in hard-coded timers to establish connections
– SYN messages bust be responded before timeout
– SYN timer is 3 secs with doubling exponential backoff: 3s, 6s, 12s,…
– Implementations give up after fixed number of attempts
– Large latencies (> 60 sec) cause every TCP connection attempt to fail
– Some TCP implementations fail sooner: e.g 9 sec for Windows
TCP is bad a detecting disconnections
– To detect connection liveliness must use KEEP_ALIVE option
System-wide timeout defaults to 2 hours
– Common solution is periodic application messaging
Detection time non-deterministic. Order of minutes
TCP connection failure is drastic
– All state is lost
– No knowledge of what messages were delivered or not
– Application must do their message framing, sequence numbering and
acknowledgment to enable to continue upon re-connect
TCP
17. TCP issue: Low bandwidth use
‘Perfect storm’ for TCP protocol:
– TCP slow start
Ramp-up time ~ RTT*log(BW)
– RTT – roundtrip time (2 x latency)
– BW -- bandwidth
– Insufficient TCP buffersize given large RTT
To utilize a given BW TCP needs buffersize ~ BW*RTT
For 10 Mbps and 500msec buffsize ~ 640KB!
Typical Available/Configured TCP buffsize far smaller!
– TCP congestion-control algorithm misinterprets packet loss as
a sign of congestion
End result: long ramp-up times, low and/or unstable
bandwidth use
TCP
18. Details: Insufficient buffersize
TCP flow control is based on a send "window size“
– Send Window determines how much data can be outstanding (i.e.,
unacknowledged) in the network.
In long-delay networks require large send-windows to hold
large amount of “in flight data” without blocking sender
– DataInTransit ~ bandwidth X delay
Operating Systems limit/hardcode window size.
– TCP standard limits window to 64 KB (in practice 32KB due to
signed arithmetic)
– Required windows are much larger:
RTT 0.8, BW 1.54 Mbps requires 154 KB
New "large-window“ TCP extension (TCP-LW) allows
windows up to 232KB
– But that makes the slow-start problem bigger…
TCP
19. Details: Congestion control
TCP congestion avoidance is bad for lossy or long latency links:
– Mistakenly interprets packet loss as congestion
– Excessively long ramp-up for new connections
RED (random early detection) gateways requires each gateway to monitor its own
queue length. When imminent congestion is detected the TCP sender is notified. By
dropping a packet earlier than it would normally, RED sends an implicit notification of
congestion. The sender is effectively notified by the timeout of this packet. The
principle behind the RED approach is that a few earlier-than-usual drops may help
avoid more packet drops later on. The TCP sender can then reduce its window
before serious congestion occurs.
In TCP Vegas the TCP sender predicts when congestion is about to occur and
reduces its transmission window before intermediate routers drop packets
– TCP can keep track of the minimum round trip time seen during a transfer and use the most
recently observed round trip time to compute the data queued in the network.
– TCP can also keep track of the throughput before and after the congestion window
changes to estimate the network congestion level.
– If estimates indicate that the number of packets queued in the network is rising, it reduces
the congestion window. As it observes the number decreasing it increases the congestion
window.
Although neither approach has been widely adopted, both hold promise for satellite
networks. As we mentioned earlier, TCP congestion control responds to congestion
slowly because of latency. If such congestion can be avoided before it happens, it is
a big win for high-speed and long-delay networks.
TCP
20. TCP issue: reliability & congestion control
TCP acknowledgment is non-selective & blunt
– If a segment is lost, TCP will retransmit all data starting
from the lost segment without regard to the successful
transmission of later segments.
TCP congestion control fooled by lossy networks
– TCP considers this lost segment as an indication of
congestion and reduce its window size in half
TCP
21. TCP issue: chatty reliability protocol
TCP reliability requires constant ACKs from
receiver
– Even if all messages are received…
ACK traffic consumers power and bandwidth
ACK traffic prevents stealth operations (can
reveal position of ACKer)
Other protocols (best efforts or NACK only) may
be better suited…
TCP
22. Summary TCP protocol problems
TCP is inflexible
TCP protocol not well suited for Data Link
– Low performance
– Incorrect behavior
NASA and others have tried to spearhead
efforts to modify TCP…
– Research on “delay tolerant” networks
– Research on TCP: HACK, SACK, Trunk protocols
These efforts remain in the research domain
TCP “one size fits all” Qos not suitable for Data Link
TCP
23. Outline
UAV Communication Requirements
Why TCP-based solutions do not work
Implementing your own data-link protocol
Using middleware (DDS) for the Data-Link
Conclusions
24. Implementing your own data-link protocol
Session management
Data stream management
Buffering
Traffic Prioritization/Shaping
Fragmentation / Reassembly
Reliability
Redundant links/failover
25. General Architecture
To solve the reliability, flow control, and
disconnection issues we need:
– Data buffers at both ends
– a reliable comm. protocol sends the data from the
send buffer to the receive buffer
Sender
Application
Receiver
Application
Reliability Protocol
Send Buffer Receive Buffer
26. General Architecture (2)
To avoid head-of-line blocking we need
– Separate buffers for each traffic type
– Separate reliable data streams for each traffic type,
each should have its own separate session
Sender
Application
Receiver
ApplicationEach traffic type has its own session
Send Buffer Receive Buffer
27. Reliable Protocol
At a minimum the reliability protocol must
– Identify each message with sessionId and a
sequence number
– Send periodic HearBeats announcing which
sequence numbers should have been received
– Accept ACKs to record the messages and clear
from send buffer
– Accept NACKs for sequence numbers and send the
requested repairs
29. Company Confidential
Confirmed Reliability (TCP Style)
Some packet loss
01
02
03
04
01
02
03
04, HB
01
02
X
ACK 1-2, NACK 3
05
06
07
08
05
06
07
08, HB
06
07
08
ACK 1-8
03
04
05
X
X
Packets 04 and 05 are received but the
protocol drops them because a prior
packet 03 is missing.
This wastes valuable bandwidth
30. Reliable Protocol (II)
For performance the protocol should
– Accept received messages out of order and cache
them on the receiver buffer while the missing
messages are repaired
– Send selective NACKs (SACKs) for just the
sequence numbers that are missed
To handle large sensor data (e.g images)
– Fragment & re-assemble large messages
– Handle reliability on message fragments as well
To handle small updates
– Bundle small updates into batches
– Flush batches based on max delay or packet size
32. Company Confidential
Confirmed Reliability (Reader Cache + SACK)
Some packet loss
01
02
03
04
01
02
03
04, HB
01
02
X
04ACK 1-2, SACK 3
05
06
07
08
05
06
07
08, HB
05
06
07
08ACK 1-8
03
Packets 04 and 05 are received and
cached waiting for the repair of 03.
No bandwidth is wasted.
33. Reliable Protocol (III)
For performance on a wide variety of links the
protocol must
– Allow configuration of timers and buffer sizes
– Maintain liveliness of the link via KeepAlive
messages
– Allow sessions and buffers to survive link
disconnection
– Perform output shaping with rate limits
– Support prioritization between sessions/traffic types
– Support differential shaping for each traffic type
– …
34. Redundancy and Failover
Data-Link may deliver duplicate packets
Data might arrive from redundant transports
Failover requires multiple sources of the same
information
How does protocol identify/filter these duplicates?
– Needs VirtualSessionId identifying session
independent of data-link or source
– Reader queue must be 2-level. Second level
organized by VirtualSessionId filters-out duplicates
35. Stealth
Reliability should be tunable:
– Best-efforts mode. No ACK traffic
Sacrifices reliability
While ensures order & no duplicates
– A NACK-only limits backwards traffic
But requires smarter buffer management
– Full reliability. Both ACKs and NACKs
Ensures delivery to the receiving application
36. Example (best effort with packet loss)
01
02
03
04
01
02
03
04, HB
01
02
X
04
05
06
07
08
05
06
07
08, HB
05
06
07
08
Company Confidential
Packets 03 is permanently lost
Repair request would compromise
stealth.
Application notified of packet loss.
37. Stealth Reliability (no packet loss)
01
02
03
04
01
02
03
04, HB
01
02
03
04
05
06
07
08
05
06
07
08, HB
05
06
07
08
Stealth not compromised under
Normal operating conditions.
38. Stealth Reliability (some packet loss)
01
02
03
04
01
02
03
04, HB
01
02
X
04NACK 3
05
06
07
08
05
06
07
08, HB
05
06
07
08
03
Stealth minimally compromised
Only when some message is lost
39. Message Batching
write()
sender receiver
write()
sender
Send queue Receive queue
Send queue Receive queue
Without batching each
message is separately
sent. For small messages
protocol headers might be
bigger than payload
With batching messages
are held a little and
combined into larger
batches maximizing
throughout and minimizing
CPU
receiver
Transparent:
Receiver still
sees individual
messages
40. Reliability with Batching
Reliability must work even when messages are
batched
ACK or NACK of individual samples would
negate some of the benefits of batching…
=> Protocol must be batch aware so that it can
ACK/NACK complete batches!
B3
B2
B1
B3
B2
B1
ACK(B3), NACK(B2)
Repair B2
B3
B2
B1
write()
sender
receiver
41. Batching is hard but it pays!
RTI DDS 4.3b perftest results
0
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000
Sample size (bytes)
Throughput(Mbps)
Linux Baseline
Linux 10Kb Batch
Intel Core2Duo Single-CPU Dual-Core 2.4GHz, 4MB cache
32-bit CentOS 5 (RHEL 5), 2GB memory, Intel E1000 NIC
42. Other considerations
Resource management:
– During disconnected operation buffers might fill or
overflow…
– Solution is smart caching:
Purge by age
Filter by frequency
Keep “one of each”
– requires additional insight onto the data
– Some object identifier (e.g. track Id)
Filter by content
44. Outline
UAV Communication Requirements
Why TCP-based solutions do not work
Implementing your own data-link protocol
Using middleware (DDS) for the Data-Link
Conclusions
45. Ethernet Wireless Radio Shared Memory cPCI 1553
Using a Network Middleware
Network middleware: A
library between the
operating system and the
application
It insulates application
from the raw network
Implements reliability,
caching, …
Hardware (e.g. Radio)
Network stack (e.g. IP)
Middleware
Application
Middleware
Application Application Application Application Application
46. Which middleware to use?
Standards based
Configurable via QoS
Not based on TCP
Manages Sessions/Fragmentation/Reliability…
Failover/handover supoort
Efficient use of bandwidth
Multi-platform
Embeddable, Certifiable…
Integration with net-centric back end
47. DDS mandated for data-distribution
DISR (formerly JTA)
– DoD Information Technology
Standards Registry
US Navy Open Architecture
FCS SOSCOE
– Future Combat System –
System of System Common
Operating Environment
SPAWAR NESI
– Net-centric Enterprise Solutions
for Interoperability
– Mandates DDS for Pub-Sub SOA
48. 48
European Air
Traffic Control
RETF (USA)
Train Communications
Tokyo Japan
Traffic Control
Boeing Army Future
Combat System
Boeing AWACS
program
US Navy, DD(X)
LCS, LPD-17
SeaSlice
and 13 other Navies
DDS Adoption
49. Insitu Unmanned Air Vehicle
“…we have seen a 30% increase in productivity based on not
having to handle data communication issues.” Gary Viviani,
VP of Engineering
Insitu is a recognized leader in the
exploding UAV space
The next generation of UAV’s
including the Scan Eagle and newer
platforms
Challenge is to have a successful
UAV mission which requires
impressive autonomy and reliable
ground control
DDS enables an information flow
that is much more orchestrated
and flexible allowing seamless
switch control between multiple
ground stations while
connecting reliably over
unreliable links
51. CLIP Mediator Bridge
Transportation
• Common Link Integration Processing (CLIP): U.S.
Air Force and Navy joint project to build Tactical
Data Link (TDL) aggregator
• Enables information exchange between platforms
with incompatible tactical data links
• Challenge: existing system had poor integration
with platform mission systems
• With Northrop Grumman, RTI helped architect,
design, develop & test mediator bridge between
platform systems and CLIP
– RTI Services built a ‘mediator’ bridge
between Air Force, Navy, NGC, B1, B52
– First NESI DDS Compliant Product
Defense
“Working with RTI has been
both effective and productive.”
– Jim Miller, CLIP Program Manager
55. AWACS Radar System Upgrade
Airborne control system for
surveillance, command & control and
battle management
Upgrading system to be open,
supportable, less expensive to
maintain and extend
DDS is standards-based, open and
extensible, reducing integration risk
DDS is a proven COTS solution,
reducing total cost of ownership over
in-house development
56. CAE SimXXI Flight Simulation
State-of-the-art full-flight simulator
from CAE
Challenge is communication between
subsystems (over IEEE 1394) with
low-latency data transfer
DDS chosen because it excels in real-
time performance and is simple to use
and integrate
58. Next-generation of the U.S. Navy
Aegis Weapon System
Challenge to share time-critical data
across highly distributed system
including radar, weapons, displays
and controls
Need to maximize future scalability
and flexibility
DDS provides real-time
communication infrastructure.
Standards-based & extensible for
future system enhancements
Lockheed Martin US Navy Aegis Open
Architecture Weapon System
60. Sample EU project using DDS
ESO Extremely Large Telescope (E-ELT)
– 43m diameter (see vehicles on picture!)
– 30.000 sensors send data on the bus
– RTI DDS used as middleware for critical data
communication and integration
INDRA i-TEC e-FDP ATM program
– European leader in Air Traffic Mgmt
applications
– ATM integration for UK, Spain and Germany
– RTI Used as integration solution for Flight Data
Management and Distribution
EADS Euro Hawk UAV program
– EADS selected RTI for European UAV program
– RTI is used as embedded middleware in UAV
versatile payload
61. Sample EU project using DDS
PLATH (Hamburg, Germany)
– Radio signal analysis experts
– Has decided to use RTI on a large scale for key
middleware services
Volkswagen R&D
– After thorough evaluation VW has selected RTI
as a middleware for their next generation
vehicular R&D platform,
– AUTOSAR, ECS, ECU context.
MBDA France & UK
– They have been using RTI for 2 years
– Vertical launch missile program « MOUV »
62. Sample EU project using DDS
BASE 10 RoboScout Technology Reference
System (TRS)
– BTSE is a German project focused company
specialized in the defense market within NATO. They
are experts in robotics integrating systems
engineering, system qualification, manufacturing and
long term support.
– Base 10 has been working with RTI for 1 year
– We delivered Quick-Start training and an architecture
study on how to implement RTI on the vehicular
platform data flows
– There are 5 different subsystems on the data bus and
communication link between RCC and RoboScout RS
– RTI is now implemented in the RCC (bottom left
picture)
– Next release of RoboScout will implement RTI in
vehicular platform and outside services (radio and
satellite data-links).
64. Dissecting Messaging Technologies
The alternatives:
Standards based:
– Web-Service/SOAP Based (WS-Eventing, WS-Notification…)
– JMS
– CORBA
– Real-Time Data-Distribution Service
Vendor-proprietary:
– ESBs
– IBM WebSphere MQ,
– TIBCO,
– 29West,
– Gigaspaces
Custom build
Architecture
Quality of Service
Performance
& Scalability
65. Best-of breed RT-Messaging: DDS
Data Distribution Service (DDS)
– High performance real-time data distribution
– Object Management Group (OMG)
DDS Standard API (v1.2)
– Specifies user-visible API
– Ensures application portability
– Adopted in June 2003, revised June 2005,2006
DDS Standard Wire Protocol (v 2.1)
– Real-Time Publish-Subscribe (RTPS)
– Ensures application interoperability
– Adopted in June 2006, revised July 2007, 2008
Real-time
Publish-Subscribe
(RTPS) Wire Protocol
DDS
Middleware
Data Distribution
Service API
Standards-based services for
application developers
Standard protocol for
interoperability
67. Message Quality of Service
Avoid a single source from
overwhelming the network. Prevent
large low-urgency data (e.g., file
downloads) from compromising the
performance of critical data (e.g.,
alarms and critical news updates).
Provide dedicated bandwidth to the most
critical data.
Control how much load and
bandwidth a particular
sender can inject into the
network. Control the peak
load, average load, and size
of a burst.
Flow
Control
Prioritize real-time flows like live audio
over traffic that may be buffered
(e.g., video replay).
Prioritize critical control information (e.g.,
live radar tracks) over non-time
critical information such as aircraft
schedule changes.
Specify the relative importance of
different messages and the
maximum acceptable delay
between the time the
message is sent and the time
it’s delivered to the reader(s).
Latency
Budget
Send live voice or video data. Send
sensor data (e.g., radar tracks),
traffic readings, CPU/network
statistics and readings.
Let the application decide
whether messages should be
confirmed and retried when
missed, or else sent as best
efforts.
Reliability
Example Use CasesPurposeQoS
68. Message Quality of Service
Avoid a single source from
overwhelming the network. Prevent
large low-urgency data (e.g., file
downloads) from compromising the
performance of critical data (e.g.,
alarms and critical news updates).
Provide dedicated bandwidth to the most
critical data.
Control how much load and
bandwidth a particular
sender can inject into the
network. Control the peak
load, average load, and size
of a burst.
Flow
Control
Prioritize real-time flows like live audio
over traffic that may be buffered
(e.g., video replay).
Prioritize critical control information (e.g.,
live radar tracks) over non-time
critical information such as aircraft
schedule changes.
Specify the relative importance of
different messages and the
maximum acceptable delay
between the time the
message is sent and the time
it’s delivered to the reader(s).
Latency
Budget
Send live voice or video data. Send
sensor data (e.g., radar tracks),
traffic readings, CPU/network
statistics and readings.
Let the application decide
whether messages should be
confirmed and retried when
missed, or else sent as best
efforts.
Reliability
Example Use CasesPurposeQoS
DDS JMS* (partial)
DDS
DDS
WS-* (partial)Proprietary
Proprietary
69. Message Quality of Service (Cont.)
Allow exploiting the differential service
capabilities of the network
infrastructure
Configure the network infrastructure to
prioritize messages ahead of others.
Controls the traffic class used for
the underlying network
transport.
Takes advantage of network
multicast infrastructure
Transport
Priority
Multicast
Prevent a rapidly changing source from
using a lot of resources and starving
other less-active sources.
Some applications may only be
interested in the last 100 events for
each server regardless of the time
interval when they occurred.
Control how many related
messages (e.g., successive
updates to a stock value or
successive readings of a
sensor) must be maintained
by the middleware and
delivered to readers.
History
Prevent data that loses value with age
(e.g., old stock values, old news, old
sensor readings) from using
valuable system resources, while
ensuring that needed historic
information is kept (e.g., transaction
records).
Control how long the data must
be kept by the middleware to
be delivered to readers.
Old data may be of little value
delivering it wastes
bandwidth and gets in the
way of the more recent data.
Lifespan
Example Use CasesPurposeQoS
70. Message Quality of Service (Cont.)
Allow exploiting the differential service
capabilities of the network
infrastructure
Configure the network infrastructure to
prioritize messages ahead of others.
Controls the traffic class used for
the underlying network
transport.
Takes advantage of network
multicast infrastructure
Transport
Priority
Multicast
Prevent a rapidly changing source from
using a lot of resources and starving
other less-active sources.
Some applications may only be
interested in the last 100 events for
each server regardless of the time
interval when they occurred.
Control how many related
messages (e.g., successive
updates to a stock value or
successive readings of a
sensor) must be maintained
by the middleware and
delivered to readers.
History
Prevent data that loses value with age
(e.g., old stock values, old news, old
sensor readings) from using
valuable system resources, while
ensuring that needed historic
information is kept (e.g., transaction
records).
Control how long the data must
be kept by the middleware to
be delivered to readers.
Old data may be of little value
delivering it wastes
bandwidth and gets in the
way of the more recent data.
Lifespan
Example Use CasesPurposeQoS
DDS JMS
DDS
DDS
Proprietary
71. Message Quality of Service (Cont.)
Allow consumers with slow CPU or
network (e.g. wireless)
Filter data at the source or in the
infrastructure. Avoid wasting CPU
and bandwidth delivering data that
is not of interest
Monitor aircraft in your airspace, alarms
in the immediate vicinity, stocks that
cross a threshold or in the industries
of interest…
Provide an application only the
data it needs
Filter messages based on
content as requested by the
consuming application
Content
Filtering
Prevent data that loses when application
crash
Allow short-living applications (e.g. cgi
scripts) to generate messages that
are received reliable even by
applications that join the network
later
Externalize message history so
that they survive beyond the
life of the application that
generates them
Deliver messages reliably in the
presence of application
failure and re-starts.
Persisten
ce
Example Use CasesPurposeQoS
72. Message Quality of Service (Cont.)
Allow consumers with slow CPU or
network (e.g. wireless)
Filter data at the source or in the
infrastructure. Avoid wasting CPU
and bandwidth delivering data that
is not of interest
Monitor aircraft in your airspace, alarms
in the immediate vicinity, stocks that
cross a threshold or in the industries
of interest…
Provide an application only the
data it needs
Filter messages based on
content as requested by the
consuming application
Content
Filtering
Prevent data that loses when application
crash
Allow short-living applications (e.g. cgi
scripts) to generate messages that
are received reliable even by
applications that join the network
later
Externalize message history so
that they survive beyond the
life of the application that
generates them
Deliver messages reliably in the
presence of application
failure and re-starts.
Persisten
ce
Example Use CasesPurposeQoS
DDS JMS
DDS
Proprietary
Proprietary
73. Non-real-time Soft real-time Hard real-time Extreme real-time
Java/RMIJava/JMS
CORBA
MPI
Java RTSJ (soft RT) RTSJ (hard RT)
Web Services
MessagingTechnologiesandStandardsMessagingTechnologiesandStandards
Data Distribution Service / DDS
RT CORBA
Adapted from NSWC-DD OA Documentation
Data Distribution Service spans a
very wide spectrum of application needs
74. Top reasons to use DDS
Flexibility and Power of the data-centric model
Performance & Scalability
Rich set of built-in services
Interoperability across platforms and Languages
Provides/integrates Pub-Sub into SOA
75. #1 DDS Data-Centric Model
Data WriterData Writer
Data WriterData Writer
Data ReaderData Reader
Data Reader
Data Reader
Data Writer
“Global Data Space” generalizes Subject-Based Addressing
– Data objects addressed by DomainId, Topic and Key
– Domains provide a level of isolation
– Topic groups homogeneous subjects (same data-type & meaning)
– Key is a generalization of subject
Key can be any set of fields, not limited to a “x.y.z …” formatted string
76. #1 DDS Data-Centric Model
Data WriterData Writer
Data WriterData Writer
Data ReaderData Reader
Data Reader
Data Reader
Data Writer
Data Object
“Global Data Space” generalizes Subject-Based Addressing
– Data objects addressed by DomainId, Topic and Key
– Domains provide a level of isolation
– Topic groups homogeneous subjects (same data-type & meaning)
– Key is a generalization of subject
Key can be any set of fields, not limited to a “x.y.z …” formatted string
77. #1 DDS Data-Centric Model
Data WriterData Writer
Data WriterData Writer
Data ReaderData Reader
Data Reader
Data Reader
Data Writer
Topic
“Global Data Space” generalizes Subject-Based Addressing
– Data objects addressed by DomainId, Topic and Key
– Domains provide a level of isolation
– Topic groups homogeneous subjects (same data-type & meaning)
– Key is a generalization of subject
Key can be any set of fields, not limited to a “x.y.z …” formatted string
78. #1 DDS Data-Centric Model
Data WriterData Writer
Data WriterData Writer
Data ReaderData Reader
Data Reader
Data Reader
Data Writer
Key (subject)
“Global Data Space” generalizes Subject-Based Addressing
– Data objects addressed by DomainId, Topic and Key
– Domains provide a level of isolation
– Topic groups homogeneous subjects (same data-type & meaning)
– Key is a generalization of subject
Key can be any set of fields, not limited to a “x.y.z …” formatted string
79. Company Confidential
Topic: “Market Data”
Subject Filter (for a Reader)
Field
Value
Symbol Type Exchange
Payload
* * NYSE *
Subject Filter (for a Reader)
SourceField
Value
Symbol Type Exchange Payload
REUTERS * EQ NYSE Volume > x, Ask < y
Payload Filter (for a Reader)
Topic: “Order Entry”
Topic: “Market Data”
Subscriptions: By Topic, Subject, Content
Symbol OrderKind Stop Limit
SourceField
Value
Symbol Type Exchange
Payload
* * * * *
Volume Bid Ask …
OrderNumber …
80. DDS Demo: Concepts
Topics
– Square, Circle, Triangle
– Attributes
Data types (schemas)
– Shape (color, x, y, size)
Color is instance Key
– Attributes
Shape & color used for key
QoS
– Deadline, Liveliness
– Reliability, Durability
– History, Partition
– OwnershipControl Area:
Allows selection of objects and QoS
Display Area:
Shows state of objects
Start demo
81. QoS: Quality of Service
TRANSPORT PRIORITYCONTENT FILTERS
PRESENTATIONLIFESPAN
DESTINATION ORDERENTITY FACTORY
LATENCY BUDGETDEADLINE
LIVELINESSTIME BASED FILTER
OWNERSHIP STRENGTHRELIABILITY
OWNERSHIPRESOURCE LIMITS
PARTITIONWRITER DATA LIFECYCLE
GROUP DATAREADER DATA LIFECYCLE
TOPIC DATAHISTORY (per subject)
USER DATADURABILITY
QoS PolicyQoS Policy
82. Tunable Reliability Protocol
Configurable AckNack reply
times to eliminate storms
Fully configurable to bound
latency and overhead
– Heartbeats, delays, buffer
sizes
Consumer /
Reader
Producer /
Writer
Reliable
•Guaranteed
Ordered Delivery
•“Best effort” also
supportedS7
S5
S6
S4
S3
S2
S7 S6 S5 S4 S3 S2 S1
S1
S7
S5
S6
S4
S3
S2
S1
Performance can be tracked
by senders and recipients
– Configurable high/low
watermark, Buffer full
Flexible handling of slow
recipients
– Dynamically remove slow
receivers
83. High-Throughput via Aggregation
Increases throughput by aggregating smaller
messages into larger network packets
User tunable
– # packets to aggregate for delivery
– Aggregate packet size
– Max elapsed time before data is sent
– Manual flush at any time
write()
Full or timeout
84. Demo: Quality of Service (QoS)
Topics
– Square, Circle, Triangle
– Attributes
Data types (schemas)
– Shape (color, x, y, size)
Color is instance Key
– Attributes
Shape & color used for key
QoS
– Deadline, Liveliness
– Reliability, Durability
– History, Partition
– Ownership
RTI DDS delivers
Writers and readers state
Their needs
Start demo
85. #2 Performance & Scalability
DDS was designed to support high performance
RTI DDS was developed to maximize performance and minimize
jitter
Advanced techniques employed:
– Pre-allocation of memory
Never allocate/free memory in the critical path
– Use dedicated threads per receive port
Minimize thread switching
Avoid expensing operating system calls (e.g. select())
– Maximize concurrency
Carefully design critical sections
Patented concurrent mutex-free thread-safe data structures
– Employ high-performance data-access APIs
Read data by array (no additional copies)
Scatter/gather APIs to access transport.
Buffer loaning for zero copy access
86. Latency – (Linear Scale)
DDS/GSOAP/JMS/Notification Service Comparison - Latency
0
500
1000
1500
2000
2500
4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384
Message Size (bytes)
DDS JMS Notification Service
Message Length (samples)
Adapted from Vanderbilt presentation at July 2006 OMG Workshop on RT Systems
88. Performance: Looking under the hood…
Increases performance for large messages, resulting in
higher throughput and lower latency. Reduces
CPU consumption on both sender and receiver.
Makes performance scale better with message size.
Operating system network-stack
technology that allows an
application to put and get data from
the network buffers “by reference,”
without performing extra copy
operations.
Zero Copy
Decouples sender and receiver, providing more
predictable performance for the writer and
reducing latency jitter.
Allows multiple write operations to be performed
concurrently over multiple channels, batched, or
optimized in other ways.
Middleware technology that allows a write
operation to be processed by a
separate thread and not block the
application thread that performed
the write.
Asynchronous Writes
Enables multicast use for larger (greater than 64KB)
messages.
Prevents “Head of Line” blocking where a high-priority
message is queued behind a large message.
Reduces jitter. Provides better performance in less
reliable networks (wireless / WANs).
Middleware technology that breaks a large
message into smaller units, delivers
them separately, and then
reassembles them prior to
deliverance to the application.
Message
Fragmentation
Greatly increases the throughput for small messages.
Reduces bandwidth and processor utilization for small
messages.
Middleware technology that combines
multiple messages into a single unit.
Message
Batching
Provides the most efficient way to send messages to
multiple receivers.
Reduces bandwidth, reduces overhead on the sender,
and minimizes latency and jitter.
Internet technology that allows a single
UDP message to be delivered to
many receivers.
Multicast
Why It MattersDescriptionPerformance &
Scalability
Technique
89. Company Confidential
Performance:
RTI DDS Low Latency and Jitter
0
50
100
150
200
250
300
350
400
32 64 128 256 512 1024 2048 4096 8192
Maximum
99.99%
99%
Median
Minimum
Reliable, ordered delivery over
Gigabit Ethernet between 2.0 GHz Opteron processors
running 32-bit Red Hat Enterprise Linux 4.0
Message/Data Size (bytes)
Latency(microseconds)
Latency and Jitter on Unloaded Network without Message Batching
91. 563,498 556,896
535,883
365,760
0
100,000
200,000
300,000
400,000
500,000
600,000
1 subscriber 20 subscribers
(1 per CPU and NIC)
40 subscribers
(1 per core, 2 per NIC)
72 subscribers
(1 per core, 2-8 per NIC)
MessagesperSecond
Scalability:
RTI DDS Reliable Multicast Performance
200 Byte messages
GBit Ethernet
Single publishing thread
All data subscribed
No message loss –
throttled to slowest
subscriber
CentOS 5, 32-bit
CPUs
– 2.4 GHz Intel Core 2 Duo
E6600
– 2.4 GHz Intel Core 2 Quad
Q6600
– 2.33 GHz Intel Xeon
E5345
– 2.4 GHz AMD Opteron
8216
NICs
– Intel PRO/1000
– Broadcom NetXtreme II
Throughput with batching
92. 0
100
200
300
400
500
600
700
800
900
1,000
32 64 128 256 512 1024 2048 4096 8192 16384
Message Size (bytes)
MegabitsperSecond
Native C++
.NET (C#)
Java
Performance:
RTI DDS High Performance across all Languages
Windows XP Pro SP2
32-bit
Reliable multicast
Gigabit Ethernet
2.4 GHz Intel Core 2
Quad Q6600
Single Intel PRO/1000
NIC
Four producer and
consumer threads
Throughput: Megabits per Second with batching
93. #3 Powerful Services & Tools
– High-Availability
– Persistent Data
– Recording service
– Relational Database bridge
– Development & Monitoring Tools
94. DDS High Availability via Redundancy
Owner determined per subject
Only extant writer with highest strength can publish a subject (or
topic for non-keyed topics)
Automatic failover when highest strength writer:
– Loses liveliness
– Misses a deadline
– Stops writing the subject
Shared Ownership allows any writer to update the subject
Producer / Writer
strength=10
Topic T1
I1 I2
Producer / Writer
strength=5
Producer / Writer
strength=1
I1 Primary
I1 Backup
I2 Primary
I2 Backup
95. DDS Data Persistence
A standalone service that persists data outside of
the context of a DataWriter
Data
Writer
Global
Data Space
Data
Reader
Persistence
Service
Persistence
Service
Data
Reader
Data
Writer
Permanent
Storage
Permanent
Storage
Can be configured for:
• Redundancy
• Load balancing
Demo:
1. PersistenceService
2. ShapesDemo
3. Application failure
4. Application (ShapesDemo) re-start
5. Persistence Svc failure
6. Application re-start
Cleanup database
96. DDS Real-Time Recording Service
Applications:
– Future analysis and
debugging
– Post-mortem
– Compliance checking
– Replay for testing and
simulation purposes
Record high-rate data
arriving in real-time
Non-intrusive – multicast
reception
Demo:
1. Start RecorderService
2. Start ShapesDemo
3. See output files
4. Convert to: HTML XML CSV
5. View Data: HTML XML CSV
98. DDS Enables Event Processing
CEP: programmable engines used to transform “data” into “information”
CEP engines are programmed using a derivative of SQL
CEP engines save time: They can implement a lot of the application logic:
– Classification, Correlation, Aggregation, Filter, Cleansing, Pattern Detection, etc.
DDS is the perfect ‘data’ and ‘information’ pipe for CEP engines
– Use high-speed data streams (1,000-1,000,000 msg/sec)
– Require latency measured in sub-milliseconds
– Demand access to events from a heterogeneous systems
CEP Engine
Dashboards
Applications
Alerts
RTI Global
Data Space
Market Data
Trades
Low Latency Messages
99. Tools provide insight into a distributed
system
RTI Analyzer
– Understand connections
and data flow
– Tune QoS properties
without changing code
RTI Scope
– Capture and monitor packet
payloads
– Collect time histories of
Topic values
RTI Protocol Analyzer
– Sniff the wire and analyze
traffic
100. #4 Interoperability between platforms & languages
Data accessible to all interested applications:
– Data distribution (publishers and subscribers): DDS
– Data management (storage, retrieval, queries): SQL
– ESB Integration, Business process integration: WSDL
– Legacy Java Integration: JMS
DBMS
DBMSDBMS
Global Data Space
Distributed
Node
Distributed
Node
Distributed
Node
Distributed
Node
Distributed
Node
SQL JMS
DDS SQL
DDSWSDL
D T
101. DDS: Multi- Architecture Support
• Same API for all platforms
• Language Independence: C, C++, Java™, C#, .NET, ADA
• Enterprise and Embedded Support
VxWorks®, INTEGRITY®, LynxOS®
Linux, Solaris, Windows
• Prototype on any platform
Linux
RTI DDS
Windows
RTI DDS
Integrity
RTI DDS
VxWorks
RTI DDS
103. #5 Provides Real-Time Pub-Sub in SOA
Real-Time
Devices Fault
Tolerance
Auditing &
Recording
Tools &
Visualization
Database
Event
Processing
Real-Time Pub-Sub/Caching/Messaging
SOA &
Real-Time
Web Services
WS-DDS
104. Real-Time SOA Architecture/Implementation
RT Architecture/Technology
High Performance
Event-Driven/Publish-Subscribe
Small footprint
Quality of Service
Support for embedded
environments
Support for unreliable & low-
bandwidth networks
Traditional Enterprise
Low Performance
Client-Server
Centralized (Server-based)
TCP based
DDS Data Bus
105. Conclusions
Implementing your own Data-Link Protocol is HARD
The simplest, most flexible solution is to use middleware to handle the reliability,
caching, failover…
Middleware must have special features to support specialized needs of Data Link:
Robust to packet loss, disconnects, good use of bandwidth, etc.
DDS the best choice today
– Is a mature international Standard from OMG
Platform Neutral: Operating systems and Programming Languages
Deployed worldwide in Military systems and other Demanding real-time applications
– It is mandated by US DoD for Publish-Subscribe and data-distribution applications
– It is ideally suited to UAVs
Highly Tunable via Quality of Service (QoS)
Flexible reliability model overcomes TCP problems
Can accommodate unreliable & high-latency transports
Uses bandwidth Efficiently
Rich services (persistence, filtering, high-availability)