Publicité
Publicité

Contenu connexe

Publicité

Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems

  1. Scalability Dealing with large systems Lex Heerink, PhD software architect Research & Development | ZettaScale Technology
  2. Goal What • Give insight in some mechanisms that impact scalability • Provide some options to deal with scalability How • Brief intro into CycloneDDS and scalability • Nose dip in discovery aspects • Things you could do to address scalability • Example: hub-and-spoke architecture From the specs The need to scale to hundreds or thousands of publishers and subscribers in a robust manner is also an important requirement. [OMG DDS spec 1.4] “Another important requirement is the need to scale to hundreds or thousands of subscribers in a robust fault-tolerant manner”. [OMG RTPS spec 2.2]
  3. About CycloneDDS DDS is a standards-based technology for ubiquitous, interoperable, platform independent and real-time data sharing across network connected devices Characteristics: publish/subscribe technology, data centric, fault tolerant, no single point of failure, reliable Key concepts: participants, topics, endpoints, partitions Applied in systems with above average availability and reliability demands. Mandated in aerospace and defense. CycloneDDS is an open source and freely available DDS implementation: https://cyclonedds.io/ Data space topicA topicB qos qos DDS concepts W R R W R
  4. Structure of the data (“topics”) can be modelled in IDL, and properties can be assigned to data that specify how the data space should treat the data Data spaces can be partitioned in independent data planes. Each plane can get hold on the data for different purposes, and data can be created/modified/updated/disposed in each of these planes independently of the other planes. R W R W topic R topic Data space partitioning W W black partition red partition @appendable struct car { @key string license_plate; /* key of the topic */ brandtype brand; /* non-key field */ int color; }; System partitioning
  5. Scalability in the context of DDS is about the behaviour of the system when increasing the number of participants, the number of topics, the number of readers/writers, etc. Obviously, the way you model you data, the data rate at which you publish the data, and the distribution of the publishers and subscribers may impact scalability. We’ll assume that you made smart choices in data modelling, and primarily focus on the discovery aspects related to publishers and subscribers. Dealing with large systems About scalability
  6. Discovery in DDS DDSI-RTPS is the wire protocol used by DDS. It is designed to run over multicast and best-effort connectionless transports such as UDP/IP. Used for discovery of remote participants, readers and writers, so that data can delivered • SPDP: participant discovery, periodic, best-effort • SEDP: endpoint discovery, transient-local, reliable participant endpoint writer participant endpoint reader writer reader participant discovery endpoint discovery
  7. Participants periodically announce their presence (best- effort, multicast (default)). If a participant discovers a new participant, then it responds by sending by sending a unicast message back. A participant that discovers N new participants therefore receives and processes N replies. The messages carry locator information of the builtin readers/writers for a participant. These are needed to kickstart endpoint discovery. → Responses of participant discovery scales quadratically with the number of participants Participant discovery participant Participant discovery
  8. Endpoint discovery participant A Data space topicA topicB W participant B participant C R R R W participant D R R participant E R R Builtin endpoints exchange relevant info about topics, qos, readers and writers - published as reliable, transient-local data Writers can request matching readers to send acknack back to the writer. - published as directed, UDP messages - single writer vs. many readers lead to large fan- in of acknacks - large fan-in may lead to high processing load for the writer Endpoint discovery scales quadratically with the number of matching endpoints acknack
  9. Acknacks An acknack is sent by a reader to a matching writer, so that the writer can determine if the reader has received all data A writer can request an acknack from a reader. Reasons for a writer to require an acknack are: - resource management - to fullfill the durability property - flow control CycloneDDS uses smart policies to decide when to request an acknack (e.g., an adaptive policy to request more often when writer history cache reaches threshold). writer reader data acknack #2 1 2 3 2 3 Acknack example 1 2
  10. Dealing with scaling Scalability is affected by characteristics of the application, and characteristics induced by DDS. We focus on things that you can that reduce the scalability effects of DDS , in particular on the quadratic scaling in discovery. Solution ingredients: separation in time and/or space • Controlled start up of participants and endpoints • Delayed acknacks • System partitioning Separation in time and space
  11. Controlled startup Instead of starting all particpants and endpoints at the same time, start smaller batches at different times. This spreads out the amount of discovery data over time. Controlled startup can be part of the startup procedure of a system ….. but it does not work disconnect/reconnects. When a reconnect occurs, discovery takes place immediately, which means you get the quadratic participant discovery and endpoint discovery problem back. Immediate startup Data space
  12. Controlled startup Controlled startup Data space Instead of starting all particpants and endpoints at the same time, start smaller batches at different times. This spreads out the amount of discovery data over time. Controlled startup can be part of the startup procedure of a system ….. but it does not work disconnect/reconnects. When a reconnect occurs, discovery takes place immediately, which means you get the quadratic participant discovery and endpoint discovery problem back.
  13. Delayed acknacks Delaying sending acknacks may lead to spreading out discovery over time Default ack delay setting (whenever) ack delay = 7ms Experiment setup • 1 base station, 150 satellite stations • Base station has 100 writers and 100 readers • Satellite stations have 100 writers and 200 readers • Each statellite station receives transient local data published by base station • Each satellite station publishes transient local data using 100 writers (total: 20000 instances). • Each satellite station receives data publishes by base station and data from other satellite stations Experiment 1 base station 150 satellites Base station to satellite; 100 topics, transient local data Inter-satellite communication; 100 topics, transient local data, 20000 instances base satelllite
  14. Delayed acknacks (continued) Pictures shows measurements with the same experiment, where the delay to send acknacks is 7ms (randomized and bounded). Time (x-axis, sec) vs. CPU usage (y-axis, %) of CycloneDDS per thread Default ack delay setting (whenever) Base station ack delay = 7ms Delaying acks and spreading them out over time reduces load and decreased experiment duration (from 200s to 180s) Base station Default ack delay (sent whenever) 30 s 50 s Less busy sending acknacks Less busy to process discovery data Threads main – main thread of CycloneDDS, creates/deletes entities and waits and checks dq.builtins – CycloneDDS thread to process discovery data; does the matching of readers and writers tev – timed event thread that handles asynchronous events such as retransmitting acknacks, sending heartbeats, and sending discovery messages recv – receive data and handing it off to e.g., dq.builtins
  15. System partitioning Reduce the number of matching endpoints by preventing matching alltogether. Mechanism to partition • domainId – isolates domains, including all their participants and endpoints • Partitions - isolates readers and writers within the same domain by creating separate shared data spaces • IgnoredPartitions – Prevent sending data that matches the IgnoredPartition expression to remote participants todo System partitioning
  16. Scaling: from flat to hub-and-spoke Use the experimental setup discussed before. Compare flat network with hub- and-spoke Experiment setup • 1 base station, 150 satellite stations • Base station has 100 writers and 100 readers • Satellite stations have 100 writers and 200 readers • Each statellite station receives transient local data published by base station • Each satellite station publishes transient local data using 100 writers to each other satellite stations (total: 20000 instances). • Each satellite station receives data publishes by base station and data from other satellite stations Apply architectural changes and partitioning techniques to prevent quadratic scalability problems Recipe to scale 1. Prevent inter-satellite participant discovery 2. Apply partitioning techniques to prevent inter- satellite communication 3. Use forwarder to realize satellite-to-satellite communication Flat architecture Hub-and-spoke
  17. Step 1: Prevent participant discovery Prevent inter-satellite discovery base station Client config <CycloneDDS> <Domain> <General> <Interfaces> <NetworkInterfaceAddress>127.0.0.1</NetworkInterfaceAddress> </Interfaces> <AllowMulticast>asm</AllowMulticast> </General> <Discovery> <DefaultMulticastAddress>239.255.0.1</DefaultMulticastAddress> </Discovery> </Domain> </CycloneDDS> Unicast SPDP Configure satellite stations to always send directed SPDP messages instead of multicast SPDP messages. This prevents that satellite stations know about each other. Consequently, this also prevents that endpoints on satellite nodes know about each other. satellite station satellite station Make sure that atellite stations NEVER multicasts SPDP messages unicast SPDP multicast SPDP
  18. Step 1: Prevent participant discovery Prevent inter-satellite discovery base station Configure satellite nodes to always send directed SPDP messages instead of multicast SPDP messages. This prevents that satellite nodes know about each other. Consequently, this also prevents that endpoints on satellite nodes know about each other. A writer on a satellite node now matches with only 1 reader on the base station instead of 150 readers on all satellite stations. satellite station satellite station Satellite stations NEVER multicasts SPDP messages R W R W R W R W R no communication
  19. Step 2: Use partitions Use satellite specific partitions Use partitions to prevent inter satellite communication. A satellite writer publishes data on a satellite-specific partition, and the base station publishes on global partition. Satellites subscribe to the base station. This limits endpoint discovery. Base station is required to forward data published by satellite station to other satellites R W R W R W R W R participant specific “yellow”partition participant specific “brown”partition global“red”partition
  20. Step 3: Forwarding data Use forwarder that subscribes to satellite-specific data and republishes the data on the global partition. Data published by satellite stations is now forwarded to other satellites. Because the data is republished on the global partition and there are multiple interested recipients for the data, this data (that goes from base stations to satellite stations) will be multicasted. Use satellite specific partitions R W R W R W R W R participant specific “yellow”partition participant specific “brown”partition global“red”partition forwarder
  21. Step 3: Forwarding data (continued) The forwarder is a component that receives data on one partition, and republishes the data in another partition. Forwarder can be build as an application component, or using Zenoh routers. Zenoh is a scalable technology that is specialized in bringing data efficiently at the right location at the right time. Zenoh integrates well with CycloneDDS. Zenoh routing forwarder R R R R W ….. More info on Zenoh: see https://zenoh.io/
  22. flat network 200 sec 40 sec 80 sec 25 sec ~100 satellites ~150 satellites Flat network architecture Hub-and-spoke architecture
  23. Summary of the results The following conclusions can be drawn from the experiments 1. The hub-and-spoke architecture takes less time to complete 2. Threads related to sending and processing of discovery data are less busy in a hub-and-spoke architecture. Experiment environment . Experiments conducted on a 40 core Intel ® Xeon CPU E5-2690 v2 @ 3.00 GHz, 2 threads per core Traces of overload situations have been seen for large number of satellite stations in flat network architectures. Overload situations may reduce the rate at which threads can actually make progress handling data, because data has to be retransmitted more often.
  24. Thank you for your attention Background material CycloneDDS: https://cyclonedds.io/ Cyclone configuration guide: https://cyclonedds.io/docs/cyclonedds/latest/config/index .html Zenoh: https://zenoh.io/ DDS: https://www.omg.org/spec/DDS/1.4/PDF DDSI-RTPS: https://www.omg.org/spec/DDSI- RTPS/2.2/PDF/ Concluding remarks Hub-and-spoke can be used to reduce scalability in situation where endpoint discovery becomes a limiting factor. A single hub is also a single point of failure. To keep fault tolerance levels up, you may need redundant hubs.
Publicité