In this talk Salvatore Sanfilippo will talk about the internals of his distributed queue Disque. The talk will cover the general architecture of the system, the delivery guarantees provided, its data replication, and finally the best-effort protocols employed in order to avoid useless multiple deliveries of the same messages.
17. Disque & CAP
• AP.
• Immutable messages (mostly).
• Converge to ACK state.
• CAP “A” availability (single node partition).
18. At least once delivery
• Liveness: eventually the message will be
delivered.
• Safety: messages not yet delivered at least
one time will never be evicted from the
cluster.
• (But if message TTL is reached).
19. At most once delivery
• Safety: messages already dequeued will never be
queued a second time.
• An immediate result of replicating to just one
node, enqueue just one time (retry time set to
zero).
22. NACK and retries counters
• Alternative for explicit dead letters.
• Counters consistency is best effort.
• (but it does not matters).
• GETJOB exposes the two counters.
24. WHY?
• Costly: think at spikes after partitions or at
CP stores to de-dup.
• No de-dup, nor idempotency, in certain
uses, if duplication rate is acceptable.
• Not so hard: worth it.
31. QUEUED
ACTIVE
QUEUED MESSAGE on ACTIVE -> QUEUED state change
ACKED
QUEUED
Reset retry timer
QUEUED
Dequeue if ID1 > ID2
QUEUED
SETACK
32. KNOWN SOURCE
ANY OTHER NODE
NEEDJOBS
YOURJOBS
Exponential delay + Broadcast & ad-hoc
NEEDJOBS
33. NEEDJOBS triggers
• Clients blocked with GETJOBS
(and queues are empty)
• Queue drops to zero messages
(and import rate > 0)
34. Message owners
Each node has,
for each message,
a list* of owners
* a possibly inconsistent list
35. Ehm… some C code.
/* Job representation in memory. */
typedef struct job {
char id[JOB_ID_LEN]; /* Job ID. */
unsigned int state:4; /* Job state: one of JOB_STATE_* states. */
unsigned int gc_retry:4;/* GC attempts counter, for exponential delay. */
uint8_t flags; /* Job flags. */
uint16_t repl; /* Replication factor. */
uint32_t etime; /* Job expire time. */
uint64_t ctime; /* Job creation time in ms+counter. */
uint32_t delay; /* Delay before to queue this job for 1st time. */
uint32_t retry; /* Job re-queue time. */
uint16_t num_nacks; /* Number of NACKs this node observed. */
uint16_t num_deliv; /* Number of deliveries this node observed. */
Immutable, converging, inconsistent
36. Ehm… some C code.
robj *queue; /* Job queue name. */
sds body; /* Body, or NULL if job is just an ACK. */
dict *nodes_delivered; /* Nodes that may have a copy. */
dict *nodes_confirmed; /* Nodes that confirmed copy or ack.
mstime_t qtime; /* Next queue time */
mstime_t awakeme; /* Time at which we need to take actions. */
} job;