This document discusses DraftKings' journey to build a scalable player engagement platform using jackpots. It outlines challenges around latency, scale, and transaction processing. It evaluates message queue options like Kafka, RabbitMQ and SQS, and decides to use Kafka with Redis to manage retries and backpressure. Messages are polled from Kafka and inserted into Redis sets. On success, they are removed from Redis. For retries, Redis is queried for old messages which are cloned and re-dispatched to Kafka. This externalizes retry handling while leveraging Kafka's scalability. Future work includes custom backpressure using multiple Redis sets and sharding them by key.
4. | 4
Who are we?
Blueribbon (acquired by draftkings DraftKings (NASDAQ: DKNG))
● Player engagement platform that leverage marketing tools to achieve better user engagement
Draftkings
● The biggest Daily Fantasy Sport company in the U.S
● Offer sports entertainment experience across 15 professional sports. In 8 countries
● Focus on American sports fan
5. | 5
Operator pains
● Diversity - All operators propose the same solution
● Loyalty - player loyalty and stickiness
● Engagement - Bonuses not making real impact.
6. | 6
How we solve it?
● Creating new marketing tool(s) that create better engagement platform
● The tools should not interfere the game itself
● Easy to integrate
Jackpot as a service
7. | 7
What is Jackpot?
1. Additional gaming tool similar to lottery but with different math algorithms
2. Placed live on top of the game where player (or multiple players) can see it
3. Accumulating until someone win and reseeding
8. | 8
Why Jackpots?
Attracting
● Players can win big prizes
● Winning amounts range from small prizes to very big ones
(Everyone has heard stories about people lucking out a mega
jackpot)
Additional Funnel to win
● Jackpots bring to players new ways to win prizes
● Additional “attraction” to the game-play itself
Play
Win
Marketing
● One of the strongest marketing tools for engagement and
retention players
Engage
9. | 9
How we leverage Jackpots
● Generic tool
● On top of all games and support diverse verticals
● High potential as a social tool for better player engagement
● Completely SaaS - easy integration (API’s, SDK, Scale)
11. | 11
Our Challenges
Low Latency
• Fast responses
Scale
• Concurrency
• Availability
Persistency
• No data loss
• Transaction must be
completed
Atomicity
• All or Nothing (Distributed)
12. | 12
Platform Team mission
- Microservices - High
complexity and different data
sources
- Workflow - We have
sequence of events per
transaction that must occur
linearly
- Asynchronous
communication - Using
queues to enable persistency
and atomicity per one
component to another
14. | 14
Message Queue vs Streaming Broker
Characteristic Message Queue Streaming Broker
Granularity ● Queue delivers single message and ack is
per message
● Acknowledge granularity is per group of message
● Manual commit is for braves only
Retry management ● Retry is managed out of the box per
message
● Able to define number of attempts
● Can define DLQ
● No manage retry
Message re-consumed ability ● Once Acked, Queue message is deleted ● All messages are persisted and can be re-consumed
Horizontal scalability ● Not out of the box ● Scaling is out of the box
Popular Vendors ● RabbitMQ, SQS, ActiveMQ ● Kafka, Kinesis, Pulsar
Hybrid Pulsar / Infinitic
15. | 15
Kafka/Kinesis
• On Kineses - cant get multiple
consumers topics out of the box
with the SDK
RabbitMQ/ActiveMQ
• High maintenance on big clusters
• Performances implications when
enabling backups
SQS/SNS
• SaaS
• Not for low latency
• Client SDK
• High costs
Nice to know (By provider impl)
16. | 16
Decisions
Which road shall we take to meet our challenges
1. Mini-teams to benchmark different solutions
2. POC
3. Conclusions
17. | 17
The missing part(s)
Granularit
y
Work on group of
messages
Not available
Retry
manage
ment
Ability to stop
(limited by consumer)
Back
pressure
Kafka looks like the the only remaining choice for our problem but the default solution is not
sufficient for the overall requirements
19. | 19
How to solve with Kafka
Manage Retry:
Solution:
● Create additional Queue(s) for retry per original topic
● Consumers will re-dispatch to retry queue
Drawbacks:
● Duplicating topics can introduce additional load - adding costs
● Not able to backoff with Kafka
● You can't really create delay between one retry to another
● Not able to separate the retry mechanism from the actual service
21. | 21
Extern Queue message state
To keep continue Kafka with it’s default configuration we need to think on another way to maintain the state outside
We need to choose data-source that can apply the following:
1. It has to be reliable
2. It has to be super-fast
3. It should be simple as a key-value store
4. Horizontal scale
Guess who?:)
22. | 22
Redis as queue state manager
● Fast write-> Redis has the ability to store and query values within ms’s which is required.
● Memory -> Leverage Redis zset(sortedSet) to store events in high efficiency (avoiding key’s
memory)
● Fast Query -> Using Redis sorted sets we can fast-query “old” entries.
Retry Set Tier 1
Score(TS) Event
213834863555 {...}
213834863545 {...}
Using zset we can “query”
values by score
E.g: get all records that are
between Score X to Score Y
23. | 23
Poll
messages
from kafka
Kafka
The crime scene (by steps)
Insert each
message to
zset
Redi
s
processing
logic work
Servi
ce
zrem from
zset
Redi
s
1. Retrieve “old” entries
2. Clone message &
Modify metadata
(counter, ts, etc)
3. Re-dispatch to same
topic
Adding
Messages to
local queue
Queu
e
Retrieve
from queue
(Diff thread
pool)
Servi
ce
Step 1: Poll messages from kafka to dedicated pool
Step 2: Using Auto-commit and insert each message to redis.
Step 3: Add messages to local queue(If full - backoff)
Step 4: Retrieve messages from local queue
Step 5: Process internal logic
Step 6: On success delete from redis
Step 7: Poll for old messages and apply retry logic
Step 1 Step 2 Step 3 Step 5
Step 4 Step 6
Step 7
25. | 25
Outcomes
● Retry -> We get retry out of the box as each message being monitored and cloned into new
one without the overhead of maintaining additional topics
● DLQ -> Can track messages counter and forward into DLQ once exhausted
● Backoff - Messages wont redispatch immediately
● We chose technologies we were already familiar with
26. | 26
Downsides
● Abstract work that has to be implemented and tested roughly
● Yet another datasource
● Not polyglot unless porting
27. | 27
“SDK” the logic
● Apply this logic on a common-level as SDK or shared-lib
● Make the data abstract as possible to allow generic messaging metadata (counter, source,
tracking, id’s)
● Testable and abstract for future maintenance
28. | 28
Moving forward
● Extend into custom back-pressure by adding more zset(s) handling
● Scale out redis to route zsets by event key(s)