1. A streaming platform like Kafka can provide the benefits of Hadoop for batch processing but in a faster, real-time way by processing data as it arrives rather than storing all data.
2. Virtual reality applications require stream processing to power features like VR mirroring and capture in real-time. Kafka's stream processing capabilities address challenges like this for VR.
3. The document discusses how AltspaceVR uses Kafka stream processing for applications like VR mirroring and capture, presence tracking, scheduled tasks, and more to power their real-time VR experiences.
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
1. 1
Hadoop Made Fast
Why Virtual Reality
Needed Stream Processing
to Survive
Greg Fodor, Co-founder, AltspaceVR
Gehrig Kunz, Technical Product Marketing, Confluent
2. 2Confidential
Streaming in Action Series
You are here!
August 16th
Pandora Plays Nicely
Everywhere with Real-Time
Data Pipelines
Watch on Confluent.io
3. 3
A look at today
A Streaming Platform is Hadoop Made Fast
● Hadoop was a good idea, it has its flaws
● How a streaming platform can look like Hadoop
● Companies are using a streaming platform
Stream Processing with Kafka for Virtual Reality
● An example of Kafka with VR
● Challenges VR has that require stream processing
● Examples where it helps
● Why stream processing with Kafka makes sense
5. 5
Good idea, Hadoop is
● Get all the datas
● Perform analysis, explore data
● Perfect for understanding your business
6. 6
But today is different
Star Wars is good, again.
And the apps we build require
constant data.
7. 7
Bringing it to today
Get all the datas
Process data as it arrives
Power your business
git commit -m “Today you want to”
With Hadoop you wanted to
Get all the datas
Explore historical data
Understanding your business
9. 9
What this looks like in practice
Ingest a stream
of data.
Process and act on it as it arrives.
Power your business.
1
2
3
10. 10
Kafka’s Streams API
● Kafka’s Streams API: A lightweight library for
performing stream processing
• Aggregations, Sessions, Windowing, Joins,
et al
● Build apps, not clusters
Client
Server
Runs outside
Kafka brokers!
13. 13
Kafka, stream processing for developers
Deploy apps – not clusters – that are:
● Real-time
● Elastic
● Fault-tolerant
● Teams can be more efficient
● Provide a better, new experience to users
14. 14
Kafka, stream processing for developers
Deploy apps – not clusters – that are:
● Real-time
● Elastic
● Fault-tolerant
● Teams can be more efficient
● Provide a better, new experience to users
Virtual reality, anyone?
Psst, Greg.
30. 30
Game Streams
Create a logical stream across Photon servers
• Real-time netdata transformation
• Routing between Photon servers
• Stateful, due to Photon protocol
43. 43
Prefer declarative OLTP table state
Database tables state should describe “how the world should be” not “steps to perform”
Job’s duty is to make the world look like the one desired
“A stream should exist from playback A to room B” not
“Right now, create a stream from playback A to room B”
Straightforward to test + verify: does desired world match up with reality?
Easier to reason about in failure cases
44. 44
Keep consistent topic naming
Kafka Stream jobs involve a lot of source + intermediate topics
We prefer:
[<data source>|<job application id>]-<avro record type>[_<specifier>]-<partition key>
Ex:
oltp_db-user-user_id
job_playbacks-photon_instantiations-game_stream_id
45. 45
RocksDB range scans
Did you know that RocksDB stores keys lexicographically sorted?
Kafka Streams exposes range() queries on persistent state stores!
46. 46
Example: Scheduled tasks
Keys in “tasks” topic are a composite key of <timestamp, id>
Allows range queries for upcoming tasks (local to partition, obviously)
47. 47
Dark staging jobs
Eventually you will need to deploy a staging version of a job into prod for integration testing
while known-good version is serving users.
Ensure you bake in the necessary degree of freedom! (Duplicate topics, application ids, etc.)
49. 49
KTable rematerialization
Cold nodes read *entire* KTable transaction log for each KTable on startup. (Of course!)
Not something you’re likely to experience except during a failure.
You could be in for a surprise!
Easy to force a rematerialization to test: stop job, remove state dir from job work directory,
restart.
(But you should probably check your xlog topic sizes first)
In our case, AWS EBS I/O throttling caused us to be unable to bring a fresh node up!
Ensure topic xlog doesn’t grow unbounded:
- Ensure you delete dead keys explicitly and have proper compaction policies set on xlog topics
- Or, use set up topic rentention policies if data can be purged after time duration
50. 50
Reset switches + flushing
Sometimes KTables topics or entries need to be forcibly rematerialized/flushed/read from
beginning.
For example: KTable topic exists before first job run. Or, something broke.
Handy to build in mechanisms to:
- Reset consumer offsets to zero
- For OLTP/Connect-backed KTable data, force a no-op update to database record(s) to flush
- In Rails, ActiveRecord#flush
May be less necessary in newer versions of Kafka Streams (ex due to KAFKA-4114 + bugfixes)
Handy topic consumer group offset resetter routine, pass in job Properties:
https://gist.github.com/gfodor/a4f5e4721e959766e75e4c901bf42890
51. 51
Streaming for VR
Kafka Streams has been amazing for us.
Shown so far, we have jobs for:
• VR Mirror/Capture/Playback
• Presence
• Scheduled tasks
We are also using it for:
• Real time game telemetry ET
• VR Capture archival to S3
• Real-time push messaging
52. 52
From batch to real-time
● Provides similar concepts to Hadoop
● Streaming platform is right for today’s applications
○ Distributed storage, Stream processing, Publish/Subscribe model
53. 53
A streaming platform can be ‘Hadoop Made Fast’
● Use Kafka as a ‘source of truth’
● Process data as it arrives
● Power real-time experiences (like VR)
54. 54Confidential
Streaming in Action Series
You are here
August 16th
Pandora Plays Nicely
Everywhere with Real-Time
Data Pipelines
Watch on Confluent.io