SlideShare une entreprise Scribd logo
1  sur  24
Target and Connect Intelligently
Experience with Kafka & Storm
Otto Mok
Solution Architect, AcuityAds
April 30, 2014 – Toronto Hadoop User Group
2
Agenda
• Background
– What does AcuityAds do?
• Use case
– What are we trying to do?
• High-level System Architecture
– How does the data flow?
• Kafka & Storm
– What did we do wrong?
3
Background
Source: https://www.google.ca/search?q=banner+ads&tbm=isch&tbo=u
4
Background
• Digital Advertising
– Website banner, pre-roll video, free mobile app
• Buy ad impressions at ‘real-time’
– Response within 50ms for auction
• Find best match between people and ads
– Show ad that you care about
• Use machine learning algo to ‘learn’
– Data, data, data
5
Use case
• 10+ billion daily impressions
• 30,000+ new sites daily
• How many daily impressions by site?
• How are the impressions distributed?
– Country, Province, Gender, Age Range, etc...
6
High-level System Architecture
• 10+ billion daily bid requests
• Make up to 4 billion daily bids
• Serve millions of daily impressions
• 10+ TB of messages daily
• 300k+ message / second
Bidder Adserver
Kafka
Hbase/Hadoop
Storm
7
Kafka
Source: http://kafka.apache.org/documentation.html
8
Kafka - Spec
• Kafka v0.8.0
• Servers – 10 x 2U(10 x 3TB) JBOD
• Total storage – 300 TB
• Replication – 3x
• Unique data – 100 TB
• Capacity – a few days
• Producer acknowledgment – never waits
• Topic - BIDREQUEST
9
Kafka - Monitoring
• Nagios
– Ping, CPU, memory, network I/O, disk space
• Producer-Consumer group message counting
– Hourly consumption rate check
Topic Consumer Group ID Producer Count Consumer Count Error Ratio
BIDREQUEST InventoryTopology 122,450,812 122,444,294 None 1.00
BIDREQUEST SearchTargetingTopology 122,450,812 107,755,295 Ratio below 98% 0.88
10
Kafka - Monitoring
• Kafka Web Console
– Partition offset for each consumer group
11
Kafka - Issues
• Issue 1 - Partitions
– 10 partitions
– Each partition > 1 TB a day
– 100 TB / 1 TB – no problem!
• Each partition is stored in a directory
– /disk05/kafka-logs/BIDREQUEST-09
– /disk09/kafka-logs/BIDREQUEST-03
12
Kafka - Issues
• Issue 2 – Unbalanced partition distribution
– Some servers running out of space
– Some servers are not “leader” for any partition
• Network glitch cause server to drop out of
cluster, no longer leader after rejoin
• auto.leader.rebalance.enable=true
13
Lots of data – now what?
Source: http://bookriotcom.c.presscdn.com/wp-content/uploads/2013/03/server-farm-shot.jpg
14
Use case - again
• 10+ billion daily impressions
• 30,000+ new sites daily
• How many daily impressions by site?
• How are the impressions distributed?
– Country, Province, Gender, Age Range, etc...
15
Storm
Source: http://storm.incubator.apache.org/documentation/Tutorial.html
16
Storm - Spec
• Storm v0.8.2
• Servers – 13 x Dual Quad Core Xeon 36G RAM
• 4 worker slots per server
• Total logical CPUs – 208
• Total memory – 468 G
• Total slots – 52 worker slots (JVMs)
17
Storm - Monitor
18
Storm - Topology
• Spout read each BidRequest from Kafka topic
• Determine new or existing, emit tuples to
different “streams”
19
Storm - Topology
• InsertInventoryBolt
– Process tuples from NewInventory stream
– Field grouping on sourceId, domainName
– Tick tuple every 1 second
• UpdateInventoryBolt
– Process tuples from ExistingInventory stream
– Field grouping on inventoryId
– Tick tuple every 1 second
20
Storm - Topology
• LogInventoryBolt
– Process tuples from ExistingInventory stream
– Field grouping on inventoryId
– Tick tuple every 10 seconds
21
Storm - Issues
• Issue – Low uptime
– 10 workers, 100 executors
– Not processing many tuples
– Process latency < 10ms
• Bolts restarts due to uncaught Exceptions
22
Conclusion
• Cost
– Bleed edge technology  bugs
– Support  mailing lists
– Monitoring  roll your own
– Operation  dedicated personnel
• Benefit
– Near real-time data on site impression volume &
distribution by geo, demo, etc...
23
Forward Looking
• Kafka v0.8.1.1
– Allow specify broker hostname for producer &
consumer
– Change # of partitions of a topic online
• Storm v0.9.1
– Faster pure Java Netty transport
– View logs from each server from Storm UI
– Tick tuple using floating point seconds
– Storm on Hadoop (HDP 2.1)
24
Thank you
Otto Mok
otto.mok@acuityads.com
Source: http://jamesgieordano.files.wordpress.com/2011/05/babyelephant.jpg

Contenu connexe

Tendances

Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
DataWorks Summit
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 

Tendances (20)

Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Multi-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceMulti-tenant Apache Storm as a service
Multi-tenant Apache Storm as a service
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 

En vedette

Real time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaReal time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, Kafka
Trieu Nguyen
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
nathanmarz
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
DataMine Lab
 
Introduction of netty
Introduction of nettyIntroduction of netty
Introduction of netty
Bing Luo
 
Nettyらへん
NettyらへんNettyらへん
Nettyらへん
Go Tanaka
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
Daniel Bimschas
 

En vedette (20)

Real time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaReal time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, Kafka
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Netty
NettyNetty
Netty
 
Introduction of netty
Introduction of nettyIntroduction of netty
Introduction of netty
 
Présentation de Apache Zookeeper
Présentation de Apache ZookeeperPrésentation de Apache Zookeeper
Présentation de Apache Zookeeper
 
Notes on Netty baics
Notes on Netty baicsNotes on Netty baics
Notes on Netty baics
 
Nettyらへん
NettyらへんNettyらへん
Nettyらへん
 
Non blocking io with netty
Non blocking io with nettyNon blocking io with netty
Non blocking io with netty
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Netty: asynchronous data transfer
Netty: asynchronous data transferNetty: asynchronous data transfer
Netty: asynchronous data transfer
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
 
Sistemi domotici integrati per la gestione intelligente d’ambiente
Sistemi domotici integrati per la gestione intelligente d’ambienteSistemi domotici integrati per la gestione intelligente d’ambiente
Sistemi domotici integrati per la gestione intelligente d’ambiente
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 

Similaire à Experience with Kafka & Storm

Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
DataStax Academy
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Lucidworks
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similaire à Experience with Kafka & Storm (20)

Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
 
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Pl...
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
"The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You""The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You"
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Release it! - Takeaways
Release it! - TakeawaysRelease it! - Takeaways
Release it! - Takeaways
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
From wwwtocloud_28sept09
From wwwtocloud_28sept09From wwwtocloud_28sept09
From wwwtocloud_28sept09
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
From WWW to Cloud Oct 2009.Pptx
From WWW to Cloud Oct 2009.PptxFrom WWW to Cloud Oct 2009.Pptx
From WWW to Cloud Oct 2009.Pptx
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Experience with Kafka & Storm

  • 1. Target and Connect Intelligently Experience with Kafka & Storm Otto Mok Solution Architect, AcuityAds April 30, 2014 – Toronto Hadoop User Group
  • 2. 2 Agenda • Background – What does AcuityAds do? • Use case – What are we trying to do? • High-level System Architecture – How does the data flow? • Kafka & Storm – What did we do wrong?
  • 4. 4 Background • Digital Advertising – Website banner, pre-roll video, free mobile app • Buy ad impressions at ‘real-time’ – Response within 50ms for auction • Find best match between people and ads – Show ad that you care about • Use machine learning algo to ‘learn’ – Data, data, data
  • 5. 5 Use case • 10+ billion daily impressions • 30,000+ new sites daily • How many daily impressions by site? • How are the impressions distributed? – Country, Province, Gender, Age Range, etc...
  • 6. 6 High-level System Architecture • 10+ billion daily bid requests • Make up to 4 billion daily bids • Serve millions of daily impressions • 10+ TB of messages daily • 300k+ message / second Bidder Adserver Kafka Hbase/Hadoop Storm
  • 8. 8 Kafka - Spec • Kafka v0.8.0 • Servers – 10 x 2U(10 x 3TB) JBOD • Total storage – 300 TB • Replication – 3x • Unique data – 100 TB • Capacity – a few days • Producer acknowledgment – never waits • Topic - BIDREQUEST
  • 9. 9 Kafka - Monitoring • Nagios – Ping, CPU, memory, network I/O, disk space • Producer-Consumer group message counting – Hourly consumption rate check Topic Consumer Group ID Producer Count Consumer Count Error Ratio BIDREQUEST InventoryTopology 122,450,812 122,444,294 None 1.00 BIDREQUEST SearchTargetingTopology 122,450,812 107,755,295 Ratio below 98% 0.88
  • 10. 10 Kafka - Monitoring • Kafka Web Console – Partition offset for each consumer group
  • 11. 11 Kafka - Issues • Issue 1 - Partitions – 10 partitions – Each partition > 1 TB a day – 100 TB / 1 TB – no problem! • Each partition is stored in a directory – /disk05/kafka-logs/BIDREQUEST-09 – /disk09/kafka-logs/BIDREQUEST-03
  • 12. 12 Kafka - Issues • Issue 2 – Unbalanced partition distribution – Some servers running out of space – Some servers are not “leader” for any partition • Network glitch cause server to drop out of cluster, no longer leader after rejoin • auto.leader.rebalance.enable=true
  • 13. 13 Lots of data – now what? Source: http://bookriotcom.c.presscdn.com/wp-content/uploads/2013/03/server-farm-shot.jpg
  • 14. 14 Use case - again • 10+ billion daily impressions • 30,000+ new sites daily • How many daily impressions by site? • How are the impressions distributed? – Country, Province, Gender, Age Range, etc...
  • 16. 16 Storm - Spec • Storm v0.8.2 • Servers – 13 x Dual Quad Core Xeon 36G RAM • 4 worker slots per server • Total logical CPUs – 208 • Total memory – 468 G • Total slots – 52 worker slots (JVMs)
  • 18. 18 Storm - Topology • Spout read each BidRequest from Kafka topic • Determine new or existing, emit tuples to different “streams”
  • 19. 19 Storm - Topology • InsertInventoryBolt – Process tuples from NewInventory stream – Field grouping on sourceId, domainName – Tick tuple every 1 second • UpdateInventoryBolt – Process tuples from ExistingInventory stream – Field grouping on inventoryId – Tick tuple every 1 second
  • 20. 20 Storm - Topology • LogInventoryBolt – Process tuples from ExistingInventory stream – Field grouping on inventoryId – Tick tuple every 10 seconds
  • 21. 21 Storm - Issues • Issue – Low uptime – 10 workers, 100 executors – Not processing many tuples – Process latency < 10ms • Bolts restarts due to uncaught Exceptions
  • 22. 22 Conclusion • Cost – Bleed edge technology  bugs – Support  mailing lists – Monitoring  roll your own – Operation  dedicated personnel • Benefit – Near real-time data on site impression volume & distribution by geo, demo, etc...
  • 23. 23 Forward Looking • Kafka v0.8.1.1 – Allow specify broker hostname for producer & consumer – Change # of partitions of a topic online • Storm v0.9.1 – Faster pure Java Netty transport – View logs from each server from Storm UI – Tick tuple using floating point seconds – Storm on Hadoop (HDP 2.1)
  • 24. 24 Thank you Otto Mok otto.mok@acuityads.com Source: http://jamesgieordano.files.wordpress.com/2011/05/babyelephant.jpg