SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Apache Apex
Real Time Insights for Advertising Tech
Tushar Gosavi
November 25, 2015
Agenda
• The Customer - What they do
• The Use Case
• While (! Realtime)
– Evaluation of Application
– Challenges
• Leading digital automation software company for publishers
• Leading innovator in real-time bidding (RTB) auctions
• Helps publishers monetize their digital assets
• Enables publishers to make smarter inventory decisions and improve revenue
More about the customer
• Reporting of critical metrics from auctions and client logs
• Revenue, impression, and click information
• Aggregate counters and reporting on top N metrics
• Low latency querying using pub-sub model
Understanding the usecase
Scale
• 6 geographically distributed data centers
• Combination of co-located & AWS based DCs
• > 5 PB under data management
• 22 TB/day of data generated from auction & client logs
• Heterogeneous data log formats
• North of 15 billion impressions per day
• Average data inflow of 200K events/sec
5Proprietary and Confidential
• Ad server log events consumed as Avro-encoded, Snappy compressed files from
S3. New files uploaded every 10-20 minutes.
• Data may arrive in S3 out of order (time stamps).
• Event size about 2KB uncompressed, only subset of fields retrieved for
aggregation.
• Aggregates kept in memory (checkpointed) with expiration policy and query
processing against in-memory data.
• Front-end integration through pub-sub protocol for real-time dashboard
components.
Initial Requirements
Solution (Phase 1)
7Proprietary and Confidential
AdServer
REST proxy
REST proxy
Real-time architecture- Powered By Apex
Kafka
Cluster
S3Reader
S3Reader
Filter Filter
Dimensions
Aggregator
Dimensions
Aggregator
Dimensions
Store
Query Query
Result
Kafka
Cluster
Auction Logs
Middleware
Auction Logs
Filtered Events Filtered Events
Aggregates
Query from MW
Query Query Results
S3 S3 Client logsAuction Logs
Learning & Challenges
• Unstable S3 client libraries
– Unpredictable hangs and Corrupted data
– On Hang, Master kills the container and restart reading of file from different container
– Corrupt files caused containers to kill – application configurable retry mechanism and skip
bad files
• Out of Order data
– Tuples with timestamp in future and past
• Memory Requirement for Store
– Cardinality Estimation for incoming data
Solution (Phase 2)
9Proprietary and Confidential
REST proxy
Real-time architecture- Powered By Apex
Client logs
Kafka
Input
(Auction logs)
ETL
Filter Filter
Dimensions
Aggregator
Dimensions
Aggregator
Dimensions
Store
Query Query
Result
Kafka
Cluster
Auction LogsKafka
Cluste
r
Middleware
AdServer
REST proxy
Kafka
Cluste
r
Auction
Logs
Client logs
Kafka Messages
Decompress
& Flatten
Filtered Events Filtered Events
Aggregates
Query from MW
Query Query Results
S3
S3Reader
Kafka
Input
(Auction logs)Auction
Logs
Learning & Challenges
• Complex Logical DAG
• Kafka Operator
– Dynamic Partitioning disabled
– Memory configuration
– Offset snapshotting to ensure exactly once semantics
– Limit Kafka read rate
• Harder Debugging (More number of components)
– GBs of container logs
– Difficult to locate the sequence of failure (Feature being added)
Solution
11Proprietary and Confidential
User
Browser
AdServer
REST proxy
REST proxy
Real-time architecture- Powered By Apex
Kafka
Cluster
Client logs
Kafka
Input
(Auction logs)
Kafka
Input
(Client logs)
CDN
(Caching of
logs)
ETL ETL
Filter Filter
Dimensions
Aggregator
Dimensions
Aggregator
Dimensions
Store
Query Query
Result
Kafka
Cluster
Auction Logs
Client logs
Middleware
Auction Logs
Client logs
Kafka Messages Kafka Messages
Decompress
& Flatten
Decompress
& Flatten
Filtered Events Filtered Events
Aggregates
Query from MW
Query Query Results
Kafka
Cluster
Application Configuration
• 64 Kafka Input operators reading from 6 geographically distributed DCs
• Under 40 seconds end-to-end latency, from ad-serving to visualization
• 32 instances of in-memory distributed store
• 64 aggregators
• 1.2 TB memory footprint @ peak load
• Work underway on a fault tolerant application using HDHT
12Proprietary and Confidential
Screenshots - Demo UI
Before And After
14Proprietary and Confidential
5 Hours + 20 Minute
• No real-time processing system in place.
• Publishers and buyers could only rely on
a batch processing system for gathering
relevant data
• Outdated data, not relevant to
current time
• Current data being pushed to a
waiting queue
• Cumbersome batch-processing
lifecycle
• No visualization for reports
• No glimpse into everyday
happenings, translating to lost
decisions or untimely decision
making scenarios
Before Scenario After Scenario
• Phase 1,2
• With DataTorrent RTS (built
on Apache Apex), Dev
team put together the first
real time analytics platform
• This enabled Reporting of
critical metrics around
campaign monetization
• Reuse of batch ingestion
mechanism for the
impression data, shared
with other pipelines (S3)
< 1 Min
No Real-time Batch + Real-time
• Phase 3
• Reduce end-to-end latency
through real-time ingestion
of impression data from
Kafka
• Results available much
sooner to the user
• Balances load (no more
batch ingestion spikes),
reduces resource
consumption
• Handles ever growing traffic
with more efficient resource
utilization.
Real-time Streaming
Operators used
S3 reader (File Input Operator)
- Recursively reading the contents of a S3 bucket based on a
partitioning pattern
- Inclusion & exclusion support
- Fault tolerance (replay and idempotence)
- Throughput of over 12K reads/second for event size of 1.2 KB each
Kafka Input Operator
- Ability to consume from multiple Kafka clusters
- Offset management support
- Fault tolerant reads
- Support for idempotence & exactly once semantics
- Controlled reads for managing backpressure
15Proprietary and Confidential
Cont’d…
Dimension Store
- Distributed in-memory store
- Support for re-aggregation of events
- Partitioning of aggregates
- Low latency query support with a pub/sub model using Kafka
HDHT
- HDFS backed embedded key-value store
- Fault tolerant, random read & write
- Durability in-case of cold restarts
16Proprietary and Confidential
Key learnings
• DAG – sizing, locality & partitioning (Benchmark)
• Benchmark each Operator
• Memory sizing for the Operators
• Manage Backpressure
• Think fault tolerance & recovery before starting implementation
17Proprietary and Confidential
Resources
18
Apache Apex Community Page
Apache Apex LinkedIn Group
Questions?
19Proprietary and Confidential

Contenu connexe

Tendances

A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
confluent
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
HostedbyConfluent
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
confluent
 

Tendances (20)

Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafka
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at Scale
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
 
Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environments
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian LitaKafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
New Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQLNew Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQL
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
 

Similaire à Real Time Insights for Advertising Tech

From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
Thomas Weise
 

Similaire à Real Time Insights for Advertising Tech (20)

Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Stream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache ApexStream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 

Plus de Apache Apex

Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 

Plus de Apache Apex (20)

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 

Dernier

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Dernier (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Real Time Insights for Advertising Tech

  • 1. Apache Apex Real Time Insights for Advertising Tech Tushar Gosavi November 25, 2015
  • 2. Agenda • The Customer - What they do • The Use Case • While (! Realtime) – Evaluation of Application – Challenges
  • 3. • Leading digital automation software company for publishers • Leading innovator in real-time bidding (RTB) auctions • Helps publishers monetize their digital assets • Enables publishers to make smarter inventory decisions and improve revenue More about the customer
  • 4. • Reporting of critical metrics from auctions and client logs • Revenue, impression, and click information • Aggregate counters and reporting on top N metrics • Low latency querying using pub-sub model Understanding the usecase
  • 5. Scale • 6 geographically distributed data centers • Combination of co-located & AWS based DCs • > 5 PB under data management • 22 TB/day of data generated from auction & client logs • Heterogeneous data log formats • North of 15 billion impressions per day • Average data inflow of 200K events/sec 5Proprietary and Confidential
  • 6. • Ad server log events consumed as Avro-encoded, Snappy compressed files from S3. New files uploaded every 10-20 minutes. • Data may arrive in S3 out of order (time stamps). • Event size about 2KB uncompressed, only subset of fields retrieved for aggregation. • Aggregates kept in memory (checkpointed) with expiration policy and query processing against in-memory data. • Front-end integration through pub-sub protocol for real-time dashboard components. Initial Requirements
  • 7. Solution (Phase 1) 7Proprietary and Confidential AdServer REST proxy REST proxy Real-time architecture- Powered By Apex Kafka Cluster S3Reader S3Reader Filter Filter Dimensions Aggregator Dimensions Aggregator Dimensions Store Query Query Result Kafka Cluster Auction Logs Middleware Auction Logs Filtered Events Filtered Events Aggregates Query from MW Query Query Results S3 S3 Client logsAuction Logs
  • 8. Learning & Challenges • Unstable S3 client libraries – Unpredictable hangs and Corrupted data – On Hang, Master kills the container and restart reading of file from different container – Corrupt files caused containers to kill – application configurable retry mechanism and skip bad files • Out of Order data – Tuples with timestamp in future and past • Memory Requirement for Store – Cardinality Estimation for incoming data
  • 9. Solution (Phase 2) 9Proprietary and Confidential REST proxy Real-time architecture- Powered By Apex Client logs Kafka Input (Auction logs) ETL Filter Filter Dimensions Aggregator Dimensions Aggregator Dimensions Store Query Query Result Kafka Cluster Auction LogsKafka Cluste r Middleware AdServer REST proxy Kafka Cluste r Auction Logs Client logs Kafka Messages Decompress & Flatten Filtered Events Filtered Events Aggregates Query from MW Query Query Results S3 S3Reader Kafka Input (Auction logs)Auction Logs
  • 10. Learning & Challenges • Complex Logical DAG • Kafka Operator – Dynamic Partitioning disabled – Memory configuration – Offset snapshotting to ensure exactly once semantics – Limit Kafka read rate • Harder Debugging (More number of components) – GBs of container logs – Difficult to locate the sequence of failure (Feature being added)
  • 11. Solution 11Proprietary and Confidential User Browser AdServer REST proxy REST proxy Real-time architecture- Powered By Apex Kafka Cluster Client logs Kafka Input (Auction logs) Kafka Input (Client logs) CDN (Caching of logs) ETL ETL Filter Filter Dimensions Aggregator Dimensions Aggregator Dimensions Store Query Query Result Kafka Cluster Auction Logs Client logs Middleware Auction Logs Client logs Kafka Messages Kafka Messages Decompress & Flatten Decompress & Flatten Filtered Events Filtered Events Aggregates Query from MW Query Query Results Kafka Cluster
  • 12. Application Configuration • 64 Kafka Input operators reading from 6 geographically distributed DCs • Under 40 seconds end-to-end latency, from ad-serving to visualization • 32 instances of in-memory distributed store • 64 aggregators • 1.2 TB memory footprint @ peak load • Work underway on a fault tolerant application using HDHT 12Proprietary and Confidential
  • 14. Before And After 14Proprietary and Confidential 5 Hours + 20 Minute • No real-time processing system in place. • Publishers and buyers could only rely on a batch processing system for gathering relevant data • Outdated data, not relevant to current time • Current data being pushed to a waiting queue • Cumbersome batch-processing lifecycle • No visualization for reports • No glimpse into everyday happenings, translating to lost decisions or untimely decision making scenarios Before Scenario After Scenario • Phase 1,2 • With DataTorrent RTS (built on Apache Apex), Dev team put together the first real time analytics platform • This enabled Reporting of critical metrics around campaign monetization • Reuse of batch ingestion mechanism for the impression data, shared with other pipelines (S3) < 1 Min No Real-time Batch + Real-time • Phase 3 • Reduce end-to-end latency through real-time ingestion of impression data from Kafka • Results available much sooner to the user • Balances load (no more batch ingestion spikes), reduces resource consumption • Handles ever growing traffic with more efficient resource utilization. Real-time Streaming
  • 15. Operators used S3 reader (File Input Operator) - Recursively reading the contents of a S3 bucket based on a partitioning pattern - Inclusion & exclusion support - Fault tolerance (replay and idempotence) - Throughput of over 12K reads/second for event size of 1.2 KB each Kafka Input Operator - Ability to consume from multiple Kafka clusters - Offset management support - Fault tolerant reads - Support for idempotence & exactly once semantics - Controlled reads for managing backpressure 15Proprietary and Confidential
  • 16. Cont’d… Dimension Store - Distributed in-memory store - Support for re-aggregation of events - Partitioning of aggregates - Low latency query support with a pub/sub model using Kafka HDHT - HDFS backed embedded key-value store - Fault tolerant, random read & write - Durability in-case of cold restarts 16Proprietary and Confidential
  • 17. Key learnings • DAG – sizing, locality & partitioning (Benchmark) • Benchmark each Operator • Memory sizing for the Operators • Manage Backpressure • Think fault tolerance & recovery before starting implementation 17Proprietary and Confidential
  • 18. Resources 18 Apache Apex Community Page Apache Apex LinkedIn Group