SlideShare une entreprise Scribd logo
1  sur  13
DOINGAWESOME
THINGSINONLINE
DISPLAYADVERTISING
USINGHADOOP
SuccessStories,LessonsLearned,andaWishList
Dr.JaimieKwon.TechDirector,DataMining
Massive cross-screen network reaching 600M+ consumers worldwide
Premium programmatic demand side platform
Leading premium video network with 67M+ uniques
Premium programmatic video platform
Branded and content entertainment platform
Branded and content entertainment platform
Branded and content entertainment platform
Premium programmatic supply side platform
5Vs IN BIG DATA
• Doesn’t always work
well with “volume”…
leading to silos.
Technical challenge.
VELOCITY
• Petabytes are norm. Thanks
Hadoop! Bottleneck and
hotspots occurs in
unexpected places.
VOLUME
• “Where shall clean
metadata be found?”
Organizational challenge
(culture and process).
VERACITY
• Diverse data source…
leading to silos.
Engineering resource /
architectural challenge
VARIETY
• Not to be forgotten.
“Why we fight?”
VALUE
IT’S BEENA
GREAT 10YEARS
(Taken from http://www.slideshare.net/larsgeorge/hadoop-is-dead-lars-george-bi-data2013 and http://techblog.baghel.com/index.php?itemid=132 )
AOLNETWORKS
DATAIN HADOOP
USE CASES
Aggregates : Easy via Hive
Ad hoc queries : Harder via Pig/Hive
User level analysis : Hardest
1. Customer / audience understanding,
2. Predicting look-alike audiences,
3. Measuring ad effectiveness,
4. User time-series analysis,
5. Stream analysis,
6. Ad-hoc research,
7. ...
SCALE
• > 1 Billion events / day
• > 100 million web users
Hundreds of advertisers
Thousands of ad campaigns
Thousands of pixels
Petabytes of data
CHALLENGES
VARIETY
• Acquisitions happens
• New, diverse data sources
• Speed of ingestion is the key
NEED FOR USER LEVELANALYSIS
Answering such questions as:
• “What are prominent behavioral segments of
those who purchased product A?”
• “What do users do 2-weeks prior to
purchasing product B?”
• “What is the likelihood of a user purchasing
product C over next week?”
UNSTRUCTURED
DATA
MAD,MAD, MAD
Magnetic: “attracting all
the data sources that
crop up within an
organization regardless of
data quality niceties.”
Agile: “allow analysts to
easily ingest, digest,
produce and adapt data at
a rapid pace.”
Deep: “... increasingly
sophisticated statistical methods
... beyond the rollups and
drilldowns of traditional BI. ...
need to see both the forest and
the trees in running these
algorithms - they want to study
enormous datasets without
resorting to samples and
extracts. The modern data
warehouse should serve both as
a deep data repository and as
a sophisticated algorithmic
runtime engine.”
MAD Skills: New Analysis Practices for Big Data (2009, Cohen et al.)
M A D
USERPROFILE
USER PROFILE
• Daily user profile is built for all
anonymous cookie ids seen on a given
day
• Multiple days’ worth of user profile is
assembled via map-side join.
• Processing framework is built so map-
side join and other machineries are
hidden from researchers and (most)
developers.
• Support almost all advanced use cases.
CHOICES WE (ALMOST) HAD:
• Flat file on HDFS,
• Pig,
• Hive,
• Hbase,
• Custom “user profile”
• Ended up with user profile
approach and never looked back..
• .. so far.
USECASES#1:
CUSTOMERUNDERSTANDING
User profile supports AOL Networks’ audience analytics system that answers such
questions as:
• “Are very young and old customers better clickers?”
o “Yes, but young adult are better purchasers”
• “Are people who saw display advertising more likely to come to the online store?”
o “Yes. About twice more likely in particular.”
USECASES#2:
LOOKALIKEAUDIENCEMODEL
User profile supports AOL
Networks’ Lookalike audience
offering, which let you reach new
people who are likely to be
interested in advertiser’s offering
due to their similarity to existing
customers.
Predictive Analytics
and Optimization
Logistic Regression
Neural Networks
Random Forest
Gradient Boosting Machine
…
VALUE UNSTRUCTURED
DATA
MORECHALLENGES...
Cluster Ops
Tuning of Cluster / Jobs
Velocity / real-time: Want more real-time update of the user profile. Hard.
Veracity: Organizational challenge. High-quality metadata.
Good “Data Scientists” specializing in “Big Data” are hard to find.
LOOKING FORWARDTO MORE
EXCITING DEVELOPMENT
(Taken from http://www.slideshare.net/larsgeorge/hadoop-is-dead-lars-george-bi-data2013 and http://techblog.baghel.com/index.php?itemid=132 )
20232015
?

Contenu connexe

En vedette

빅데이터 전문가 / 데이터 사이언티스트 커리어에 대한 고려 사항과 사례 - Gonnector 고영혁 (Dylan Ko)
빅데이터 전문가 / 데이터 사이언티스트 커리어에 대한 고려 사항과 사례 - Gonnector 고영혁 (Dylan Ko)빅데이터 전문가 / 데이터 사이언티스트 커리어에 대한 고려 사항과 사례 - Gonnector 고영혁 (Dylan Ko)
빅데이터 전문가 / 데이터 사이언티스트 커리어에 대한 고려 사항과 사례 - Gonnector 고영혁 (Dylan Ko)
Dylan Ko
 

En vedette (14)

Big data infra core technology 빅데이터 전문인력-양성사업_분석과정-특강
Big data infra core technology 빅데이터 전문인력-양성사업_분석과정-특강Big data infra core technology 빅데이터 전문인력-양성사업_분석과정-특강
Big data infra core technology 빅데이터 전문인력-양성사업_분석과정-특강
 
General Additive Models in R
General Additive Models in RGeneral Additive Models in R
General Additive Models in R
 
분석8기 4조
분석8기 4조분석8기 4조
분석8기 4조
 
분석 현장에서 요구되는 데이터과학자의 역량과 자질
분석 현장에서 요구되는 데이터과학자의 역량과 자질분석 현장에서 요구되는 데이터과학자의 역량과 자질
분석 현장에서 요구되는 데이터과학자의 역량과 자질
 
빅데이터 전문가 / 데이터 사이언티스트 커리어에 대한 고려 사항과 사례 - Gonnector 고영혁 (Dylan Ko)
빅데이터 전문가 / 데이터 사이언티스트 커리어에 대한 고려 사항과 사례 - Gonnector 고영혁 (Dylan Ko)빅데이터 전문가 / 데이터 사이언티스트 커리어에 대한 고려 사항과 사례 - Gonnector 고영혁 (Dylan Ko)
빅데이터 전문가 / 데이터 사이언티스트 커리어에 대한 고려 사항과 사례 - Gonnector 고영혁 (Dylan Ko)
 
HR과 빅데이터
HR과 빅데이터HR과 빅데이터
HR과 빅데이터
 
빅데이터 기술을 적용한_차세대_보안핵심_신기술의_최적_적용_및_활용방안(배포)-d_han_kim-2014-2-20
빅데이터 기술을 적용한_차세대_보안핵심_신기술의_최적_적용_및_활용방안(배포)-d_han_kim-2014-2-20빅데이터 기술을 적용한_차세대_보안핵심_신기술의_최적_적용_및_활용방안(배포)-d_han_kim-2014-2-20
빅데이터 기술을 적용한_차세대_보안핵심_신기술의_최적_적용_및_활용방안(배포)-d_han_kim-2014-2-20
 
AWS Enterprise Summit :: 빅데이터 워크로드를 위한 AWS 활용방법 (김기완 솔루션즈 아키텍트)
AWS Enterprise Summit :: 빅데이터 워크로드를 위한 AWS 활용방법 (김기완 솔루션즈 아키텍트)AWS Enterprise Summit :: 빅데이터 워크로드를 위한 AWS 활용방법 (김기완 솔루션즈 아키텍트)
AWS Enterprise Summit :: 빅데이터 워크로드를 위한 AWS 활용방법 (김기완 솔루션즈 아키텍트)
 
21st Century University feasibility study
21st Century University feasibility study 21st Century University feasibility study
21st Century University feasibility study
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O API
 
한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
빅데이터의 이해
빅데이터의 이해빅데이터의 이해
빅데이터의 이해
 
Machine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision TreesMachine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision Trees
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Doing Awesome Things in Online Advertising Using Hadoop

  • 2. Massive cross-screen network reaching 600M+ consumers worldwide Premium programmatic demand side platform Leading premium video network with 67M+ uniques Premium programmatic video platform Branded and content entertainment platform Branded and content entertainment platform Branded and content entertainment platform Premium programmatic supply side platform
  • 3. 5Vs IN BIG DATA • Doesn’t always work well with “volume”… leading to silos. Technical challenge. VELOCITY • Petabytes are norm. Thanks Hadoop! Bottleneck and hotspots occurs in unexpected places. VOLUME • “Where shall clean metadata be found?” Organizational challenge (culture and process). VERACITY • Diverse data source… leading to silos. Engineering resource / architectural challenge VARIETY • Not to be forgotten. “Why we fight?” VALUE
  • 4. IT’S BEENA GREAT 10YEARS (Taken from http://www.slideshare.net/larsgeorge/hadoop-is-dead-lars-george-bi-data2013 and http://techblog.baghel.com/index.php?itemid=132 )
  • 5. AOLNETWORKS DATAIN HADOOP USE CASES Aggregates : Easy via Hive Ad hoc queries : Harder via Pig/Hive User level analysis : Hardest 1. Customer / audience understanding, 2. Predicting look-alike audiences, 3. Measuring ad effectiveness, 4. User time-series analysis, 5. Stream analysis, 6. Ad-hoc research, 7. ... SCALE • > 1 Billion events / day • > 100 million web users Hundreds of advertisers Thousands of ad campaigns Thousands of pixels Petabytes of data
  • 6. CHALLENGES VARIETY • Acquisitions happens • New, diverse data sources • Speed of ingestion is the key NEED FOR USER LEVELANALYSIS Answering such questions as: • “What are prominent behavioral segments of those who purchased product A?” • “What do users do 2-weeks prior to purchasing product B?” • “What is the likelihood of a user purchasing product C over next week?” UNSTRUCTURED DATA
  • 7. MAD,MAD, MAD Magnetic: “attracting all the data sources that crop up within an organization regardless of data quality niceties.” Agile: “allow analysts to easily ingest, digest, produce and adapt data at a rapid pace.” Deep: “... increasingly sophisticated statistical methods ... beyond the rollups and drilldowns of traditional BI. ... need to see both the forest and the trees in running these algorithms - they want to study enormous datasets without resorting to samples and extracts. The modern data warehouse should serve both as a deep data repository and as a sophisticated algorithmic runtime engine.” MAD Skills: New Analysis Practices for Big Data (2009, Cohen et al.) M A D
  • 8. USERPROFILE USER PROFILE • Daily user profile is built for all anonymous cookie ids seen on a given day • Multiple days’ worth of user profile is assembled via map-side join. • Processing framework is built so map- side join and other machineries are hidden from researchers and (most) developers. • Support almost all advanced use cases. CHOICES WE (ALMOST) HAD: • Flat file on HDFS, • Pig, • Hive, • Hbase, • Custom “user profile” • Ended up with user profile approach and never looked back.. • .. so far.
  • 9. USECASES#1: CUSTOMERUNDERSTANDING User profile supports AOL Networks’ audience analytics system that answers such questions as: • “Are very young and old customers better clickers?” o “Yes, but young adult are better purchasers” • “Are people who saw display advertising more likely to come to the online store?” o “Yes. About twice more likely in particular.”
  • 10. USECASES#2: LOOKALIKEAUDIENCEMODEL User profile supports AOL Networks’ Lookalike audience offering, which let you reach new people who are likely to be interested in advertiser’s offering due to their similarity to existing customers. Predictive Analytics and Optimization Logistic Regression Neural Networks Random Forest Gradient Boosting Machine … VALUE UNSTRUCTURED DATA
  • 11. MORECHALLENGES... Cluster Ops Tuning of Cluster / Jobs Velocity / real-time: Want more real-time update of the user profile. Hard. Veracity: Organizational challenge. High-quality metadata. Good “Data Scientists” specializing in “Big Data” are hard to find.
  • 12. LOOKING FORWARDTO MORE EXCITING DEVELOPMENT (Taken from http://www.slideshare.net/larsgeorge/hadoop-is-dead-lars-george-bi-data2013 and http://techblog.baghel.com/index.php?itemid=132 ) 20232015
  • 13. ?

Notes de l'éditeur

  1. Example of typical cover slide.
  2. http://db.cs.berkeley.edu/papers/vldb09-madskills.pdfhttp://x86.cs.duke.edu/~gang/documents/CIDR11_Paper36.pdf