SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Company Profile
Сегментация пользователей
в online-рекламе
Spark vs Hadoop
Сергей Жемжицкий,
CTO, CleverDATA,
22 мая, 2015
cleverdata.ru | info@cleverdata.ru
International market
business development
since 2012
One of three leading IT companies in Russia
43 branches in Russia and abroad
+5500 employees
100K projects for 10K customers
Data management innovative
platform (Data Exchange Service)
Cloud Service
In-house development
Internet advertising solutions
Data Management Platforms
Customers Base Management
Web Analytics
Marketing automation
Big Data
Data Mining
Digital Intelligence
Operational Intelligence
Low Latency and NoSQL
Cloud Computing
cleverdata.ru | info@cleverdata.ru
Агенда
• Про задачу;
• Hadoop vs. Spark;
• Особенности;
• Что дальше.
cleverdata.ru | info@cleverdata.ru
publishers
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
advertisers
D
S
P
S
S
P
Real Time Bidding (RTB)
TRACKING DATA
cleverdata.ru | info@cleverdata.ru
publishers
COOKIE SYNCs
ACCESS LOGS
PARTNER’S DATA
3rd PARTY DATA
CLICK STREAMS
advertisers
S
S
P
D
S
P
DMP
Data Management Platform (DMP)
cleverdata.ru | info@cleverdata.ru
3rd party
data
Relational Data Store
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
Типовые потоки данных
cleverdata.ru | info@cleverdata.ru
Типовые потоки данных :: RTB
3rd party
data
Relational Data Store
RTB
SRV
Exchange
SSP
bid req.
bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
cleverdata.ru | info@cleverdata.ru
1st-party data
3rd party
data
Relational Data Store
RTB
SRV
Exchange
SSP
bid req.
bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
cleverdata.ru | info@cleverdata.ru
1st-party data
• Зачем монетизировать?
• Как монетизировать?
• Чем монетизировать?
cleverdata.ru | info@cleverdata.ru
Зачем монетизировать?
Найти всех пользователей, которые
участвовали в рекламной кампании “Star Wars” [и]
видели один из баннеров “Darth Vader” или “Luke Skywalker”
в течении последних 6 дней [и]
кликнули на этот баннер [и]
посетили страницу покупки светового меча Darth’а Vader’а [и]
но так ничего и не купили
Для того, чтобы
сделать ретаргетинг персонифицированным баннером со
скидкой на меч в 40%
cleverdata.ru | info@cleverdata.ru
find all users who have
taken part in campaign[s] “Star Wars” [and]
viewed banner[s] “Darth Vader” or “Luke Skywalker”
during [last] 6 day[s] [and]
clicked banner[s] “Darth Vader's lightsaber” [and]
visited buying area of “Darth Vader's lightsaber” [and]
not visited order confirmed area of “Darth Vader's lightsaber”
Как монетизировать?
[impression]
[click]
[tr. pixel]
[tr. pixel]
id cookie event_id event_type campaign_id timestamp …
1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 …
2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 …
3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 …
[cookies]
cleverdata.ru | info@cleverdata.ru
Как монетизировать?
reducefind all users who have
taken part in campaign[s] “Star Wars”
viewed banner[s] “Darth Vader” or
“Luke Skywalker” during [last] 6 day[s]
clicked banner[s] “Darth Vader's
lightsaber”
visited buying area of “Darth Vader's
lightsaber”
not visited order confirmed area of “Darth
Vader's lightsaber”
(c1, 0)
(c1, 1)
(c1, 2)
(c1, 3)
Ø
map
(c1, 0;1;2;3)
true(0) and
true(1) and
true(2) and
true(3) and
not false(4)
C1
cleverdata.ru | info@cleverdata.ru
VS.
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Правда жизни
• Стильно;
• Модно;
• Молодежно.
cleverdata.ru | info@cleverdata.ru
Spark :: Размер
cleverdata.ru | info@cleverdata.ru
Перед тем, как смотреть на Hadoop
cleverdata.ru | info@cleverdata.ru
Map-Reduce :: Размер
cleverdata.ru | info@cleverdata.ru
Материалы и инструменты
Hardware (3 Nodes)
• 12 Core AMD Opteron™ 6338P
~ 2.8 GHz
• 64 GB RAM
• 1 GBPS NICs
Software
• CDH 5.3.1 (Hadoop 2.5.0)
• Spark 1.2.0
Data
• 14.2 GB of raw data
• 61.1 M of transactions
• 128 MB block size
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Время выполнения
cleverdata.ru | info@cleverdata.ru
Spark :: Exec-cores vs Num-execs
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Инициализация
MR
protected void setup(Context ctx)
o.a.h.c.Configured
distributed cache
Spark
mapRegion
broadcast vars
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Параллелизм
MR
mapred.reduce.tasks
mapreduce.job.reduces
splittable formats
Spark
spark.default.parallelism
num-executors, executor-cores in
yarn
numTasks в groupByKey,
reduceByKey, aggregateByKey…
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Зависимости
MR
o.a.h.u.Tool
o.a.h.u.ToolRunner
-conf app.conf
-files
-libjars
setUserClassesTakesPrecedence
Spark
--jars
--files
--conf
--driver-java-options
spark.driver.extraJavaOptions
spark.executor.extraJavaOptions
spark.driver.userClassPathFirst
spark.executor.userClassPathFirst
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Secondary Sort
MR
setSortComparatorClass
setGroupingComparatorClass
setPartitionerClass
Spark
repartitionAndSortWithinPartitions
mapPartitions
Entire partition processing result
must be able to fit in memory
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Тестирование
MR
MRUnit
o.a.h.h.MiniDFSCluster
o.a.h.m.MiniMRCluster
o.a.h.y.s.MiniYARNCluster
o.a.h.m.v2.MiniMRYarnCluster
Spark
Local executor
cleverdata.ru | info@cleverdata.ru
Что дальше и почему Spark?
• Spark Streaming;
• Micro Batches;
• λ-архитектура.
без серьезного хирургического вмешательства
cleverdata.ru | info@cleverdata.ru
Спасибо за вопросы!
info@cleverleaf.co.uk :: info@cleverdata.ru
cleverleaf.co.uk :: cleverdata.ru
1dmp.io :: crawler.1dmp.io
facebook.com/CleverData :: +7 (495) 967-66-50

Contenu connexe

En vedette

5 Sunday Hacks to a Great Week
5 Sunday Hacks to a Great Week5 Sunday Hacks to a Great Week
5 Sunday Hacks to a Great WeekJay Gotra
 
10 Things You Didn't Know:  Jack Dorsey
10 Things You Didn't Know:  Jack Dorsey10 Things You Didn't Know:  Jack Dorsey
10 Things You Didn't Know:  Jack DorseyJay Gotra
 
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015Антон Шестаков
 
Let's Encrypt
Let's EncryptLet's Encrypt
Let's EncryptJay Gotra
 
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...Антон Шестаков
 
4 animaux pour_une_femme
4 animaux pour_une_femme4 animaux pour_une_femme
4 animaux pour_une_femmeRenée Bukay
 
Com fem les làmines de l'espai...
Com fem les làmines de l'espai...Com fem les làmines de l'espai...
Com fem les làmines de l'espai...laclassedequartb
 
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++Антон Шестаков
 
Electrochemical Machining
Electrochemical MachiningElectrochemical Machining
Electrochemical MachiningSushima Keisham
 
Comment être expert dans l'innovation ?
Comment être expert dans l'innovation ?Comment être expert dans l'innovation ?
Comment être expert dans l'innovation ?John Passy
 
Comment être un expert en innovation - Retour d'expérience
Comment être un expert en innovation - Retour d'expérienceComment être un expert en innovation - Retour d'expérience
Comment être un expert en innovation - Retour d'expérienceJohn Passy
 
Eviter les désastres de sous-traitance Offshore Indien !
Eviter les désastres de sous-traitance Offshore Indien !Eviter les désastres de sous-traitance Offshore Indien !
Eviter les désastres de sous-traitance Offshore Indien !John Passy
 
Feerie d orchidees
Feerie d orchideesFeerie d orchidees
Feerie d orchideesRenée Bukay
 

En vedette (15)

5 Sunday Hacks to a Great Week
5 Sunday Hacks to a Great Week5 Sunday Hacks to a Great Week
5 Sunday Hacks to a Great Week
 
10 Things You Didn't Know:  Jack Dorsey
10 Things You Didn't Know:  Jack Dorsey10 Things You Didn't Know:  Jack Dorsey
10 Things You Didn't Know:  Jack Dorsey
 
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
 
Let's Encrypt
Let's EncryptLet's Encrypt
Let's Encrypt
 
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
 
4 animaux pour_une_femme
4 animaux pour_une_femme4 animaux pour_une_femme
4 animaux pour_une_femme
 
Joelle chelala
Joelle chelalaJoelle chelala
Joelle chelala
 
Com fem les làmines de l'espai...
Com fem les làmines de l'espai...Com fem les làmines de l'espai...
Com fem les làmines de l'espai...
 
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
 
Electrochemical Machining
Electrochemical MachiningElectrochemical Machining
Electrochemical Machining
 
Comment être expert dans l'innovation ?
Comment être expert dans l'innovation ?Comment être expert dans l'innovation ?
Comment être expert dans l'innovation ?
 
Comment être un expert en innovation - Retour d'expérience
Comment être un expert en innovation - Retour d'expérienceComment être un expert en innovation - Retour d'expérience
Comment être un expert en innovation - Retour d'expérience
 
Eviter les désastres de sous-traitance Offshore Indien !
Eviter les désastres de sous-traitance Offshore Indien !Eviter les désastres de sous-traitance Offshore Indien !
Eviter les désastres de sous-traitance Offshore Indien !
 
Feerie d orchidees
Feerie d orchideesFeerie d orchidees
Feerie d orchidees
 
Agglos
AgglosAgglos
Agglos
 

Similaire à Выступление Сергея Жемжицкого, CleverData

Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014KZero Worldswide
 
Kde jsou limity zákaznické 360°?
 Kde jsou limity zákaznické 360°? Kde jsou limity zákaznické 360°?
Kde jsou limity zákaznické 360°?Taste Medio
 
Intelligence Data Day 2020
Intelligence Data Day 2020Intelligence Data Day 2020
Intelligence Data Day 2020Patrick Deglon
 
Publishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarPublishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarMatěj Novák
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsMark Kromer
 
Experience Summary
Experience SummaryExperience Summary
Experience SummarySanket Dave
 
Analytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfAnalytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfHuman37
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...confluent
 
Product Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct School
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Olivier Van Baeveghem
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfOvercoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfScyllaDB
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Databricks
 
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategySurvival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategyVIVALDI
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacleShareThis
 
CRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketCRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketSB Soft
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013nkabra
 

Similaire à Выступление Сергея Жемжицкого, CleverData (20)

VR Radar Chart Q2 2014
VR Radar Chart Q2 2014VR Radar Chart Q2 2014
VR Radar Chart Q2 2014
 
Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014
 
Kde jsou limity zákaznické 360°?
 Kde jsou limity zákaznické 360°? Kde jsou limity zákaznické 360°?
Kde jsou limity zákaznické 360°?
 
Intelligence Data Day 2020
Intelligence Data Day 2020Intelligence Data Day 2020
Intelligence Data Day 2020
 
Publishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarPublishers' Life After Cookies Webinar
Publishers' Life After Cookies Webinar
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
Experience Summary
Experience SummaryExperience Summary
Experience Summary
 
Analytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfAnalytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdf
 
The Sizmek_Tech solutions
The Sizmek_Tech solutionsThe Sizmek_Tech solutions
The Sizmek_Tech solutions
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Product Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator X
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfOvercoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
 
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategySurvival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
 
CRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketCRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury Market
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 

Dernier

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 

Dernier (20)

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 

Выступление Сергея Жемжицкого, CleverData

  • 1. Company Profile Сегментация пользователей в online-рекламе Spark vs Hadoop Сергей Жемжицкий, CTO, CleverDATA, 22 мая, 2015
  • 2. cleverdata.ru | info@cleverdata.ru International market business development since 2012 One of three leading IT companies in Russia 43 branches in Russia and abroad +5500 employees 100K projects for 10K customers Data management innovative platform (Data Exchange Service) Cloud Service In-house development Internet advertising solutions Data Management Platforms Customers Base Management Web Analytics Marketing automation Big Data Data Mining Digital Intelligence Operational Intelligence Low Latency and NoSQL Cloud Computing
  • 3. cleverdata.ru | info@cleverdata.ru Агенда • Про задачу; • Hadoop vs. Spark; • Особенности; • Что дальше.
  • 4. cleverdata.ru | info@cleverdata.ru publishers AD NETWORK AD NETWORK AD NETWORK AD NETWORK AD NETWORK AD NETWORK advertisers D S P S S P Real Time Bidding (RTB)
  • 5. TRACKING DATA cleverdata.ru | info@cleverdata.ru publishers COOKIE SYNCs ACCESS LOGS PARTNER’S DATA 3rd PARTY DATA CLICK STREAMS advertisers S S P D S P DMP Data Management Platform (DMP)
  • 6. cleverdata.ru | info@cleverdata.ru 3rd party data Relational Data Store raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates Типовые потоки данных
  • 7. cleverdata.ru | info@cleverdata.ru Типовые потоки данных :: RTB 3rd party data Relational Data Store RTB SRV Exchange SSP bid req. bid resp. pixels :: impressions :: clicks bid requests user profiles raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates
  • 8. cleverdata.ru | info@cleverdata.ru 1st-party data 3rd party data Relational Data Store RTB SRV Exchange SSP bid req. bid resp. pixels :: impressions :: clicks bid requests user profiles raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates
  • 9. cleverdata.ru | info@cleverdata.ru 1st-party data • Зачем монетизировать? • Как монетизировать? • Чем монетизировать?
  • 10. cleverdata.ru | info@cleverdata.ru Зачем монетизировать? Найти всех пользователей, которые участвовали в рекламной кампании “Star Wars” [и] видели один из баннеров “Darth Vader” или “Luke Skywalker” в течении последних 6 дней [и] кликнули на этот баннер [и] посетили страницу покупки светового меча Darth’а Vader’а [и] но так ничего и не купили Для того, чтобы сделать ретаргетинг персонифицированным баннером со скидкой на меч в 40%
  • 11. cleverdata.ru | info@cleverdata.ru find all users who have taken part in campaign[s] “Star Wars” [and] viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s] [and] clicked banner[s] “Darth Vader's lightsaber” [and] visited buying area of “Darth Vader's lightsaber” [and] not visited order confirmed area of “Darth Vader's lightsaber” Как монетизировать? [impression] [click] [tr. pixel] [tr. pixel] id cookie event_id event_type campaign_id timestamp … 1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 … 2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 … 3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 … [cookies]
  • 12. cleverdata.ru | info@cleverdata.ru Как монетизировать? reducefind all users who have taken part in campaign[s] “Star Wars” viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s] clicked banner[s] “Darth Vader's lightsaber” visited buying area of “Darth Vader's lightsaber” not visited order confirmed area of “Darth Vader's lightsaber” (c1, 0) (c1, 1) (c1, 2) (c1, 3) Ø map (c1, 0;1;2;3) true(0) and true(1) and true(2) and true(3) and not false(4) C1
  • 14. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Правда жизни • Стильно; • Модно; • Молодежно.
  • 16. cleverdata.ru | info@cleverdata.ru Перед тем, как смотреть на Hadoop
  • 18. cleverdata.ru | info@cleverdata.ru Материалы и инструменты Hardware (3 Nodes) • 12 Core AMD Opteron™ 6338P ~ 2.8 GHz • 64 GB RAM • 1 GBPS NICs Software • CDH 5.3.1 (Hadoop 2.5.0) • Spark 1.2.0 Data • 14.2 GB of raw data • 61.1 M of transactions • 128 MB block size
  • 19. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Время выполнения
  • 20. cleverdata.ru | info@cleverdata.ru Spark :: Exec-cores vs Num-execs
  • 21. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Инициализация MR protected void setup(Context ctx) o.a.h.c.Configured distributed cache Spark mapRegion broadcast vars
  • 22. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Параллелизм MR mapred.reduce.tasks mapreduce.job.reduces splittable formats Spark spark.default.parallelism num-executors, executor-cores in yarn numTasks в groupByKey, reduceByKey, aggregateByKey…
  • 23. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Зависимости MR o.a.h.u.Tool o.a.h.u.ToolRunner -conf app.conf -files -libjars setUserClassesTakesPrecedence Spark --jars --files --conf --driver-java-options spark.driver.extraJavaOptions spark.executor.extraJavaOptions spark.driver.userClassPathFirst spark.executor.userClassPathFirst
  • 24. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Secondary Sort MR setSortComparatorClass setGroupingComparatorClass setPartitionerClass Spark repartitionAndSortWithinPartitions mapPartitions Entire partition processing result must be able to fit in memory
  • 25. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Тестирование MR MRUnit o.a.h.h.MiniDFSCluster o.a.h.m.MiniMRCluster o.a.h.y.s.MiniYARNCluster o.a.h.m.v2.MiniMRYarnCluster Spark Local executor
  • 26. cleverdata.ru | info@cleverdata.ru Что дальше и почему Spark? • Spark Streaming; • Micro Batches; • λ-архитектура. без серьезного хирургического вмешательства
  • 28. info@cleverleaf.co.uk :: info@cleverdata.ru cleverleaf.co.uk :: cleverdata.ru 1dmp.io :: crawler.1dmp.io facebook.com/CleverData :: +7 (495) 967-66-50