SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Company Profile
Сегментация пользователей
в online-рекламе
Spark vs Hadoop
Сергей Жемжицкий,
CTO, CleverDATA,
22 мая, 2015
cleverdata.ru | info@cleverdata.ru
International market
business development
since 2012
One of three leading IT companies in Russia
43 branches in Russia and abroad
+5500 employees
100K projects for 10K customers
Data management innovative
platform (Data Exchange Service)
Cloud Service
In-house development
Internet advertising solutions
Data Management Platforms
Customers Base Management
Web Analytics
Marketing automation
Big Data
Data Mining
Digital Intelligence
Operational Intelligence
Low Latency and NoSQL
Cloud Computing
cleverdata.ru | info@cleverdata.ru
Агенда
• Про задачу;
• Hadoop vs. Spark;
• Особенности;
• Что дальше.
cleverdata.ru | info@cleverdata.ru
publishers
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
advertisers
D
S
P
S
S
P
Real Time Bidding (RTB)
TRACKING DATA
cleverdata.ru | info@cleverdata.ru
publishers
COOKIE SYNCs
ACCESS LOGS
PARTNER’S DATA
3rd PARTY DATA
CLICK STREAMS
advertisers
S
S
P
D
S
P
DMP
Data Management Platform (DMP)
cleverdata.ru | info@cleverdata.ru
3rd party
data
Relational Data Store
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
Типовые потоки данных
cleverdata.ru | info@cleverdata.ru
Типовые потоки данных :: RTB
3rd party
data
Relational Data Store
RTB
SRV
Exchange
SSP
bid req.
bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
cleverdata.ru | info@cleverdata.ru
1st-party data
3rd party
data
Relational Data Store
RTB
SRV
Exchange
SSP
bid req.
bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
cleverdata.ru | info@cleverdata.ru
1st-party data
• Зачем монетизировать?
• Как монетизировать?
• Чем монетизировать?
cleverdata.ru | info@cleverdata.ru
Зачем монетизировать?
Найти всех пользователей, которые
участвовали в рекламной кампании “Star Wars” [и]
видели один из баннеров “Darth Vader” или “Luke Skywalker”
в течении последних 6 дней [и]
кликнули на этот баннер [и]
посетили страницу покупки светового меча Darth’а Vader’а [и]
но так ничего и не купили
Для того, чтобы
сделать ретаргетинг персонифицированным баннером со
скидкой на меч в 40%
cleverdata.ru | info@cleverdata.ru
find all users who have
taken part in campaign[s] “Star Wars” [and]
viewed banner[s] “Darth Vader” or “Luke Skywalker”
during [last] 6 day[s] [and]
clicked banner[s] “Darth Vader's lightsaber” [and]
visited buying area of “Darth Vader's lightsaber” [and]
not visited order confirmed area of “Darth Vader's lightsaber”
Как монетизировать?
[impression]
[click]
[tr. pixel]
[tr. pixel]
id cookie event_id event_type campaign_id timestamp …
1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 …
2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 …
3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 …
[cookies]
cleverdata.ru | info@cleverdata.ru
Как монетизировать?
reducefind all users who have
taken part in campaign[s] “Star Wars”
viewed banner[s] “Darth Vader” or
“Luke Skywalker” during [last] 6 day[s]
clicked banner[s] “Darth Vader's
lightsaber”
visited buying area of “Darth Vader's
lightsaber”
not visited order confirmed area of “Darth
Vader's lightsaber”
(c1, 0)
(c1, 1)
(c1, 2)
(c1, 3)
Ø
map
(c1, 0;1;2;3)
true(0) and
true(1) and
true(2) and
true(3) and
not false(4)
C1
cleverdata.ru | info@cleverdata.ru
VS.
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Правда жизни
• Стильно;
• Модно;
• Молодежно.
cleverdata.ru | info@cleverdata.ru
Spark :: Размер
cleverdata.ru | info@cleverdata.ru
Перед тем, как смотреть на Hadoop
cleverdata.ru | info@cleverdata.ru
Map-Reduce :: Размер
cleverdata.ru | info@cleverdata.ru
Материалы и инструменты
Hardware (3 Nodes)
• 12 Core AMD Opteron™ 6338P
~ 2.8 GHz
• 64 GB RAM
• 1 GBPS NICs
Software
• CDH 5.3.1 (Hadoop 2.5.0)
• Spark 1.2.0
Data
• 14.2 GB of raw data
• 61.1 M of transactions
• 128 MB block size
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Время выполнения
cleverdata.ru | info@cleverdata.ru
Spark :: Exec-cores vs Num-execs
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Инициализация
MR
protected void setup(Context ctx)
o.a.h.c.Configured
distributed cache
Spark
mapRegion
broadcast vars
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Параллелизм
MR
mapred.reduce.tasks
mapreduce.job.reduces
splittable formats
Spark
spark.default.parallelism
num-executors, executor-cores in
yarn
numTasks в groupByKey,
reduceByKey, aggregateByKey…
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Зависимости
MR
o.a.h.u.Tool
o.a.h.u.ToolRunner
-conf app.conf
-files
-libjars
setUserClassesTakesPrecedence
Spark
--jars
--files
--conf
--driver-java-options
spark.driver.extraJavaOptions
spark.executor.extraJavaOptions
spark.driver.userClassPathFirst
spark.executor.userClassPathFirst
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Secondary Sort
MR
setSortComparatorClass
setGroupingComparatorClass
setPartitionerClass
Spark
repartitionAndSortWithinPartitions
mapPartitions
Entire partition processing result
must be able to fit in memory
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Тестирование
MR
MRUnit
o.a.h.h.MiniDFSCluster
o.a.h.m.MiniMRCluster
o.a.h.y.s.MiniYARNCluster
o.a.h.m.v2.MiniMRYarnCluster
Spark
Local executor
cleverdata.ru | info@cleverdata.ru
Что дальше и почему Spark?
• Spark Streaming;
• Micro Batches;
• λ-архитектура.
без серьезного хирургического вмешательства
cleverdata.ru | info@cleverdata.ru
Спасибо за вопросы!
info@cleverleaf.co.uk :: info@cleverdata.ru
cleverleaf.co.uk :: cleverdata.ru
1dmp.io :: crawler.1dmp.io
facebook.com/CleverData :: +7 (495) 967-66-50

Contenu connexe

En vedette

La tierraenminiatura
La tierraenminiaturaLa tierraenminiatura
La tierraenminiaturatinohermida
 
Thursday assure
Thursday assureThursday assure
Thursday assuregunnell3
 
Online MLM Success Tip - Build a List!
Online MLM Success Tip - Build a List!Online MLM Success Tip - Build a List!
Online MLM Success Tip - Build a List!ToolsToProsperity.Com
 
2 nd published paper
2 nd published paper2 nd published paper
2 nd published paperAdemola Jimoh
 
Principios de tecnología educativa
Principios de tecnología educativaPrincipios de tecnología educativa
Principios de tecnología educativalizamec
 
CV_MarionSaby_12052015
CV_MarionSaby_12052015CV_MarionSaby_12052015
CV_MarionSaby_12052015Marion Saby
 
El Medio Ambiente by Irina
El Medio Ambiente by Irina El Medio Ambiente by Irina
El Medio Ambiente by Irina doriquinto
 
FIGURAS CON LAS ALDEAS DEL CLASH OF CLANS
FIGURAS CON LAS ALDEAS DEL CLASH OF CLANSFIGURAS CON LAS ALDEAS DEL CLASH OF CLANS
FIGURAS CON LAS ALDEAS DEL CLASH OF CLANSchristiian11
 
9.genetica.ppt.hereditariedade
9.genetica.ppt.hereditariedade9.genetica.ppt.hereditariedade
9.genetica.ppt.hereditariedadejuniortaro
 
Discipulos de-jesus
Discipulos de-jesusDiscipulos de-jesus
Discipulos de-jesusAngelo Rama
 

En vedette (19)

наука
науканаука
наука
 
Weka7 11
Weka7 11Weka7 11
Weka7 11
 
La tierraenminiatura
La tierraenminiaturaLa tierraenminiatura
La tierraenminiatura
 
Parãfrase
ParãfraseParãfrase
Parãfrase
 
Thursday assure
Thursday assureThursday assure
Thursday assure
 
Online MLM Success Tip - Build a List!
Online MLM Success Tip - Build a List!Online MLM Success Tip - Build a List!
Online MLM Success Tip - Build a List!
 
2 nd published paper
2 nd published paper2 nd published paper
2 nd published paper
 
Cn9 bq 00009
Cn9 bq 00009Cn9 bq 00009
Cn9 bq 00009
 
Reflexión
ReflexiónReflexión
Reflexión
 
Principios de tecnología educativa
Principios de tecnología educativaPrincipios de tecnología educativa
Principios de tecnología educativa
 
La Basura Electrónica
La Basura ElectrónicaLa Basura Electrónica
La Basura Electrónica
 
CV_MarionSaby_12052015
CV_MarionSaby_12052015CV_MarionSaby_12052015
CV_MarionSaby_12052015
 
Procesos psicologicos en el desarrollo
Procesos psicologicos en el desarrolloProcesos psicologicos en el desarrollo
Procesos psicologicos en el desarrollo
 
El Medio Ambiente by Irina
El Medio Ambiente by Irina El Medio Ambiente by Irina
El Medio Ambiente by Irina
 
Sin bacterias no hay vida
Sin bacterias no hay vidaSin bacterias no hay vida
Sin bacterias no hay vida
 
FIGURAS CON LAS ALDEAS DEL CLASH OF CLANS
FIGURAS CON LAS ALDEAS DEL CLASH OF CLANSFIGURAS CON LAS ALDEAS DEL CLASH OF CLANS
FIGURAS CON LAS ALDEAS DEL CLASH OF CLANS
 
9.genetica.ppt.hereditariedade
9.genetica.ppt.hereditariedade9.genetica.ppt.hereditariedade
9.genetica.ppt.hereditariedade
 
Discipulos de-jesus
Discipulos de-jesusDiscipulos de-jesus
Discipulos de-jesus
 
Morfofisiologia 2
Morfofisiologia 2Morfofisiologia 2
Morfofisiologia 2
 

Similaire à Hadoop meetup zhemzhitsky

Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014KZero Worldswide
 
Kde jsou limity zákaznické 360°?
 Kde jsou limity zákaznické 360°? Kde jsou limity zákaznické 360°?
Kde jsou limity zákaznické 360°?Taste Medio
 
Intelligence Data Day 2020
Intelligence Data Day 2020Intelligence Data Day 2020
Intelligence Data Day 2020Patrick Deglon
 
Publishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarPublishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarMatěj Novák
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsMark Kromer
 
Experience Summary
Experience SummaryExperience Summary
Experience SummarySanket Dave
 
Analytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfAnalytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfHuman37
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...confluent
 
Product Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct School
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Olivier Van Baeveghem
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfOvercoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfScyllaDB
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Databricks
 
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategySurvival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategyVIVALDI
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacleShareThis
 
CRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketCRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketSB Soft
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013nkabra
 

Similaire à Hadoop meetup zhemzhitsky (20)

VR Radar Chart Q2 2014
VR Radar Chart Q2 2014VR Radar Chart Q2 2014
VR Radar Chart Q2 2014
 
Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014
 
Kde jsou limity zákaznické 360°?
 Kde jsou limity zákaznické 360°? Kde jsou limity zákaznické 360°?
Kde jsou limity zákaznické 360°?
 
Intelligence Data Day 2020
Intelligence Data Day 2020Intelligence Data Day 2020
Intelligence Data Day 2020
 
Publishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarPublishers' Life After Cookies Webinar
Publishers' Life After Cookies Webinar
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
Experience Summary
Experience SummaryExperience Summary
Experience Summary
 
Analytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfAnalytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdf
 
The Sizmek_Tech solutions
The Sizmek_Tech solutionsThe Sizmek_Tech solutions
The Sizmek_Tech solutions
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Product Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator X
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfOvercoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
 
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategySurvival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
 
CRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketCRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury Market
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 

Plus de Антон Шестаков

Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...Антон Шестаков
 
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015Антон Шестаков
 
Выступление Александра Петрова на Hybrid Conf 2015
Выступление Александра Петрова на Hybrid Conf 2015Выступление Александра Петрова на Hybrid Conf 2015
Выступление Александра Петрова на Hybrid Conf 2015Антон Шестаков
 
Выступление Сергея Жемжицкого, CleverData
Выступление Сергея Жемжицкого, CleverDataВыступление Сергея Жемжицкого, CleverData
Выступление Сергея Жемжицкого, CleverDataАнтон Шестаков
 
Выступление Александра Петрова из DCA (Data-Centric Alliance)
Выступление Александра Петрова из DCA (Data-Centric Alliance)Выступление Александра Петрова из DCA (Data-Centric Alliance)
Выступление Александра Петрова из DCA (Data-Centric Alliance)Антон Шестаков
 
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++Антон Шестаков
 

Plus de Антон Шестаков (6)

Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
 
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
 
Выступление Александра Петрова на Hybrid Conf 2015
Выступление Александра Петрова на Hybrid Conf 2015Выступление Александра Петрова на Hybrid Conf 2015
Выступление Александра Петрова на Hybrid Conf 2015
 
Выступление Сергея Жемжицкого, CleverData
Выступление Сергея Жемжицкого, CleverDataВыступление Сергея Жемжицкого, CleverData
Выступление Сергея Жемжицкого, CleverData
 
Выступление Александра Петрова из DCA (Data-Centric Alliance)
Выступление Александра Петрова из DCA (Data-Centric Alliance)Выступление Александра Петрова из DCA (Data-Centric Alliance)
Выступление Александра Петрова из DCA (Data-Centric Alliance)
 
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
 

Hadoop meetup zhemzhitsky

  • 1. Company Profile Сегментация пользователей в online-рекламе Spark vs Hadoop Сергей Жемжицкий, CTO, CleverDATA, 22 мая, 2015
  • 2. cleverdata.ru | info@cleverdata.ru International market business development since 2012 One of three leading IT companies in Russia 43 branches in Russia and abroad +5500 employees 100K projects for 10K customers Data management innovative platform (Data Exchange Service) Cloud Service In-house development Internet advertising solutions Data Management Platforms Customers Base Management Web Analytics Marketing automation Big Data Data Mining Digital Intelligence Operational Intelligence Low Latency and NoSQL Cloud Computing
  • 3. cleverdata.ru | info@cleverdata.ru Агенда • Про задачу; • Hadoop vs. Spark; • Особенности; • Что дальше.
  • 4. cleverdata.ru | info@cleverdata.ru publishers AD NETWORK AD NETWORK AD NETWORK AD NETWORK AD NETWORK AD NETWORK advertisers D S P S S P Real Time Bidding (RTB)
  • 5. TRACKING DATA cleverdata.ru | info@cleverdata.ru publishers COOKIE SYNCs ACCESS LOGS PARTNER’S DATA 3rd PARTY DATA CLICK STREAMS advertisers S S P D S P DMP Data Management Platform (DMP)
  • 6. cleverdata.ru | info@cleverdata.ru 3rd party data Relational Data Store raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates Типовые потоки данных
  • 7. cleverdata.ru | info@cleverdata.ru Типовые потоки данных :: RTB 3rd party data Relational Data Store RTB SRV Exchange SSP bid req. bid resp. pixels :: impressions :: clicks bid requests user profiles raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates
  • 8. cleverdata.ru | info@cleverdata.ru 1st-party data 3rd party data Relational Data Store RTB SRV Exchange SSP bid req. bid resp. pixels :: impressions :: clicks bid requests user profiles raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates
  • 9. cleverdata.ru | info@cleverdata.ru 1st-party data • Зачем монетизировать? • Как монетизировать? • Чем монетизировать?
  • 10. cleverdata.ru | info@cleverdata.ru Зачем монетизировать? Найти всех пользователей, которые участвовали в рекламной кампании “Star Wars” [и] видели один из баннеров “Darth Vader” или “Luke Skywalker” в течении последних 6 дней [и] кликнули на этот баннер [и] посетили страницу покупки светового меча Darth’а Vader’а [и] но так ничего и не купили Для того, чтобы сделать ретаргетинг персонифицированным баннером со скидкой на меч в 40%
  • 11. cleverdata.ru | info@cleverdata.ru find all users who have taken part in campaign[s] “Star Wars” [and] viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s] [and] clicked banner[s] “Darth Vader's lightsaber” [and] visited buying area of “Darth Vader's lightsaber” [and] not visited order confirmed area of “Darth Vader's lightsaber” Как монетизировать? [impression] [click] [tr. pixel] [tr. pixel] id cookie event_id event_type campaign_id timestamp … 1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 … 2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 … 3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 … [cookies]
  • 12. cleverdata.ru | info@cleverdata.ru Как монетизировать? reducefind all users who have taken part in campaign[s] “Star Wars” viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s] clicked banner[s] “Darth Vader's lightsaber” visited buying area of “Darth Vader's lightsaber” not visited order confirmed area of “Darth Vader's lightsaber” (c1, 0) (c1, 1) (c1, 2) (c1, 3) Ø map (c1, 0;1;2;3) true(0) and true(1) and true(2) and true(3) and not false(4) C1
  • 14. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Правда жизни • Стильно; • Модно; • Молодежно.
  • 16. cleverdata.ru | info@cleverdata.ru Перед тем, как смотреть на Hadoop
  • 18. cleverdata.ru | info@cleverdata.ru Материалы и инструменты Hardware (3 Nodes) • 12 Core AMD Opteron™ 6338P ~ 2.8 GHz • 64 GB RAM • 1 GBPS NICs Software • CDH 5.3.1 (Hadoop 2.5.0) • Spark 1.2.0 Data • 14.2 GB of raw data • 61.1 M of transactions • 128 MB block size
  • 19. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Время выполнения
  • 20. cleverdata.ru | info@cleverdata.ru Spark :: Exec-cores vs Num-execs
  • 21. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Инициализация MR protected void setup(Context ctx) o.a.h.c.Configured distributed cache Spark mapRegion broadcast vars
  • 22. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Параллелизм MR mapred.reduce.tasks mapreduce.job.reduces splittable formats Spark spark.default.parallelism num-executors, executor-cores in yarn numTasks в groupByKey, reduceByKey, aggregateByKey…
  • 23. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Зависимости MR o.a.h.u.Tool o.a.h.u.ToolRunner -conf app.conf -files -libjars setUserClassesTakesPrecedence Spark --jars --files --conf --driver-java-options spark.driver.extraJavaOptions spark.executor.extraJavaOptions spark.driver.userClassPathFirst spark.executor.userClassPathFirst
  • 24. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Secondary Sort MR setSortComparatorClass setGroupingComparatorClass setPartitionerClass Spark repartitionAndSortWithinPartitions mapPartitions Entire partition processing result must be able to fit in memory
  • 25. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Тестирование MR MRUnit o.a.h.h.MiniDFSCluster o.a.h.m.MiniMRCluster o.a.h.y.s.MiniYARNCluster o.a.h.m.v2.MiniMRYarnCluster Spark Local executor
  • 26. cleverdata.ru | info@cleverdata.ru Что дальше и почему Spark? • Spark Streaming; • Micro Batches; • λ-архитектура. без серьезного хирургического вмешательства
  • 28. info@cleverleaf.co.uk :: info@cleverdata.ru cleverleaf.co.uk :: cleverdata.ru 1dmp.io :: crawler.1dmp.io facebook.com/CleverData :: +7 (495) 967-66-50