SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Large scale data
processing pipelines
at trivago: a use case
2016-04-07, Neuss, Germany
Clemens Valiente
Email: clemens.valiente@trivago.com
Phone: +49 (0) 211 540 65 - 137
de.linkedin.com/in/clemensvaliente
Lead BI Data Engineer
trivago Düsseldorf
Originally a mathematician
Studied at Uni Erlangen
At trivago for 5 years
Clemens Valiente
Data driven PR and External
Communication
Price information collected
from the various booking
websites and shown to our
visitors also gives us a
thorough overview over trends
and development of hotel
prices. This knowledge then is
used by our Content Marketing
& Communication Department
(CMC) to write stories and
articles.
3
4
The past: Data pipeline 2010 – 2015
Java Software
Engineering
Business
Intelligence
CMC
5
The past: Data pipeline 2010 – 2015
Facts & Figures
Price dimensions
- Around one million hotels
- 250 booking websites
- Travellers search for up to
180 days in advance
- Data collected over five
years
Restrictions
- Only single night stays
- Only prices from
European visitors
- Prices cached up to 30
minutes
- One price per hotel,
website and arrival date
per day
- “Insert ignore”: The first
price per key wins
Size of data
- We collected a total of 56
billion prices in those five
years
- Towards the end of this
pipeline in 2015 the
average was around 100
million prices per day
written to BI
6
The past: Data pipeline 2010 – 2015
Java Software
Engineering
Business
Intelligence
CMC
7
Refactoring the pipeline: Requirements
• Scales with an arbitrary amount of data (future proof)
• reliable and resilient
• low performance impact on Java backend
• long term storage of raw input data
• fast processing of filtered and aggregated data
• Open source
• we want to log everything:
• more prices
• Length of stay, room type, breakfast info, room category, domain
• with more information
• Net & gross price, city tax, resort fee, affiliate fee, VAT
8
Present data pipeline 2016 – ingestion
San Francisco
Düsseldorf
Hong Kong
9
Present data pipeline 2016 – processing
Camus
CMC
10
Present data pipeline 2016 – facts &
figures
Cluster specifications
- 24 machines (to be 48)
- 700 TB disc space, 60%
used
- 1.7 TB memory in Yarn
- 24 cores per machine, 864
VCores in total
(overprovisioning)
Data Size (price log)
- 1.2 trillion messages
collected so far
- 6 billion messages/day
- 44 TB of data
Data processing
- Camus: 30 mappers writing
data in 10 minute intervals
- First aggregation/filtering
stage in Hive runs in 30
minutes with 5 days of
CPU time spent
- Impala Queries across
>100 GB of result tables
usually done within a few
seconds
11
Present data pipeline 2016 – results after one
year in production
• Very reliable, no downtime or service interuptions of the system
• Java team is very happy – less load on their system
• BI team is very happy – more data, more ressources to process it
• CMC team is very happy
• Faster results
• Better quality of results due to more data
• More detailed results
• => Shorter research phase, more and better stories
• => Less requests to BI
12
Present data pipeline 2016 – use cases &
status quo
Uses for price information
- Monitoring price parity in
hotel market
- Anomaly and fraud
detection
- Price feed for online
marketing
- Display of price
development and
delivering price alerts to
website visitors
Other data sources and
usage
- Clicklog information from
our website and mobile
app
- Used for marketing
performance analysis,
product tests, invoice
generation etc
Status quo
- Our entire business runs
on and through the kafka –
hadoop pipeline
- Almost all departments rely
on data, insights and
metrics delivered by
hadoop
- Most of the company could
not do their job without
hadoop data
13
Future data pipeline 2016
Camus
CMC
Message format:
CSV
Protobuf
avro
Stream processing
Storm
Spark
Samza?
Kafka Streams
Gobblin
14
Key challenges and learnings
Mastering hadoop
- Finding your log files
- Interpreting error
messages correctly
- Understanding settings
and how to use them to
solve problem
- Store data in wide,
denormalised Hive tables
in parquet format and
nested data types
Using hadoop
- Offer easy hadoop access
to users (Impala / Hive
JDBC with visualisation
tools)
- Educate users on how to
write good code, strict
guidelines and review
- Load balancer for Impala
- Daily dump of fs image as
hive table
Bad parts
- HUE (the standard GUI)
- Write oozie workflows and
coordinators in xml, not
through the Hue interface
- Monitoring impala
- Still some hard to find bugs
in Hive & Impala
15
Ressource overview (YARN)
16
Processes
17
Mapreduce job
18
reducers
19
logfiles
20
R shiny
Email: clemens.valiente@trivago.com
Phone: +49 (0) 211 540 65 - 137
de.linkedin.com/in/clemensvaliente
Lead BI Data Engineer
trivago Düsseldorf
Originally a mathematician
Studied at Uni Erlangen
At trivago for 5 years
Clemens Valiente
Thank you!
Now it is
time for
questions
and
comments!

Contenu connexe

Similaire à Large scale data processing pipelines at trivago

WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...
WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...
WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...
WSO2
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
Paul Barsch
 

Similaire à Large scale data processing pipelines at trivago (20)

Case study: How Cozy Cloud monitors every layer of its activity using OVH Met...
Case study: How Cozy Cloud monitors every layer of its activity using OVH Met...Case study: How Cozy Cloud monitors every layer of its activity using OVH Met...
Case study: How Cozy Cloud monitors every layer of its activity using OVH Met...
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forward
 
Magento B2B e-Commerce
Magento B2B e-CommerceMagento B2B e-Commerce
Magento B2B e-Commerce
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
RPA & AI IN THE CLOUD - THE LATEST IN AS-A-SERVICE SOLUTIONS FOR FINANCE
RPA & AI IN THE CLOUD - THE LATEST IN AS-A-SERVICE SOLUTIONS FOR FINANCERPA & AI IN THE CLOUD - THE LATEST IN AS-A-SERVICE SOLUTIONS FOR FINANCE
RPA & AI IN THE CLOUD - THE LATEST IN AS-A-SERVICE SOLUTIONS FOR FINANCE
 
How First to Value Beats First to Market: Case Studies of Fast Data Success
How First to Value Beats First to Market: Case Studies of Fast Data SuccessHow First to Value Beats First to Market: Case Studies of Fast Data Success
How First to Value Beats First to Market: Case Studies of Fast Data Success
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
 
WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...
WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...
WSO2Con Asia 2014 - Simultaneous Analysis of Massive Data Streams in real-tim...
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs
 
How Incuda builds user journey models with Snowplow
How Incuda builds user journey models with SnowplowHow Incuda builds user journey models with Snowplow
How Incuda builds user journey models with Snowplow
 
Apache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial ServicesApache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial Services
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
Simultaneous analysis of massive data streams in real time and batch
Simultaneous analysis of massive data streams in real time and batchSimultaneous analysis of massive data streams in real time and batch
Simultaneous analysis of massive data streams in real time and batch
 
OVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceOVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a Service
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 

Dernier

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Dernier (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

Large scale data processing pipelines at trivago

  • 1. Large scale data processing pipelines at trivago: a use case 2016-04-07, Neuss, Germany Clemens Valiente
  • 2. Email: clemens.valiente@trivago.com Phone: +49 (0) 211 540 65 - 137 de.linkedin.com/in/clemensvaliente Lead BI Data Engineer trivago Düsseldorf Originally a mathematician Studied at Uni Erlangen At trivago for 5 years Clemens Valiente
  • 3. Data driven PR and External Communication Price information collected from the various booking websites and shown to our visitors also gives us a thorough overview over trends and development of hotel prices. This knowledge then is used by our Content Marketing & Communication Department (CMC) to write stories and articles. 3
  • 4. 4 The past: Data pipeline 2010 – 2015 Java Software Engineering Business Intelligence CMC
  • 5. 5 The past: Data pipeline 2010 – 2015 Facts & Figures Price dimensions - Around one million hotels - 250 booking websites - Travellers search for up to 180 days in advance - Data collected over five years Restrictions - Only single night stays - Only prices from European visitors - Prices cached up to 30 minutes - One price per hotel, website and arrival date per day - “Insert ignore”: The first price per key wins Size of data - We collected a total of 56 billion prices in those five years - Towards the end of this pipeline in 2015 the average was around 100 million prices per day written to BI
  • 6. 6 The past: Data pipeline 2010 – 2015 Java Software Engineering Business Intelligence CMC
  • 7. 7 Refactoring the pipeline: Requirements • Scales with an arbitrary amount of data (future proof) • reliable and resilient • low performance impact on Java backend • long term storage of raw input data • fast processing of filtered and aggregated data • Open source • we want to log everything: • more prices • Length of stay, room type, breakfast info, room category, domain • with more information • Net & gross price, city tax, resort fee, affiliate fee, VAT
  • 8. 8 Present data pipeline 2016 – ingestion San Francisco Düsseldorf Hong Kong
  • 9. 9 Present data pipeline 2016 – processing Camus CMC
  • 10. 10 Present data pipeline 2016 – facts & figures Cluster specifications - 24 machines (to be 48) - 700 TB disc space, 60% used - 1.7 TB memory in Yarn - 24 cores per machine, 864 VCores in total (overprovisioning) Data Size (price log) - 1.2 trillion messages collected so far - 6 billion messages/day - 44 TB of data Data processing - Camus: 30 mappers writing data in 10 minute intervals - First aggregation/filtering stage in Hive runs in 30 minutes with 5 days of CPU time spent - Impala Queries across >100 GB of result tables usually done within a few seconds
  • 11. 11 Present data pipeline 2016 – results after one year in production • Very reliable, no downtime or service interuptions of the system • Java team is very happy – less load on their system • BI team is very happy – more data, more ressources to process it • CMC team is very happy • Faster results • Better quality of results due to more data • More detailed results • => Shorter research phase, more and better stories • => Less requests to BI
  • 12. 12 Present data pipeline 2016 – use cases & status quo Uses for price information - Monitoring price parity in hotel market - Anomaly and fraud detection - Price feed for online marketing - Display of price development and delivering price alerts to website visitors Other data sources and usage - Clicklog information from our website and mobile app - Used for marketing performance analysis, product tests, invoice generation etc Status quo - Our entire business runs on and through the kafka – hadoop pipeline - Almost all departments rely on data, insights and metrics delivered by hadoop - Most of the company could not do their job without hadoop data
  • 13. 13 Future data pipeline 2016 Camus CMC Message format: CSV Protobuf avro Stream processing Storm Spark Samza? Kafka Streams Gobblin
  • 14. 14 Key challenges and learnings Mastering hadoop - Finding your log files - Interpreting error messages correctly - Understanding settings and how to use them to solve problem - Store data in wide, denormalised Hive tables in parquet format and nested data types Using hadoop - Offer easy hadoop access to users (Impala / Hive JDBC with visualisation tools) - Educate users on how to write good code, strict guidelines and review - Load balancer for Impala - Daily dump of fs image as hive table Bad parts - HUE (the standard GUI) - Write oozie workflows and coordinators in xml, not through the Hue interface - Monitoring impala - Still some hard to find bugs in Hive & Impala
  • 21. Email: clemens.valiente@trivago.com Phone: +49 (0) 211 540 65 - 137 de.linkedin.com/in/clemensvaliente Lead BI Data Engineer trivago Düsseldorf Originally a mathematician Studied at Uni Erlangen At trivago for 5 years Clemens Valiente Thank you! Now it is time for questions and comments!

Notes de l'éditeur

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>
  12. <number>
  13. <number>
  14. <number>
  15. <number>
  16. <number>
  17. <number>
  18. <number>
  19. <number>
  20. <number>
  21. <number>