SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
1/37
ElasticSearch
feedback
2/37
Introduction
3/37
Nicolas Blanc - BlaBlArchitect
SinfomicSinfomic
(1999)
@thewhitegeek
(2001)
(2005)
(2008)
(2012)
4/37
What is BlaBlaCar ?
5/37
3 000 000MEMBERS
IN EUROPE
6/37
10 9 countries10 9 countries
● France
● Spain
● Italy
● UK
● Poland
● Portugal
● Netherlands
● Belgium
● Luxemburg
● NEW Germany
● France
● Spain
● Italy
● UK
● Poland
● Portugal
● Netherlands
● Belgium
● Luxemburg
7/37
Growth
50 millions
25 millions
January
2008
January
2013
8/37
Infrastructure
 2 front web servers
 2 MySQL master (+4 slaves SSD)
 1 private cloud
(KVM + Open vSwitch)
●
Redis
●
Memcache
●
RabbitMQ/workers
 1 cluster ElasticSearch
9/37
Changing the Search Engine
10/37
What's existing ? Why Changing ?
MySQL Database
●
Relationnal DB (lots of join needed)
●
Plain SQL query
●
Home made geographical search
Recent problems
●
New feature, means more complex queries
●
Scalability : Performance depending on DB load
11/37
Initial requirements
Scalability
●
Trip search need to be made in less than 200ms
●
The system part of the solution easy to maintain
●
Be able to cluster it (also to not have SPOF)
Low code impact on existing application
●
Same features as of today (geographical search)
●
Minimize the developper's work
●
Add one missing feature : facets
12/37
Initial Competitors
SenseiDB
13/37
Why ElasticSearch
✔
Easyest cluster possibility
✔
Good performance when indexing
✔
Few code to write to use it
✔
Schema less
✔
Based on Lucene
✔
Written in Java (need to code grouping feature)
14/37
ElasticSearch has won,
now migrate our search !
15/37
Changing our mindset
Object in Relationnal Database
●
Can be exploded on multiple tables
●
Lots of informations usable by JOIN
Object in Document Oriented Database
●
Only one big index for theses objects
●
All informations need to be in the object, not on
multiple tables
16/37
Changing our mindset
Object in Relationnal Database
●
Can be exploded on multiple tables
●
Lots of informations usable by JOIN
Object in Document Oriented Database
●
Only one big index for theses objects
●
All informations need to be in the object, not on
multiple tables
17/37
Well defining our objects
Need to know what we want to search
●
Searching trips (front office usage)
●
Searching members (backoffice usage)
●
Searching FAQ (front office usage)
Think of all needed field
●
The ones used for query
●
The ones used for filters
●
The ones used for facets
18/37
Thinking of well defining index
System point of view
●
Number of Nodes in the cluster
●
Number of Shards
●
Number of Replica
Application point of view
●
Define type and attributes for all fields (mapping)
●
Using parent/child or nested to improve indexing
●
How to push documents from DB ?
19/37
Indexing : using a river or not ?
River advantages
●
Plugs directly to our source backend
●
ElasticSearch API exists to code a new one
River problems
●
Not easy to add business logic on some fields
●
Really hard when your DB is unconventionnal
●
Full Reindex all the documents
20/37
Indexing : our manual way
We write an asynchronous indexer
●
Written in java
●
Have business logic when fetching from db
●
Fetch from multiple DB/source
●
Use of java ES library
●
Easy interface
●
send {“trip”:1234567} and the server answer {“OK”}
21/37
One index sample : Trip
22/37
Well defining our object Trip
Think of all needed field
●
The ones used for query
●
Trip date of departure,from where,to where,user id
●
The ones used for filters
●
User ratings,price,vehicle,seats left,is user blocked
(a blocked user, is a user who made some forbidden
action on the website.)
●
The ones used for facets
●
User ratings,price,vehicle
23/37
Well defining our index Trip
Think of all system requirement
●
The cluster has 2 nodes
●
We keep the default configuration for shards/replica
Think of object mapping
●
For each field :
●
Define the type (string, long, geo_point, date,
float, boolean)
●
Define the scope (include_in_all)
●
Define the analyzer (for type string)
24/37
Trip Mapping
"trip": {
"properties": {
"is_user_blocked": {
"type": "boolean",
"include_in_all" : false
},
"user_ratings" : {
"type" : "long",
"include_in_all" : false
},
"from": {
"type": "geo_point",
"include_in_all" : false
},
"price": {
"include_in_all": false,
"type": "float"
},
"price_euro": {
"type": "float",
“include_in_all: false
},
"seats_left": {
"include_in_all": false,
"type": "long"
},
"seats_offered": {
"include_in_all": false,
"type": "long"
},
"to": {
"include_in_all": false,
"type": "geo_point"
},
"trip_date": {
"format": "dateOptionalTime",
"include_in_all": false,
"type": "date"
},
“vehicle”: {
"include_in_all": false,
"type": "string"
},
"userid": {
"include_in_all": false,
"index": "not_analyzed",
"type": "string"
}
}
}
25/37
Well indexing events
Which modification send event change
●
All trips creation/deletion/modification
●
Member modifications (block or not)
●
New ratings from other members
●
A seat has been reserved
●
Member change his vehicle
Event change is a call to internal indexer
●
Send '{“trip”:123456}' to indexer (create/update)
●
Send '{“tripd”:123456}' to indexer (delete)
26/37
Sample trip index query
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [{
"geo_distance": {
"distance": "40.14937866995km",
"from": {
"lat": 48.856614,
"lon": 2.3522219
}
}
}, {
"geo_distance": {
"distance": "40.14937866995km",
"to": {
"lat": 45.764043,
"lon": 4.835659
}
}
},
{
"range": {
"price": {
"from": 0,
"include_lower": false
}
}
}]
}
}
},
"sort": [{
"trip_date": { "order": "asc" },
}],
"filter": {
"term": { "is_user_blocked": false }
}
},
"from": 0,
"size": 10
}
27/37
The Real World
A trip has now more than 30 fields
●
(faq is around 25 fields)
●
(members even more...)
To build a trip document we need 3
differents SQL queries
●
(FAQ : 2 differents SQL queries)
●
(Member : 10 differents SQL queries)
A trip has only 1 shard (grouping)
28/37
And now the caveats
29/37
Preloaded Scripts
We use mvel script to improve scoring
●
They are not clustered
●
Each node need to have the scripts
●
Need a node restart to be added or modified
Solution : Chef (tool from Opscode)
All nodes configurations are centralized into Chef
repository
30/37
Grouping documents
Home made patchs to ElasticSearch
(based on a Martijn Van Groningen work for
lusini.de)
Soon in ElasticSearch
(I hope so much)
31/37
Mapping modification
On a running index :
Changing a type is not allowed
Changing analyzer is not allowed
Solution : index alias
1) Changing mapping → create a new index
2) When new index is up to date → changing alias
32/37
IOs limits
We have only 2 nodes
●
Trip index is around 2GB
●
But only 1 shard for Trip index
●
Can index 100 trips / seconds on busy evening
Solution : We put Intel SSDs
(waiting for distributed grouping feature)
33/37
Choosing the analyzer
Some field need to not be analyzed
●
If you use ISO code for country
(IT, for Italy or DE for Germany are ignored in
some cases)
Global analyzer has limits
●
Accentuation from countries like France,
Germany or Spain are not always parsed correctly
●
One analyzer by country is difficult to implement
in some cases
34/37
OK Sweet,
What's next
?
35/37
Using ElasticSearch to ease log analysis
36/37
By the way…
We’re hiring !!!
Dev, HTML Ninja, leader,…
Come & See me right now
… or send me your friends 
(And we have beer, baby foot and arcade cabinet  )
37/37
Thank you !
Follow us !
@covoiturage
Apply now :
join@BlaBlaCar.com

Contenu connexe

Tendances

JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...jazoon13
 
BKK16-411 Devicetree Specification
BKK16-411 Devicetree SpecificationBKK16-411 Devicetree Specification
BKK16-411 Devicetree SpecificationLinaro
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...Linaro
 
Manage your bare-metal infrastructure with a CI/CD-driven approach
Manage your bare-metal infrastructure with a CI/CD-driven approachManage your bare-metal infrastructure with a CI/CD-driven approach
Manage your bare-metal infrastructure with a CI/CD-driven approachinovex GmbH
 
Large Scale Computing Infrastructure - Nautilus
Large Scale Computing Infrastructure - NautilusLarge Scale Computing Infrastructure - Nautilus
Large Scale Computing Infrastructure - NautilusGabriele Di Bernardo
 
Sprint 38 review
Sprint 38 reviewSprint 38 review
Sprint 38 reviewManageIQ
 
BKK16-203 Irq prediction or how to better estimate idle time
BKK16-203 Irq prediction or how to better estimate idle timeBKK16-203 Irq prediction or how to better estimate idle time
BKK16-203 Irq prediction or how to better estimate idle timeLinaro
 
Infrastructure as code
Infrastructure as codeInfrastructure as code
Infrastructure as codeRoman Komkov
 
Provisioning with Stacki at NIST
Provisioning with Stacki at NISTProvisioning with Stacki at NIST
Provisioning with Stacki at NISTStackIQ
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART iiLinaro
 
High-Performance Computing with C++
High-Performance Computing with C++High-Performance Computing with C++
High-Performance Computing with C++JetBrains
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Understanding Open Source Serverless Platforms: Design Considerations and Per...
Understanding Open Source Serverless Platforms: Design Considerations and Per...Understanding Open Source Serverless Platforms: Design Considerations and Per...
Understanding Open Source Serverless Platforms: Design Considerations and Per...Johnny Li
 
Real-time Debugging using GDB Tracepoints and other Eclipse features
Real-time Debugging using GDB Tracepoints and other Eclipse features Real-time Debugging using GDB Tracepoints and other Eclipse features
Real-time Debugging using GDB Tracepoints and other Eclipse features marckhouzam
 
TIAD 2016 : Network automation with Ansible and OpenConfig/YANG
TIAD 2016 : Network automation with Ansible and OpenConfig/YANGTIAD 2016 : Network automation with Ansible and OpenConfig/YANG
TIAD 2016 : Network automation with Ansible and OpenConfig/YANGThe Incredible Automation Day
 

Tendances (20)

JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
 
BKK16-411 Devicetree Specification
BKK16-411 Devicetree SpecificationBKK16-411 Devicetree Specification
BKK16-411 Devicetree Specification
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
 
Manage your bare-metal infrastructure with a CI/CD-driven approach
Manage your bare-metal infrastructure with a CI/CD-driven approachManage your bare-metal infrastructure with a CI/CD-driven approach
Manage your bare-metal infrastructure with a CI/CD-driven approach
 
Coal 9 pipelining in Assembly Programming
Coal 9 pipelining in Assembly ProgrammingCoal 9 pipelining in Assembly Programming
Coal 9 pipelining in Assembly Programming
 
Qt5 beta1 on ti platforms
Qt5 beta1 on ti platformsQt5 beta1 on ti platforms
Qt5 beta1 on ti platforms
 
Large Scale Computing Infrastructure - Nautilus
Large Scale Computing Infrastructure - NautilusLarge Scale Computing Infrastructure - Nautilus
Large Scale Computing Infrastructure - Nautilus
 
Sprint 38 review
Sprint 38 reviewSprint 38 review
Sprint 38 review
 
BKK16-203 Irq prediction or how to better estimate idle time
BKK16-203 Irq prediction or how to better estimate idle timeBKK16-203 Irq prediction or how to better estimate idle time
BKK16-203 Irq prediction or how to better estimate idle time
 
Infrastructure as code
Infrastructure as codeInfrastructure as code
Infrastructure as code
 
Provisioning with Stacki at NIST
Provisioning with Stacki at NISTProvisioning with Stacki at NIST
Provisioning with Stacki at NIST
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART ii
 
High-Performance Computing with C++
High-Performance Computing with C++High-Performance Computing with C++
High-Performance Computing with C++
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Eclipse PTP in AICS
Eclipse PTP in AICSEclipse PTP in AICS
Eclipse PTP in AICS
 
Understanding Open Source Serverless Platforms: Design Considerations and Per...
Understanding Open Source Serverless Platforms: Design Considerations and Per...Understanding Open Source Serverless Platforms: Design Considerations and Per...
Understanding Open Source Serverless Platforms: Design Considerations and Per...
 
Real-time Debugging using GDB Tracepoints and other Eclipse features
Real-time Debugging using GDB Tracepoints and other Eclipse features Real-time Debugging using GDB Tracepoints and other Eclipse features
Real-time Debugging using GDB Tracepoints and other Eclipse features
 
TIAD 2016 : Network automation with Ansible and OpenConfig/YANG
TIAD 2016 : Network automation with Ansible and OpenConfig/YANGTIAD 2016 : Network automation with Ansible and OpenConfig/YANG
TIAD 2016 : Network automation with Ansible and OpenConfig/YANG
 

Similaire à BlaBlaCar Elastic Search Feedback

Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache SparkLucian Neghina
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleEDB
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics MongoDB
 
[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fastDenis Karpenko
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Omid Vahdaty
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelinesRamesh Sampath
 
Devoxx : being productive with JHipster
Devoxx : being productive with JHipsterDevoxx : being productive with JHipster
Devoxx : being productive with JHipsterJulien Dubois
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopTamas K Lengyel
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015Jorg Janke
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelMartin Zapletal
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 

Similaire à BlaBlaCar Elastic Search Feedback (20)

Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration Hustle
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics
 
Druid
DruidDruid
Druid
 
[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fast
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
2D graphics
2D graphics2D graphics
2D graphics
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
 
Devoxx : being productive with JHipster
Devoxx : being productive with JHipsterDevoxx : being productive with JHipster
Devoxx : being productive with JHipster
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 

Dernier

Melanie Smith Tourism, Wellbeing and Happiness
Melanie Smith Tourism, Wellbeing and HappinessMelanie Smith Tourism, Wellbeing and Happiness
Melanie Smith Tourism, Wellbeing and HappinessEDGAR TARRÉS FALCÓ
 
Top Five Best Places to Visit in India.pdf
Top Five Best Places to Visit in India.pdfTop Five Best Places to Visit in India.pdf
Top Five Best Places to Visit in India.pdfonlinevisaindia
 
Visit the Famous Temples of Dev Bhoomi by Uttarakhand tour Package
Visit the Famous Temples of Dev Bhoomi by Uttarakhand tour PackageVisit the Famous Temples of Dev Bhoomi by Uttarakhand tour Package
Visit the Famous Temples of Dev Bhoomi by Uttarakhand tour PackageDhruv Sharma
 
5 beautyfull places visiting in uttrakhand
5 beautyfull places visiting in uttrakhand5 beautyfull places visiting in uttrakhand
5 beautyfull places visiting in uttrakhandaradhya3287
 
What Safety Precautions Are Recommended For Na Pali Snorkeling Adventure
What Safety Precautions Are Recommended For Na Pali Snorkeling AdventureWhat Safety Precautions Are Recommended For Na Pali Snorkeling Adventure
What Safety Precautions Are Recommended For Na Pali Snorkeling AdventureHanalei Charters
 
Explore the best of Varanasi buddhist temple.pptx
Explore the best of Varanasi buddhist temple.pptxExplore the best of Varanasi buddhist temple.pptx
Explore the best of Varanasi buddhist temple.pptxIRCTCBuddhisttrain
 
Transportation Options_ Getting to Keukenhof Gardens from Amsterdam.pdf
Transportation Options_ Getting to Keukenhof Gardens from Amsterdam.pdfTransportation Options_ Getting to Keukenhof Gardens from Amsterdam.pdf
Transportation Options_ Getting to Keukenhof Gardens from Amsterdam.pdfGlobalbustours
 
Sizzling Summer Adventures Unforgettable Tours Under the Sun
Sizzling Summer Adventures Unforgettable Tours Under the SunSizzling Summer Adventures Unforgettable Tours Under the Sun
Sizzling Summer Adventures Unforgettable Tours Under the SunSnowshoe Tahoe
 
László Puczkó Wellbeing Tourism and Economy
László Puczkó Wellbeing Tourism and EconomyLászló Puczkó Wellbeing Tourism and Economy
László Puczkó Wellbeing Tourism and EconomyEDGAR TARRÉS FALCÓ
 
What Unwritten Rules Of Surfing Etiquette Are Crucial For Beginners To Grasp
What Unwritten Rules Of Surfing Etiquette Are Crucial For Beginners To GraspWhat Unwritten Rules Of Surfing Etiquette Are Crucial For Beginners To Grasp
What Unwritten Rules Of Surfing Etiquette Are Crucial For Beginners To GraspHanalei Surf School
 
It’s Time Get Refresh Travel Around The World
It’s Time Get Refresh Travel Around The WorldIt’s Time Get Refresh Travel Around The World
It’s Time Get Refresh Travel Around The WorldParagliding Billing Bir
 
_Unforgettable Chandigarh to Himachal Pradesh Tour Package.pdf
_Unforgettable Chandigarh to Himachal Pradesh Tour Package.pdf_Unforgettable Chandigarh to Himachal Pradesh Tour Package.pdf
_Unforgettable Chandigarh to Himachal Pradesh Tour Package.pdfNewChandigarhTravels
 
What Are The Must-Know Tips For First-Time Jet Skiers In Aruba
What Are The Must-Know Tips For First-Time Jet Skiers In ArubaWhat Are The Must-Know Tips For First-Time Jet Skiers In Aruba
What Are The Must-Know Tips For First-Time Jet Skiers In ArubaDelphi Watersports
 
What Are Some Tips For A Safe White River Rafting Experience
What Are Some Tips For A Safe White River Rafting ExperienceWhat Are Some Tips For A Safe White River Rafting Experience
What Are Some Tips For A Safe White River Rafting ExperienceTahoe Whitewater Tours
 
Paragliding Billing Bir at Himachal Pardesh
Paragliding Billing Bir at Himachal PardeshParagliding Billing Bir at Himachal Pardesh
Paragliding Billing Bir at Himachal PardeshParagliding Billing Bir
 
Discover the Magic of Sicily: Your Travel Guide
Discover the Magic of Sicily: Your Travel GuideDiscover the Magic of Sicily: Your Travel Guide
Discover the Magic of Sicily: Your Travel GuideTime for Sicily
 
Lucknow to Sitapur Cab | Lucknow to Sitapur Taxi
Lucknow to Sitapur Cab | Lucknow to Sitapur TaxiLucknow to Sitapur Cab | Lucknow to Sitapur Taxi
Lucknow to Sitapur Cab | Lucknow to Sitapur TaxiCab Bazar
 
The Genuine Student Requirement for Australian Student Visas
The Genuine Student Requirement for Australian Student VisasThe Genuine Student Requirement for Australian Student Visas
The Genuine Student Requirement for Australian Student VisasAmit Kakkar
 
Canada PR - Eligibility, Steps to apply.pptx
Canada PR - Eligibility, Steps to apply.pptxCanada PR - Eligibility, Steps to apply.pptx
Canada PR - Eligibility, Steps to apply.pptxY-Axis Overseas Careers
 
Busy Season Mastery Simple Strategies to Optimize Your Lodging Business!.pptx
Busy Season Mastery Simple Strategies to Optimize Your Lodging Business!.pptxBusy Season Mastery Simple Strategies to Optimize Your Lodging Business!.pptx
Busy Season Mastery Simple Strategies to Optimize Your Lodging Business!.pptxRezStream
 

Dernier (20)

Melanie Smith Tourism, Wellbeing and Happiness
Melanie Smith Tourism, Wellbeing and HappinessMelanie Smith Tourism, Wellbeing and Happiness
Melanie Smith Tourism, Wellbeing and Happiness
 
Top Five Best Places to Visit in India.pdf
Top Five Best Places to Visit in India.pdfTop Five Best Places to Visit in India.pdf
Top Five Best Places to Visit in India.pdf
 
Visit the Famous Temples of Dev Bhoomi by Uttarakhand tour Package
Visit the Famous Temples of Dev Bhoomi by Uttarakhand tour PackageVisit the Famous Temples of Dev Bhoomi by Uttarakhand tour Package
Visit the Famous Temples of Dev Bhoomi by Uttarakhand tour Package
 
5 beautyfull places visiting in uttrakhand
5 beautyfull places visiting in uttrakhand5 beautyfull places visiting in uttrakhand
5 beautyfull places visiting in uttrakhand
 
What Safety Precautions Are Recommended For Na Pali Snorkeling Adventure
What Safety Precautions Are Recommended For Na Pali Snorkeling AdventureWhat Safety Precautions Are Recommended For Na Pali Snorkeling Adventure
What Safety Precautions Are Recommended For Na Pali Snorkeling Adventure
 
Explore the best of Varanasi buddhist temple.pptx
Explore the best of Varanasi buddhist temple.pptxExplore the best of Varanasi buddhist temple.pptx
Explore the best of Varanasi buddhist temple.pptx
 
Transportation Options_ Getting to Keukenhof Gardens from Amsterdam.pdf
Transportation Options_ Getting to Keukenhof Gardens from Amsterdam.pdfTransportation Options_ Getting to Keukenhof Gardens from Amsterdam.pdf
Transportation Options_ Getting to Keukenhof Gardens from Amsterdam.pdf
 
Sizzling Summer Adventures Unforgettable Tours Under the Sun
Sizzling Summer Adventures Unforgettable Tours Under the SunSizzling Summer Adventures Unforgettable Tours Under the Sun
Sizzling Summer Adventures Unforgettable Tours Under the Sun
 
László Puczkó Wellbeing Tourism and Economy
László Puczkó Wellbeing Tourism and EconomyLászló Puczkó Wellbeing Tourism and Economy
László Puczkó Wellbeing Tourism and Economy
 
What Unwritten Rules Of Surfing Etiquette Are Crucial For Beginners To Grasp
What Unwritten Rules Of Surfing Etiquette Are Crucial For Beginners To GraspWhat Unwritten Rules Of Surfing Etiquette Are Crucial For Beginners To Grasp
What Unwritten Rules Of Surfing Etiquette Are Crucial For Beginners To Grasp
 
It’s Time Get Refresh Travel Around The World
It’s Time Get Refresh Travel Around The WorldIt’s Time Get Refresh Travel Around The World
It’s Time Get Refresh Travel Around The World
 
_Unforgettable Chandigarh to Himachal Pradesh Tour Package.pdf
_Unforgettable Chandigarh to Himachal Pradesh Tour Package.pdf_Unforgettable Chandigarh to Himachal Pradesh Tour Package.pdf
_Unforgettable Chandigarh to Himachal Pradesh Tour Package.pdf
 
What Are The Must-Know Tips For First-Time Jet Skiers In Aruba
What Are The Must-Know Tips For First-Time Jet Skiers In ArubaWhat Are The Must-Know Tips For First-Time Jet Skiers In Aruba
What Are The Must-Know Tips For First-Time Jet Skiers In Aruba
 
What Are Some Tips For A Safe White River Rafting Experience
What Are Some Tips For A Safe White River Rafting ExperienceWhat Are Some Tips For A Safe White River Rafting Experience
What Are Some Tips For A Safe White River Rafting Experience
 
Paragliding Billing Bir at Himachal Pardesh
Paragliding Billing Bir at Himachal PardeshParagliding Billing Bir at Himachal Pardesh
Paragliding Billing Bir at Himachal Pardesh
 
Discover the Magic of Sicily: Your Travel Guide
Discover the Magic of Sicily: Your Travel GuideDiscover the Magic of Sicily: Your Travel Guide
Discover the Magic of Sicily: Your Travel Guide
 
Lucknow to Sitapur Cab | Lucknow to Sitapur Taxi
Lucknow to Sitapur Cab | Lucknow to Sitapur TaxiLucknow to Sitapur Cab | Lucknow to Sitapur Taxi
Lucknow to Sitapur Cab | Lucknow to Sitapur Taxi
 
The Genuine Student Requirement for Australian Student Visas
The Genuine Student Requirement for Australian Student VisasThe Genuine Student Requirement for Australian Student Visas
The Genuine Student Requirement for Australian Student Visas
 
Canada PR - Eligibility, Steps to apply.pptx
Canada PR - Eligibility, Steps to apply.pptxCanada PR - Eligibility, Steps to apply.pptx
Canada PR - Eligibility, Steps to apply.pptx
 
Busy Season Mastery Simple Strategies to Optimize Your Lodging Business!.pptx
Busy Season Mastery Simple Strategies to Optimize Your Lodging Business!.pptxBusy Season Mastery Simple Strategies to Optimize Your Lodging Business!.pptx
Busy Season Mastery Simple Strategies to Optimize Your Lodging Business!.pptx
 

BlaBlaCar Elastic Search Feedback

  • 3. 3/37 Nicolas Blanc - BlaBlArchitect SinfomicSinfomic (1999) @thewhitegeek (2001) (2005) (2008) (2012)
  • 6. 6/37 10 9 countries10 9 countries ● France ● Spain ● Italy ● UK ● Poland ● Portugal ● Netherlands ● Belgium ● Luxemburg ● NEW Germany ● France ● Spain ● Italy ● UK ● Poland ● Portugal ● Netherlands ● Belgium ● Luxemburg
  • 8. 8/37 Infrastructure  2 front web servers  2 MySQL master (+4 slaves SSD)  1 private cloud (KVM + Open vSwitch) ● Redis ● Memcache ● RabbitMQ/workers  1 cluster ElasticSearch
  • 10. 10/37 What's existing ? Why Changing ? MySQL Database ● Relationnal DB (lots of join needed) ● Plain SQL query ● Home made geographical search Recent problems ● New feature, means more complex queries ● Scalability : Performance depending on DB load
  • 11. 11/37 Initial requirements Scalability ● Trip search need to be made in less than 200ms ● The system part of the solution easy to maintain ● Be able to cluster it (also to not have SPOF) Low code impact on existing application ● Same features as of today (geographical search) ● Minimize the developper's work ● Add one missing feature : facets
  • 13. 13/37 Why ElasticSearch ✔ Easyest cluster possibility ✔ Good performance when indexing ✔ Few code to write to use it ✔ Schema less ✔ Based on Lucene ✔ Written in Java (need to code grouping feature)
  • 14. 14/37 ElasticSearch has won, now migrate our search !
  • 15. 15/37 Changing our mindset Object in Relationnal Database ● Can be exploded on multiple tables ● Lots of informations usable by JOIN Object in Document Oriented Database ● Only one big index for theses objects ● All informations need to be in the object, not on multiple tables
  • 16. 16/37 Changing our mindset Object in Relationnal Database ● Can be exploded on multiple tables ● Lots of informations usable by JOIN Object in Document Oriented Database ● Only one big index for theses objects ● All informations need to be in the object, not on multiple tables
  • 17. 17/37 Well defining our objects Need to know what we want to search ● Searching trips (front office usage) ● Searching members (backoffice usage) ● Searching FAQ (front office usage) Think of all needed field ● The ones used for query ● The ones used for filters ● The ones used for facets
  • 18. 18/37 Thinking of well defining index System point of view ● Number of Nodes in the cluster ● Number of Shards ● Number of Replica Application point of view ● Define type and attributes for all fields (mapping) ● Using parent/child or nested to improve indexing ● How to push documents from DB ?
  • 19. 19/37 Indexing : using a river or not ? River advantages ● Plugs directly to our source backend ● ElasticSearch API exists to code a new one River problems ● Not easy to add business logic on some fields ● Really hard when your DB is unconventionnal ● Full Reindex all the documents
  • 20. 20/37 Indexing : our manual way We write an asynchronous indexer ● Written in java ● Have business logic when fetching from db ● Fetch from multiple DB/source ● Use of java ES library ● Easy interface ● send {“trip”:1234567} and the server answer {“OK”}
  • 22. 22/37 Well defining our object Trip Think of all needed field ● The ones used for query ● Trip date of departure,from where,to where,user id ● The ones used for filters ● User ratings,price,vehicle,seats left,is user blocked (a blocked user, is a user who made some forbidden action on the website.) ● The ones used for facets ● User ratings,price,vehicle
  • 23. 23/37 Well defining our index Trip Think of all system requirement ● The cluster has 2 nodes ● We keep the default configuration for shards/replica Think of object mapping ● For each field : ● Define the type (string, long, geo_point, date, float, boolean) ● Define the scope (include_in_all) ● Define the analyzer (for type string)
  • 24. 24/37 Trip Mapping "trip": { "properties": { "is_user_blocked": { "type": "boolean", "include_in_all" : false }, "user_ratings" : { "type" : "long", "include_in_all" : false }, "from": { "type": "geo_point", "include_in_all" : false }, "price": { "include_in_all": false, "type": "float" }, "price_euro": { "type": "float", “include_in_all: false }, "seats_left": { "include_in_all": false, "type": "long" }, "seats_offered": { "include_in_all": false, "type": "long" }, "to": { "include_in_all": false, "type": "geo_point" }, "trip_date": { "format": "dateOptionalTime", "include_in_all": false, "type": "date" }, “vehicle”: { "include_in_all": false, "type": "string" }, "userid": { "include_in_all": false, "index": "not_analyzed", "type": "string" } } }
  • 25. 25/37 Well indexing events Which modification send event change ● All trips creation/deletion/modification ● Member modifications (block or not) ● New ratings from other members ● A seat has been reserved ● Member change his vehicle Event change is a call to internal indexer ● Send '{“trip”:123456}' to indexer (create/update) ● Send '{“tripd”:123456}' to indexer (delete)
  • 26. 26/37 Sample trip index query { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "and": [{ "geo_distance": { "distance": "40.14937866995km", "from": { "lat": 48.856614, "lon": 2.3522219 } } }, { "geo_distance": { "distance": "40.14937866995km", "to": { "lat": 45.764043, "lon": 4.835659 } } }, { "range": { "price": { "from": 0, "include_lower": false } } }] } } }, "sort": [{ "trip_date": { "order": "asc" }, }], "filter": { "term": { "is_user_blocked": false } } }, "from": 0, "size": 10 }
  • 27. 27/37 The Real World A trip has now more than 30 fields ● (faq is around 25 fields) ● (members even more...) To build a trip document we need 3 differents SQL queries ● (FAQ : 2 differents SQL queries) ● (Member : 10 differents SQL queries) A trip has only 1 shard (grouping)
  • 28. 28/37 And now the caveats
  • 29. 29/37 Preloaded Scripts We use mvel script to improve scoring ● They are not clustered ● Each node need to have the scripts ● Need a node restart to be added or modified Solution : Chef (tool from Opscode) All nodes configurations are centralized into Chef repository
  • 30. 30/37 Grouping documents Home made patchs to ElasticSearch (based on a Martijn Van Groningen work for lusini.de) Soon in ElasticSearch (I hope so much)
  • 31. 31/37 Mapping modification On a running index : Changing a type is not allowed Changing analyzer is not allowed Solution : index alias 1) Changing mapping → create a new index 2) When new index is up to date → changing alias
  • 32. 32/37 IOs limits We have only 2 nodes ● Trip index is around 2GB ● But only 1 shard for Trip index ● Can index 100 trips / seconds on busy evening Solution : We put Intel SSDs (waiting for distributed grouping feature)
  • 33. 33/37 Choosing the analyzer Some field need to not be analyzed ● If you use ISO code for country (IT, for Italy or DE for Germany are ignored in some cases) Global analyzer has limits ● Accentuation from countries like France, Germany or Spain are not always parsed correctly ● One analyzer by country is difficult to implement in some cases
  • 35. 35/37 Using ElasticSearch to ease log analysis
  • 36. 36/37 By the way… We’re hiring !!! Dev, HTML Ninja, leader,… Come & See me right now … or send me your friends  (And we have beer, baby foot and arcade cabinet  )
  • 37. 37/37 Thank you ! Follow us ! @covoiturage Apply now : join@BlaBlaCar.com