10. 10/37
What's existing ? Why Changing ?
MySQL Database
●
Relationnal DB (lots of join needed)
●
Plain SQL query
●
Home made geographical search
Recent problems
●
New feature, means more complex queries
●
Scalability : Performance depending on DB load
11. 11/37
Initial requirements
Scalability
●
Trip search need to be made in less than 200ms
●
The system part of the solution easy to maintain
●
Be able to cluster it (also to not have SPOF)
Low code impact on existing application
●
Same features as of today (geographical search)
●
Minimize the developper's work
●
Add one missing feature : facets
13. 13/37
Why ElasticSearch
✔
Easyest cluster possibility
✔
Good performance when indexing
✔
Few code to write to use it
✔
Schema less
✔
Based on Lucene
✔
Written in Java (need to code grouping feature)
15. 15/37
Changing our mindset
Object in Relationnal Database
●
Can be exploded on multiple tables
●
Lots of informations usable by JOIN
Object in Document Oriented Database
●
Only one big index for theses objects
●
All informations need to be in the object, not on
multiple tables
16. 16/37
Changing our mindset
Object in Relationnal Database
●
Can be exploded on multiple tables
●
Lots of informations usable by JOIN
Object in Document Oriented Database
●
Only one big index for theses objects
●
All informations need to be in the object, not on
multiple tables
17. 17/37
Well defining our objects
Need to know what we want to search
●
Searching trips (front office usage)
●
Searching members (backoffice usage)
●
Searching FAQ (front office usage)
Think of all needed field
●
The ones used for query
●
The ones used for filters
●
The ones used for facets
18. 18/37
Thinking of well defining index
System point of view
●
Number of Nodes in the cluster
●
Number of Shards
●
Number of Replica
Application point of view
●
Define type and attributes for all fields (mapping)
●
Using parent/child or nested to improve indexing
●
How to push documents from DB ?
19. 19/37
Indexing : using a river or not ?
River advantages
●
Plugs directly to our source backend
●
ElasticSearch API exists to code a new one
River problems
●
Not easy to add business logic on some fields
●
Really hard when your DB is unconventionnal
●
Full Reindex all the documents
20. 20/37
Indexing : our manual way
We write an asynchronous indexer
●
Written in java
●
Have business logic when fetching from db
●
Fetch from multiple DB/source
●
Use of java ES library
●
Easy interface
●
send {“trip”:1234567} and the server answer {“OK”}
22. 22/37
Well defining our object Trip
Think of all needed field
●
The ones used for query
●
Trip date of departure,from where,to where,user id
●
The ones used for filters
●
User ratings,price,vehicle,seats left,is user blocked
(a blocked user, is a user who made some forbidden
action on the website.)
●
The ones used for facets
●
User ratings,price,vehicle
23. 23/37
Well defining our index Trip
Think of all system requirement
●
The cluster has 2 nodes
●
We keep the default configuration for shards/replica
Think of object mapping
●
For each field :
●
Define the type (string, long, geo_point, date,
float, boolean)
●
Define the scope (include_in_all)
●
Define the analyzer (for type string)
25. 25/37
Well indexing events
Which modification send event change
●
All trips creation/deletion/modification
●
Member modifications (block or not)
●
New ratings from other members
●
A seat has been reserved
●
Member change his vehicle
Event change is a call to internal indexer
●
Send '{“trip”:123456}' to indexer (create/update)
●
Send '{“tripd”:123456}' to indexer (delete)
27. 27/37
The Real World
A trip has now more than 30 fields
●
(faq is around 25 fields)
●
(members even more...)
To build a trip document we need 3
differents SQL queries
●
(FAQ : 2 differents SQL queries)
●
(Member : 10 differents SQL queries)
A trip has only 1 shard (grouping)
29. 29/37
Preloaded Scripts
We use mvel script to improve scoring
●
They are not clustered
●
Each node need to have the scripts
●
Need a node restart to be added or modified
Solution : Chef (tool from Opscode)
All nodes configurations are centralized into Chef
repository
30. 30/37
Grouping documents
Home made patchs to ElasticSearch
(based on a Martijn Van Groningen work for
lusini.de)
Soon in ElasticSearch
(I hope so much)
31. 31/37
Mapping modification
On a running index :
Changing a type is not allowed
Changing analyzer is not allowed
Solution : index alias
1) Changing mapping → create a new index
2) When new index is up to date → changing alias
32. 32/37
IOs limits
We have only 2 nodes
●
Trip index is around 2GB
●
But only 1 shard for Trip index
●
Can index 100 trips / seconds on busy evening
Solution : We put Intel SSDs
(waiting for distributed grouping feature)
33. 33/37
Choosing the analyzer
Some field need to not be analyzed
●
If you use ISO code for country
(IT, for Italy or DE for Germany are ignored in
some cases)
Global analyzer has limits
●
Accentuation from countries like France,
Germany or Spain are not always parsed correctly
●
One analyzer by country is difficult to implement
in some cases
36. 36/37
By the way…
We’re hiring !!!
Dev, HTML Ninja, leader,…
Come & See me right now
… or send me your friends
(And we have beer, baby foot and arcade cabinet )