SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Using ElasticSearch as a fast, flexible,
and scalable solution to search
occurrence records and checklists

Christian Gendreau, Canadensys
Marie-Elise Lecoq, GBIF France
Introduction
ElasticSearch is an open source, document oriented, distributed
search engine, built on top of Apache Lucene.

From ElasticSearch GitHub page
Setup
•  Java 6 or higher
•  Download : # wget …elasticsearch-0.90.5.zip
•  Unzip
Configuration
•  Name your cluster
•  Replication and multi-shard are enabled by default
•  Start : # bin/elasticsearch
Add data
Using the REST API
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1'
-d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'
Import data
Rivers
•  Document-based database (mongoDB)
•  JDBC (relational database)
•  Data source (wikipedia, Twitter)
Mapping
•  Schema-less
•  Customize indexing
•  Customize querying
ElasticSearch at
Canadensys
Database of Vascular Plants of Canada (VASCAN)

data.canadensys.net/vascan
Our ElasticSearch index
Index structure for scientific names
•  autocompletion : edge_ngram filter
o 

“carex” -> “ca”,”car”,”care”,”carex”

•  genus first letter : pattern_replace filter
o 

“carex feta” -> “c. feta”

•  epithet : path_hierarchy tokenizer
o 

“carex feta” -> “feta”
ElasticSearch at GBIF France
Data stored in ElasticSearch are updated upon MongoDB
changes.
The search engine requests elasticsearch using filters like taxon,
date, place, dataset and geolocalisation.
Statistic calculation using facets
ElasticSearch at GBIF France
ElasticSearch - Solr
•  Solr and elasticsearch both tries to solve the same problem
with no much differences

•  Development setup and production deployment (replication /
sharding) easier with elasticsearch

•  By default, the elasticsearch is well configured for Lucene and
customization remains easy.
Facets
•  “Group by” in SQL
•  Mostly used for calculate statistics
•  Example :
curl -XGET [...]
"facets" : {
”dataset" : {
"terms" : {
"field" : ”dataset",
"order" : "term”
…
API and libraries
REST API
o  interoperability between different programming languages
o  HTTP request

Java API
o 
o 

more efficient than REST API due to the binary API use.
built in marshaling(data formatting on the network)
Query - RESTfull API
Example:
$ curl localhost:9200/vascan/_search?pretty=1 -d
'{"query":{
"match":{
"name" :{
"query":"carex"
}
}
}
}’
Query - Java API
Code example:
...
SearchRequestBuilder srb = client.prepareSearch(INDEX_NAME)
.setQuery(QueryBuilders
.boolQuery()
.should(QueryBuilders.matchQuery("vernacular_name",text))
.setTypes(VERNACULAR_TYPE);
...
Pitfalls
• 
• 
• 
• 

Error reporting (index creation, river creation)
Results may be hard to predict using complex queries
Documentation
With each mapping modification comes a free reindex from
data
Future
•  Scientific Name analyzer
•  Geospatial component
Thank you!

Contenu connexe

Tendances

Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack IntroductionVikram Shinde
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining William Simms
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneNoriaki Tatsumi
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Big Data Overview Part 1
Big Data Overview Part 1Big Data Overview Part 1
Big Data Overview Part 1William Simms
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stackVikrant Chauhan
 
What's new in Elasticsearch v5
What's new in Elasticsearch v5What's new in Elasticsearch v5
What's new in Elasticsearch v5Idan Tohami
 
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.ioDataconomy Media
 
Au cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticAu cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticElasticsearch
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaObjectRocket
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Continuent
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchBo Andersen
 

Tendances (20)

Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Big Data Overview Part 1
Big Data Overview Part 1Big Data Overview Part 1
Big Data Overview Part 1
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
What's new in Elasticsearch v5
What's new in Elasticsearch v5What's new in Elasticsearch v5
What's new in Elasticsearch v5
 
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.io
 
Au cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticAu cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite Elastic
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 

Similaire à Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists

(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data AnalyticsAmazon Web Services
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearchErhwen Kuo
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3uzzal basak
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAmazon Web Services
 
Java clients for elasticsearch
Java clients for elasticsearchJava clients for elasticsearch
Java clients for elasticsearchFlorian Hopf
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchTaylor Lovett
 
Qui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionQui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionGlobalLogic Ukraine
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchPeter Steenbergen
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Domingo Suarez Torres
 

Similaire à Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists (20)

(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
 
Elasticsearch as a Database?
Elasticsearch as a Database?Elasticsearch as a Database?
Elasticsearch as a Database?
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Elasticsearch as a Database?
Elasticsearch as a Database?Elasticsearch as a Database?
Elasticsearch as a Database?
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
 
Java clients for elasticsearch
Java clients for elasticsearchJava clients for elasticsearch
Java clients for elasticsearch
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with Elasticsearch
 
Qui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionQui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in Action
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & Elasticsearch
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
963
963963
963
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016
 

Dernier

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Dernier (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists

  • 1. Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists Christian Gendreau, Canadensys Marie-Elise Lecoq, GBIF France
  • 2. Introduction ElasticSearch is an open source, document oriented, distributed search engine, built on top of Apache Lucene. From ElasticSearch GitHub page
  • 3. Setup •  Java 6 or higher •  Download : # wget …elasticsearch-0.90.5.zip •  Unzip
  • 4. Configuration •  Name your cluster •  Replication and multi-shard are enabled by default •  Start : # bin/elasticsearch
  • 5. Add data Using the REST API $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elastic Search" }'
  • 6. Import data Rivers •  Document-based database (mongoDB) •  JDBC (relational database) •  Data source (wikipedia, Twitter)
  • 7. Mapping •  Schema-less •  Customize indexing •  Customize querying
  • 8. ElasticSearch at Canadensys Database of Vascular Plants of Canada (VASCAN) data.canadensys.net/vascan
  • 9. Our ElasticSearch index Index structure for scientific names •  autocompletion : edge_ngram filter o  “carex” -> “ca”,”car”,”care”,”carex” •  genus first letter : pattern_replace filter o  “carex feta” -> “c. feta” •  epithet : path_hierarchy tokenizer o  “carex feta” -> “feta”
  • 10. ElasticSearch at GBIF France Data stored in ElasticSearch are updated upon MongoDB changes. The search engine requests elasticsearch using filters like taxon, date, place, dataset and geolocalisation. Statistic calculation using facets
  • 12. ElasticSearch - Solr •  Solr and elasticsearch both tries to solve the same problem with no much differences •  Development setup and production deployment (replication / sharding) easier with elasticsearch •  By default, the elasticsearch is well configured for Lucene and customization remains easy.
  • 13. Facets •  “Group by” in SQL •  Mostly used for calculate statistics •  Example : curl -XGET [...] "facets" : { ”dataset" : { "terms" : { "field" : ”dataset", "order" : "term” …
  • 14. API and libraries REST API o  interoperability between different programming languages o  HTTP request Java API o  o  more efficient than REST API due to the binary API use. built in marshaling(data formatting on the network)
  • 15. Query - RESTfull API Example: $ curl localhost:9200/vascan/_search?pretty=1 -d '{"query":{ "match":{ "name" :{ "query":"carex" } } } }’
  • 16. Query - Java API Code example: ... SearchRequestBuilder srb = client.prepareSearch(INDEX_NAME) .setQuery(QueryBuilders .boolQuery() .should(QueryBuilders.matchQuery("vernacular_name",text)) .setTypes(VERNACULAR_TYPE); ...
  • 17. Pitfalls •  •  •  •  Error reporting (index creation, river creation) Results may be hard to predict using complex queries Documentation With each mapping modification comes a free reindex from data
  • 18. Future •  Scientific Name analyzer •  Geospatial component