SlideShare une entreprise Scribd logo
1  sur  96
“ Terms of Endearment” The ElasticSearch query language explained Clinton Gormley, YAPC::EU 2011 DRTECH @clintongormley
search for : “ DELETE QUERY ”  We can
search for : “ DELETE QUERY ”  and find : “ deleteByQuery ” We can
but you can only find  what is stored in the database
Normalise values  “ deleteByQuery” 'delete' 'by' 'query' 'deletebyquery'
Normalise values  and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
Normalise  values  and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
Analyse  values  and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
What is stored in ElasticSearch?
{ tweet  => "Perl is GREAT!", posted => "2011-08-15", user  => { name  => "Clinton Gormley", email => "drtech@cpan.org", }, tags  => [" perl" ,"opinion"],  posts  => 2, } Document:
{ tweet   => "Perl is GREAT!", posted  => "2011-08-15", user   => { name   => "Clinton Gormley", email  => "drtech@cpan.org", }, tags   => [" perl" ,"opinion"],  posts   => 2, } Fields:
{ tweet  =>  "Perl is GREAT!", posted =>  "2011-08-15", user  =>  { name  =>  "Clinton Gormley", email =>  "drtech@cpan.org", }, tags   =>  [" perl" ,"opinion"],  posts  =>  2, } Values:
{ tweet  => "Perl is GREAT!", posted => "2011-08-15", user  => { name  => "Clinton Gormley", email => "drtech@cpan.org" }, tags  => [" perl" ,"opinion"],  posts  => 2, } Field types: # object # string # date # nested object # string # string # array of enums # integer
{ tweet  => "Perl is GREAT!", posted => "2011-08-15", user   => { name  => "Clinton Gormley", email  => "drtech@cpan.org", }, tags  => [" perl" ,"opinion"],  posts  => 2, } Nested objects flattened:
{ tweet  => "Perl is GREAT!", posted  => "2011-08-15", user.name  => "Clinton Gormley", user.email  => "drtech@cpan.org", tags  => [" perl" ,"opinion"],  posts  => 2, } Nested objects flattened
{ tweet  =>  "Perl is GREAT!", posted  =>  "2011-08-15", user.name  =>  "Clinton Gormley", user.email =>  "drtech@cpan.org", tags  =>  [" perl" ,"opinion"],  posts  =>  2, } Values analyzed into terms
{ tweet  =>  ['perl','great'], posted  =>  [Date(2011-08-15)], user.name  =>  ['clinton','gormley'], user.email =>  ['drtech','cpan.org'], tags  =>  [' perl' ,'opinion'],  posts  =>  [2], } Values analyzed into terms
database table row ⇒  many tables ⇒  many rows ⇒  one schema ⇒  many columns In MySQL
index type document ⇒  many types ⇒  many documents ⇒  one mapping ⇒  many fields In ElasticSearch
Create index with mappings $es-> create_index ( index  => 'twitter', mappings   => { tweet   => { properties  => { title  => { type => 'string' }, created => { type => 'date'  } }  } } );
Add a mapping $es-> put_mapping (  index => 'twitter', type  => ' user ', mapping   => { properties  => { name  => { type => 'string' }, created => { type => 'date'  }, }  } );
Can add to existing mapping
Can add to existing mapping Cannot change mapping for field
Core field types { type  => 'string', }
Core field types { type  => 'string', # byte|short|integer|long|double|float # date, ip addr, geolocation # boolean # binary (as base 64) }
Core field types { type  => 'string', index  => ' analyzed ', # 'Foo Bar'  ⇒  [ 'foo', 'bar' ] }
Core field types { type  => 'string', index  => ' not_analyzed ', # 'Foo Bar'  ⇒  [ 'Foo Bar' ] }
Core field types { type  => 'string', index  => ' no ', # 'Foo Bar'  ⇒  [ ] }
Core field types { type  => 'string', index  => 'analyzed', analyzer  => 'default', }
Core field types { type  => 'string', index  => 'analyzed', index_ analyzer  => 'default', search_ analyzer => 'default', }
Core field types { type  => 'string', index  => 'analyzed', analyzer  => 'default', boost  => 2, }
Core field types { type  => 'string', index  => 'analyzed', analyzer  => 'default', boost  => 2, include_in_all  => 1 |0 }
[object Object]
Simple
Whitespace
Stop
Keyword Built in analyzers ,[object Object]
Language
Snowball
Custom
The Brown-Cow's Part_No.  #A.BC123-456 joe@bloggs.com keyword: The Brown-Cow's Part_No. #A.BC123-456 joe@bloggs.com whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, joe@bloggs.com simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com
Token filters ,[object Object]
ASCII Folding
Length
Lowercase
NGram
Edge NGram
Porter Stem
Shingle
Stop
Word Delimiter ,[object Object]
KStem
Snowball
Phonetic
Synonym
Compound Word
Reverse
Elision
Truncate
Unique
Custom Analyzer $c->create_index( index  => 'twitter', settings  => { analysis => { analyzer => { ascii_html => { type  => 'custom', tokenizer  => 'standard', filter  => [ qw( standard lowercase asciifolding stop ) ], char_filter => ['html_strip'] } } }} );
Searching $result = $es->search( index  => 'twitter', type  => 'tweet',  );
Searching $result = $es->search( index  =>  ['twitter','facebook'] , type  =>  ['tweet','post'] ,  );
Searching $result = $es->search( #  all indices #  all types );
Searching $result = $es->search( index  => 'twitter', type  => 'tweet',  query  => { text => { _all => 'foo' }}, );
Searching $result = $es->search( index  => 'twitter', type  => 'tweet', query b   =>  'foo' , #  b == ElasticSearch::SearchBuilder );
Searching $result = $es->search( index  => 'twitter', type  => 'tweet', query  => { text => { _all => 'foo' }}, sort  => [{ '_score': 'desc' }] );
Searching $result = $es->search( index  => 'twitter', type  => 'tweet', query  => { text => { _all => 'foo' }}, sort  => [{ '_score': 'desc' }] from  => 0, size  => 10, );
Query DSL
Queries   vs  Filters
Queries   vs  Filters  ,[object Object],[object Object]
Queries   vs  Filters  ,[object Object]
relevance scoring ,[object Object]
no scoring
Queries   vs  Filters  ,[object Object]
relevance scoring
slower ,[object Object]
no scoring
faster
Queries   vs  Filters  ,[object Object]
relevance scoring
slower
no caching ,[object Object]
no scoring
faster
cacheable
Queries   vs  Filters  ,[object Object]
relevance scoring
slower
no caching ,[object Object]
no scoring
faster
cacheable  Use filters for anything that doesn't affect the relevance score!
Query only Query DSL: $es->search(  query => {  text => { title => 'perl' }  } ); SearchBuilder: $es->search(  query b  => {  title => 'perl'  } );
Filter only Query DSL: $es->search( query => { constant_score => { filter => { term => { tag => 'perl } } } }); SearchBuilder: $es->search( query b  => { -filter => {  tag => 'perl'  } });
Query and filter Query DSL: $es->search( query => { filtered  => { query => {  text => { title => 'perl' } }, filter =>{  term => { tag => 'perl'  } } } }); SearchBuilder: $es->search( query b  => { title  => 'perl', -filter => {  tag => 'perl'  }  });

Contenu connexe

Tendances

Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraMarkus Lanthaler
 
Referentiemodel Enterprise Content Management
Referentiemodel Enterprise Content ManagementReferentiemodel Enterprise Content Management
Referentiemodel Enterprise Content ManagementDanny Greefhorst
 
Content Management with MongoDB by Mark Helmstetter
 Content Management with MongoDB by Mark Helmstetter Content Management with MongoDB by Mark Helmstetter
Content Management with MongoDB by Mark HelmstetterMongoDB
 
Integrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and WilmaIntegrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and WilmaDalton Valadares
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020andyseaborne
 
Frontend Crash Course: HTML and CSS
Frontend Crash Course: HTML and CSSFrontend Crash Course: HTML and CSS
Frontend Crash Course: HTML and CSSThinkful
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman
 

Tendances (20)

JSON-LD and MongoDB
JSON-LD and MongoDBJSON-LD and MongoDB
JSON-LD and MongoDB
 
Advanced Json
Advanced JsonAdvanced Json
Advanced Json
 
JSON-LD
JSON-LDJSON-LD
JSON-LD
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and Hydra
 
ShEx vs SHACL
ShEx vs SHACLShEx vs SHACL
ShEx vs SHACL
 
Referentiemodel Enterprise Content Management
Referentiemodel Enterprise Content ManagementReferentiemodel Enterprise Content Management
Referentiemodel Enterprise Content Management
 
Data Modeling with NGSI, NGSI-LD
Data Modeling with NGSI, NGSI-LDData Modeling with NGSI, NGSI-LD
Data Modeling with NGSI, NGSI-LD
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Content Management with MongoDB by Mark Helmstetter
 Content Management with MongoDB by Mark Helmstetter Content Management with MongoDB by Mark Helmstetter
Content Management with MongoDB by Mark Helmstetter
 
Integrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and WilmaIntegrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and Wilma
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020
 
SHACL Overview
SHACL OverviewSHACL Overview
SHACL Overview
 
Frontend Crash Course: HTML and CSS
Frontend Crash Course: HTML and CSSFrontend Crash Course: HTML and CSS
Frontend Crash Course: HTML and CSS
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Formularze progstrintzao
Formularze progstrintzaoFormularze progstrintzao
Formularze progstrintzao
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)
 
Introduction to JSON
Introduction to JSONIntroduction to JSON
Introduction to JSON
 
Html projects for beginners
Html projects for beginnersHtml projects for beginners
Html projects for beginners
 
Naming Conventions
Naming ConventionsNaming Conventions
Naming Conventions
 

Similaire à Terms of endearment - the ElasticSearch Query DSL explained

Php Basic Security
Php Basic SecurityPhp Basic Security
Php Basic Securitymussawir20
 
High-level Web Testing
High-level Web TestingHigh-level Web Testing
High-level Web Testingpetersergeant
 
Exploiting Php With Php
Exploiting Php With PhpExploiting Php With Php
Exploiting Php With PhpJeremy Coates
 
PHP 102: Out with the Bad, In with the Good
PHP 102: Out with the Bad, In with the GoodPHP 102: Out with the Bad, In with the Good
PHP 102: Out with the Bad, In with the GoodJeremy Kendall
 
Intro python
Intro pythonIntro python
Intro pythonkamzilla
 
Drupal Lightning FAPI Jumpstart
Drupal Lightning FAPI JumpstartDrupal Lightning FAPI Jumpstart
Drupal Lightning FAPI Jumpstartguestfd47e4c7
 
03 Php Array String Functions
03 Php Array String Functions03 Php Array String Functions
03 Php Array String FunctionsGeshan Manandhar
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)MongoSF
 
Forum Presentation
Forum PresentationForum Presentation
Forum PresentationAngus Pratt
 
Intro to #memtech PHP 2011-12-05
Intro to #memtech PHP   2011-12-05Intro to #memtech PHP   2011-12-05
Intro to #memtech PHP 2011-12-05Jeremy Kendall
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchclintongormley
 
Haml & Sass presentation
Haml & Sass presentationHaml & Sass presentation
Haml & Sass presentationbryanbibat
 
Ods Markup And Tagsets: A Tutorial
Ods Markup And Tagsets: A TutorialOds Markup And Tagsets: A Tutorial
Ods Markup And Tagsets: A Tutorialsimienc
 
Introduction into Struts2 jQuery Grid Tags
Introduction into Struts2 jQuery Grid TagsIntroduction into Struts2 jQuery Grid Tags
Introduction into Struts2 jQuery Grid TagsJohannes Geppert
 

Similaire à Terms of endearment - the ElasticSearch Query DSL explained (20)

Php Basic Security
Php Basic SecurityPhp Basic Security
Php Basic Security
 
High-level Web Testing
High-level Web TestingHigh-level Web Testing
High-level Web Testing
 
Exploiting Php With Php
Exploiting Php With PhpExploiting Php With Php
Exploiting Php With Php
 
PHP 102: Out with the Bad, In with the Good
PHP 102: Out with the Bad, In with the GoodPHP 102: Out with the Bad, In with the Good
PHP 102: Out with the Bad, In with the Good
 
HTML5 Web Forms
HTML5 Web FormsHTML5 Web Forms
HTML5 Web Forms
 
Intro python
Intro pythonIntro python
Intro python
 
JSP Custom Tags
JSP Custom TagsJSP Custom Tags
JSP Custom Tags
 
Sencha Touch Intro
Sencha Touch IntroSencha Touch Intro
Sencha Touch Intro
 
JQuery 101
JQuery 101JQuery 101
JQuery 101
 
Drupal Lightning FAPI Jumpstart
Drupal Lightning FAPI JumpstartDrupal Lightning FAPI Jumpstart
Drupal Lightning FAPI Jumpstart
 
JQuery Basics
JQuery BasicsJQuery Basics
JQuery Basics
 
03 Php Array String Functions
03 Php Array String Functions03 Php Array String Functions
03 Php Array String Functions
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
 
Forum Presentation
Forum PresentationForum Presentation
Forum Presentation
 
Intro to #memtech PHP 2011-12-05
Intro to #memtech PHP   2011-12-05Intro to #memtech PHP   2011-12-05
Intro to #memtech PHP 2011-12-05
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearch
 
Haml & Sass presentation
Haml & Sass presentationHaml & Sass presentation
Haml & Sass presentation
 
Ods Markup And Tagsets: A Tutorial
Ods Markup And Tagsets: A TutorialOds Markup And Tagsets: A Tutorial
Ods Markup And Tagsets: A Tutorial
 
Introduction into Struts2 jQuery Grid Tags
Introduction into Struts2 jQuery Grid TagsIntroduction into Struts2 jQuery Grid Tags
Introduction into Struts2 jQuery Grid Tags
 
Mojolicious on Steroids
Mojolicious on SteroidsMojolicious on Steroids
Mojolicious on Steroids
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Terms of endearment - the ElasticSearch Query DSL explained

  • 1. “ Terms of Endearment” The ElasticSearch query language explained Clinton Gormley, YAPC::EU 2011 DRTECH @clintongormley
  • 2. search for : “ DELETE QUERY ” We can
  • 3. search for : “ DELETE QUERY ” and find : “ deleteByQuery ” We can
  • 4. but you can only find what is stored in the database
  • 5. Normalise values “ deleteByQuery” 'delete' 'by' 'query' 'deletebyquery'
  • 6. Normalise values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  • 7. Normalise values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  • 8. Analyse values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  • 9. What is stored in ElasticSearch?
  • 10. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Document:
  • 11. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Fields:
  • 12. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Values:
  • 13. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org" }, tags => [" perl" ,"opinion"], posts => 2, } Field types: # object # string # date # nested object # string # string # array of enums # integer
  • 14. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Nested objects flattened:
  • 15. { tweet => "Perl is GREAT!", posted => "2011-08-15", user.name => "Clinton Gormley", user.email => "drtech@cpan.org", tags => [" perl" ,"opinion"], posts => 2, } Nested objects flattened
  • 16. { tweet => "Perl is GREAT!", posted => "2011-08-15", user.name => "Clinton Gormley", user.email => "drtech@cpan.org", tags => [" perl" ,"opinion"], posts => 2, } Values analyzed into terms
  • 17. { tweet => ['perl','great'], posted => [Date(2011-08-15)], user.name => ['clinton','gormley'], user.email => ['drtech','cpan.org'], tags => [' perl' ,'opinion'], posts => [2], } Values analyzed into terms
  • 18. database table row ⇒ many tables ⇒ many rows ⇒ one schema ⇒ many columns In MySQL
  • 19. index type document ⇒ many types ⇒ many documents ⇒ one mapping ⇒ many fields In ElasticSearch
  • 20. Create index with mappings $es-> create_index ( index => 'twitter', mappings => { tweet => { properties => { title => { type => 'string' }, created => { type => 'date' } } } } );
  • 21. Add a mapping $es-> put_mapping ( index => 'twitter', type => ' user ', mapping => { properties => { name => { type => 'string' }, created => { type => 'date' }, } } );
  • 22. Can add to existing mapping
  • 23. Can add to existing mapping Cannot change mapping for field
  • 24. Core field types { type => 'string', }
  • 25. Core field types { type => 'string', # byte|short|integer|long|double|float # date, ip addr, geolocation # boolean # binary (as base 64) }
  • 26. Core field types { type => 'string', index => ' analyzed ', # 'Foo Bar' ⇒ [ 'foo', 'bar' ] }
  • 27. Core field types { type => 'string', index => ' not_analyzed ', # 'Foo Bar' ⇒ [ 'Foo Bar' ] }
  • 28. Core field types { type => 'string', index => ' no ', # 'Foo Bar' ⇒ [ ] }
  • 29. Core field types { type => 'string', index => 'analyzed', analyzer => 'default', }
  • 30. Core field types { type => 'string', index => 'analyzed', index_ analyzer => 'default', search_ analyzer => 'default', }
  • 31. Core field types { type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, }
  • 32. Core field types { type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, include_in_all => 1 |0 }
  • 33.
  • 36. Stop
  • 37.
  • 41. The Brown-Cow's Part_No. #A.BC123-456 joe@bloggs.com keyword: The Brown-Cow's Part_No. #A.BC123-456 joe@bloggs.com whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, joe@bloggs.com simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com
  • 42.
  • 46. NGram
  • 50. Stop
  • 51.
  • 52. KStem
  • 61. Custom Analyzer $c->create_index( index => 'twitter', settings => { analysis => { analyzer => { ascii_html => { type => 'custom', tokenizer => 'standard', filter => [ qw( standard lowercase asciifolding stop ) ], char_filter => ['html_strip'] } } }} );
  • 62. Searching $result = $es->search( index => 'twitter', type => 'tweet', );
  • 63. Searching $result = $es->search( index => ['twitter','facebook'] , type => ['tweet','post'] , );
  • 64. Searching $result = $es->search( # all indices # all types );
  • 65. Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, );
  • 66. Searching $result = $es->search( index => 'twitter', type => 'tweet', query b => 'foo' , # b == ElasticSearch::SearchBuilder );
  • 67. Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] );
  • 68. Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] from => 0, size => 10, );
  • 70. Queries vs Filters
  • 71.
  • 72.
  • 73.
  • 75.
  • 77.
  • 80.
  • 83.
  • 87.
  • 90.
  • 93. cacheable Use filters for anything that doesn't affect the relevance score!
  • 94. Query only Query DSL: $es->search( query => { text => { title => 'perl' } } ); SearchBuilder: $es->search( query b => { title => 'perl' } );
  • 95. Filter only Query DSL: $es->search( query => { constant_score => { filter => { term => { tag => 'perl } } } }); SearchBuilder: $es->search( query b => { -filter => { tag => 'perl' } });
  • 96. Query and filter Query DSL: $es->search( query => { filtered => { query => { text => { title => 'perl' } }, filter =>{ term => { tag => 'perl' } } } }); SearchBuilder: $es->search( query b => { title => 'perl', -filter => { tag => 'perl' } });
  • 98. Filters : equality Query DSL: { term => { tags => 'perl' }} { terms => { tags => ['perl','ruby'] }} SearchBuilder: { tags => 'perl' } { tags => ['perl','ruby'] }
  • 99. Filters : range Query DSL: { range => { date => { gte => '2010-11-01', lt => '2010-12-01' }} SearchBuilder: { date => { gte => '2010-11-01', lt => '2011-12-01' }}
  • 100. Filters : range (many values) Query DSL: { numeric_range => { date => { gte => '2010-11-01', lt => '2010-12-01 }} SearchBuilder: { date => { ' >= ' => '2010-11-01', ' < ' => '2011-12-01' }}
  • 101. Filters : and | or | not Query DSL: { and => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { not => { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ] }} SearchBuilder: { X => 1, Y => 2 } [ X => 1, Y => 2 ] { -not => { X => 1, Y => 2 } } # and { -not => [ X => 1, Y => 2 ] } # or
  • 102. Filters : exists | missing Query DSL: { exists => { field => 'title' }} { missing => { field => 'title' }} SearchBuilder: { -exists => 'title' } { -missing => 'title' }
  • 103. Filter example SearchBuilder: { -filter => [ featured => 1, { created_at => { gt => '2011-08-01' }, status => { '!=' => 'pending' }, }, ] }
  • 104. Filter example Query DSL: { constant_score => { filter => { or => [ { term => { featured => 1 }}, { and => [ { not => { term => { status => 'pending' }}, { range => { created_at => { gt => '2011-08-01' }}}, ] } ] } } }
  • 105.
  • 106. nested
  • 108. query
  • 110. prefix
  • 111.
  • 112. type
  • 117.
  • 120.
  • 121. range
  • 122. prefix
  • 123. fuzzy
  • 125. ids
  • 126.
  • 128.
  • 129.
  • 131.
  • 134.
  • 137.
  • 138. range
  • 139. prefix
  • 140. fuzzy
  • 142. ids
  • 143.
  • 145.
  • 146.
  • 148.
  • 153. Text/Analyzed Queries analyzed ⇒ text query using search_analyzer
  • 154. Text-Query Family Query DSL: { text => { title => 'great perl' }} Search Builder: { title => 'great perl' }
  • 155. Text-Query Family Query DSL: { text => { title => { query => 'great perl' }}} Search Builder: { title => { '=' => { query => 'great perl' }}}
  • 156. Text-Query Family Query DSL: { text => { title => { query => 'great perl' , operator => 'and' }}} Search Builder: { title => { '=' => { query => 'great perl', operator => 'and' }}}
  • 157. Text-Query Family Query DSL: { text => { title => { query => 'great perl' , fuzziness => 0.5 }}} Search Builder: { title => { '=' => { query => 'great perl', fuzziness => 0.5 }}}
  • 158. Text-Query Family Query DSL: { text => { title => { query => 'great perl', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'great perl', }}}
  • 159. Text-Query Family Query DSL: { text => { title => { query => ' great perl ', type => 'phrase' }}} Search Builder: { title => { '==' => { query => ' great perl ', }}}
  • 160. Text-Query Family Query DSL: { text => { title => { query => ' perl is great ', type => 'phrase' }}} Search Builder: { title => { '==' => { query => ' perl is great ', }}}
  • 161. Text-Query Family Query DSL: { text => { title => { query => ' perl great ', type => 'phrase', slop => 3 }}} Search Builder: { title => { '==' => { query => ' perl great ', slop => 3 }}}
  • 162. Text-Query Family Query DSL: { text => { title => { query => ' perl is gr ', type => ' phrase_prefix ', }}} Search Builder: { title => { '^' => { query => ' perl is gr ', }}}
  • 163. Query string / Field Lucene Query Syntax aware “ perl is great”~5 AND author:clint* -deleted
  • 164. Query string / Field Syntax errors: AND perl is great ” author : clint* -
  • 165. Query string / Field Syntax errors: AND perl is great ” author : clint* - ElasticSearch::QueryParser
  • 166. Combining: Bool Query DSL: { bool => { must => [ { term => { foo => 1}}, ... ], must_not => [ { term => { bar => 1}}, ... ], should => [ { term => { X => 2}}, { term => { Y => 2}},... ], minimum_number_should_match => 1, }}
  • 167. Combining: Bool SearchBuilder: { foo => 1, bar => { '!=' => 1}, -or => [ X => 2, Y => 2], } { -bool => { must => { foo => 1 }, must_not => { bar => 1 }, should => [{ X => 2}, { Y => 2 }], minimum_number_should_match => 1, }}
  • 168. Combining: DisMax Query DSL: { dis_max => { queries => [ { term => { foo => 1}}, { term => { bar => 1}}, ] }} SearchBuilder: { -dis_max => [ { term => { foo => 1}}, { term => { bar => 1}}, ], }
  • 169. Bool: combines scores DisMax: uses highest score from all matching clauses
  • 172. Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string” }, }
  • 173. Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string”, boost => 2, }, }, }
  • 174. Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string”, boost => 2, }, rank => { type => “integer” }, }, _boost => { name => 'rank', null_value => 1.0 }, }
  • 175. Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => 'perl' }}, ] }} SearchBuilder: { content => 'perl', title => 'perl' }
  • 176. Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl' }} }
  • 177. Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', boost => 2 }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl', boost=> 2 }} }
  • 178. Boosting: custom_score Query DSL: { custom_score => { query => { text => { title => 'perl' }}, script => “_score * foo /doc['rank'].value”, }} SearchBuilder: { -custom_score => { query => { title => 'perl' }, script => “_score * foo /doc['rank'].value”, }}
  • 179. Query example SearchBuilder: { -or => [ title => { '=' => { query => 'custom score', boost => 2 }}, content => 'custom score', ], -filter => { repo => 'elasticsearch/elasticsearch', created_at => { '>=' => '2011-07-01', '<' => '2011-08-01'}, -or => [ creator_id => 123, assignee_id => 123, ], labels => ['bug','breaking'] } }
  • 180. Query example Query DSL: { query => { filtered => { query => { bool => { should => [ { text => { content => &quot;custom score&quot; } }, { text => { title => { boost => 2, query => &quot;custom score&quot; } } }, ], }, }, filter => { and => [ { or => [ { term => { creator_id => 123 } }, { term => { assignee_id => 123 } }, ]}, { terms => { labels => [&quot;bug&quot;, &quot;breaking&quot;] } }, { term => { repo => &quot;elasticsearch/elasticsearch&quot; } }, { numeric_range => { created_at => { gte => &quot;2011-07-01&quot;, lt => &quot;2011-08-01&quot; }}}, ]}, }}
  • 181.