SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
Présentation ElasticSearch
1
Indexation d’un annuaire de restaurant
● Titre
● Description
● Prix
● Adresse
● Type
2
Création d’un index sans mapping
PUT restaurant
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
3
Indexation sans mapping
PUT restaurant/restaurant/1
{
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
4
Risque de l’indexation sans mapping
PUT restaurant/restaurant/2
{
"title": "Pizza de l'ormeau",
"description": "Dans cette pizzeria on trouve
des pizzas très bonnes et très variés",
"price": 10,
"adresse": "1 place de l'ormeau, 31400
TOULOUSE",
"type": "italien"
}
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: "Pizza de
l'ormeau""
}
},
"status": 400
} 5
Mapping inféré
GET /restaurant/_mapping
{
"restaurant": {
"mappings": {
"restaurant": {
"properties": {
"adresse": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"prix": {
"type": "long"
},
"title": {
"type": "long"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
6
Création d’un mapping
PUT :url/restaurant
{
"settings": {
"index": {"number_of_shards": 3, "number_of_replicas": 2}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"description": {"type": "text"},
"price": {"type": "integer"},
"adresse": {"type": "text"},
"type": { "type": "keyword"}
}
}
}
}
7
Indexation de quelques restaurants
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse":
"10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13
route de labège, 31400 TOULOUSE", "type": "asiatique"}
8
Recherche basique
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatique"
}
}
}
{
"hits": {
"total": 1,
"max_score": 0.6395861,
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix
contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
9
Mise en défaut de notre mapping
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatiques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
10
Qu’est ce qu’un analyseur
● Transforme une chaîne de caractères en token
○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”]
● Les tokens permettent de construire un index inversé
11
Qu’est ce qu’un index inversé
12
Explication: analyseur par défaut
GET /_analyze
{
"analyzer": "standard",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [{
"token": "un",
"start_offset": 0, "end_offset": 2,
"type": "<ALPHANUM>", "position": 0
},{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiatique",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "très",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieux",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
}
13
Explication: analyseur “french”
GET /_analyze
{
"analyzer": "french",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [
{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiat",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "trè",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieu",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
} 14
Décomposition d’un analyseur
Elasticsearch décompose l’analyse en trois étapes:
● Filtrage des caractères (ex: suppression de balises html)
● Découpage en “token”
● Filtrage des tokens:
○ Suppression de token (mot vide de sens “un”, “le”, “la”)
○ Transformation (lemmatisation...)
○ Ajout de tokens (synonyme)
15
Décomposition de l’analyseur french
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "elision",
"articles_case": true,
"articles": [
"l", "m", "t", "qu", "n", "s", "j", "d", "c",
"jusqu", "quoiqu", "lorsqu", "puisqu"
]
}, {
"type": "stop", "stopwords": "_french_"
}, {
"type": "stemmer", "language": "french"
}
],
"text": "ce n'est qu'un restaurant asiatique très copieux"
}
“ce n’est qu’un restaurant asiatique très
copieux”
[“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“ce”, “est”, “un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“restaurant”, “asiatique”, “très”, “copieux”]
[“restaurant”, “asiat”, “trè”, “copieu”]
elision
standard tokenizer
stopwords
french stemming
16
Spécification de l’analyseur dans le mapping
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {fields: {"type": "text", "analyzer": "french"}},
"description": {"type": "text", "analyzer": "french"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"type": { "type": "keyword"}
}
}
}
}
17
Recherche résiliente aux erreurs de frappe
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
18
Une solution le ngram token filter
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "ngram",
"min_gram": 3,
"max_gram": 7
}
],
"text": "asiatuque"
}
[
"asi",
"asia",
"asiat",
"asiatu",
"asiatuq",
"sia",
"siat",
"siatu",
"siatuq",
"siatuqu",
"iat",
"iatu",
"iatuq",
"iatuqu",
"iatuque",
"atu",
"atuq",
"atuqu",
"atuque",
"tuq",
"tuqu",
"tuque",
"uqu",
"uque",
"que"
]
19
Création d’un analyseur custom pour utiliser le ngram filter
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "ngram_analyzer"},
"description": {"type": "text", "analyzer": "ngram_analyzer"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "ngram_analyzer"},
"type": {"type": "keyword"}
}
}
}
20
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.60128295,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}, {
"_score": 0.46237043,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où
tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
21
Bruit induit par le ngram
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "gastronomique"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.6277555,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat
coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},{
"_score": 0.56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un
prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
22
Spécifier plusieurs analyseurs pour un champs
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {
"type": "text", "analyzer": "french",
"fields": {
"ngram": { "type": "text", "analyzer": "ngram_analyzer"}
},
"price": {"type": "integer"},
23
Utilisation de plusieurs champs lors d’une recherche
GET /restaurant/restaurant/_search
{
"query": {
"multi_match": {
"query": "gastronomique",
"fields": [
"description^4",
"description.ngram"
]
}
}
}
{
"hits": {
"hits": [
{
"_score": 2.0649285,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},
{
"_score": 0 .56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
{
"_index": "restaurant",
24
Ignorer ou ne pas ignorer les stopwords tel est la question
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price":
42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"}
25
Les stopwords ne sont pas
forcément vide de sens
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": 42,
"description": "Un restaurant gastronomique donc
cher ou tout plat coûte cher (42 euros)",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
}
},{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
et pas cher",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
26
Modification de l’analyser french
pour garder les stopwords
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision",
"articles_case": true,
"articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"}
},
"analyzer": {
"custom_french": {
"tokenizer": "standard",
"filter": [
"french_elision",
"lowercase",
"french_stemmer"
]
}
27
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant
asiatique très copieux et pas cher",
"price": 14,
"adresse": "13 route de labège,
31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
28
Rechercher avec les stopwords sans diminuer les
performances
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": {
"query": "restaurant pas
cher",
"cutoff_frequency": 0.01
}
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": [
{"term": {"description": "restaurant"}},
{"term": {"description": "cher"}}]
}
},
"should": [
{"match": {
"description": "pas"
}}
]
}
29
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"functions": [{
"filter": { "terms": { "type": ["asiatique", "italien"]}},
"weight": 2
}]
}
}
}
30
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"script_score": {
"script": {
"lang": "painless",
"inline": "_score * ( 1 + 10/doc['prix'].value)"
}
}
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.53484553,
"_source": {
"title": "Pizza de l'ormeau",
"price": 10,
"adresse": "1 place de l'ormeau, 31400 TOULOUSE",
"type": "italien"
}
}, {
"_score": 0.26742277,
"_source": {
"title": 42,
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
}, {
"_score": 0.26742277,
"_source": {
"title": "Chez l'oncle chan",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
31
Comment indexer les documents multilingues
Trois cas:
● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"})
○ Ngram
○ Analysé plusieurs fois le même champs avec un analyseur par langage
● Un champ par langue:
○ Facile car on peut spécifier un analyseur différent par langue
○ Attention de ne pas se retrouver avec un index parsemé
● Une version du document par langue (à favoriser)
○ Un index par document
○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique)
32
Gestion des synonymes
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision", "articles_case": true,
"articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"},
"french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]}
},
"analyzer": {
"french_with_synonym": {
"tokenizer": "standard",
"filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"coord": {"type": "geo_point"},
33
Gestions des synonymes
GET /restaurant/restaurant/_search
{
"query": {
"match": {"description": "sous-marins"}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne,
31520 RAMONVILLE",
"type": "fastfood",
"coord": "43.5577519,1.4625753"
}
}
]
}
}
34
Données géolocalisées
PUT /restaurant
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {"type": "text", "analyzer": "french"
},
"price": {"type": "integer"},
"adresse": {"type": "text","analyzer": "french"},
"coord": {"type": "geo_point"},
"type": { "type": "keyword"}
}
}
}
}
35
Données géolocalisées
POST restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents",
"price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés",
"price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14,
"adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"}
{"index": {"_id": 4}}
{"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8,
"adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"}
{"index": {"_id": 5}}
{"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood",
"coord": "43.5577519,1.4625753"}
{"index": {"_id": 6}}
{"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
Filtrage et trie sur données
géolocalisées
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"filter": [
{"term": {"type":"français"}},
{"geo_distance": {
"distance": "1km",
"coord": {"lat": 43.5739329, "lon": 1.4893669}
}}
]
}
},
"sort": [{
"geo_distance": {
"coord": {"lat": 43.5739329, "lon": 1.4893669},
"unit": "km"
}
}]
{
"hits": {
"hits": [
{
"_source": {
"title": "bistronomique",
"description": "Un restaurant bon mais un petit peu cher, les desserts sont
"price": 17,
"adresse": "73 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.57417,1.4905748"
},
"sort": [0.10081529266640063]
},{
"_source": {
"title:": "L'évidence",
"description": "restaurant copieux et pas cher, cependant c'est pas bon",
"price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.5770109,1.4846573"
},
"sort": [0.510960087579506]
},{
"_source": {
"title:": "Chez Ingalls",
"description": "Contemporain et rustique, ce restaurant avec cheminée sert
savoyardes et des grillades",
37
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {"match": {"description": "sandwitch"}},
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}}
],
"must_not": [
{"match_phrase": {
"description": "pas bon"
}}
],
"filter": [
{"range": {"price": {
"lte": "20"
}}}
]
}
} 38
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}},
{"match": {"description": "service rapide"}}
],
"minimum_number_should_match": 2
}
}
}
39
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": "-"pas bon" +(pizzi~2 OR sandwitch)"
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must_not": {
"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"type": "phrase",
"query": "pas bon"
}
},
"should": [
{"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"fuziness": 2,
"max_expansions": 50,
"query": "pizzi"
}
},
{"multi_match": {
"fields": [ "description", , "title^2", "adresse",
"type"],
"query": "sandwitch"
} 40
Alias: comment se donner des marges de manoeuvre
PUT /restaurant_v1/
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"lat": {"type": "double"},
"lon": {"type": "double"}
}
}
}
}
POST /_aliases
{
"actions": [
{"add": {"index": "restaurant_v1", "alias": "restaurant_search"}},
{"add": {"index": "restaurant_v1", "alias": "restaurant_write"}}
]
}
41
Alias, Pipeline et reindexion
PUT /restaurant_v2
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"position": {"type": "geo_point"}
}
}
}
}
PUT /_ingest/pipeline/fixing_position
{
"description": "move lat lon into position parameter",
"processors": [
{"rename": {"field": "lat", "target_field": "position.lat"}},
{"rename": {"field": "lon", "target_field": "position.lon"}}
]
}
POST /_aliases
{
"actions": [
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_search"}},
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_write"}},
{"add": {"index": "restaurant_v2", "alias":
"restaurant_search"}},
{"add": {"index": "restaurant_v2", "alias": "restaurant_write"}}
]
}
POST /_reindex
{
"source": {"index": "restaurant_v1"},
"dest": {"index": "restaurant_v2", "pipeline": "fixing_position"}
}
42
Analyse des données des interventions des pompiers
de 2005 à 2014
PUT /pompier
{
"mappings": {
"intervention": {
"properties": {
"date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"},
"type_incident": { "type": "keyword" },
"description_groupe": { "type": "keyword" },
"caserne": { "type": "integer"},
"ville": { "type": "keyword"},
"arrondissement": { "type": "keyword"},
"division": {"type": "integer"},
"position": {"type": "geo_point"},
"nombre_unites": {"type": "integer"}
}
}
}
}
43
Voir les différents incidents
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"type_incident": {
"terms": {"field": "type_incident", "size": 100}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{"key": "Premier répondant", "doc_count": 437891},
{"key": "Appel de Cie de détection", "doc_count": 76157},
{"key": "Alarme privé ou locale", "doc_count": 60879},
{"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734},
{"key": "10-22 sans feu", "doc_count": 29283},
{"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663},
{"key": "Inondation", "doc_count": 26801},
{"key": "Problèmes électriques", "doc_count": 23495},
{"key": "Aliments surchauffés", "doc_count": 23428},
{"key": "Odeur suspecte - gaz", "doc_count": 21158},
{"key": "Déchets en feu", "doc_count": 18007},
{"key": "Ascenseur", "doc_count": 12703},
{"key": "Feu de champ *", "doc_count": 11518},
{"key": "Structure dangereuse", "doc_count": 9958},
{"key": "10-22 avec feu", "doc_count": 9876},
{"key": "Alarme vérification", "doc_count": 8328},
{"key": "Aide à un citoyen", "doc_count": 7722},
{"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351},
{"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232},
{"key": "Feu de véhicule extérieur", "doc_count": 5943},
{"key": "Fausse alerte 10-19", "doc_count": 4680},
{"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494},
{"key": "Assistance serv. muni.", "doc_count": 3431},
{"key": "Avertisseur de CO", "doc_count": 2542},
{"key": "Fuite gaz naturel 10-22", "doc_count": 1928},
{"key": "Matières dangereuses / 10-22", "doc_count": 1905},
{"key": "Feu de bâtiment", "doc_count": 1880},
{"key": "Senteur de feu à l'extérieur", "doc_count": 1566},
{"key": "Surchauffe - véhicule", "doc_count": 1499},
{"key": "Feu / Agravation possible", "doc_count": 1281},
{"key": "Fuite gaz naturel 10-09", "doc_count": 1257},
{"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015},
{"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971},
44
Agrégations imbriquées
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"ville": {
"terms": {"field": "ville"},
"aggs": {
"arrondissement": {
"terms": {"field": "arrondissement"}
}
}
}
}
}
{
"aggregations": {"ville": {"buckets": [
{
"key": "Montréal", "doc_count": 768955,
"arrondissement": {"buckets": [
{"key": "Ville-Marie", "doc_count": 83010},
{"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272},
{"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933},
{"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951},
{"key": "Rosemont / Petite-Patrie", "doc_count": 59213},
{"key": "Ahuntsic / Cartierville", "doc_count": 57721},
{"key": "Plateau Mont-Royal", "doc_count": 53344},
{"key": "Montréal-Nord", "doc_count": 40757},
{"key": "Sud-Ouest", "doc_count": 39936},
{"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139}
]}
}, {
"key": "Dollard-des-Ormeaux", "doc_count": 17961,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13452},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477},
{"key": "Pierrefonds / Senneville", "doc_count": 10},
{"key": "Dorval / Ile Dorval", "doc_count": 8},
{"key": "Pointe-Claire", "doc_count": 8},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6}
]}
}, {
"key": "Pointe-Claire", "doc_count": 17925,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13126},
{"key": "Pointe-Claire", "doc_count": 4766},
{"key": "Dorval / Ile Dorval", "doc_count": 12},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7},
{"key": "Kirkland", "doc_count": 7},
{"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1},
{"key": "St-Laurent", "doc_count": 1}
45
Calcul de moyenne et trie d'agrégation
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"avg_nombre_unites_general": {
"avg": {"field": "nombre_unites"}
},
"type_incident": {
"terms": {
"field": "type_incident",
"size": 5,
"order" : {"avg_nombre_unites": "desc"}
},
"aggs": {
"avg_nombre_unites": {
"avg": {"field": "nombre_unites"}
}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{
"key": "Feu / 5e Alerte", "doc_count": 162,
"avg_nombre_unites": {"value": 70.9074074074074}
}, {
"key": "Feu / 4e Alerte", "doc_count": 100,
"avg_nombre_unites": {"value": 49.36}
}, {
"key": "Troisième alerte/autre que BAT", "doc_count": 1,
"avg_nombre_unites": {"value": 43.0}
}, {
"key": "Feu / 3e Alerte", "doc_count": 173,
"avg_nombre_unites": {"value": 41.445086705202314}
}, {
"key": "Deuxième alerte/autre que BAT", "doc_count": 8,
"avg_nombre_unites": {"value": 37.5}
}
]
},
"avg_nombre_unites_general": {"value": 2.1374461758713728}
}
} 46
Percentile
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"unites_percentile": {
"percentiles": {
"field": "nombre_unites",
"percents": [25, 50, 75, 100]
}
}
}
}
{
"aggregations": {
"unites_percentile": {
"values": {
"25.0": 1.0,
"50.0": 1.0,
"75.0": 3.0,
"100.0": 275.0
}
}
}
}
47
Histogram
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"unites_histogram": {
"histogram": {
"field": "nombre_unites",
"order": {"_key": "asc"},
"interval": 1
},
"aggs": {
"ville": {
"terms": {"field": "ville", "size": 1}
}
}
}
}
}
{
"aggregations": {
"unites_histogram": {
"buckets": [
{
"key": 1.0, "doc_count": 23507,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]}
},{
"key": 2.0, "doc_count": 1550,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]}
},{
"key": 3.0, "doc_count": 563,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]}
},{
"key": 4.0, "doc_count": 449,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]}
},{
"key": 5.0, "doc_count": 310,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]}
},{
"key": 6.0, "doc_count": 215,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]}
},{
"key": 7.0, "doc_count": 136,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]}
},{
"key": 8.0, "doc_count": 35,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]}
},{
"key": 9.0, "doc_count": 10,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 10.0, "doc_count": 11,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 11.0, "doc_count": 2,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]}
48
“Significant term”
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"ville": {
"significant_terms": {"field": "ville", "size": 5, "percentage": {}}
}
}
}
{
"aggregations": {
"ville": {
"doc_count": 26801,
"buckets": [
{
"key": "Ile-Bizard",
"score": 0.10029498525073746,
"doc_count": 68, "bg_count": 678
},
{
"key": "Montréal-Nord",
"score": 0.0826544804291675,
"doc_count": 416, "bg_count": 5033
},
{
"key": "Roxboro",
"score": 0.08181818181818182,
"doc_count": 27, "bg_count": 330
},
{
"key": "Côte St-Luc",
"score": 0.07654825526563974,
"doc_count": 487, "bg_count": 6362
},
{
"key": "Saint-Laurent",
"score": 0.07317073170731707,
"doc_count": 465, "bg_count": 6355
49
Agrégation et données géolocalisées
GET :url/pompier/interventions/_search
{
"size": 0,
"query": {
"regexp": {"type_incident": "Feu.*"}
},
"aggs": {
"distance_from_here": {
"geo_distance": {
"field": "position",
"unit": "km",
"origin": {
"lat": 45.495902,
"lon": -73.554263
},
"ranges": [
{ "to": 2},
{"from":2, "to": 4},
{"from":4, "to": 6},
{"from": 6, "to": 8},
{"from": 8}]
}
}
}
{
"aggregations": {
"distance_from_here": {
"buckets": [
{
"key": "*-2.0",
"from": 0.0,
"to": 2.0,
"doc_count": 80
},
{
"key": "2.0-4.0",
"from": 2.0,
"to": 4.0,
"doc_count": 266
},
{
"key": "4.0-6.0",
"from": 4.0,
"to": 6.0,
"doc_count": 320
},
{
"key": "6.0-8.0",
"from": 6.0,
"to": 8.0,
"doc_count": 326
},
{
"key": "8.0-*",
"from": 8.0,
"doc_count": 1720
}
]
}
}
}
50
Il y a t-il des questions ?
? 51
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": ""service rapide"~2"
}
}
}
"hits": {
"hits": [
{
"_source": {
"title:": "Un fastfood très connu",
"description": "service très rapide,
rapport qualité/prix médiocre",
"price": 8,
"adresse": "210 route de narbonne, 31520
RAMONVILLE",
"type": "fastfood",
"coord": "43.5536343,1.476165"
}
},{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne, 31520
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": {
"slop": 2,
"query": "service rapide"
}
}
}
52

Contenu connexe

Tendances

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 
NoSQL-MongoDB介紹
NoSQL-MongoDB介紹NoSQL-MongoDB介紹
NoSQL-MongoDB介紹國昭 張
 
Indexing and Performance Tuning
Indexing and Performance TuningIndexing and Performance Tuning
Indexing and Performance TuningMongoDB
 
Elastic Search
Elastic SearchElastic Search
Elastic SearchNavule Rao
 
Real time entity resolution with elasticsearch - haystack 2018
Real time entity resolution with elasticsearch - haystack 2018Real time entity resolution with elasticsearch - haystack 2018
Real time entity resolution with elasticsearch - haystack 2018OpenSource Connections
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaVladimir Kostyukov
 
DATA STRUCTURE CLASS 12 .pptx
DATA STRUCTURE CLASS 12 .pptxDATA STRUCTURE CLASS 12 .pptx
DATA STRUCTURE CLASS 12 .pptxPritishMitra3
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 

Tendances (14)

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
NoSQL-MongoDB介紹
NoSQL-MongoDB介紹NoSQL-MongoDB介紹
NoSQL-MongoDB介紹
 
Graph.pptx
Graph.pptxGraph.pptx
Graph.pptx
 
Lists
ListsLists
Lists
 
Indexing and Performance Tuning
Indexing and Performance TuningIndexing and Performance Tuning
Indexing and Performance Tuning
 
XQuery
XQueryXQuery
XQuery
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Real time entity resolution with elasticsearch - haystack 2018
Real time entity resolution with elasticsearch - haystack 2018Real time entity resolution with elasticsearch - haystack 2018
Real time entity resolution with elasticsearch - haystack 2018
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in Scala
 
DATA STRUCTURE CLASS 12 .pptx
DATA STRUCTURE CLASS 12 .pptxDATA STRUCTURE CLASS 12 .pptx
DATA STRUCTURE CLASS 12 .pptx
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 

Plus de LINAGORA

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels LINAGORA
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !LINAGORA
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques LINAGORA
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS MeetupLINAGORA
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFILINAGORA
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)LINAGORA
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseLINAGORA
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalLINAGORA
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésLINAGORA
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »LINAGORA
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet LINAGORA
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLINAGORA
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wirelessLINAGORA
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du CloudLINAGORA
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPLINAGORA
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINIDLINAGORA
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...LINAGORA
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...LINAGORA
 

Plus de LINAGORA (20)

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFI
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entreprise
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec Drupal
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivités
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipal
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wireless
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du Cloud
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAP
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINID
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
 

Dernier

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Dernier (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA

  • 2. Indexation d’un annuaire de restaurant ● Titre ● Description ● Prix ● Adresse ● Type 2
  • 3. Création d’un index sans mapping PUT restaurant { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } } 3
  • 4. Indexation sans mapping PUT restaurant/restaurant/1 { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } 4
  • 5. Risque de l’indexation sans mapping PUT restaurant/restaurant/2 { "title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } { "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [title]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [title]", "caused_by": { "type": "number_format_exception", "reason": "For input string: "Pizza de l'ormeau"" } }, "status": 400 } 5
  • 6. Mapping inféré GET /restaurant/_mapping { "restaurant": { "mappings": { "restaurant": { "properties": { "adresse": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "description": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "prix": { "type": "long" }, "title": { "type": "long" }, "type": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } } 6
  • 7. Création d’un mapping PUT :url/restaurant { "settings": { "index": {"number_of_shards": 3, "number_of_replicas": 2} }, "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "description": {"type": "text"}, "price": {"type": "integer"}, "adresse": {"type": "text"}, "type": { "type": "keyword"} } } } } 7
  • 8. Indexation de quelques restaurants POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 8
  • 9. Recherche basique GET :url/restaurant/_search { "query": { "match": { "description": "asiatique" } } } { "hits": { "total": 1, "max_score": 0.6395861, "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 9
  • 10. Mise en défaut de notre mapping GET :url/restaurant/_search { "query": { "match": { "description": "asiatiques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 10
  • 11. Qu’est ce qu’un analyseur ● Transforme une chaîne de caractères en token ○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”] ● Les tokens permettent de construire un index inversé 11
  • 12. Qu’est ce qu’un index inversé 12
  • 13. Explication: analyseur par défaut GET /_analyze { "analyzer": "standard", "text": "Un restaurant asiatique très copieux" } { "tokens": [{ "token": "un", "start_offset": 0, "end_offset": 2, "type": "<ALPHANUM>", "position": 0 },{ "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiatique", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "très", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieux", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 13
  • 14. Explication: analyseur “french” GET /_analyze { "analyzer": "french", "text": "Un restaurant asiatique très copieux" } { "tokens": [ { "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiat", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "trè", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieu", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 14
  • 15. Décomposition d’un analyseur Elasticsearch décompose l’analyse en trois étapes: ● Filtrage des caractères (ex: suppression de balises html) ● Découpage en “token” ● Filtrage des tokens: ○ Suppression de token (mot vide de sens “un”, “le”, “la”) ○ Transformation (lemmatisation...) ○ Ajout de tokens (synonyme) 15
  • 16. Décomposition de l’analyseur french GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "elision", "articles_case": true, "articles": [ "l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu" ] }, { "type": "stop", "stopwords": "_french_" }, { "type": "stemmer", "language": "french" } ], "text": "ce n'est qu'un restaurant asiatique très copieux" } “ce n’est qu’un restaurant asiatique très copieux” [“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”, “très”, “copieux”] [“ce”, “est”, “un”, “restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiat”, “trè”, “copieu”] elision standard tokenizer stopwords french stemming 16
  • 17. Spécification de l’analyseur dans le mapping { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } }, "mappings": { "restaurant": { "properties": { "title": {fields: {"type": "text", "analyzer": "french"}}, "description": {"type": "text", "analyzer": "french"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "type": { "type": "keyword"} } } } } 17
  • 18. Recherche résiliente aux erreurs de frappe GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 18
  • 19. Une solution le ngram token filter GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "ngram", "min_gram": 3, "max_gram": 7 } ], "text": "asiatuque" } [ "asi", "asia", "asiat", "asiatu", "asiatuq", "sia", "siat", "siatu", "siatuq", "siatuqu", "iat", "iatu", "iatuq", "iatuqu", "iatuque", "atu", "atuq", "atuqu", "atuque", "tuq", "tuqu", "tuque", "uqu", "uque", "que" ] 19
  • 20. Création d’un analyseur custom pour utiliser le ngram filter PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}} } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "ngram_analyzer"}, "description": {"type": "text", "analyzer": "ngram_analyzer"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "ngram_analyzer"}, "type": {"type": "keyword"} } } } 20
  • 21. GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "hits": [ { "_score": 0.60128295, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_score": 0.46237043, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" 21
  • 22. Bruit induit par le ngram GET /restaurant/restaurant/_search { "query": { "match": { "description": "gastronomique" } } } { "hits": { "hits": [ { "_score": 0.6277555, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_score": 0.56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, 22
  • 23. Spécifier plusieurs analyseurs pour un champs PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]} } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "fields": { "ngram": { "type": "text", "analyzer": "ngram_analyzer"} }, "price": {"type": "integer"}, 23
  • 24. Utilisation de plusieurs champs lors d’une recherche GET /restaurant/restaurant/_search { "query": { "multi_match": { "query": "gastronomique", "fields": [ "description^4", "description.ngram" ] } } } { "hits": { "hits": [ { "_score": 2.0649285, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0 .56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_index": "restaurant", 24
  • 25. Ignorer ou ne pas ignorer les stopwords tel est la question POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 25
  • 26. Les stopwords ne sont pas forcément vide de sens GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } 26
  • 27. Modification de l’analyser french pour garder les stopwords PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"} }, "analyzer": { "custom_french": { "tokenizer": "standard", "filter": [ "french_elision", "lowercase", "french_stemmer" ] } 27
  • 28. GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 28
  • 29. Rechercher avec les stopwords sans diminuer les performances GET /restaurant/restaurant/_search { "query": { "match": { "description": { "query": "restaurant pas cher", "cutoff_frequency": 0.01 } } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must": { "bool": { "should": [ {"term": {"description": "restaurant"}}, {"term": {"description": "cher"}}] } }, "should": [ {"match": { "description": "pas" }} ] } 29
  • 30. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "functions": [{ "filter": { "terms": { "type": ["asiatique", "italien"]}}, "weight": 2 }] } } } 30
  • 31. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "script_score": { "script": { "lang": "painless", "inline": "_score * ( 1 + 10/doc['prix'].value)" } } } } } { "hits": { "hits": [ { "_score": 0.53484553, "_source": { "title": "Pizza de l'ormeau", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } }, { "_score": 0.26742277, "_source": { "title": 42, "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0.26742277, "_source": { "title": "Chez l'oncle chan", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 31
  • 32. Comment indexer les documents multilingues Trois cas: ● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"}) ○ Ngram ○ Analysé plusieurs fois le même champs avec un analyseur par langage ● Un champ par langue: ○ Facile car on peut spécifier un analyseur différent par langue ○ Attention de ne pas se retrouver avec un index parsemé ● Une version du document par langue (à favoriser) ○ Un index par document ○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique) 32
  • 33. Gestion des synonymes PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"}, "french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]} }, "analyzer": { "french_with_synonym": { "tokenizer": "standard", "filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"] } } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "coord": {"type": "geo_point"}, 33
  • 34. Gestions des synonymes GET /restaurant/restaurant/_search { "query": { "match": {"description": "sous-marins"} } } { "hits": { "hits": [ { "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753" } } ] } } 34
  • 35. Données géolocalisées PUT /restaurant { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": {"type": "text", "analyzer": "french" }, "price": {"type": "integer"}, "adresse": {"type": "text","analyzer": "french"}, "coord": {"type": "geo_point"}, "type": { "type": "keyword"} } } } } 35
  • 36. Données géolocalisées POST restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents", "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"} {"index": {"_id": 4}} {"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"} {"index": {"_id": 5}} {"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753"} {"index": {"_id": 6}} {"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
  • 37. Filtrage et trie sur données géolocalisées GET /restaurant/restaurant/_search { "query": { "bool": { "filter": [ {"term": {"type":"français"}}, {"geo_distance": { "distance": "1km", "coord": {"lat": 43.5739329, "lon": 1.4893669} }} ] } }, "sort": [{ "geo_distance": { "coord": {"lat": 43.5739329, "lon": 1.4893669}, "unit": "km" } }] { "hits": { "hits": [ { "_source": { "title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748" }, "sort": [0.10081529266640063] },{ "_source": { "title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573" }, "sort": [0.510960087579506] },{ "_source": { "title:": "Chez Ingalls", "description": "Contemporain et rustique, ce restaurant avec cheminée sert savoyardes et des grillades", 37
  • 38. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "must": {"match": {"description": "sandwitch"}}, "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}} ], "must_not": [ {"match_phrase": { "description": "pas bon" }} ], "filter": [ {"range": {"price": { "lte": "20" }}} ] } } 38
  • 39. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}}, {"match": {"description": "service rapide"}} ], "minimum_number_should_match": 2 } } } 39
  • 40. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": "-"pas bon" +(pizzi~2 OR sandwitch)" } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must_not": { "multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "type": "phrase", "query": "pas bon" } }, "should": [ {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "fuziness": 2, "max_expansions": 50, "query": "pizzi" } }, {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "query": "sandwitch" } 40
  • 41. Alias: comment se donner des marges de manoeuvre PUT /restaurant_v1/ { "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "lat": {"type": "double"}, "lon": {"type": "double"} } } } } POST /_aliases { "actions": [ {"add": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v1", "alias": "restaurant_write"}} ] } 41
  • 42. Alias, Pipeline et reindexion PUT /restaurant_v2 { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "position": {"type": "geo_point"} } } } } PUT /_ingest/pipeline/fixing_position { "description": "move lat lon into position parameter", "processors": [ {"rename": {"field": "lat", "target_field": "position.lat"}}, {"rename": {"field": "lon", "target_field": "position.lon"}} ] } POST /_aliases { "actions": [ {"remove": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"remove": {"index": "restaurant_v1", "alias": "restaurant_write"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_write"}} ] } POST /_reindex { "source": {"index": "restaurant_v1"}, "dest": {"index": "restaurant_v2", "pipeline": "fixing_position"} } 42
  • 43. Analyse des données des interventions des pompiers de 2005 à 2014 PUT /pompier { "mappings": { "intervention": { "properties": { "date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"}, "type_incident": { "type": "keyword" }, "description_groupe": { "type": "keyword" }, "caserne": { "type": "integer"}, "ville": { "type": "keyword"}, "arrondissement": { "type": "keyword"}, "division": {"type": "integer"}, "position": {"type": "geo_point"}, "nombre_unites": {"type": "integer"} } } } } 43
  • 44. Voir les différents incidents GET /pompier/interventions/_search { "size": 0, "aggs": { "type_incident": { "terms": {"field": "type_incident", "size": 100} } } } { "aggregations": { "type_incident": { "buckets": [ {"key": "Premier répondant", "doc_count": 437891}, {"key": "Appel de Cie de détection", "doc_count": 76157}, {"key": "Alarme privé ou locale", "doc_count": 60879}, {"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734}, {"key": "10-22 sans feu", "doc_count": 29283}, {"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663}, {"key": "Inondation", "doc_count": 26801}, {"key": "Problèmes électriques", "doc_count": 23495}, {"key": "Aliments surchauffés", "doc_count": 23428}, {"key": "Odeur suspecte - gaz", "doc_count": 21158}, {"key": "Déchets en feu", "doc_count": 18007}, {"key": "Ascenseur", "doc_count": 12703}, {"key": "Feu de champ *", "doc_count": 11518}, {"key": "Structure dangereuse", "doc_count": 9958}, {"key": "10-22 avec feu", "doc_count": 9876}, {"key": "Alarme vérification", "doc_count": 8328}, {"key": "Aide à un citoyen", "doc_count": 7722}, {"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351}, {"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232}, {"key": "Feu de véhicule extérieur", "doc_count": 5943}, {"key": "Fausse alerte 10-19", "doc_count": 4680}, {"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494}, {"key": "Assistance serv. muni.", "doc_count": 3431}, {"key": "Avertisseur de CO", "doc_count": 2542}, {"key": "Fuite gaz naturel 10-22", "doc_count": 1928}, {"key": "Matières dangereuses / 10-22", "doc_count": 1905}, {"key": "Feu de bâtiment", "doc_count": 1880}, {"key": "Senteur de feu à l'extérieur", "doc_count": 1566}, {"key": "Surchauffe - véhicule", "doc_count": 1499}, {"key": "Feu / Agravation possible", "doc_count": 1281}, {"key": "Fuite gaz naturel 10-09", "doc_count": 1257}, {"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015}, {"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971}, 44
  • 45. Agrégations imbriquées GET /pompier/interventions/_search { "size": 0, "aggs": { "ville": { "terms": {"field": "ville"}, "aggs": { "arrondissement": { "terms": {"field": "arrondissement"} } } } } } { "aggregations": {"ville": {"buckets": [ { "key": "Montréal", "doc_count": 768955, "arrondissement": {"buckets": [ {"key": "Ville-Marie", "doc_count": 83010}, {"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272}, {"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933}, {"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951}, {"key": "Rosemont / Petite-Patrie", "doc_count": 59213}, {"key": "Ahuntsic / Cartierville", "doc_count": 57721}, {"key": "Plateau Mont-Royal", "doc_count": 53344}, {"key": "Montréal-Nord", "doc_count": 40757}, {"key": "Sud-Ouest", "doc_count": 39936}, {"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139} ]} }, { "key": "Dollard-des-Ormeaux", "doc_count": 17961, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13452}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477}, {"key": "Pierrefonds / Senneville", "doc_count": 10}, {"key": "Dorval / Ile Dorval", "doc_count": 8}, {"key": "Pointe-Claire", "doc_count": 8}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6} ]} }, { "key": "Pointe-Claire", "doc_count": 17925, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13126}, {"key": "Pointe-Claire", "doc_count": 4766}, {"key": "Dorval / Ile Dorval", "doc_count": 12}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7}, {"key": "Kirkland", "doc_count": 7}, {"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1}, {"key": "St-Laurent", "doc_count": 1} 45
  • 46. Calcul de moyenne et trie d'agrégation GET /pompier/interventions/_search { "size": 0, "aggs": { "avg_nombre_unites_general": { "avg": {"field": "nombre_unites"} }, "type_incident": { "terms": { "field": "type_incident", "size": 5, "order" : {"avg_nombre_unites": "desc"} }, "aggs": { "avg_nombre_unites": { "avg": {"field": "nombre_unites"} } } } } { "aggregations": { "type_incident": { "buckets": [ { "key": "Feu / 5e Alerte", "doc_count": 162, "avg_nombre_unites": {"value": 70.9074074074074} }, { "key": "Feu / 4e Alerte", "doc_count": 100, "avg_nombre_unites": {"value": 49.36} }, { "key": "Troisième alerte/autre que BAT", "doc_count": 1, "avg_nombre_unites": {"value": 43.0} }, { "key": "Feu / 3e Alerte", "doc_count": 173, "avg_nombre_unites": {"value": 41.445086705202314} }, { "key": "Deuxième alerte/autre que BAT", "doc_count": 8, "avg_nombre_unites": {"value": 37.5} } ] }, "avg_nombre_unites_general": {"value": 2.1374461758713728} } } 46
  • 47. Percentile GET /pompier/interventions/_search { "size": 0, "aggs": { "unites_percentile": { "percentiles": { "field": "nombre_unites", "percents": [25, 50, 75, 100] } } } } { "aggregations": { "unites_percentile": { "values": { "25.0": 1.0, "50.0": 1.0, "75.0": 3.0, "100.0": 275.0 } } } } 47
  • 48. Histogram GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "unites_histogram": { "histogram": { "field": "nombre_unites", "order": {"_key": "asc"}, "interval": 1 }, "aggs": { "ville": { "terms": {"field": "ville", "size": 1} } } } } } { "aggregations": { "unites_histogram": { "buckets": [ { "key": 1.0, "doc_count": 23507, "ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]} },{ "key": 2.0, "doc_count": 1550, "ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]} },{ "key": 3.0, "doc_count": 563, "ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]} },{ "key": 4.0, "doc_count": 449, "ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]} },{ "key": 5.0, "doc_count": 310, "ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]} },{ "key": 6.0, "doc_count": 215, "ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]} },{ "key": 7.0, "doc_count": 136, "ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]} },{ "key": 8.0, "doc_count": 35, "ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]} },{ "key": 9.0, "doc_count": 10, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 10.0, "doc_count": 11, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 11.0, "doc_count": 2, "ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]} 48
  • 49. “Significant term” GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "ville": { "significant_terms": {"field": "ville", "size": 5, "percentage": {}} } } } { "aggregations": { "ville": { "doc_count": 26801, "buckets": [ { "key": "Ile-Bizard", "score": 0.10029498525073746, "doc_count": 68, "bg_count": 678 }, { "key": "Montréal-Nord", "score": 0.0826544804291675, "doc_count": 416, "bg_count": 5033 }, { "key": "Roxboro", "score": 0.08181818181818182, "doc_count": 27, "bg_count": 330 }, { "key": "Côte St-Luc", "score": 0.07654825526563974, "doc_count": 487, "bg_count": 6362 }, { "key": "Saint-Laurent", "score": 0.07317073170731707, "doc_count": 465, "bg_count": 6355 49
  • 50. Agrégation et données géolocalisées GET :url/pompier/interventions/_search { "size": 0, "query": { "regexp": {"type_incident": "Feu.*"} }, "aggs": { "distance_from_here": { "geo_distance": { "field": "position", "unit": "km", "origin": { "lat": 45.495902, "lon": -73.554263 }, "ranges": [ { "to": 2}, {"from":2, "to": 4}, {"from":4, "to": 6}, {"from": 6, "to": 8}, {"from": 8}] } } } { "aggregations": { "distance_from_here": { "buckets": [ { "key": "*-2.0", "from": 0.0, "to": 2.0, "doc_count": 80 }, { "key": "2.0-4.0", "from": 2.0, "to": 4.0, "doc_count": 266 }, { "key": "4.0-6.0", "from": 4.0, "to": 6.0, "doc_count": 320 }, { "key": "6.0-8.0", "from": 6.0, "to": 8.0, "doc_count": 326 }, { "key": "8.0-*", "from": 8.0, "doc_count": 1720 } ] } } } 50
  • 51. Il y a t-il des questions ? ? 51
  • 52. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": ""service rapide"~2" } } } "hits": { "hits": [ { "_source": { "title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165" } },{ "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": { "slop": 2, "query": "service rapide" } } } 52