Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Kyiv.py #16 october 2015

502 vues

Publié le

Talk about ElasticSearch in python world.

Publié dans : Internet
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Kyiv.py #16 october 2015

  1. 1. Kyiv.py #16 Andrii Soldatenko 24 October 2015 @a_soldatenko
  2. 2. ElasticSearch in Python world. Andrii Soldatenko 24 October 2015 @a_soldatenko
  3. 3. About me: • Software Engineer in Test at • Speaker at PyCon Russian 2015 • Speaker at PyCon Ukraine 2014 • Speaker at PyCon Belarus 2015 • in past:
  4. 4. Preface
  5. 5. Information Explosion
  6. 6. Text Search grep --ignore-case --recursive foo books/ grep --ignore-case --recursive --file=words.txt books/ Entry.objects.get(headline__icontains='foo') words = [] with open('words.txt', 'r') as f: words = f.readlines() Entry.objects.get(headline__icontains_in=words)
  7. 7. Full text search
  8. 8. Search index
  9. 9. Simple sentences 1. The quick brown fox jumped over the lazy dog 2. Quick brown foxes leap over lazy dogs in summer
  10. 10. Inverted index Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------
  11. 11. Inverted index Term Doc_1 Doc_2 ------------------------- brown | X | X quick | X | ------------------------ Total | 2 | 1
  12. 12. Inverted index: normalization Term Doc_1 Doc_2 ------------------------- brown | X | X dog | X | X fox | X | X in | | X jump | X | X lazy | X | X over | X | X quick | X | X summer | | X the | X | X ------------------------ Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------
  13. 13. Search Engines
  14. 14. ElasticSearch
  15. 15. Who uses ElasticSearch?
  16. 16. ElasticSearch: Quick Intro Relational DB Databases TablesRows Columns ElasticSearch Indices FieldsTypes Documents
  17. 17. ElasticSearch: Quick Intro PUT /haystack/user/1 { "first_name" : "Andrii", "last_name" : "Soldatenko", "age" : 30, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ], "likes": [ "python", "django" ] }
  18. 18. ElasticSearch: Locks •Pessimistic concurrency control •Optimistic concurrency control
  19. 19. ElasticSearch: Setup #!/bin/bash VERSION=1.7.1 curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-$VERSION unzip elasticsearch-$VERSION.zip cd elasticsearch-$VERSION # Download plugin marvel ./bin/plugin -i elasticsearch/marvel/latest echo 'marvel.agent.enabled: false' >> ./config/elasticsearch.yml # run elastic ./bin/elasticsearch -d
  20. 20. ElasticSearch: Setup $ curl ‘http://localhost:9200/?pretty' { "status" : 200, "name" : "Dredmund Druid", "cluster_name" : "elasticsearch", "version" : { "number" : "1.7.1", "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19", "build_timestamp" : "2015-07-29T09:54:16Z", "build_snapshot" : false, "lucene_version" : "4.10.4" }, "tagline" : "You Know, for Search" }
  21. 21. ElasticSearch: Settings curl -X POST 'http://localhost:9200/<index_name>/_close' curl -XPUT "http://localhost:9200/<index_name>/_settings" -d' { "settings": { "analysis": { "analyzer": { "my_analyzer": { "type": "standard", "stopwords": [ "and", "the" ] } } } } }' curl -X POST 'http://localhost:9200/<index_name>/_open'
  22. 22. Haystack
  23. 23. Adding search functionality to Simple Model $ cat myapp/models.py from django.db import models from django.contrib.auth.models import User class Page(models.Model): user = models.ForeignKey(User) name = models.CharField(max_length=200) description = models.TextField() def __unicode__(self): return self.name
  24. 24. Haystack: Installation $ pip install django-haystack $ cat settings.py INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites', # Added. 'haystack', # Then your usual apps... 'blog', ]
  25. 25. Haystack: Settings $ pip install elasticsearch $ cat settings.py ... HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', }, } ...
  26. 26. Haystack: Creating SearchIndexes $ cat myapp/search_indexes.py import datetime from haystack import indexes from myapp.models import Note class PageIndex(indexes.SearchIndex, indexes.Indexable): text = indexes.CharField(document=True, use_template=True) author = indexes.CharField(model_attr='user') pub_date = indexes.DateTimeField(model_attr='pub_date') def get_model(self): return Note def index_queryset(self, using=None): """Used when the entire index for model is updated.""" return self.get_model().objects. filter(pub_date__lte=datetime.datetime.now())
  27. 27. Haystack: SearchQuerySet API from haystack.query import SearchQuerySet from haystack.inputs import Raw all_results = SearchQuerySet().all() hello_results = SearchQuerySet().filter(content='hello') unfriendly_results = SearchQuerySet(). exclude(content=‘hello’). filter(content=‘world’) # To send unescaped data: sqs = SearchQuerySet().filter(title=Raw(trusted_query))
  28. 28. How to configure elasticSearch? https://github.com/django-haystack/django-haystack/blob/ 9d92d4da0a1ec75978fc3949375dda9a1707469f/haystack/ backends/elasticsearch_backend.py#L41
  29. 29. ElasticSearch settings
  30. 30. ElasticStack backend https://github.com/bennylope/elasticstack HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'elasticstack.backends.ConfigurableElasticSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', }, } ELASTICSEARCH_INDEX_SETTINGS = {} ELASTICSEARCH_DEFAULT_ANALYZER = 'synonym_analyzer'
  31. 31. Keeping data in sync # Update everything. ./manage.py update_index --settings=settings.prod # Update everything with lots of information about what's going on. ./manage.py update_index --settings=settings.prod --verbosity=2 # Update everything, cleaning up after deleted models. ./manage.py update_index --remove --settings=settings.prod # Update everything changed in the last 2 hours. ./manage.py update_index --age=2 --settings=settings.prod # Update everything between Dec. 1, 2011 & Dec 31, 2011 ./manage.py update_index --start='2011-12-01T00:00:00' --end='2011-12-31T23:59:59' -- settings=settings.prod
  32. 32. Signals class RealtimeSignalProcessor(BaseSignalProcessor): """ Allows for observing when saves/deletes fire & automatically updates the search engine appropriately. """ def setup(self): # Naive (listen to all model saves). models.signals.post_save.connect(self.handle_save) models.signals.post_delete.connect(self.handle_delete) # Efficient would be going through all backends & collecting all models # being used, then hooking up signals only for those. def teardown(self): # Naive (listen to all model saves). models.signals.post_save.disconnect(self.handle_save) models.signals.post_delete.disconnect(self.handle_delete) # Efficient would be going through all backends & collecting all models # being used, then disconnecting signals only for those.
  33. 33. Haystack: Pros and Cons Pros: • easy to setup • looks like Django ORM but for searches • search engine independent • support 4 engines (Elastic, Solr, Xapian, Whoosh) Cons: • poor SearchQuerySet API • difficult to manage stop words • loose performance, because extra layer • Model - based
  34. 34. Final Thoughts https://www.elastic.co/guide/en/elasticsearch/guide/master/ index.html
  35. 35. Thank You @a_soldatenko https://asoldatenko.com
  36. 36. Questions ?

×