53. Thank you! Eυχαριστούμε! Grazie!
And btw: we are hiring a senior search relevance engineer!
bit.ly/2Orb8bc
https://www.bloomberg.com/careers
Malvina Josephidou & Diego Ceccarelli
Bloomberg
@malvijosephidou |@diegoceccarelli
#Activate18 #ActivateSearch
Notes de l'éditeur
Malvina starts here
Diego 16 slides
Malvina starts here
https://pxhere.com/en/photo/1143377https://creativecommons.org/publicdomain/zero/1.0/You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
https://upload.wikimedia.org/wikipedia/commons/a/a7/Cute_Sloth.jpg the sloth is under CC commons, can be reused https://commons.wikimedia.org/wiki/File:Cute_Sloth.jpg You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission
Every update request in Solr is passed through a chain of events defined in the solr config under the updaterequestprocessorchain. Updaterequestprocessors allow us to create new fields from existing ones or change existing fields at index time. We use updaterequestprocessors to compute features at index time and write those values under new feature fields which are kept in the index. In this snippet of code we add in the updaterequestprocessorchain a custom made updaterequestprocessor that looks at the wire field and if it finds the string it will create this new feature field with value 1 else it will create it it with the default value of 0. And we can define other such updaterequestprocessors with our feature computation logic, for example a feature might involve counting terms in a multivaluedfield and we can easily do that too.
https://www.pexels.com/photo/baby-child-close-up-crying-47090/ CC0 License
✓ Free for personal and commercial use
✓ No attribution required
Picture from
https://pxhere.com/en/photo/1143377
Licence is CC0 Public Domain
Free for personal and commercial use
No attribution required Learn more
/* Change FieldValueFeature to use docvalues to fetch a feature value when the field has docValues
Docvalues are faster for this particular case because they build a forward index from documents to values. They
Public domain license: https://www.goodfreephotos.com/vector-images/you-shall-not-pass-sign-with-gandalf-vector-clipart.png.php the Work may be freely reproduced, distributed, transmitted, used, modified, built upon, or otherwise exploited by anyone for any purpose, commercial or non-commercial, and in any way, including by methods that have not yet been invented or conceived.
https://www.pexels.com/photo/attraction-building-city-hotel-415999/ CCO licenseYou can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission
In cases where we do grouping and ask for group.limit=1 only it is possible to skip the second grouping step. In our test datasets it improved speed by around 40%.
Essentially, in the first grouping step each shard returns the top K groups based on the highest scoring document in each group. The top K groups from each shard are merged in the federator and in the second step we ask all the shards to return the top documents from each of the top ranking groups.
If we only want to return the highest scoring document per group we can return the top document id in the first step, merge results in the federator to retain the top K groups and then skip the second grouping step entirely. This is possible provided that:
We do not need to know the total number of matching documents per groupb) Within group sort and between group sort is the same.
The LTR optimization is to then to compute the LTR score on the top-ranking document per group only (rather than all members of a group)
instead of applying the model to each document in each group and each shard, apply LTR on the top document per group where ‘top’ is determined by the Solr score. So documents in grouped are ranked using Solr score, groups are ranked using LTR.
Malvina starts here
We were ready to roll out the model.https://www.pexels.com/photo/view-ape-thinking-primate-33535/ CC0 license
What we realized at this point was that there was no model. There was something called ltr1 we had developed using features computed offline but we realized that model was broken.We had product on our backs, our code worked and was fast enough, but we didn’t actually have anything to roll out and we also hadn't written a single line of code related to training models using features logged in production.