Gain insight into the state-of-the-art deep learning algorithms being used to power e-commerce search at Target and how to customize Solr to blend multiple ML signals at a large scale.
Speakers:
Aashish Dattani, Lead Data Engineer, Target
Richard Wang, Principal AI Scientist, Target
Sunil Srinivasan, Lead Engineer, Target
Spark NLP: State of the Art Natural Language Processing at Scale
Similaire à Using Deep Learning and Customized Solr Components to Improve search Relevancy at Target - Aashish Dattani, Richard Wang & Sunil Srinivasan, Target
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
Similaire à Using Deep Learning and Customized Solr Components to Improve search Relevancy at Target - Aashish Dattani, Richard Wang & Sunil Srinivasan, Target (20)
3. Target
• 1,855 stores in the United States
• 39 distribution centers in the United
States
• 350,000+ team members worldwide
• Online business at target.com
• Global offices in China, Hong Kong and
India
4. About us
S U N I L S R I N I VA S A N
Lead Engineer
A A S H I S H D AT TA N I
Lead AI Engineer
R I C H A R D WA N G
Principal AI Engineer
5. Agenda
• Solr at Target
• Architecture Overview
• Solr Components
• Deep Learning
6. Moved away from proprietary engine
to Solr
Growing index by the day
Highly performant engine
Customized for relevancy
and store availability
5 YEARSO N S O L R
2+
MILLIONS K U S
P95–
8. Querying Solr
Searchable Attributes using eDisMax query parser
• Title - Women's Sling Backpack - Universal Thread
• Category - Women > Women's Accessories > Handbags > Fashion Backpacks
• Item Type - Backpacks
• Description - Keep your essentials close at hand with this Sling Backpack from Universal Thread™.
• Augmented/Normalized data
– feet to ft, quart to qt, in to inch, “ to inch , etc..
9. Querying Solr
R E C A L L A N D P R E C I S I O N C O N T R O L L E D B Y A C O M B I N AT I O N O F
Category/Attribute classification (bq parameter)
– “student desk” belongs to `desks’ category/sku hierarchy
Filtering based on attributes (fq parameter)
– “student desk” restricts to `desks’ , ‘hutch tops’, ‘kids desk’ categories
Elevate to show list of most popular items (customized component)
– query to popular sku based on ranking signal
Precision component that filters out skus based on a threshold
10. Solr Components
C U S T O M C O M P O N E N T S
T O I M P R O V E R E L E VA N C Y, W E U S E A C O M B I N AT I O N O F C U S T O M I Z E D P O S T F I L T E R S
A N D C O M P O N E N T S
• Precision Control (post filter)
• Score Combination Function (post filter)
• Custom Elevate (component)
11. Precision Control
T W O - PA S S P R O B L E M
Filter out documents based on
score distribution
This requires us to do two
passes!
S O L U T I O N
Post-filter API has collect()
and finish() methods
Do first pass in collect() and
second pass in finish()
score
doc rank
40%
14. Combining scores
D I F F E R E N T S C O R I N G F U N C T I O N S
• Linear weighted combination: w1s1 + w2s2 + … + wNsN
• Polynomial combination: w1s1
n1 + w2s2
n2 + … + wNsN
nN
• Step functions
– Different functions based on score tier
– Each tier optimizes for a different metric
15. Signal sources
L O O K I N G U P VA L U E S
• Category/Brand/Attribute boost – Reverse index
– e.g. brand:goodfellow^20
• SKU-level query-dependent boost – Reverse index
– e.g. sku:1145367 is top selling SKU for a given query
• SKU-level query-independent boost – Forward index (docValues)
– e.g. sku:1145367 based on newness
16. Elevate component
D E S C R I P T I O N
• Force certain results to the top of the ranking order
• Takes precedence over other sort profiles (e.g. score)
L I M I TAT I O N S
• Can only read from a static .xml file
• Does not allow for reading ranks from different sources
18. Custom Elevate
C U S T O M I Z E D F E AT U R E S
• Bury SKUs to the bottom of the result list
• Input elevated values via URL parameters
– e.g. …&elevate=sku:1,sku:2,sku:3&bury=sku:10,sku:11
• Read elevated signals from doc values (forward lookup)
– e.g. store availability etc.
19. Query Understanding
Objective: To accurately and fully understand user intent (in terms of
product attributes) based on input search query.
Example query: “c9 running shoes for boys”
• Brand: C9 Champion
• Gender: male
• Item type: athletic shoes, sneakers
• Age group: kids, toddler, junior
• Material: polyester, plastic, nylon
We treat this as a classification problem, and we designed a classification
framework that, for each product attribute, can automatically generate a
model to classify any query into that attribute.
20. Query Classification Overview
First, we gather abundant training data
1. User searches → behavior data (click, add to cart, purchase, etc.)
2. Product attributes (categories, colors, sizes, brands, gender, etc.)
Second, we train machine-learned models (per attribute)
Training data consists of a list of (query, attribute value) pairs:
• For category attribute: (“shoes”, athletic shoes), (“shoes”, sneakers), etc.
During prediction (serving) time
Input: any search query (e.g. “student desk”)
Output: a list of predicted attribute values (e.g. desks, kids desk, hutch tops, etc.), each with
a probability, that are passed to Solr via the bq, fq, and a custom parameter.
22. Training Data Preparation
We use (1) Clickstream And (2) Product Attribute data:
(1) Search query → Product SKUs clicked/carted/purchased
– Past 2 years of clickstream data, 1.5M+ unique queries post-filtering
(2) Product SKU → Product attribute values
– Attributes (categories, gender, brands, etc.) are from Target’s item catalog (2M+ SKUs)
Combining (1) and (2) above, we get:
• Search query → list of attribute values, each with a score
• Score of a attribute value V given a query Q is:
≈ 𝑃(𝑉 | 𝑄) =
# 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑉 𝑖𝑠 𝑐𝑙𝑖𝑐𝑘𝑒𝑑,𝑐𝑎𝑟𝑡𝑒𝑑,𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑑 𝑔𝑖𝑣𝑒𝑛 𝑄
𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑄
– For the category attribute & for the query “running shoes”:
athletic shoes (0.5), sneakers (0.2), … sandals (0.01)
23. Neural Model
Training
Our hyperparameters:
Embedding dimension: d = 100
Region sizes (n-grams): 1, 2, 3, 4, 5
Filters per region: 64
Drop-out rate: 0.2
Max tokens per query: 10
# of output classes: varies depending on attribute
room
essentials
full
size
bedding
sheet
set
24. Evaluation Metrics
Precision of a query: # of correct predicted attribute values over total # of predictions
for that query from the classifier
• The higher the precision, the more accurate the predictions are.
Recall of a query: # of correct predicted attribute values over total # of attribute values
there are for that query in the test set
• The higher the recall, the more coverage of those attribute values in the test set.
Top-N accuracy:
• For a query, if any of the top N predictions is relevant, then it scores a 1, otherwise
0.
Experimental settings:
Attribute # of Train Queries # of Dev Queries # of Test Queries # of Classes
Category 1.5M 12K 12K ~4K
27. Evaluation Results
F1 Score is harmonic mean between precision and
recall
The more parameters in a model, the better the F1
score
28. Takeaway
• Our classifiers achieve precision and recall above 90%, and have an
accuracy of top 5 predictions above 96%
• With the classification pipeline, a new model can be automatically
generated on any attribute within 18 hours
• By using state-of-the-art neural network techniques, in conjunction
with customized Solr components, we have improved our search
relevancy by more than 20%
31. STAY CONNECTED
Twitter @activate_conf
Facebook @activateconf
#Activate19
Log in to wifi, follow Activate on social media,
and download the event app where you can
submit an evaluation after the session
WIFI NETWORK: Activate2019
PASSWORD: Lucidworks
DOWNLOAD THE ACTIVATE 2019 MOBILE APP
Search Activate2019 in the App/Play store
Or visit: http://crowd.cc/activate19