SlideShare une entreprise Scribd logo
1  sur  31
USING DEEP LEARNING
AND
CUSTOMIZED SOLR COMPONENTS
TO IMPROVE SEARCH RELEVANCY
AT
Target
• 1,855 stores in the United States
• 39 distribution centers in the United
States
• 350,000+ team members worldwide
• Online business at target.com
• Global offices in China, Hong Kong and
India
About us
S U N I L S R I N I VA S A N
Lead Engineer
A A S H I S H D AT TA N I
Lead AI Engineer
R I C H A R D WA N G
Principal AI Engineer
Agenda
• Solr at Target
• Architecture Overview
• Solr Components
• Deep Learning
Moved away from proprietary engine
to Solr
Growing index by the day
Highly performant engine
Customized for relevancy
and store availability
5 YEARSO N S O L R
2+
MILLIONS K U S
P95–
Architecture
Querying Solr
Searchable Attributes using eDisMax query parser
• Title - Women's Sling Backpack - Universal Thread
• Category - Women > Women's Accessories > Handbags > Fashion Backpacks
• Item Type - Backpacks
• Description - Keep your essentials close at hand with this Sling Backpack from Universal Thread™.
• Augmented/Normalized data
– feet to ft, quart to qt, in to inch, “ to inch , etc..
Querying Solr
R E C A L L A N D P R E C I S I O N C O N T R O L L E D B Y A C O M B I N AT I O N O F
Category/Attribute classification (bq parameter)
– “student desk” belongs to `desks’ category/sku hierarchy
Filtering based on attributes (fq parameter)
– “student desk” restricts to `desks’ , ‘hutch tops’, ‘kids desk’ categories
Elevate to show list of most popular items (customized component)
– query to popular sku based on ranking signal
Precision component that filters out skus based on a threshold
Solr Components
C U S T O M C O M P O N E N T S
T O I M P R O V E R E L E VA N C Y, W E U S E A C O M B I N AT I O N O F C U S T O M I Z E D P O S T F I L T E R S
A N D C O M P O N E N T S
• Precision Control (post filter)
• Score Combination Function (post filter)
• Custom Elevate (component)
Precision Control
T W O - PA S S P R O B L E M
Filter out documents based on
score distribution
This requires us to do two
passes!
S O L U T I O N
Post-filter API has collect()
and finish() methods
Do first pass in collect() and
second pass in finish()
score
doc rank
40%
Post-filter: Sample code
Combination Function
SKU Attribute
Score 1
SKU Attribute
Score 2
SKU Attribute
Score 3
SKU Attribute
Score N
.
.
.
Combinatio
n Function
Final Doc
Score
Combining scores
D I F F E R E N T S C O R I N G F U N C T I O N S
• Linear weighted combination: w1s1 + w2s2 + … + wNsN
• Polynomial combination: w1s1
n1 + w2s2
n2 + … + wNsN
nN
• Step functions
– Different functions based on score tier
– Each tier optimizes for a different metric
Signal sources
L O O K I N G U P VA L U E S
• Category/Brand/Attribute boost – Reverse index
– e.g. brand:goodfellow^20
• SKU-level query-dependent boost – Reverse index
– e.g. sku:1145367 is top selling SKU for a given query
• SKU-level query-independent boost – Forward index (docValues)
– e.g. sku:1145367 based on newness
Elevate component
D E S C R I P T I O N
• Force certain results to the top of the ranking order
• Takes precedence over other sort profiles (e.g. score)
L I M I TAT I O N S
• Can only read from a static .xml file
• Does not allow for reading ranks from different sources
Multi-sort in Solr
Doc ID Elevate Rank Price Score
d1 90 $10.99 500
d2 60 $3.99 400
d3 90 $7.99 300
d4 100 $12.99 200
d5 80 $10.99 100
Result
Ordering
d4
d3
d1
d5
d2
Custom Elevate
C U S T O M I Z E D F E AT U R E S
• Bury SKUs to the bottom of the result list
• Input elevated values via URL parameters
– e.g. …&elevate=sku:1,sku:2,sku:3&bury=sku:10,sku:11
• Read elevated signals from doc values (forward lookup)
– e.g. store availability etc.
Query Understanding
Objective: To accurately and fully understand user intent (in terms of
product attributes) based on input search query.
Example query: “c9 running shoes for boys”
• Brand: C9 Champion
• Gender: male
• Item type: athletic shoes, sneakers
• Age group: kids, toddler, junior
• Material: polyester, plastic, nylon
We treat this as a classification problem, and we designed a classification
framework that, for each product attribute, can automatically generate a
model to classify any query into that attribute.
Query Classification Overview
First, we gather abundant training data
1. User searches → behavior data (click, add to cart, purchase, etc.)
2. Product attributes (categories, colors, sizes, brands, gender, etc.)
Second, we train machine-learned models (per attribute)
Training data consists of a list of (query, attribute value) pairs:
• For category attribute: (“shoes”, athletic shoes), (“shoes”, sneakers), etc.
During prediction (serving) time
Input: any search query (e.g. “student desk”)
Output: a list of predicted attribute values (e.g. desks, kids desk, hutch tops, etc.), each with
a probability, that are passed to Solr via the bq, fq, and a custom parameter.
Classification Pipeline
Query-Attribute
training data
N-gram
Convolution
Neural Network
Input Query List of predicted
attribute values
Query
Classification
Model
Prediction
Training
Attribute
Extractor
Product
Catalog
Query-Attribute
Aggregator
Search &
Click Data
Preparation
Training Data Preparation
We use (1) Clickstream And (2) Product Attribute data:
(1) Search query → Product SKUs clicked/carted/purchased
– Past 2 years of clickstream data, 1.5M+ unique queries post-filtering
(2) Product SKU → Product attribute values
– Attributes (categories, gender, brands, etc.) are from Target’s item catalog (2M+ SKUs)
Combining (1) and (2) above, we get:
• Search query → list of attribute values, each with a score
• Score of a attribute value V given a query Q is:
≈ 𝑃(𝑉 | 𝑄) =
# 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑉 𝑖𝑠 𝑐𝑙𝑖𝑐𝑘𝑒𝑑,𝑐𝑎𝑟𝑡𝑒𝑑,𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑑 𝑔𝑖𝑣𝑒𝑛 𝑄
𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑄
– For the category attribute & for the query “running shoes”:
athletic shoes (0.5), sneakers (0.2), … sandals (0.01)
Neural Model
Training
Our hyperparameters:
Embedding dimension: d = 100
Region sizes (n-grams): 1, 2, 3, 4, 5
Filters per region: 64
Drop-out rate: 0.2
Max tokens per query: 10
# of output classes: varies depending on attribute
room
essentials
full
size
bedding
sheet
set
Evaluation Metrics
Precision of a query: # of correct predicted attribute values over total # of predictions
for that query from the classifier
• The higher the precision, the more accurate the predictions are.
Recall of a query: # of correct predicted attribute values over total # of attribute values
there are for that query in the test set
• The higher the recall, the more coverage of those attribute values in the test set.
Top-N accuracy:
• For a query, if any of the top N predictions is relevant, then it scores a 1, otherwise
0.
Experimental settings:
Attribute # of Train Queries # of Dev Queries # of Test Queries # of Classes
Category 1.5M 12K 12K ~4K
Evaluation Results
The Jackpot
Region
Evaluation Results
96% of times at least
one correct attribute
value is in the top 5
predictions
Evaluation Results
F1 Score is harmonic mean between precision and
recall
The more parameters in a model, the better the F1
score
Takeaway
• Our classifiers achieve precision and recall above 90%, and have an
accuracy of top 5 predictions above 96%
• With the classification pipeline, a new model can be automatically
generated on any attribute within 18 hours
• By using state-of-the-art neural network techniques, in conjunction
with customized Solr components, we have improved our search
relevancy by more than 20%
Questions ?
THANK YOU
STAY CONNECTED
Twitter @activate_conf
Facebook @activateconf
#Activate19
Log in to wifi, follow Activate on social media,
and download the event app where you can
submit an evaluation after the session
WIFI NETWORK: Activate2019
PASSWORD: Lucidworks
DOWNLOAD THE ACTIVATE 2019 MOBILE APP
Search Activate2019 in the App/Play store
Or visit: http://crowd.cc/activate19

Contenu connexe

Tendances

Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
 
Finding the Bad Actor: Custom scoring & forensic name matching with Elastics...
Finding the Bad Actor: Custom scoring & forensic name matching  with Elastics...Finding the Bad Actor: Custom scoring & forensic name matching  with Elastics...
Finding the Bad Actor: Custom scoring & forensic name matching with Elastics...Charlie Hull
 
From Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryFrom Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryAlessandro Benedetti
 
Graph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesGraph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesRon Barabash
 
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...Databricks
 
Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Sease
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAlessandro Benedetti
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Andrea Gazzarini
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
 
Spark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleSpark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleDatabricks
 

Tendances (13)

Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Finding the Bad Actor: Custom scoring & forensic name matching with Elastics...
Finding the Bad Actor: Custom scoring & forensic name matching  with Elastics...Finding the Bad Actor: Custom scoring & forensic name matching  with Elastics...
Finding the Bad Actor: Custom scoring & forensic name matching with Elastics...
 
From Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryFrom Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank Story
 
Graph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesGraph processing at scale using spark & graph frames
Graph processing at scale using spark & graph frames
 
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
 
Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph Embeddings
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
 
Spark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleSpark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at Scale
 

Similaire à Using Deep Learning and Customized Solr Components to Improve search Relevancy at Target - Aashish Dattani, Richard Wang & Sunil Srinivasan, Target

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Verifying and Validating Requirements
Verifying and Validating RequirementsVerifying and Validating Requirements
Verifying and Validating RequirementsRavikanth-BA
 
Scalable Data Models with Elasticsearch
Scalable Data Models with ElasticsearchScalable Data Models with Elasticsearch
Scalable Data Models with ElasticsearchBeyondTrees
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Lucidworks
 
Software requirement verification & validation
Software requirement verification & validationSoftware requirement verification & validation
Software requirement verification & validationAbdul Basit
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartLucidworks
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1arthi v
 
Measuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonMeasuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonSujit Pal
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Thanh Tran
 
Requirement verification & validation
Requirement verification & validationRequirement verification & validation
Requirement verification & validationAbdul Basit
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
 
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...OpenSource Connections
 
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachSearch Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
 

Similaire à Using Deep Learning and Customized Solr Components to Improve search Relevancy at Target - Aashish Dattani, Richard Wang & Sunil Srinivasan, Target (20)

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Verifying and Validating Requirements
Verifying and Validating RequirementsVerifying and Validating Requirements
Verifying and Validating Requirements
 
Scalable Data Models with Elasticsearch
Scalable Data Models with ElasticsearchScalable Data Models with Elasticsearch
Scalable Data Models with Elasticsearch
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
Software requirement verification & validation
Software requirement verification & validationSoftware requirement verification & validation
Software requirement verification & validation
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
 
Measuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonMeasuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and Python
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
 
Requirement verification & validation
Requirement verification & validationRequirement verification & validation
Requirement verification & validation
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
 
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
 
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachSearch Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
 

Plus de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Plus de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Dernier

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Using Deep Learning and Customized Solr Components to Improve search Relevancy at Target - Aashish Dattani, Richard Wang & Sunil Srinivasan, Target

  • 1.
  • 2. USING DEEP LEARNING AND CUSTOMIZED SOLR COMPONENTS TO IMPROVE SEARCH RELEVANCY AT
  • 3. Target • 1,855 stores in the United States • 39 distribution centers in the United States • 350,000+ team members worldwide • Online business at target.com • Global offices in China, Hong Kong and India
  • 4. About us S U N I L S R I N I VA S A N Lead Engineer A A S H I S H D AT TA N I Lead AI Engineer R I C H A R D WA N G Principal AI Engineer
  • 5. Agenda • Solr at Target • Architecture Overview • Solr Components • Deep Learning
  • 6. Moved away from proprietary engine to Solr Growing index by the day Highly performant engine Customized for relevancy and store availability 5 YEARSO N S O L R 2+ MILLIONS K U S P95–
  • 8. Querying Solr Searchable Attributes using eDisMax query parser • Title - Women's Sling Backpack - Universal Thread • Category - Women > Women's Accessories > Handbags > Fashion Backpacks • Item Type - Backpacks • Description - Keep your essentials close at hand with this Sling Backpack from Universal Thread™. • Augmented/Normalized data – feet to ft, quart to qt, in to inch, “ to inch , etc..
  • 9. Querying Solr R E C A L L A N D P R E C I S I O N C O N T R O L L E D B Y A C O M B I N AT I O N O F Category/Attribute classification (bq parameter) – “student desk” belongs to `desks’ category/sku hierarchy Filtering based on attributes (fq parameter) – “student desk” restricts to `desks’ , ‘hutch tops’, ‘kids desk’ categories Elevate to show list of most popular items (customized component) – query to popular sku based on ranking signal Precision component that filters out skus based on a threshold
  • 10. Solr Components C U S T O M C O M P O N E N T S T O I M P R O V E R E L E VA N C Y, W E U S E A C O M B I N AT I O N O F C U S T O M I Z E D P O S T F I L T E R S A N D C O M P O N E N T S • Precision Control (post filter) • Score Combination Function (post filter) • Custom Elevate (component)
  • 11. Precision Control T W O - PA S S P R O B L E M Filter out documents based on score distribution This requires us to do two passes! S O L U T I O N Post-filter API has collect() and finish() methods Do first pass in collect() and second pass in finish() score doc rank 40%
  • 13. Combination Function SKU Attribute Score 1 SKU Attribute Score 2 SKU Attribute Score 3 SKU Attribute Score N . . . Combinatio n Function Final Doc Score
  • 14. Combining scores D I F F E R E N T S C O R I N G F U N C T I O N S • Linear weighted combination: w1s1 + w2s2 + … + wNsN • Polynomial combination: w1s1 n1 + w2s2 n2 + … + wNsN nN • Step functions – Different functions based on score tier – Each tier optimizes for a different metric
  • 15. Signal sources L O O K I N G U P VA L U E S • Category/Brand/Attribute boost – Reverse index – e.g. brand:goodfellow^20 • SKU-level query-dependent boost – Reverse index – e.g. sku:1145367 is top selling SKU for a given query • SKU-level query-independent boost – Forward index (docValues) – e.g. sku:1145367 based on newness
  • 16. Elevate component D E S C R I P T I O N • Force certain results to the top of the ranking order • Takes precedence over other sort profiles (e.g. score) L I M I TAT I O N S • Can only read from a static .xml file • Does not allow for reading ranks from different sources
  • 17. Multi-sort in Solr Doc ID Elevate Rank Price Score d1 90 $10.99 500 d2 60 $3.99 400 d3 90 $7.99 300 d4 100 $12.99 200 d5 80 $10.99 100 Result Ordering d4 d3 d1 d5 d2
  • 18. Custom Elevate C U S T O M I Z E D F E AT U R E S • Bury SKUs to the bottom of the result list • Input elevated values via URL parameters – e.g. …&elevate=sku:1,sku:2,sku:3&bury=sku:10,sku:11 • Read elevated signals from doc values (forward lookup) – e.g. store availability etc.
  • 19. Query Understanding Objective: To accurately and fully understand user intent (in terms of product attributes) based on input search query. Example query: “c9 running shoes for boys” • Brand: C9 Champion • Gender: male • Item type: athletic shoes, sneakers • Age group: kids, toddler, junior • Material: polyester, plastic, nylon We treat this as a classification problem, and we designed a classification framework that, for each product attribute, can automatically generate a model to classify any query into that attribute.
  • 20. Query Classification Overview First, we gather abundant training data 1. User searches → behavior data (click, add to cart, purchase, etc.) 2. Product attributes (categories, colors, sizes, brands, gender, etc.) Second, we train machine-learned models (per attribute) Training data consists of a list of (query, attribute value) pairs: • For category attribute: (“shoes”, athletic shoes), (“shoes”, sneakers), etc. During prediction (serving) time Input: any search query (e.g. “student desk”) Output: a list of predicted attribute values (e.g. desks, kids desk, hutch tops, etc.), each with a probability, that are passed to Solr via the bq, fq, and a custom parameter.
  • 21. Classification Pipeline Query-Attribute training data N-gram Convolution Neural Network Input Query List of predicted attribute values Query Classification Model Prediction Training Attribute Extractor Product Catalog Query-Attribute Aggregator Search & Click Data Preparation
  • 22. Training Data Preparation We use (1) Clickstream And (2) Product Attribute data: (1) Search query → Product SKUs clicked/carted/purchased – Past 2 years of clickstream data, 1.5M+ unique queries post-filtering (2) Product SKU → Product attribute values – Attributes (categories, gender, brands, etc.) are from Target’s item catalog (2M+ SKUs) Combining (1) and (2) above, we get: • Search query → list of attribute values, each with a score • Score of a attribute value V given a query Q is: ≈ 𝑃(𝑉 | 𝑄) = # 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑉 𝑖𝑠 𝑐𝑙𝑖𝑐𝑘𝑒𝑑,𝑐𝑎𝑟𝑡𝑒𝑑,𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑑 𝑔𝑖𝑣𝑒𝑛 𝑄 𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑄 – For the category attribute & for the query “running shoes”: athletic shoes (0.5), sneakers (0.2), … sandals (0.01)
  • 23. Neural Model Training Our hyperparameters: Embedding dimension: d = 100 Region sizes (n-grams): 1, 2, 3, 4, 5 Filters per region: 64 Drop-out rate: 0.2 Max tokens per query: 10 # of output classes: varies depending on attribute room essentials full size bedding sheet set
  • 24. Evaluation Metrics Precision of a query: # of correct predicted attribute values over total # of predictions for that query from the classifier • The higher the precision, the more accurate the predictions are. Recall of a query: # of correct predicted attribute values over total # of attribute values there are for that query in the test set • The higher the recall, the more coverage of those attribute values in the test set. Top-N accuracy: • For a query, if any of the top N predictions is relevant, then it scores a 1, otherwise 0. Experimental settings: Attribute # of Train Queries # of Dev Queries # of Test Queries # of Classes Category 1.5M 12K 12K ~4K
  • 26. Evaluation Results 96% of times at least one correct attribute value is in the top 5 predictions
  • 27. Evaluation Results F1 Score is harmonic mean between precision and recall The more parameters in a model, the better the F1 score
  • 28. Takeaway • Our classifiers achieve precision and recall above 90%, and have an accuracy of top 5 predictions above 96% • With the classification pipeline, a new model can be automatically generated on any attribute within 18 hours • By using state-of-the-art neural network techniques, in conjunction with customized Solr components, we have improved our search relevancy by more than 20%
  • 31. STAY CONNECTED Twitter @activate_conf Facebook @activateconf #Activate19 Log in to wifi, follow Activate on social media, and download the event app where you can submit an evaluation after the session WIFI NETWORK: Activate2019 PASSWORD: Lucidworks DOWNLOAD THE ACTIVATE 2019 MOBILE APP Search Activate2019 in the App/Play store Or visit: http://crowd.cc/activate19