SlideShare une entreprise Scribd logo
1  sur  53
© 2018 Bloomberg Finance L.P. All rights reserved.
Learning to Rank:
From Theory to Production
Malvina Josephidou & Diego Ceccarelli
Bloomberg
@malvijosephidou | @diegoceccarelli
#Activate18 #ActivateSearch
© 2018 Bloomberg Finance L.P. All rights reserved.
About Us
o Software Engineers at Bloomberg
o Working on relevance of the News search engine
o Before joining, PhDs in ML and IR
© 2018 Bloomberg Finance L.P. All rights reserved.
Bloomberg – Who are we?
o A technology company with
5,000+ software engineers
o Financial data, analytics,
communication and trading tools
o More than 325K subscribers in
170 countries
© 2018 Bloomberg Finance L.P. All rights reserved.
News search in numbers
5
© 2018 Bloomberg Finance L.P. All rights reserved.
How do we retrieve relevant results?
o Use relevance
functions to assign
scores to each
matching document
o Sort documents by
relevance score
6
50.1
35.5
10.2
Relevance
Score
© 2018 Bloomberg Finance L.P. All rights reserved.
How do we design relevance functions?
score = tf
+ 5.2 x tf(title)
+ 4.5 x tf(desc)
© 2018 Bloomberg Finance L.P. All rights reserved.
Good Luck With That…
query = Solr query = Italy query = Facebook query = Trump
score = tf(body)
+ 5.2 x tf(title)
+ 4.5 x tf(desc)
+ ??? x doc-length
+ ??? x freshness
+ ??? x popularity
+ ??? x author
+ ..... ?????
?
© 2018 Bloomberg Finance L.P. All rights reserved.
How do we come up with ranking functions?
o We don’t. Hand-tuning ranking functions is
insane
o ML to the rescue: Use data to train algorithms
to distinguish relevant documents from
irrelevant documents
© 2018 Bloomberg Finance L.P. All rights reserved.
2018 achievement
Learning-to-Rank fully deployed in production
© 2018 Bloomberg Finance L.P. All rights reserved.
Learning-to-Rank
(aka LTR)
Use machine learning algorithms to rank
results in a way that optimizes search
relevance
© 2018 Bloomberg Finance L.P. All rights reserved.
How does this work?
Index
Top-k
retrieval
User
Query
People
Commodities
News
Other Sources
ReRanking
Model
Top-k
reranked
Top-x
retrieval
x >> k
© 2018 Bloomberg Finance L.P. All rights reserved.
How to Ship LTR in Production in 3 Steps
Make it Work
Make it Fast
Deploy to Production
© 2018 Bloomberg Finance L.P. All rights reserved.
LTR steps
I. Collect query-document judgments [Offline]
II. Extract query-document features [Solr]
III. Train model with judgments + features [Offline]
IV. Deploy model [Solr]
V. Apply model [Solr]
VI. Evaluate results [Offline]
© 2018 Bloomberg Finance L.P. All rights reserved.
I. Collect judgements
Judgement
(good/bad)
Judgement
(5 stars)
3/5
5/5
0/5
© 2018 Bloomberg Finance L.P. All rights reserved.
I. Collect judgements
Explicit – judges assess search results manually
o Experts
o Crowdsourced
Implicit – infer assessments through user behavior
o Aggregated result clicks
o Query reformulation
o Dwell time
© 2018 Bloomberg Finance L.P. All rights reserved.
II. Extract Features
Signals that give an indication of a result’s importance
Query matches
the title
Freshness Is it from
bloomberg.com?
Popularity
0 0.7 0 3583
1 0.9 1 625
0 0.1 0 129
© 2018 Bloomberg Finance L.P. All rights reserved.
II. Extract Features
o Define features to extract in
myFeatures.json
o Deploy features definition file
to Solr
curl -XPUT
'http://localhost:8983/solr/myCollection
/schema/feature-store' --data-binary
"@/path/myFeatures.json" -H 'Content-
type:application/json'
[
{
"name": "matchTitle",
"type": "org.apache.solr.ltr.feature. SolrFeature",
"params": {
"q": "{!field f=title}${text}"
}, {
"name": "freshness",
"type": "org.apache.solr.ltr.feature. SolrFeature",
"params": {
"q": "{!func}recip(ms(NOW,timestamp),3.16e-11,1,1)"
},
{ "name": "isFromBloomberg", … },
{ "name": "popularity", … }
]
© 2018 Bloomberg Finance L.P. All rights reserved.
o Add features transformer to Solr config
o Request features for document by adding [features] to the fl
parameter
http://localhost:8983/solr/myCollection/query?q=test&fl=title,url,[features]
II. Extract Features
<!– Document transformer adding feature vectors with each retrieved document ->
<transformer name=“features”
class=“org.apache.solr…LTRFeatureLoggerTransformerFactory” />
© 2018 Bloomberg Finance L.P. All rights reserved.
III. Train Model
o Combine query-document judgments & features into training data
file
o Train ranking model offline
• RankSVM1 [liblinear]
• LambdaMART2 [ranklib]
Example: Linear model
score = 1.2 x matchTitle + 10 x popularity
+ 5.4 x isFromBloomberg - 2 x freshness
1T. Joachims, Optimizing Search Engines Using Clickthrough Data,
Proceedings of the ACM Conference on Knowledge Discovery and Data
Mining (KDD), ACM, 2002.
2C.J.C. Burges, "From RankNet to LambdaRank to LambdaMART: An
Overview", Microsoft Research Technical Report MSR-TR-2010-82, 2010.
© 2018 Bloomberg Finance L.P. All rights reserved.
IV. Deploy Model
o Generate trained output model
in myModelName.json
o Deploy model definition file to
Solr
curl -XPUT
'http://localhost:8983/solr/techproducts/schema/mode
l-store' --data-binary "@/path/myModelName.json" -H
'Content-type:application/json'
© 2018 Bloomberg Finance L.P. All rights reserved.
V. Re-rank Results
o Add LTR query parser to Solr config
o Search and re-rank results
http://localhost:8983/solr/myCollection/query?q=AAPL&
rq={!ltr model="myModelName" reRankDocs=100}
<!– Query parser used to re-rank top docs with a provided model -->
<queryParser name=“ltr” class=“org.apache.solr.ltr.search.LTRQParserPlagin”/>
© 2018 Bloomberg Finance L.P. All rights reserved.
VI. Evaluate quality of search
Precision
how many relevant results I
returned divided by total number
of results returned
Recall
how many relevant results I
returned divided by total number
of relevant results for the query
NDCG (discounted cumulative
gain)
Image credit: https://commons.wikimedia.org/wiki/File:Precisionrecall.svg by User:Walber
© 2018 Bloomberg Finance L.P. All rights reserved.
How to Ship LTR in Production in 3 Steps
Make it Work
Make it Fast
Deploy to Production
© 2018 Bloomberg Finance L.P. All rights reserved.
Are We Good to Go?
o Mid 2017: the infrastructure bit
seemed to be done.
o We had a ‘placeholder’ feature store.
But, we needed useful features
computed in production.
o We deployed a new shiny feature
store…
© 2018 Bloomberg Finance L.P. All rights reserved.
There’s no model without features…
Search latency when we rolled out a
new set of features
© 2018 Bloomberg Finance L.P. All rights reserved.
Why was
it slow? Which
features
were to
blame?
© 2018 Bloomberg Finance L.P. All rights reserved.
Metrics on feature latency
o We instrumented support in Apache Solr to log
the time it took to compute each feature on each
document.
o We added analytics on top of that.
© 2018 Bloomberg Finance L.P. All rights reserved.
Why is it slow?
o 1.9ms is ok for 10 documents, but not for 100 docs…
Feature Latencies using the old set of
features
Total time per search: 19ms
Totalfeaturetimeperdoc
Feature Latencies using the new set of features,
including the FastSlothFeature
Total time per search: 15ms
Totalfeaturetimeperdoc
Total time per search: 145ms
Feature Latencies using the new set of features
SlothFeature
Totalfeaturetimeperoc
© 2018 Bloomberg Finance L.P. All rights reserved.
How do we go faster?
oSome of the features are unrelated to
the query: For example: is the source
reliable?
oWe can precompute them and store
them in the index.
© 2018 Bloomberg Finance L.P. All rights reserved.
Index Static Features
• Add feature_is_wire_BLAH to the Solr schema
<field name="feature_is_wire_BLAH" type="tdouble" indexed="false" stored="true"
docValues=”false" required="false”/>
• Use UpdateRequestProcessors in Solr config to create new fields
<updateRequestProcessorChain
<processor
class="com.internal.solr.update.processor.IsFoundUpdateProcessorFactory">
<str name="source">wire</str>
<str name="dest">feature_is_wire_BLAH</str>
<str name="map">
{
”BLAH" : 1.0
}
</str>
<double name="default">0.0</double>
</processor>
</updateRequestProcessorChain>
© 2018 Bloomberg Finance L.P. All rights reserved.
Index Static Features
• Features at runtime are produced by reading the value from the
index using the FieldValueFeature.
Would this ease all our performance troubles?
"features":[
{
"type":"ltr.feature.impl.FieldValueFeature",
"params": {
"field": "feature_is_wire_BLAH"
},
"name": “is_wire_BLAH",
"default": 0.0
},
© 2018 Bloomberg Finance L.P. All rights reserved.
Better?
© 2018 Bloomberg Finance L.P. All rights reserved.
Why is this
happening to
us?
© 2018 Bloomberg Finance L.P. All rights reserved.
Reading from the index can still be slow…
o Retrieving field values from their stored values is
slow
o So we changed the implementation of
FieldValueFeature to use DocValues if they are
available
o DocValues record field values in a column-
oriented way, mapping doc ids to values
© 2018 Bloomberg Finance L.P. All rights reserved.
From Stored Fields to Doc Values
Feature latencies when retrieving 100 docs using
DocValues
Feature latencies when retrieving 100 docs using
StoredFields
Total time per search: 135ms Total time per search: 25ms
: 5x faster!
© 2018 Bloomberg Finance L.P. All rights reserved.
Are we there yet?
Feature logging/computation
Document re-ranking
© 2018 Bloomberg Finance L.P. All rights reserved.
Evaluating performance: NO-OP Model
o A linear model, performing all the computations but not
modifying the original order
Original
Solr Score
Query
matches
the title
Freshness Is the
document
from
Bloomberg
.com?
Popularity
2.3 0 0.7 0 3583
2.1 1 0.9 1 625
1.3 0 0.1 0 129
No-op
1.0
0.0
0.0
0.0
0.0
X =
Final Score
2.3
2.1
1.3
© 2018 Bloomberg Finance L.P. All rights reserved.
Need for speed
No LTR, retrieve 3
docs
(ms)
LTR retrieve 3
docs, re-rank 25
(ms)
Median search time 39 77
Why is it so slow???
© 2018 Bloomberg Finance L.P. All rights reserved.
News needs grouping
o Similar news stories are grouped (clustered) together and only the
top-ranked story in each group is shown
© 2018 Bloomberg Finance L.P. All rights reserved.
Grouping + Re-ranking:
Not a match made in heaven
oRegular grouping involves 3
communication rounds between
coordinator and shards
o With re-ranking, we have to re-
rank the groups and the
documents in each group
[ SOLR-8776 Support RankQuery in grouping ]
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Vegas baby
© 2018 Bloomberg Finance L.P. All rights reserved.
What is the Las Vegas Patch?
© 2018 Bloomberg Finance L.P. All rights reserved.
Grouping
Three requests from the
coordinator:
1. Coordinator asks for top n
groups for the query and
computes top n groups.
2. Each shard compute top m
documents for the top n groups
3. Coordinator retrieves top docs
for each group and retrieve them
from the shards
© 2018 Bloomberg Finance L.P. All rights reserved.
Why? Example:
o Top 2 groups, top 2 documents, 2 shards
Doc Group Score
doc6 group3 70.0
doc7 group2 65.0
doc8 group3 60.0
doc9 group2 60.0
doc10 group1 50.0
Doc Group Score
doc1 group1 20.0
doc2 group2 5.0
doc3 group1 100.0
doc4 group1 120.0
doc5 group2 10.0
Group Docs Score
group1 doc4 120.0
doc1 20.0
group2 doc5 10.0
doc2 5.0
Group Docs Score
group3 doc6 70.0
doc8 60.0
group2 doc7 65.0
doc9 60.0
Machine1Machine2 Top Groups:
Group1: doc4 doc1
Group3: doc6 doc8
Top docs should be:
Group1: doc4 doc10
Group3: doc6 doc8
© 2018 Bloomberg Finance L.P. All rights reserved.
Las Vegas Idea
o If you want just one document per group,
you do not have this problem
o We can return the top document from each
group in the first step and skip the second
step entirely
o For LTR: Re-rank only the top document of
each group
© 2018 Bloomberg Finance L.P. All rights reserved.
Show me the numbers
o We made plain old search faster: by
about 40%!
o LTR-served searches are still faster
than they were before we did the Las
Vegas optimization
Method Median time Perc95 time
Normal Grouping (No LTR) 0.20 0.35
Las Vegas (No LTR) 0.12 0.26
Las Vegas+LTR no-op 0.18 0.27
© 2018 Bloomberg Finance L.P. All rights reserved.
How to Ship LTR in Production in 3 Steps
Make it Work
Make it Fast
Deploy to Production
© 2018 Bloomberg Finance L.P. All rights reserved.
Where is the model?
Let the LTR hackathon start…
o Write code to process training data
in svmlite format.
o Wrappers around scikit-learn to
train various linear models, do
regularization, hyperparameter
optimization and model debugging
o Evaluate the model: MAP, NDCG,
MRR
© 2018 Bloomberg Finance L.P. All rights reserved.
A Small Model for LTR, a Giant Step for
Bloomberg
o Released initially to select
internal users
o Then, to all internal users
o Then, to 10% of our clients
o And finally, to 100% of our
clients
© 2018 Bloomberg Finance L.P. All rights reserved.
It’s a whole new (LTR) world!
o Trying out new features, new models
and classes of models
o Experimenting with different types of
training data
o Rolling out new rankers, for different
types of queries
© 2018 Bloomberg Finance L.P. All rights reserved.
Take home messages
o Make sure you can measure success
and failure – metrics, metrics, metrics!
o If a feature is static index it
o Don’t use stored values for static
features, always rely on DocValues
o If you are not happy with the
performance of search, consider a trip
to Las Vegas
you may end up improving
performance by 40% 
Thank you! Eυχαριστούμε! Grazie!
And btw: we are hiring a senior search relevance engineer!
bit.ly/2Orb8bc
https://www.bloomberg.com/careers
Malvina Josephidou & Diego Ceccarelli
Bloomberg
@malvijosephidou |@diegoceccarelli
#Activate18 #ActivateSearch

Contenu connexe

Tendances

Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSujit Pal
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive HookMinwoo Kim
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Databricks
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionEugene Yan Ziyou
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowDatabricks
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiTimothy Spann
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéDatabricks
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowFernando Ortega Gallego
 
GPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a ServiceGPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a ServicePivotalOpenSourceHub
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Neo4j
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowDatabricks
 

Tendances (20)

Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversion
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFi
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 
GPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a ServiceGPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a Service
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflow
 

Similaire à Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Ceccarelli, Bloomberg

ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...Amazon Web Services
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Amazon Web Services
 
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...Amazon Web Services
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streamsconfluent
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Amazon Web Services
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksAmazon Web Services
 
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...Amazon Web Services
 
Analyze your application portfolio to know where the quality and risk issues ...
Analyze your application portfolio to know where the quality and risk issues ...Analyze your application portfolio to know where the quality and risk issues ...
Analyze your application portfolio to know where the quality and risk issues ...Sogeti Nederland B.V.
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Amazon Web Services
 
Talentica - JS Meetup - Angular Schematics
Talentica - JS Meetup - Angular SchematicsTalentica - JS Meetup - Angular Schematics
Talentica - JS Meetup - Angular SchematicsKrishnan Mudaliar
 
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...Jonathan Dion
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinNick Pentreath
 
TenYearsCPOptimizer
TenYearsCPOptimizerTenYearsCPOptimizer
TenYearsCPOptimizerPaulShawIBM
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopJulien SIMON
 
Serverless patterns
Serverless patternsServerless patterns
Serverless patternsJesse Butler
 
Agile-plus-DevOps Testing for Packaged Applications
Agile-plus-DevOps Testing for Packaged ApplicationsAgile-plus-DevOps Testing for Packaged Applications
Agile-plus-DevOps Testing for Packaged ApplicationsWorksoft
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Amazon Web Services
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Romit Mehta
 

Similaire à Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Ceccarelli, Bloomberg (20)

ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
 
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streams
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
vinay-mittal-new
vinay-mittal-newvinay-mittal-new
vinay-mittal-new
 
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
 
Analyze your application portfolio to know where the quality and risk issues ...
Analyze your application portfolio to know where the quality and risk issues ...Analyze your application portfolio to know where the quality and risk issues ...
Analyze your application portfolio to know where the quality and risk issues ...
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Talentica - JS Meetup - Angular Schematics
Talentica - JS Meetup - Angular SchematicsTalentica - JS Meetup - Angular Schematics
Talentica - JS Meetup - Angular Schematics
 
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same Coin
 
TenYearsCPOptimizer
TenYearsCPOptimizerTenYearsCPOptimizer
TenYearsCPOptimizer
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
 
Serverless patterns
Serverless patternsServerless patterns
Serverless patterns
 
Agile-plus-DevOps Testing for Packaged Applications
Agile-plus-DevOps Testing for Packaged ApplicationsAgile-plus-DevOps Testing for Packaged Applications
Agile-plus-DevOps Testing for Packaged Applications
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
 

Plus de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Plus de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Dernier

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Ceccarelli, Bloomberg

  • 1. © 2018 Bloomberg Finance L.P. All rights reserved. Learning to Rank: From Theory to Production Malvina Josephidou & Diego Ceccarelli Bloomberg @malvijosephidou | @diegoceccarelli #Activate18 #ActivateSearch
  • 2. © 2018 Bloomberg Finance L.P. All rights reserved. About Us o Software Engineers at Bloomberg o Working on relevance of the News search engine o Before joining, PhDs in ML and IR
  • 3. © 2018 Bloomberg Finance L.P. All rights reserved. Bloomberg – Who are we? o A technology company with 5,000+ software engineers o Financial data, analytics, communication and trading tools o More than 325K subscribers in 170 countries
  • 4. © 2018 Bloomberg Finance L.P. All rights reserved. News search in numbers 5
  • 5. © 2018 Bloomberg Finance L.P. All rights reserved. How do we retrieve relevant results? o Use relevance functions to assign scores to each matching document o Sort documents by relevance score 6 50.1 35.5 10.2 Relevance Score
  • 6. © 2018 Bloomberg Finance L.P. All rights reserved. How do we design relevance functions? score = tf + 5.2 x tf(title) + 4.5 x tf(desc)
  • 7. © 2018 Bloomberg Finance L.P. All rights reserved. Good Luck With That… query = Solr query = Italy query = Facebook query = Trump score = tf(body) + 5.2 x tf(title) + 4.5 x tf(desc) + ??? x doc-length + ??? x freshness + ??? x popularity + ??? x author + ..... ????? ?
  • 8. © 2018 Bloomberg Finance L.P. All rights reserved. How do we come up with ranking functions? o We don’t. Hand-tuning ranking functions is insane o ML to the rescue: Use data to train algorithms to distinguish relevant documents from irrelevant documents
  • 9. © 2018 Bloomberg Finance L.P. All rights reserved. 2018 achievement Learning-to-Rank fully deployed in production
  • 10. © 2018 Bloomberg Finance L.P. All rights reserved. Learning-to-Rank (aka LTR) Use machine learning algorithms to rank results in a way that optimizes search relevance
  • 11. © 2018 Bloomberg Finance L.P. All rights reserved. How does this work? Index Top-k retrieval User Query People Commodities News Other Sources ReRanking Model Top-k reranked Top-x retrieval x >> k
  • 12. © 2018 Bloomberg Finance L.P. All rights reserved. How to Ship LTR in Production in 3 Steps Make it Work Make it Fast Deploy to Production
  • 13. © 2018 Bloomberg Finance L.P. All rights reserved. LTR steps I. Collect query-document judgments [Offline] II. Extract query-document features [Solr] III. Train model with judgments + features [Offline] IV. Deploy model [Solr] V. Apply model [Solr] VI. Evaluate results [Offline]
  • 14. © 2018 Bloomberg Finance L.P. All rights reserved. I. Collect judgements Judgement (good/bad) Judgement (5 stars) 3/5 5/5 0/5
  • 15. © 2018 Bloomberg Finance L.P. All rights reserved. I. Collect judgements Explicit – judges assess search results manually o Experts o Crowdsourced Implicit – infer assessments through user behavior o Aggregated result clicks o Query reformulation o Dwell time
  • 16. © 2018 Bloomberg Finance L.P. All rights reserved. II. Extract Features Signals that give an indication of a result’s importance Query matches the title Freshness Is it from bloomberg.com? Popularity 0 0.7 0 3583 1 0.9 1 625 0 0.1 0 129
  • 17. © 2018 Bloomberg Finance L.P. All rights reserved. II. Extract Features o Define features to extract in myFeatures.json o Deploy features definition file to Solr curl -XPUT 'http://localhost:8983/solr/myCollection /schema/feature-store' --data-binary "@/path/myFeatures.json" -H 'Content- type:application/json' [ { "name": "matchTitle", "type": "org.apache.solr.ltr.feature. SolrFeature", "params": { "q": "{!field f=title}${text}" }, { "name": "freshness", "type": "org.apache.solr.ltr.feature. SolrFeature", "params": { "q": "{!func}recip(ms(NOW,timestamp),3.16e-11,1,1)" }, { "name": "isFromBloomberg", … }, { "name": "popularity", … } ]
  • 18. © 2018 Bloomberg Finance L.P. All rights reserved. o Add features transformer to Solr config o Request features for document by adding [features] to the fl parameter http://localhost:8983/solr/myCollection/query?q=test&fl=title,url,[features] II. Extract Features <!– Document transformer adding feature vectors with each retrieved document -> <transformer name=“features” class=“org.apache.solr…LTRFeatureLoggerTransformerFactory” />
  • 19. © 2018 Bloomberg Finance L.P. All rights reserved. III. Train Model o Combine query-document judgments & features into training data file o Train ranking model offline • RankSVM1 [liblinear] • LambdaMART2 [ranklib] Example: Linear model score = 1.2 x matchTitle + 10 x popularity + 5.4 x isFromBloomberg - 2 x freshness 1T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002. 2C.J.C. Burges, "From RankNet to LambdaRank to LambdaMART: An Overview", Microsoft Research Technical Report MSR-TR-2010-82, 2010.
  • 20. © 2018 Bloomberg Finance L.P. All rights reserved. IV. Deploy Model o Generate trained output model in myModelName.json o Deploy model definition file to Solr curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mode l-store' --data-binary "@/path/myModelName.json" -H 'Content-type:application/json'
  • 21. © 2018 Bloomberg Finance L.P. All rights reserved. V. Re-rank Results o Add LTR query parser to Solr config o Search and re-rank results http://localhost:8983/solr/myCollection/query?q=AAPL& rq={!ltr model="myModelName" reRankDocs=100} <!– Query parser used to re-rank top docs with a provided model --> <queryParser name=“ltr” class=“org.apache.solr.ltr.search.LTRQParserPlagin”/>
  • 22. © 2018 Bloomberg Finance L.P. All rights reserved. VI. Evaluate quality of search Precision how many relevant results I returned divided by total number of results returned Recall how many relevant results I returned divided by total number of relevant results for the query NDCG (discounted cumulative gain) Image credit: https://commons.wikimedia.org/wiki/File:Precisionrecall.svg by User:Walber
  • 23. © 2018 Bloomberg Finance L.P. All rights reserved. How to Ship LTR in Production in 3 Steps Make it Work Make it Fast Deploy to Production
  • 24. © 2018 Bloomberg Finance L.P. All rights reserved. Are We Good to Go? o Mid 2017: the infrastructure bit seemed to be done. o We had a ‘placeholder’ feature store. But, we needed useful features computed in production. o We deployed a new shiny feature store…
  • 25. © 2018 Bloomberg Finance L.P. All rights reserved. There’s no model without features… Search latency when we rolled out a new set of features
  • 26. © 2018 Bloomberg Finance L.P. All rights reserved. Why was it slow? Which features were to blame?
  • 27. © 2018 Bloomberg Finance L.P. All rights reserved. Metrics on feature latency o We instrumented support in Apache Solr to log the time it took to compute each feature on each document. o We added analytics on top of that.
  • 28. © 2018 Bloomberg Finance L.P. All rights reserved. Why is it slow? o 1.9ms is ok for 10 documents, but not for 100 docs… Feature Latencies using the old set of features Total time per search: 19ms Totalfeaturetimeperdoc Feature Latencies using the new set of features, including the FastSlothFeature Total time per search: 15ms Totalfeaturetimeperdoc Total time per search: 145ms Feature Latencies using the new set of features SlothFeature Totalfeaturetimeperoc
  • 29. © 2018 Bloomberg Finance L.P. All rights reserved. How do we go faster? oSome of the features are unrelated to the query: For example: is the source reliable? oWe can precompute them and store them in the index.
  • 30. © 2018 Bloomberg Finance L.P. All rights reserved. Index Static Features • Add feature_is_wire_BLAH to the Solr schema <field name="feature_is_wire_BLAH" type="tdouble" indexed="false" stored="true" docValues=”false" required="false”/> • Use UpdateRequestProcessors in Solr config to create new fields <updateRequestProcessorChain <processor class="com.internal.solr.update.processor.IsFoundUpdateProcessorFactory"> <str name="source">wire</str> <str name="dest">feature_is_wire_BLAH</str> <str name="map"> { ”BLAH" : 1.0 } </str> <double name="default">0.0</double> </processor> </updateRequestProcessorChain>
  • 31. © 2018 Bloomberg Finance L.P. All rights reserved. Index Static Features • Features at runtime are produced by reading the value from the index using the FieldValueFeature. Would this ease all our performance troubles? "features":[ { "type":"ltr.feature.impl.FieldValueFeature", "params": { "field": "feature_is_wire_BLAH" }, "name": “is_wire_BLAH", "default": 0.0 },
  • 32. © 2018 Bloomberg Finance L.P. All rights reserved. Better?
  • 33. © 2018 Bloomberg Finance L.P. All rights reserved. Why is this happening to us?
  • 34. © 2018 Bloomberg Finance L.P. All rights reserved. Reading from the index can still be slow… o Retrieving field values from their stored values is slow o So we changed the implementation of FieldValueFeature to use DocValues if they are available o DocValues record field values in a column- oriented way, mapping doc ids to values
  • 35. © 2018 Bloomberg Finance L.P. All rights reserved. From Stored Fields to Doc Values Feature latencies when retrieving 100 docs using DocValues Feature latencies when retrieving 100 docs using StoredFields Total time per search: 135ms Total time per search: 25ms : 5x faster!
  • 36. © 2018 Bloomberg Finance L.P. All rights reserved. Are we there yet? Feature logging/computation Document re-ranking
  • 37. © 2018 Bloomberg Finance L.P. All rights reserved. Evaluating performance: NO-OP Model o A linear model, performing all the computations but not modifying the original order Original Solr Score Query matches the title Freshness Is the document from Bloomberg .com? Popularity 2.3 0 0.7 0 3583 2.1 1 0.9 1 625 1.3 0 0.1 0 129 No-op 1.0 0.0 0.0 0.0 0.0 X = Final Score 2.3 2.1 1.3
  • 38. © 2018 Bloomberg Finance L.P. All rights reserved. Need for speed No LTR, retrieve 3 docs (ms) LTR retrieve 3 docs, re-rank 25 (ms) Median search time 39 77 Why is it so slow???
  • 39. © 2018 Bloomberg Finance L.P. All rights reserved. News needs grouping o Similar news stories are grouped (clustered) together and only the top-ranked story in each group is shown
  • 40. © 2018 Bloomberg Finance L.P. All rights reserved. Grouping + Re-ranking: Not a match made in heaven oRegular grouping involves 3 communication rounds between coordinator and shards o With re-ranking, we have to re- rank the groups and the documents in each group [ SOLR-8776 Support RankQuery in grouping ]
  • 41. © 2018 Bloomberg Finance L.P. All rights reserved.
  • 42. © 2018 Bloomberg Finance L.P. All rights reserved. Vegas baby
  • 43. © 2018 Bloomberg Finance L.P. All rights reserved. What is the Las Vegas Patch?
  • 44. © 2018 Bloomberg Finance L.P. All rights reserved. Grouping Three requests from the coordinator: 1. Coordinator asks for top n groups for the query and computes top n groups. 2. Each shard compute top m documents for the top n groups 3. Coordinator retrieves top docs for each group and retrieve them from the shards
  • 45. © 2018 Bloomberg Finance L.P. All rights reserved. Why? Example: o Top 2 groups, top 2 documents, 2 shards Doc Group Score doc6 group3 70.0 doc7 group2 65.0 doc8 group3 60.0 doc9 group2 60.0 doc10 group1 50.0 Doc Group Score doc1 group1 20.0 doc2 group2 5.0 doc3 group1 100.0 doc4 group1 120.0 doc5 group2 10.0 Group Docs Score group1 doc4 120.0 doc1 20.0 group2 doc5 10.0 doc2 5.0 Group Docs Score group3 doc6 70.0 doc8 60.0 group2 doc7 65.0 doc9 60.0 Machine1Machine2 Top Groups: Group1: doc4 doc1 Group3: doc6 doc8 Top docs should be: Group1: doc4 doc10 Group3: doc6 doc8
  • 46. © 2018 Bloomberg Finance L.P. All rights reserved. Las Vegas Idea o If you want just one document per group, you do not have this problem o We can return the top document from each group in the first step and skip the second step entirely o For LTR: Re-rank only the top document of each group
  • 47. © 2018 Bloomberg Finance L.P. All rights reserved. Show me the numbers o We made plain old search faster: by about 40%! o LTR-served searches are still faster than they were before we did the Las Vegas optimization Method Median time Perc95 time Normal Grouping (No LTR) 0.20 0.35 Las Vegas (No LTR) 0.12 0.26 Las Vegas+LTR no-op 0.18 0.27
  • 48. © 2018 Bloomberg Finance L.P. All rights reserved. How to Ship LTR in Production in 3 Steps Make it Work Make it Fast Deploy to Production
  • 49. © 2018 Bloomberg Finance L.P. All rights reserved. Where is the model? Let the LTR hackathon start… o Write code to process training data in svmlite format. o Wrappers around scikit-learn to train various linear models, do regularization, hyperparameter optimization and model debugging o Evaluate the model: MAP, NDCG, MRR
  • 50. © 2018 Bloomberg Finance L.P. All rights reserved. A Small Model for LTR, a Giant Step for Bloomberg o Released initially to select internal users o Then, to all internal users o Then, to 10% of our clients o And finally, to 100% of our clients
  • 51. © 2018 Bloomberg Finance L.P. All rights reserved. It’s a whole new (LTR) world! o Trying out new features, new models and classes of models o Experimenting with different types of training data o Rolling out new rankers, for different types of queries
  • 52. © 2018 Bloomberg Finance L.P. All rights reserved. Take home messages o Make sure you can measure success and failure – metrics, metrics, metrics! o If a feature is static index it o Don’t use stored values for static features, always rely on DocValues o If you are not happy with the performance of search, consider a trip to Las Vegas you may end up improving performance by 40% 
  • 53. Thank you! Eυχαριστούμε! Grazie! And btw: we are hiring a senior search relevance engineer! bit.ly/2Orb8bc https://www.bloomberg.com/careers Malvina Josephidou & Diego Ceccarelli Bloomberg @malvijosephidou |@diegoceccarelli #Activate18 #ActivateSearch

Notes de l'éditeur

  1. Malvina starts here
  2. Diego 16 slides
  3. Malvina starts here
  4. https://pxhere.com/en/photo/1143377 https://creativecommons.org/publicdomain/zero/1.0/ You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
  5. https://upload.wikimedia.org/wikipedia/commons/a/a7/Cute_Sloth.jpg the sloth is under CC commons, can be reused https://commons.wikimedia.org/wiki/File:Cute_Sloth.jpg You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission
  6. Every update request in Solr is passed through a chain of events defined in the solr config under the updaterequestprocessorchain. Updaterequestprocessors allow us to create new fields from existing ones or change existing fields at index time. We use updaterequestprocessors to compute features at index time and write those values under new feature fields which are kept in the index. In this snippet of code we add in the updaterequestprocessorchain a custom made updaterequestprocessor that looks at the wire field and if it finds the string it will create this new feature field with value 1 else it will create it it with the default value of 0. And we can define other such updaterequestprocessors with our feature computation logic, for example a feature might involve counting terms in a multivaluedfield and we can easily do that too.
  7. https://www.pexels.com/photo/baby-child-close-up-crying-47090/ CC0 License ✓ Free for personal and commercial use ✓ No attribution required
  8. Picture from https://pxhere.com/en/photo/1143377 Licence is  CC0 Public Domain Free for personal and commercial use No attribution required Learn more
  9. /* Change FieldValueFeature to use docvalues to fetch a feature value when the field has docValues Docvalues are faster for this particular case because they build a forward index from documents to values. They
  10. Public domain license: https://www.goodfreephotos.com/vector-images/you-shall-not-pass-sign-with-gandalf-vector-clipart.png.php  the Work may be freely reproduced, distributed, transmitted, used, modified, built upon, or otherwise exploited by anyone for any purpose, commercial or non-commercial, and in any way, including by methods that have not yet been invented or conceived.
  11. https://www.pexels.com/photo/attraction-building-city-hotel-415999/ CCO license You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission
  12. In cases where we do grouping and ask for group.limit=1 only it is possible to skip the second grouping step. In our test datasets it improved speed by around 40%. Essentially, in the first grouping step each shard returns the top K groups based on the highest scoring document in each group. The top K groups from each shard are merged in the federator and in the second step we ask all the shards to return the top documents from each of the top ranking groups. If we only want to return the highest scoring document per group we can return the top document id in the first step, merge results in the federator to retain the top K groups and then skip the second grouping step entirely. This is possible provided that: We do not need to know the total number of matching documents per group b) Within group sort and between group sort is the same.  The LTR optimization is to then to compute the LTR score on the top-ranking document per group only (rather than all members of a group)
  13. instead of applying the model to each document in each group and each shard, apply LTR on the top document per group where ‘top’ is determined by the Solr score. So documents in grouped are ranked using Solr score, groups are ranked using LTR.
  14. Malvina starts here
  15. We were ready to roll out the model. https://www.pexels.com/photo/view-ape-thinking-primate-33535/ CC0 license What we realized at this point was that there was no model. There was something called ltr1 we had developed using features computed offline but we realized that model was broken. We had product on our backs, our code worked and was fast enough, but we didn’t actually have anything to roll out and we also hadn't written a single line of code related to training models using features logged in production.
  16. https://www.pexels.com/photo/drink-beer-cheers-toast-8859/ CC0
  17. You can’t optimize what you can’t measure