SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
2016
OCTOBER 11-14

BOSTON, MA
http://lucenerevolution.com
Search and
Recommenders
Grant Ingersoll
@gsingers
CTO, Lucidworks
Jake Mannix
@pbrane
Lead Data Engineer, Lucidworks
• Vision, motivations and definitions
• Use cases for ecommerce, compliance, fraud and customer support
• Fusion and the evolution of recommenders
• Demo
• Future Directions
Agenda
Search-Driven
Everything
Customer
Service
Customer
Insights
Fraud Surveillance
Research
Portal
Online Retail
Digital
Content
• Many companies treat search, recommendations/discovery and analytics as different
beasts, yet:
• The same inputs that make search better can also drive recommendations and better
analytics
• Engagement analytics is the key:
• Your users give you engagement signals regarding the content that is relevant to them
• Over time, patterns emerge in similarities of behavior (simplest possible pattern is just
“popularity”)
• These signals are often the biggest factor in both search relevance AND
recommendations
• In the enterprise, this is still the case, but the types of signals are often different (email,
IM)
Three Sides of the Same Coin
• Content — documents which are textually similar are often good as “similar items” to be
recommended
• Collaborative — documents which have been engaged with by the same people (and/or in the
same search context) are also similar in a more subtle, but often more powerful way
• Multi-Modal — why choose one? Try a smooth interpolation between using a content-based
similarity metric, and an engagement based one!
Defining Moments
Search-Driven Online Retail
 Increase conversions with a
personalized shopping experience with
best in class reliability and
performance.
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Data Processing
Smart Access API
Search-Driven Compliance and Surveillance
Detect and investigate activity for
regulatory compliance, from one
unified view.
DATABASE
ACCURATE REAL-TIME
INFORMATION
CONTEXTUALLY-
ENRICHED
INFORMATION
MESSAGESLOGS
DATA EXPLORATION
AND VISUALIZATION
Data Acquisition
Indexing & Streaming
Smart Access API
Search-Driven Customer Service
Resolve customer issues quickly with
immediate access to relevant answers.
CUSTOMER 

SELF-SERVICE
KNOWLEDGE BASE
PROACTIVE ALERTS AND
RECOMMENDATIONS
EXPERT TUNED
RELEVANCY DRIVEN BY
ANALYTICS AND INSIGHTS
CRM SUPPORT TICKETS &
ISSUE TRACKING
Data Acquisition
Data Processing
Smart Access API
Fusion and Recommenders
Lucidworks Fusion Is Search-Driven Everything
•Drive next generation relevance
via Content, Collaboration and
Context
•Harness best in class Open
Source: Apache Solr + Spark
•Simplify application
development and reduce
ongoing maintenance
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations &

Alerts
Analytics & InsightsExtreme Relevancy
Access data from
anywhere to build
intelligent, data-
driven applications.
Fusion Architecture
RESTAPI
Worker Worker Cluster Mgr.
Apache Spark
Shards Shards
Apache Solr
HDFS(Optional)
Shared Config
Mgmt
Leader
Election
Load
Balancing
ZK 1
Apache Zookeeper
ZK N
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
Pipelines
Blob Storage
Scheduling
Recommenders/Signals
…
Core Services
Admin UI
SECURITY BUILT-IN
Lucidworks View
• Fusion
• Recommenders API
• Machine Learning pipeline stages
• Scheduling
• Solr:
• More Like This + Signals
• Spark:
• MLlib, Mahout, custom
Key Platform Tech
• Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and:
• Extracts nontrivial terms from specified fields in it
• Builds an “OR” query to search for closest matches (like a cosine similarity computation)
• Has many knobs to tune regarding “data-cleaning” non-useful terms from the query
• TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V
Content-focused
{!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>
“People who bought X also bought Y” / “Movies recommended for you”
Collaborative Filtering
Search User/
Item Index
Top K users
who’ve
interacted with
this Item
Search and
Rollup on User/
Item Index
Top Y docs
Current Doc
Filter by
context
Profit
User/Item Index
Offline Tasks
User/Item Signals
Math!
• Fusion CF-based “documents like this” pipeline stages:
• Sub-query: search aggregated signals index for current doc_id,
extracting the top-K pairs of (user_id, weight)
• Sub-query: search that table again with a weighted OR query:
(user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … )
• Roll-up: topN(sum(score_i * weight_i))
• Sub-query: fetch the documents from primary Solr index of
these top N doc_ids
Collaborative Filtering: step by step in Fusion
• Both content-based and CF recommenders use features of the documents to generate a
similarity metric
• Content uses the tokens in the document
• CF uses user ids who have engaged with it
• Metrics can be weighted-summed, allowing a “slider” between the two
• Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a
(doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix
• There is a cost to such techniques: harder to maintain, harder to A/B test variations
Multi-modal
• Basics:
• 26 Apache Projects registered so far plus LW web properties
• 93 datasources* including email, Github, JIRA*, Website and Wiki
• Fusion 2.4
• Signals everywhere
• UI based on Lucidworks View
• ASF Mail archives mirrored at: http://asfmail.lucidworks.io
Demo
http://searchhub.lucidworks.com
Implementation Details
http://github.com/lucidworks/searchhub
Branch: GH-28-doc-view
Key Source Code
UI
Angular Directives:
perdocument
recommendations
Offline Tasks
Spark Jobs:
mail_thread_signal_creation_job.json
SimpleTwoHopRecommender.scala
Fusion Pipelines
Query:
lucidfind-recommendations
cf-similar-items-batch-rec
cf-similar-items-rec
• Ensemble and Click-based approaches
• https://github.com/lucidworks/searchhub/issues/40
• https://github.com/lucidworks/searchhub/issues/28
• https://github.com/lucidworks/searchhub/issues/22
• Deploy live
• User registrations
• https://github.com/lucidworks/searchhub/issues/30
Future Work
Resources
Fusion: http://www.lucidworks.com/products/fusion
Search Hub: http://searchhub.lucidworks.com
Company: http://www.lucidworks.com
Our blog: http://www.lucidworks.com/blog
Twitter: @gsingers, @pbrane

Contenu connexe

Tendances

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 

Tendances (18)

Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
 
Webinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionWebinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with Fusion
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Whitepaper- Real World Search
Whitepaper-  Real World SearchWhitepaper-  Real World Search
Whitepaper- Real World Search
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Webinar: Solr & Fusion for Big Data
Webinar: Solr & Fusion for Big DataWebinar: Solr & Fusion for Big Data
Webinar: Solr & Fusion for Big Data
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 

En vedette

En vedette (20)

Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business Intelligence
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 
it's just search
it's just searchit's just search
it's just search
 
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache Solr
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
 
SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIs
 
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
 
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache Solr
 
Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIs
 
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxCoffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 

Similaire à Webinar: Search and Recommenders

Similaire à Webinar: Search and Recommenders (20)

South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
NOW! Get the internet to work for you!
NOW! Get the internet to work for you!NOW! Get the internet to work for you!
NOW! Get the internet to work for you!
 
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaGraphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
 
What IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherWhat IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each Other
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
 
Optimising Your Content for Findability
Optimising Your Content for FindabilityOptimising Your Content for Findability
Optimising Your Content for Findability
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Search Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval ExperiencesSearch Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval Experiences
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findability
 
Webinar: Building Customer-Targeted Search with Fusion
Webinar: Building Customer-Targeted Search with FusionWebinar: Building Customer-Targeted Search with Fusion
Webinar: Building Customer-Targeted Search with Fusion
 
Solving Real World Challenges with Enterprise Search
Solving Real World Challenges with Enterprise SearchSolving Real World Challenges with Enterprise Search
Solving Real World Challenges with Enterprise Search
 
Productionalize content recommendation engine
Productionalize content recommendation engine Productionalize content recommendation engine
Productionalize content recommendation engine
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
 

Plus de Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

Plus de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Webinar: Search and Recommenders

  • 1.
  • 3. Search and Recommenders Grant Ingersoll @gsingers CTO, Lucidworks Jake Mannix @pbrane Lead Data Engineer, Lucidworks
  • 4. • Vision, motivations and definitions • Use cases for ecommerce, compliance, fraud and customer support • Fusion and the evolution of recommenders • Demo • Future Directions Agenda
  • 6. • Many companies treat search, recommendations/discovery and analytics as different beasts, yet: • The same inputs that make search better can also drive recommendations and better analytics • Engagement analytics is the key: • Your users give you engagement signals regarding the content that is relevant to them • Over time, patterns emerge in similarities of behavior (simplest possible pattern is just “popularity”) • These signals are often the biggest factor in both search relevance AND recommendations • In the enterprise, this is still the case, but the types of signals are often different (email, IM) Three Sides of the Same Coin
  • 7. • Content — documents which are textually similar are often good as “similar items” to be recommended • Collaborative — documents which have been engaged with by the same people (and/or in the same search context) are also similar in a more subtle, but often more powerful way • Multi-Modal — why choose one? Try a smooth interpolation between using a content-based similarity metric, and an engagement based one! Defining Moments
  • 8. Search-Driven Online Retail  Increase conversions with a personalized shopping experience with best in class reliability and performance. CATALOG DYNAMIC NAVIGATION AND LANDING PAGES INSTANT INSIGHTS AND ANALYTICS PERSONALIZED SHOPPING EXPERIENCE PROMOTIONS USER HISTORY Data Acquisition Data Processing Smart Access API
  • 9. Search-Driven Compliance and Surveillance Detect and investigate activity for regulatory compliance, from one unified view. DATABASE ACCURATE REAL-TIME INFORMATION CONTEXTUALLY- ENRICHED INFORMATION MESSAGESLOGS DATA EXPLORATION AND VISUALIZATION Data Acquisition Indexing & Streaming Smart Access API
  • 10. Search-Driven Customer Service Resolve customer issues quickly with immediate access to relevant answers. CUSTOMER 
 SELF-SERVICE KNOWLEDGE BASE PROACTIVE ALERTS AND RECOMMENDATIONS EXPERT TUNED RELEVANCY DRIVEN BY ANALYTICS AND INSIGHTS CRM SUPPORT TICKETS & ISSUE TRACKING Data Acquisition Data Processing Smart Access API
  • 12. Lucidworks Fusion Is Search-Driven Everything •Drive next generation relevance via Content, Collaboration and Context •Harness best in class Open Source: Apache Solr + Spark •Simplify application development and reduce ongoing maintenance CATALOG DYNAMIC NAVIGATION AND LANDING PAGES INSTANT INSIGHTS AND ANALYTICS PERSONALIZED SHOPPING EXPERIENCE PROMOTIONS USER HISTORY Data Acquisition Indexing & Streaming Smart Access API Recommendations &
 Alerts Analytics & InsightsExtreme Relevancy Access data from anywhere to build intelligent, data- driven applications.
  • 13. Fusion Architecture RESTAPI Worker Worker Cluster Mgr. Apache Spark Shards Shards Apache Solr HDFS(Optional) Shared Config Mgmt Leader Election Load Balancing ZK 1 Apache Zookeeper ZK N DATABASEWEBFILELOGSHADOOP CLOUD Connectors Alerting/Messaging NLP Pipelines Blob Storage Scheduling Recommenders/Signals … Core Services Admin UI SECURITY BUILT-IN Lucidworks View
  • 14. • Fusion • Recommenders API • Machine Learning pipeline stages • Scheduling • Solr: • More Like This + Signals • Spark: • MLlib, Mahout, custom Key Platform Tech
  • 15. • Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and: • Extracts nontrivial terms from specified fields in it • Builds an “OR” query to search for closest matches (like a cosine similarity computation) • Has many knobs to tune regarding “data-cleaning” non-useful terms from the query • TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V Content-focused {!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>
  • 16. “People who bought X also bought Y” / “Movies recommended for you” Collaborative Filtering Search User/ Item Index Top K users who’ve interacted with this Item Search and Rollup on User/ Item Index Top Y docs Current Doc Filter by context Profit User/Item Index Offline Tasks User/Item Signals Math!
  • 17. • Fusion CF-based “documents like this” pipeline stages: • Sub-query: search aggregated signals index for current doc_id, extracting the top-K pairs of (user_id, weight) • Sub-query: search that table again with a weighted OR query: (user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … ) • Roll-up: topN(sum(score_i * weight_i)) • Sub-query: fetch the documents from primary Solr index of these top N doc_ids Collaborative Filtering: step by step in Fusion
  • 18. • Both content-based and CF recommenders use features of the documents to generate a similarity metric • Content uses the tokens in the document • CF uses user ids who have engaged with it • Metrics can be weighted-summed, allowing a “slider” between the two • Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a (doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix • There is a cost to such techniques: harder to maintain, harder to A/B test variations Multi-modal
  • 19. • Basics: • 26 Apache Projects registered so far plus LW web properties • 93 datasources* including email, Github, JIRA*, Website and Wiki • Fusion 2.4 • Signals everywhere • UI based on Lucidworks View • ASF Mail archives mirrored at: http://asfmail.lucidworks.io Demo http://searchhub.lucidworks.com
  • 20. Implementation Details http://github.com/lucidworks/searchhub Branch: GH-28-doc-view Key Source Code UI Angular Directives: perdocument recommendations Offline Tasks Spark Jobs: mail_thread_signal_creation_job.json SimpleTwoHopRecommender.scala Fusion Pipelines Query: lucidfind-recommendations cf-similar-items-batch-rec cf-similar-items-rec
  • 21. • Ensemble and Click-based approaches • https://github.com/lucidworks/searchhub/issues/40 • https://github.com/lucidworks/searchhub/issues/28 • https://github.com/lucidworks/searchhub/issues/22 • Deploy live • User registrations • https://github.com/lucidworks/searchhub/issues/30 Future Work
  • 22. Resources Fusion: http://www.lucidworks.com/products/fusion Search Hub: http://searchhub.lucidworks.com Company: http://www.lucidworks.com Our blog: http://www.lucidworks.com/blog Twitter: @gsingers, @pbrane