SlideShare une entreprise Scribd logo
1  sur  28
Applying NLP to Product Comparison at Visual Meta
1
Ross Turner
Elasticsearch Meetup Berlin 22/02/17
Overview
Product Comparison on the Visual Meta Platform1
Applying NLP to Product Comparison
Using NLP to Maintain a Product Catalogue2
Making Product Discovery Conversational3
2
About Me
Previously…
• Researcher in Natural Language Generation (NLG)
• Software Engineer on Local Search
• Co-founder and Principal Engineer at an NLG Start Up
Currently…
• Engineering Head at Visual Meta
Product Comparison on the Visual Meta Platform
4
Product Comparison at Visual Meta
‘All shops, one site’
• Online marketing platform with
shopping portals in 12 different
countries
• 3 brands: Ladenzeile, ShopAlike,
UmSóLugar
• 100,000,000+ items
• 6,000+ partner shops
Faceted Search at Visual Meta
Discover fashion, furniture and
more….
• 800,000 platform visits per day
• 80 filter types across 21
categories
• Currently porting filter search
to ElasticSearch
Maintaining a Product Catalogue at Visual Meta
Product feeds are continuously synced from partner shops:
• Feed items must be categorised in order to be discoverable on the platform
We want to:
• Identify all variants of a product
• Compare offers across shops
• Make it easy for our for users to browse through millions of products
Model Colour Memory
Apple iPhone 6s Space Grey 32GB
Apple iPhone 6s Space Grey 128GB
Apple iPhone 6s Gold 32GB
Apple iPhone 6s Gold 128GB
Apple iPhone 6s Rose Gold 32GB
Apple iPhone 6s Rose Gold 128GB
Apple iPhone 6s Silver 32GB
Apple iPhone 6s Silver 128GB
Assigning Tags Based on Textual Attributes
8
String Matching
Index item names and descriptions, query product variant tag names against the index
Lucene query:
• +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s)
+(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey)
Test by manually assigning items to a random sample of products
Recall Precision Fscore
0.59 0.64 0.61
Error Analysis
Naming for the same product is not consistent across feeds:
1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)”
2. efg.com: “Apple iPhone 6 64 GB Space Grey”
3. xyz.com: “Apple iPhone 6”
Naming for the same product is not consistent within the same feed:
1. “Apple Iphone 6 - 64GB”
2. “Apple Iphone 6 64GB Space Grey”
3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone”
Wrongly categorised Products in the feed:
• “Cover for Apple Iphone 6 - 64GB”
Comparing Tag Names to Item Names
Comparing Names Between Item Feeds
Text Classification
13
Language Models
Drawbacks of bag of words / n-grams:
• Words are equally distant
• Vectors are sparse
Word embeddings capture semantics:
• Vectors are continuous
• Similar words are close in vector space
1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg
Corrado, Jeffrey Dean
15
Word2Vec for Mobile Phone Items
Mobile phone item corpus:
• 7,890 feed items
• 863k tokens, 41.5k unique
Closest words to “Galaxy”:
Word Cosine Distance
1 Samsung 0.51
2 S2 0.48
3 S5 0.46
Classification Performance
Tag Best BOW Classifier Decision Tree with Word2Vec
Fscore Precision Recall Fscore Precision Recall
“Smartphone” 0.95 0.99 0.84 0.92 0.94 0.90
“Home Speakers” 0.55 0.67 0.32 0.79 0.79 0.80
“Creeper Cipök” 0.58 0.97 0.22 0.75 0.86 0.66
“Leder Schuhe” 0.52 0.73 0.25 0.71 0.93 0.58
“Bett mit Schubladen” 0.52 0.65 0.29 0.70 0.81 0.62
Feed Enhancement
17
Two Descriptions of a Samsung TV
Samsung UE40H6400AK. Display diagonal:
101.6 cm (40"), HD type: Full HD, Display
resolution: 1920 x 1080 pixels. Tuner type:
Analog & Digital, Digital signal format
system: DVB-C, DVB-T. RMS rated power:
20 W. Consumer Electronics Control (CEC):
Anynet+. Picture processing technology:
Samsung Wide Color Enhancer
The Samsung UE40H6400 has a 101.6cm
screen size and a resolution of 1920 x
1080 pixels. It is a Full HD TV, has an
Analog & Digital tuner and comes with
Anynet+.
Generating Product Descriptions
Choosing what to say Deciding how to say it
3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
Two Descriptions of a Samsung Smartphone
Samsung SM-G920F, Galaxy. Display
diagonal: 12.9 cm (5.1"), Display
resolution: 2560 x 1440 pixels, Display
type: SAMOLED. Processor frequency: 2.1
GHz, Coprocessor frequency: 1.5 GHz.
Internal storage capacity: 32 GB, Internal
RAM: 3072 MB. Main camera resolution
(numeric): 16 MP, Video recording modes:
1080p, 2160p, Maximum frame rate: 30
fps. SIM card capability: Single SIM, SIM
card type: NanoSIM, 2G standards: GSM
The Samsung GALAXY S6 has a 12.9'
display with 2560 x 1440 pixel resolution.
It has a 2.1GHZ processor, a 16 megapixel
camera and 3072MB of internal RAM with
32GB of internal storage capacity.
Building Messages from a Product Catalogue
The Samsung Galaxy S6 has a 12.9' display
with 2560 x 1440 pixel resolution. It has a
2.1GHZ processor, a 16 megapixel camera
and 3072MB of internal RAM with 32GB of
internal storage capacity.
Making Product Discovery Conversational
22
Entity Recognition for Voice Search
Input - “I’d like some red adidas trainers”
Output:
• <brands, [adidas]>
• <categories, [trainers]>
• <colours, [red]>
234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
Lucene index is built from labels to tag tree
tokens
1. Word shingles are extracted from the input
query
2. Each shingle is queried against the index (top
down, greedy)
Labeled tokens are used to:
1. Query the product index
2. Keep track of the dialogue state
Using the Product Catalogue to Parse Queries
24
• “I’d like some red adidas trainers”
• “I’d like some red adidas”
• “like some red adidas trainers”
• “I’d like some red”
• “like some red adidas”
• “some red adidas trainers”
• ...
• “red”
• “adidas”
• “trainers”
Putting It all Together: Answering Queries
How big is the Samsung Galaxy S6’s screen?
The Samsung Galaxy S6 has a 12’9 display
How much RAM does it have?
It has 3072MB of RAM
Wrapping Up
26
Takeaways
1. Word embeddings, even when trained on limited data can:
a. provide significant improvement over bag of words models for text classification; and
b. reduce the amount of manually curated data required for the task
2. Product catalogues provide a rich information source for conversational apps
3. NLG can be utilised for product feed enhancement as well as conversation
Thank you
28

Contenu connexe

En vedette

Developing highly scalable applications with Symfony and RabbitMQ
Developing highly scalable applications with  Symfony and RabbitMQDeveloping highly scalable applications with  Symfony and RabbitMQ
Developing highly scalable applications with Symfony and RabbitMQAlexey Petrov
 
CloudStack EU user group - Trillian
CloudStack EU user group - TrillianCloudStack EU user group - Trillian
CloudStack EU user group - TrillianShapeBlue
 
NSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland ChapecoNSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland ChapecoRodrigo Montoro
 
Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Amazon Web Services
 
Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016Amazon Web Services
 
Apostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricosApostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricoselkbcion
 
Business selectors
Business selectorsBusiness selectors
Business selectorsbenwaine
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQlxfontes
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
 
Orchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp DockerOrchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp DockerThe Incredible Automation Day
 
Hunting powerpoint
Hunting powerpointHunting powerpoint
Hunting powerpointKJRoss9
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMarcin Grzejszczak
 
Automated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSSAutomated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSSSonatype
 
Application Deployment at UC Riverside
Application Deployment at UC RiversideApplication Deployment at UC Riverside
Application Deployment at UC RiversideMichael Kennedy
 
Python Pants Build System for Large Codebases
Python Pants Build System for Large CodebasesPython Pants Build System for Large Codebases
Python Pants Build System for Large CodebasesAngad Singh
 
API Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation ExperienceAPI Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation ExperienceCapgemini
 

En vedette (17)

Developing highly scalable applications with Symfony and RabbitMQ
Developing highly scalable applications with  Symfony and RabbitMQDeveloping highly scalable applications with  Symfony and RabbitMQ
Developing highly scalable applications with Symfony and RabbitMQ
 
CloudStack EU user group - Trillian
CloudStack EU user group - TrillianCloudStack EU user group - Trillian
CloudStack EU user group - Trillian
 
NSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland ChapecoNSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland Chapeco
 
Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud.
 
Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016
 
Apostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricosApostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricos
 
Business selectors
Business selectorsBusiness selectors
Business selectors
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Orchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp DockerOrchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp Docker
 
Hunting powerpoint
Hunting powerpointHunting powerpoint
Hunting powerpoint
 
Jake Fox Pd. 5
Jake Fox Pd. 5Jake Fox Pd. 5
Jake Fox Pd. 5
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
 
Automated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSSAutomated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSS
 
Application Deployment at UC Riverside
Application Deployment at UC RiversideApplication Deployment at UC Riverside
Application Deployment at UC Riverside
 
Python Pants Build System for Large Codebases
Python Pants Build System for Large CodebasesPython Pants Build System for Large Codebases
Python Pants Build System for Large Codebases
 
API Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation ExperienceAPI Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation Experience
 

Similaire à Applying NLP to product comparison at visual meta

Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!Databricks
 
Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceMongoDB
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Sandra Garcia
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTrent McConaghy
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Elasticsearch
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)Jeremy Cabral
 
Transformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdfTransformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdfChristopherLennan
 
Oracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level OverviewOracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level OverviewGordon Kiser
 
World of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfuaWorld of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfuaAndy Shutka
 
Tokens and Complex Systems
Tokens and Complex SystemsTokens and Complex Systems
Tokens and Complex SystemsTrent McConaghy
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsSteven Francia
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBMongoDB
 
Design Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems LondonDesign Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems LondonSarah Federman
 
MongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationMongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationSteven Francia
 
Gadget Store Application
Gadget Store ApplicationGadget Store Application
Gadget Store Applicationmakersbay
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19Paya Do
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHDataiku
 
Cross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowCross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowChristian Hassa
 

Similaire à Applying NLP to product comparison at visual meta (20)

Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!
 
Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Transformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdfTransformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdf
 
Oracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level OverviewOracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level Overview
 
World of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfuaWorld of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfua
 
Search enginebasics
Search enginebasicsSearch enginebasics
Search enginebasics
 
Tokens and Complex Systems
Tokens and Complex SystemsTokens and Complex Systems
Tokens and Complex Systems
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Design Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems LondonDesign Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems London
 
MongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationMongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combination
 
Gadget Store Application
Gadget Store ApplicationGadget Store Application
Gadget Store Application
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
 
Cross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowCross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlow
 

Dernier

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Dernier (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Applying NLP to product comparison at visual meta

  • 1. Applying NLP to Product Comparison at Visual Meta 1 Ross Turner Elasticsearch Meetup Berlin 22/02/17
  • 2. Overview Product Comparison on the Visual Meta Platform1 Applying NLP to Product Comparison Using NLP to Maintain a Product Catalogue2 Making Product Discovery Conversational3 2
  • 3. About Me Previously… • Researcher in Natural Language Generation (NLG) • Software Engineer on Local Search • Co-founder and Principal Engineer at an NLG Start Up Currently… • Engineering Head at Visual Meta
  • 4. Product Comparison on the Visual Meta Platform 4
  • 5. Product Comparison at Visual Meta ‘All shops, one site’ • Online marketing platform with shopping portals in 12 different countries • 3 brands: Ladenzeile, ShopAlike, UmSóLugar • 100,000,000+ items • 6,000+ partner shops
  • 6. Faceted Search at Visual Meta Discover fashion, furniture and more…. • 800,000 platform visits per day • 80 filter types across 21 categories • Currently porting filter search to ElasticSearch
  • 7. Maintaining a Product Catalogue at Visual Meta Product feeds are continuously synced from partner shops: • Feed items must be categorised in order to be discoverable on the platform We want to: • Identify all variants of a product • Compare offers across shops • Make it easy for our for users to browse through millions of products Model Colour Memory Apple iPhone 6s Space Grey 32GB Apple iPhone 6s Space Grey 128GB Apple iPhone 6s Gold 32GB Apple iPhone 6s Gold 128GB Apple iPhone 6s Rose Gold 32GB Apple iPhone 6s Rose Gold 128GB Apple iPhone 6s Silver 32GB Apple iPhone 6s Silver 128GB
  • 8. Assigning Tags Based on Textual Attributes 8
  • 9. String Matching Index item names and descriptions, query product variant tag names against the index Lucene query: • +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s) +(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey) Test by manually assigning items to a random sample of products Recall Precision Fscore 0.59 0.64 0.61
  • 10. Error Analysis Naming for the same product is not consistent across feeds: 1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)” 2. efg.com: “Apple iPhone 6 64 GB Space Grey” 3. xyz.com: “Apple iPhone 6” Naming for the same product is not consistent within the same feed: 1. “Apple Iphone 6 - 64GB” 2. “Apple Iphone 6 64GB Space Grey” 3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone” Wrongly categorised Products in the feed: • “Cover for Apple Iphone 6 - 64GB”
  • 11. Comparing Tag Names to Item Names
  • 14. Language Models Drawbacks of bag of words / n-grams: • Words are equally distant • Vectors are sparse Word embeddings capture semantics: • Vectors are continuous • Similar words are close in vector space 1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
  • 15. 15 Word2Vec for Mobile Phone Items Mobile phone item corpus: • 7,890 feed items • 863k tokens, 41.5k unique Closest words to “Galaxy”: Word Cosine Distance 1 Samsung 0.51 2 S2 0.48 3 S5 0.46
  • 16. Classification Performance Tag Best BOW Classifier Decision Tree with Word2Vec Fscore Precision Recall Fscore Precision Recall “Smartphone” 0.95 0.99 0.84 0.92 0.94 0.90 “Home Speakers” 0.55 0.67 0.32 0.79 0.79 0.80 “Creeper Cipök” 0.58 0.97 0.22 0.75 0.86 0.66 “Leder Schuhe” 0.52 0.73 0.25 0.71 0.93 0.58 “Bett mit Schubladen” 0.52 0.65 0.29 0.70 0.81 0.62
  • 18. Two Descriptions of a Samsung TV Samsung UE40H6400AK. Display diagonal: 101.6 cm (40"), HD type: Full HD, Display resolution: 1920 x 1080 pixels. Tuner type: Analog & Digital, Digital signal format system: DVB-C, DVB-T. RMS rated power: 20 W. Consumer Electronics Control (CEC): Anynet+. Picture processing technology: Samsung Wide Color Enhancer The Samsung UE40H6400 has a 101.6cm screen size and a resolution of 1920 x 1080 pixels. It is a Full HD TV, has an Analog & Digital tuner and comes with Anynet+.
  • 19. Generating Product Descriptions Choosing what to say Deciding how to say it 3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
  • 20. Two Descriptions of a Samsung Smartphone Samsung SM-G920F, Galaxy. Display diagonal: 12.9 cm (5.1"), Display resolution: 2560 x 1440 pixels, Display type: SAMOLED. Processor frequency: 2.1 GHz, Coprocessor frequency: 1.5 GHz. Internal storage capacity: 32 GB, Internal RAM: 3072 MB. Main camera resolution (numeric): 16 MP, Video recording modes: 1080p, 2160p, Maximum frame rate: 30 fps. SIM card capability: Single SIM, SIM card type: NanoSIM, 2G standards: GSM The Samsung GALAXY S6 has a 12.9' display with 2560 x 1440 pixel resolution. It has a 2.1GHZ processor, a 16 megapixel camera and 3072MB of internal RAM with 32GB of internal storage capacity.
  • 21. Building Messages from a Product Catalogue The Samsung Galaxy S6 has a 12.9' display with 2560 x 1440 pixel resolution. It has a 2.1GHZ processor, a 16 megapixel camera and 3072MB of internal RAM with 32GB of internal storage capacity.
  • 22. Making Product Discovery Conversational 22
  • 23. Entity Recognition for Voice Search Input - “I’d like some red adidas trainers” Output: • <brands, [adidas]> • <categories, [trainers]> • <colours, [red]> 234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
  • 24. Lucene index is built from labels to tag tree tokens 1. Word shingles are extracted from the input query 2. Each shingle is queried against the index (top down, greedy) Labeled tokens are used to: 1. Query the product index 2. Keep track of the dialogue state Using the Product Catalogue to Parse Queries 24 • “I’d like some red adidas trainers” • “I’d like some red adidas” • “like some red adidas trainers” • “I’d like some red” • “like some red adidas” • “some red adidas trainers” • ... • “red” • “adidas” • “trainers”
  • 25. Putting It all Together: Answering Queries How big is the Samsung Galaxy S6’s screen? The Samsung Galaxy S6 has a 12’9 display How much RAM does it have? It has 3072MB of RAM
  • 27. Takeaways 1. Word embeddings, even when trained on limited data can: a. provide significant improvement over bag of words models for text classification; and b. reduce the amount of manually curated data required for the task 2. Product catalogues provide a rich information source for conversational apps 3. NLG can be utilised for product feed enhancement as well as conversation