SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
Cold-Start Recommendations
to Users With Rich Profiles
Harlan D. Harris, PhD

Director of Data Science at WayUp
September, 2018
RecSys NYC Meetup
1
After This Meetup!
• Go to The
Storehouse!
• Meet other
RecSys peeps!
2
3
Why Build a RecSys?
• College students
may not know
what they want —
must show options
• Promote customer
jobs
• Ongoing
engagements with
content (blog,
guide) recs
4
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
(Feed)
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
(Feed)
5
the problem with
collaborative filters…
6
the problem with
collaborative filters…
6
Leverage the Profile
• Structured &
Unstructured Data
• Natural Language
Processing
• Learning to Rank
• Domain Knowledge &
Feature Engineering
7
Architecture
User & 

Front End:
Hey, show me
jobs!
Main App:

That’s hard! But
I know who you
are!
DB
Microservice:
Got you. Feature
Engineering your
Profile…
DB
Profile,

Interaction
History
Listing IDs
Listing

Details
User

Details
User ID,
Params
Ranked
Listings

& Details
Offline Machine
Learning
8
What do you mean by… Similar?
Graphic Designer

Lehman Brothers is the
leading firm in highly
leveraged mortgages!
We have a ping pong
table!
You’re a great artist.
Risk Manager

Lehman Brothers is the
leading firm in highly
leveraged mortgages!
We have a ping pong
table!
You’re OK at math.
Visual Brand Lead

Can you draw? Dunder
Mifflin seeks a talented
person to help bring our
office paper business to
the next level. And you’ll
be on television!
Meetup, next week!
9
How to Build a Multi-Factor,
Profile-Based, Cold-Start Content
Recommendation System
10
11
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
6. Sponsored — why wouldn’t we…?
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career Status — needs content tagging/
taxonomy
3. Relevant to Major (Category) — needs content tagging
4. Recent — e.g., “10 great internships you can apply to now!”
5. Collaborative — people with profiles like yours read
content with tags like this
6. Sponsored — why wouldn’t we…?
7. Random!
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: video
games}
17
4
12
97
11
3
Aggregator
4
12
7
17
2
3
12
The More the Better
id recent major log rec log maj tot rank why
1 1 4 0 1.4 2.1 3 rec
2 2 2 0.7 0.7 1.7 2 maj
3 4 3 1.4 1.1 3.0 4 maj
4 3 1 1.1 0 1.1 1 maj
*1.5
13
*1.0
The More the Better
• Sum Weighted Log Rank (not Score)
• Tune with A/B tests (or reinforcement learning)
• Plausible “why” could be exposed to user
• Mix of general and personalized rankers
id recent major log rec log maj tot rank why
1 1 4 0 1.4 2.1 3 rec
2 2 2 0.7 0.7 1.7 2 maj
3 4 3 1.4 1.1 3.0 4 maj
4 3 1 1.1 0 1.1 1 maj
*1.5
13
*1.0
14
Separation of Concerns
15
Separation of Concerns
Main App
• Built by software engineers,
not data scientists
• Knows about user
immediately
• Sends JSON profile with no
feature engineering
15
Separation of Concerns
Main App
• Built by software engineers,
not data scientists
• Knows about user
immediately
• Sends JSON profile with no
feature engineering
Recommender microservice
• Knows about content, not
users
• Updated nightly with new
content & statistics
• Parses, engineers features,
ranks
• Returns ranked IDs
15
Metrics & Tuning
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 

Click-through Rate (did they like the suggestions?),

Mean Reciprocal Rank (did they like the top items?)
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 

Click-through Rate (did they like the suggestions?),

Mean Reciprocal Rank (did they like the top items?)
• Avoid hurting top KPIs!
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 

Click-through Rate (did they like the suggestions?),

Mean Reciprocal Rank (did they like the top items?)
• Avoid hurting top KPIs!
• Offline debugging tool is very handy
16
Pros & Cons
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
• Works with new users and new-ish content
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy to extend once history available (MF or LTR subrankers)
• Easy to incorporate business priorities
• Works with new users and new-ish content
• Doesn’t work with very large number of items; 

Requires tuning
17
Thank You!
Harlan Harris
harlan@wayup.com
@harlanh on Twitter, Medium, GitHub
http://harlan.harris.name
18
What Happens When?
Real Time
• Ranking
19
Nightly
• Update
content
• Compute
popularity
• Refit
collaborative
ranker
Periodically
• Tuning
parameters
• Exploring new
rankers

Contenu connexe

Similaire à Cold-Start Recommendations to Users With Rich Profiles

Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlSpark Summit
 
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...BI Brainz
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...Adrian Jones
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slidesLouis Rosenfeld
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorialduleepa
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission TeamsDashlane
 
Technical Excellence Doesn't Just Happen - AgileIndy 2016
Technical Excellence Doesn't Just Happen - AgileIndy 2016Technical Excellence Doesn't Just Happen - AgileIndy 2016
Technical Excellence Doesn't Just Happen - AgileIndy 2016Allison Pollard
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropeFlip Kromer
 
Senior applications engineer email list
Senior applications engineer email listSenior applications engineer email list
Senior applications engineer email listGloriaDylan
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterBrad Swanson
 
Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware Cprime
 
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...Andy Talbot
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayAlexis Monville
 
Improve Product Design with High Quality Requirements
Improve Product Design with High Quality RequirementsImprove Product Design with High Quality Requirements
Improve Product Design with High Quality RequirementsElizabeth Steiner
 
MongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB FeatureMongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB FeatureMongoDB
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsAnya Bida
 
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...QueBIT Consulting
 

Similaire à Cold-Start Recommendations to Users With Rich Profiles (20)

Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
 
Understanding Your Project Before You Start
Understanding Your Project Before You StartUnderstanding Your Project Before You Start
Understanding Your Project Before You Start
 
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...
The Agile Drupalist - Methodologies & Techniques for Running Effective Drupal...
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slides
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
Technical Excellence Doesn't Just Happen - AgileIndy 2016
Technical Excellence Doesn't Just Happen - AgileIndy 2016Technical Excellence Doesn't Just Happen - AgileIndy 2016
Technical Excellence Doesn't Just Happen - AgileIndy 2016
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
 
Senior applications engineer email list
Senior applications engineer email listSenior applications engineer email list
Senior applications engineer email list
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products Faster
 
Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware
 
Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...Building enterprise platforms - off the beaten path - SharePoint User Group U...
Building enterprise platforms - off the beaten path - SharePoint User Group U...
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source Way
 
Improve Product Design with High Quality Requirements
Improve Product Design with High Quality RequirementsImprove Product Design with High Quality Requirements
Improve Product Design with High Quality Requirements
 
MongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB FeatureMongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB Feature
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
 
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...
Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Doc...
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Cold-Start Recommendations to Users With Rich Profiles

  • 1. Cold-Start Recommendations to Users With Rich Profiles Harlan D. Harris, PhD
 Director of Data Science at WayUp September, 2018 RecSys NYC Meetup 1
  • 2. After This Meetup! • Go to The Storehouse! • Meet other RecSys peeps! 2
  • 3. 3
  • 4. Why Build a RecSys? • College students may not know what they want — must show options • Promote customer jobs • Ongoing engagements with content (blog, guide) recs 4
  • 5. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  • 6. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  • 7. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  • 8. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like (Feed) 5
  • 9. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like (Feed) 5
  • 12. Leverage the Profile • Structured & Unstructured Data • Natural Language Processing • Learning to Rank • Domain Knowledge & Feature Engineering 7
  • 13. Architecture User & 
 Front End: Hey, show me jobs! Main App:
 That’s hard! But I know who you are! DB Microservice: Got you. Feature Engineering your Profile… DB Profile,
 Interaction History Listing IDs Listing
 Details User
 Details User ID, Params Ranked Listings
 & Details Offline Machine Learning 8
  • 14. What do you mean by… Similar? Graphic Designer
 Lehman Brothers is the leading firm in highly leveraged mortgages! We have a ping pong table! You’re a great artist. Risk Manager
 Lehman Brothers is the leading firm in highly leveraged mortgages! We have a ping pong table! You’re OK at math. Visual Brand Lead
 Can you draw? Dunder Mifflin seeks a talented person to help bring our office paper business to the next level. And you’ll be on television! Meetup, next week! 9
  • 15. How to Build a Multi-Factor, Profile-Based, Cold-Start Content Recommendation System 10
  • 16. 11
  • 17. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 18. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 19. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 20. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 21. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 22. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 23. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this 6. Sponsored — why wouldn’t we…? Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 24. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this 6. Sponsored — why wouldn’t we…? 7. Random! Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  • 25. The More the Better id recent major log rec log maj tot rank why 1 1 4 0 1.4 2.1 3 rec 2 2 2 0.7 0.7 1.7 2 maj 3 4 3 1.4 1.1 3.0 4 maj 4 3 1 1.1 0 1.1 1 maj *1.5 13 *1.0
  • 26. The More the Better • Sum Weighted Log Rank (not Score) • Tune with A/B tests (or reinforcement learning) • Plausible “why” could be exposed to user • Mix of general and personalized rankers id recent major log rec log maj tot rank why 1 1 4 0 1.4 2.1 3 rec 2 2 2 0.7 0.7 1.7 2 maj 3 4 3 1.4 1.1 3.0 4 maj 4 3 1 1.1 0 1.1 1 maj *1.5 13 *1.0
  • 27. 14
  • 29. Separation of Concerns Main App • Built by software engineers, not data scientists • Knows about user immediately • Sends JSON profile with no feature engineering 15
  • 30. Separation of Concerns Main App • Built by software engineers, not data scientists • Knows about user immediately • Sends JSON profile with no feature engineering Recommender microservice • Knows about content, not users • Updated nightly with new content & statistics • Parses, engineers features, ranks • Returns ranked IDs 15
  • 32. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B 16
  • 33. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) 16
  • 34. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) • Avoid hurting top KPIs! 16
  • 35. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) • Avoid hurting top KPIs! • Offline debugging tool is very handy 16
  • 37. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production 17
  • 38. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations 17
  • 39. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) 17
  • 40. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities 17
  • 41. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities • Works with new users and new-ish content 17
  • 42. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities • Works with new users and new-ish content • Doesn’t work with very large number of items; 
 Requires tuning 17
  • 43. Thank You! Harlan Harris harlan@wayup.com @harlanh on Twitter, Medium, GitHub http://harlan.harris.name 18
  • 44. What Happens When? Real Time • Ranking 19 Nightly • Update content • Compute popularity • Refit collaborative ranker Periodically • Tuning parameters • Exploring new rankers