SlideShare une entreprise Scribd logo
1  sur  41
Scaling Machine Learning and
Statistics for Web Applications
Recommendations, Search, Advertising
Deepak Agarwal
dagarwal@linkedin.com
KDD, Aug 12th, 2015 @ Sydney, AUS
2
Our vision
Create economic opportunity for every
member of the global workforce
Our mission
Connect the world’s professionals to
make them more productive and
successful
Our core value Members first!
HOW ?
Connect talent with opportunity at scale
5
Algorithmic Match-Making
via Machine Learning and Data Mining
JOBS to Apply Feed
Articles to Read Connections to Nurture,
Keep in Touch with
Search
PYMK
Scale Algorithmic Match Making
• Approach: continuously learn from historical data
– Machine learning/statistical models
– Run experiments (test new procedures and collect
data)
• Scalable and robust infrastructure
– Large scale batch computations
– Large scale near-line computations
– High throughput, low latency online computations
– Fast Retrieval Engines (Search)
Search versus Recommendation
Search: User specified query: Recall important
Recommendations: Delivery Mechanisms
• Pull Model: When the user visits, serve the most
relevant
– Desktop, mobile web, mobile app, iPad
• Push Model: User is not visiting but we need to reach
out with information {Email, Notifications}
– Higher relevance bar: Right message, right user, right
time, right frequency, right channel
Done through ML and optimization
Rest of the Talk
• Data and Problem Formulation
• Machine Learning Process
– Illustrated with Feed Application
• Lessons
WHAT INFORMATION DO WE HAVE ABOUT
USERS AND ITEMS ?
MATCH-MAKING: Know your items, your users and
their intent
User
Characteristics
Profile Information
Title, seniority, skills,
education, endorsements,
presentations,…
Behavioral
Activities, search,..
Edge features (ego-
centric network)
Connection strength,
content affinities,..
• Professional profile of record
User Intent
• What are you here for ?
– Hire, get hired, stay informed, grow network,
nurture connections, sell, market,..
• Explicit (e.g., visiting jobs homepage, search query),
• Implicit (needs to be inferred, e.g., based on activities)
Things to recommend/search
• Diverse
– People, companies, groups, publishers, slides,
courses, ads, jobs, leads, universities
• Features: NLP, semantic (e.g., topics), Unsupervised
(e.g., LDA) , Profile standardization (Title, Skills, …)
How to scale Recommendations ?
• Formulate objectives to optimize
• Optimize via ML models
– incorporate both implicit and explicit signals about user and intent
• Automate
Match making: Connecting long-term objectives to proxies that
can be optimized by machines/algorithms
Long-term objectives
(return visits, advertising
revenue, sign-ups, job apply,..)
Formulate objectives,
proxies (CTR, revenue/visit,
multiple-objectives, …)
Large scale
optimization via ML, UI
changes,..
Engage, experiment, Learn,
Deploy, Innovate
Automation
Optimize proxies with short feedback loop via Machine Learning
Whom?
User Profile, User Intent
Item Filtering,
Understanding
ContextWhat?
Interaction Data
INPUT SIGNALS
MACHINE LEARNING
RANK Items
Sort by Score
Mul -objec ve
Business rule
SCORE Items
P(Click), P(Share)
Similarity,…
The Feed: Important starting point
21
The Feed: Heterogeneity of “types”
Network Updates (Nurture, keep in
touch)
• Job change
• Job anniversaries
• Connections
• Change Profile Picture
• …
Content with Explicit follow
• Articles by Influencer
• Shares by members in your
network
• Content in Channels followed
• Content by companies followed
• …
Recommendations & Ads
• Articles, PYMK, Endorsements
• Sponsored updates (Ads)
• Jobs
• …
22
The Feed: How to build a relevant/optimized feed?
• Independent vs Dependent
– Feed has dependent observations within and across types
• CTR decays when showing same type multiple times
• CTR decays when showing multiple things from the same connection
• ------
• What is the Metric we optimize for the Feed?
-- connections
– Revenue
– Clicks
– Likes, shares, comments
– Job applications
– …..
23
Fundamental Problem: Response
Prediction
Predict the probability that a user will respond to
an item in a given context
Provides a statistical framework to incorporate
downstream utilities and other constraints in ranking
items for users
Response Prediction
• Click prediction
– Estimate P(click | user, item, context)
– Use to calculate E[utility] = P(click) * utility
• Logistic regression for click prediction, LTR for Search like problems
– Scalable
– Well understood
• Challenges
– Integrating feature data from multiple sources
– Scaling training on large data with many features
– Flexible and rapid experimentation
– Real time scoring
Response Prediction Models
• Three main sources of features
– User (e.g. industry, title)
– Item (e.g. keywords, LDA topics)
– Context (e.g. time, page)
• Interactions are important
– E.g., Industry-specific ads may only
appeal to people in that industry
• Features only get so far
– Hard to beat Σclicks / Σviews for
items/users with a lot of data
– “Warmstart” terms incorporate
per-item/per-user information
xi : Features from user i
yj : Features from item j
zk : Features from context k
a,b,g,... :Coefficients
ai,bj,gk,... :Coefficients indexed by user, item, etc.
A, B,C,... : Interaction coefficients
Interaction
Cold Start
Warm Start
log
p
1- p
æ
è
ç
ö
ø
÷ = w +aT
xi + bT
yj +gT
zk +
xi
T
Ayj + zk
T
Byj +
wj +aj xi
log
p
1- p
æ
è
ç
ö
ø
÷ = w +aT
xi + bT
yj +gT
zk +
hj
T
xi +tk
T
yjj +
wj +aj xi
hj = Ayj
xi
T
Ayj = xi
T
Ayj( )=hj
T
xi
tk = Bzk
zk
T
Byj = zk
T
B( )yj = tk
T
yj
Global
Per-partition
Model Fitting at Scale
• Most fitting methods for LR are iterative
– Multiple passes through the data
– Poor fit with map/reduce
– We use Apache Spark
• We use Data Partitioning
– Split data into blocks
– Train separately
– Merge the parameters
• Alternating Direction Method of
Multipliers (ADMM)
– Proven to converge to global optimum
• Small tweaks can improve performance
– Optimized the starting values
– Learning rate decay
Q = Global parameter estimate
Qr = Parameter estimate for partition r
dr = Data in partition r
Lr Qr;dr( ) = Likelihood function for partition r
P Q( ) = Regularization penalty term
ur = Per-partition bias values
Qr
t+1( )
= argmin
Qr
Lr Qr;dr( )+
r
2
Qr -Q
t( )
+ur
t( )
2
Q
t+1( )
= argmin
Q
P Q( )+
Rr
2
Qr -Q
t( )
+ur
t( )
2
ur
t+1( )
= ur
t( )
+Qr
t+1( )
-Q
t+1( )
min Lr Qr;dr( )+ P Q( )
r=1
R
å
Subject to Qr -Q = 0 for r =1… R
Flexible Configuration
• Feature engineering
– Every problem is different
– Lots of trial and error
– Faster, easier feature engineering
translates to gains sooner
• JSON-based config language
– Sources: import features from outside
– Transformers: apply functions to
feature vectors
– Assembler: packages feature terms
for fitting/scoring
• Rapid development
– No code for most changes
– Offline and Online in sync
User Source
Context
Source
Item Source
SubsetSubset
Interaction
Assembler
Request User Item
Training or Scoring
Runtime scoring optimizations
• Real time performance
– About 10µs per inference (1500 items = 15ms)
– Reacts to changing features immediately
• “Better wrong than late”
– If a feature isn’t immediately available, back off to prior value
• Asynchronous computation
– Actions that block or take time run in background threads
• Lazy evaluation
– Sources & transformers do not create feature vectors for all items
– Feature vectors are constructed/transformed only when needed
• Partial results cache
– Logistic regression scoring is a series of dot products
– Scalars are small; cache can be huge
– Hardware-like implementation to minimize locking and heap pressure
8
9
10
11
12
13
14
15
16
0 10 20 30 40 50 60
Timeperrequest(ms)
Time (m)
Beyond Response Prediction
• Explore/exploit
• Impression discounting
– Don’t show same/similar stuff many times to the user
• Diversification
• Multi-objective optimization
Explore/Exploit: How to Score a New Item ?
• Predict the click rate for new item
• Cold-start problem
– No data to estimate warm start for new item just added
• Solution: Controlled and economical experiments
– Explore (experiment): Collect data by promoting new item
to a small random sample of users
– Exploit: Update warm start based on collected data
– Automate explore/exploit
• Only experiment when we can get bang for the bucks (potential of
gain high)
Explore/Exploit Problem
Simplified setting: Two items
CTR
Probabilitydensity
Item A
Item B
We know the CTR of Item A (say, shown 1 million times)
We are uncertain about the CTR of Item B (only 100 times)
If we only make a single decision,
give 100% page views to Item A
If we make multiple decisions in the future
explore Item B since its CTR can be potentially
higher
 

qp
dppfqp )()(Potential
CTR of item A is q
CTR of item B is p
Probability density function of item B’s CTR is f(p)
Automating Explore/Exploit via
Thompson Sampling Heuristic
33
+
+
+
+
+
+
+
_
_
_
_
_
_
_
_
_
_ _
_
_
Cold Start
Cold + Warm
POSTERIOR of Warm-start
COEFFICIENTS
E/E: Sample warm start
from the posterior
(Thompson Sampling)
Impression Discounting
• Reduce the chance of
showing the same item to
the same user repeatedly
• Decay the score of an item
based on #times that the
user saw the item before
• Using real-time feedback
• Discounting by user
segments and item types
Global (over all types)
Impression discounting curves
of a few item types
Diversification
• Users’ experience deteriorates when exposed
to the same kind of items multiple times on
the same page Discounting actor
repetitions
Group Discussion CTR Drop
2 adjacent discussions 21%
3 adjacent discussions 48%
Multi-Objective Optimization
• E.g: Maximize advertising revenue s.t.
CTR ≥ (1-ε) max achievable CTR
– Invert via duals by making it strongly convex, helps obtain serving for
new user
• Obtain Pareto optimal solutions ( efficient
frontier )
CTR
Revenue
ε
Feasible
Impossible
Putting it all together
• Federated Model
– Tier 1: Local models for update types
– Tier 2: Calibrate local models and add more
features to do holistic personalization
• Re-rank by applying diversification, impression
discounting, multi-objective optimization
• Separate teams for Tier 1 and Tier 2
Scaling Model Building
Data
Tracking
Pre-
Processing
Feature
Generation
Model
Training
Offline
Model
Evaluation
Online
experiment
Only if Good
Extremely
Computationally
Intensive
▪ Research and Development: Flexible and easy to use software
environment to create models [offline compatible with online]
▪ Maintenance: Models in production trained continuously and
automatically, proper monitoring and testing for building reliable
workflows, config based model deployment, A/B testing platform
▪ Stack of Models: From simple baselines to more sophistication
▪ Feature Management: Easy to discover and share features across
applications
Going beyond ML, Statistics
 Dogfooding our own products
– Look at employee experience
– Debug model scores if something does not look right
 Focus research group studies, user surveys
 Product strategy & Intuition
– E.g., remove/add certain content types
– Adding constraints like freshness bounds, etc
 Presentation
– Testing different UI, presentation templates, fonts etc
Summary
• Formulating objectives is important (not easy)
• Machine Learning is slow and fragile
– Model training is not the only bottleneck
• Data pre-processing, feature management, near-line
computations and online scoring all important
• Need A/B testing platform, Fast Retrieval systems
• Need an end-to-end framework that can make
it easy for modelers to test rapidly
– Data Miners should be closely involved
42
ADDITIONAL SLIDES
Example System Architecture
Feature
Extraction
Feature
Extraction
Profile Network
Feature Extraction
Model
User Data
Store
Online Data
Processing
Apache
Example System Architecture
Feature
Extraction
Feature
Extraction
Profile Network
Feature Extraction
Model
User Data
Store
Online Data
Processing
Ranking Retrieval

Contenu connexe

Tendances

Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueXavier Amatriain
 
Recommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentRecommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentCrossing Minds
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleXavier Amatriain
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...Alejandro Bellogin
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
 
Scalable advertising recommender systems
Scalable advertising recommender systemsScalable advertising recommender systems
Scalable advertising recommender systemsJoaquin Delgado PhD.
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperChangsung Moon
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 

Tendances (20)

Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business Value
 
Recommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentRecommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time Deployment
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Scalable advertising recommender systems
Scalable advertising recommender systemsScalable advertising recommender systems
Scalable advertising recommender systems
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 

En vedette

Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Wayyingfeng
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Cisco press ccnp route 642 902
Cisco press ccnp route 642 902Cisco press ccnp route 642 902
Cisco press ccnp route 642 902Yashwant Aditya
 
как создать компанию в Швейцарии
как создать компанию в Швейцариикак создать компанию в Швейцарии
как создать компанию в ШвейцарииBridgeWest.eu
 
ใบงานที่ 3 เรื่อง ขอบข่ายและประเภทของโครงงาน
ใบงานที่ 3 เรื่อง ขอบข่ายและประเภทของโครงงานใบงานที่ 3 เรื่อง ขอบข่ายและประเภทของโครงงาน
ใบงานที่ 3 เรื่อง ขอบข่ายและประเภทของโครงงานChanon Saiatit
 
Safe handling fruit and veggies
Safe handling fruit and veggiesSafe handling fruit and veggies
Safe handling fruit and veggiesRachael Mann
 
ครูมือใหม่
ครูมือใหม่ครูมือใหม่
ครูมือใหม่Jo Smartscience II
 
AND Isitallaboutwhoyouknow - future directions report
AND Isitallaboutwhoyouknow - future directions reportAND Isitallaboutwhoyouknow - future directions report
AND Isitallaboutwhoyouknow - future directions reportpesec
 
Mapování zdrojů univerzity - Kamil Krč
Mapování zdrojů univerzity - Kamil KrčMapování zdrojů univerzity - Kamil Krč
Mapování zdrojů univerzity - Kamil KrčKamil Krč
 
региональное Seo
региональное Seoрегиональное Seo
региональное SeoNikita Sawinyh
 
Idol databases3
Idol databases3Idol databases3
Idol databases3Kieffala
 
Brozura transfer technologii_kamil_krc_el.verze
Brozura transfer technologii_kamil_krc_el.verzeBrozura transfer technologii_kamil_krc_el.verze
Brozura transfer technologii_kamil_krc_el.verzeKamil Krč
 
Learning : How to Learn
Learning : How to LearnLearning : How to Learn
Learning : How to LearnShruti Arya
 
регистрация фирмы в Португалии
регистрация фирмы в Португалиирегистрация фирмы в Португалии
регистрация фирмы в ПортугалииBridgeWest.eu
 
Haiti HOT OSM éducation 15/02/2012 (FR)
Haiti HOT OSM éducation 15/02/2012 (FR)Haiti HOT OSM éducation 15/02/2012 (FR)
Haiti HOT OSM éducation 15/02/2012 (FR)Severin Menard
 
Suomen rakennerahasto-ohjelma2014-2020
Suomen rakennerahasto-ohjelma2014-2020Suomen rakennerahasto-ohjelma2014-2020
Suomen rakennerahasto-ohjelma2014-2020lansisuomenhelmet
 

En vedette (19)

Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Way
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Cisco press ccnp route 642 902
Cisco press ccnp route 642 902Cisco press ccnp route 642 902
Cisco press ccnp route 642 902
 
EXP ENG Unit 9
EXP ENG Unit 9 EXP ENG Unit 9
EXP ENG Unit 9
 
как создать компанию в Швейцарии
как создать компанию в Швейцариикак создать компанию в Швейцарии
как создать компанию в Швейцарии
 
ใบงานที่ 3 เรื่อง ขอบข่ายและประเภทของโครงงาน
ใบงานที่ 3 เรื่อง ขอบข่ายและประเภทของโครงงานใบงานที่ 3 เรื่อง ขอบข่ายและประเภทของโครงงาน
ใบงานที่ 3 เรื่อง ขอบข่ายและประเภทของโครงงาน
 
Safe handling fruit and veggies
Safe handling fruit and veggiesSafe handling fruit and veggies
Safe handling fruit and veggies
 
V ing
V ingV ing
V ing
 
ครูมือใหม่
ครูมือใหม่ครูมือใหม่
ครูมือใหม่
 
AND Isitallaboutwhoyouknow - future directions report
AND Isitallaboutwhoyouknow - future directions reportAND Isitallaboutwhoyouknow - future directions report
AND Isitallaboutwhoyouknow - future directions report
 
Mapování zdrojů univerzity - Kamil Krč
Mapování zdrojů univerzity - Kamil KrčMapování zdrojů univerzity - Kamil Krč
Mapování zdrojů univerzity - Kamil Krč
 
региональное Seo
региональное Seoрегиональное Seo
региональное Seo
 
Idol databases3
Idol databases3Idol databases3
Idol databases3
 
05 e
05 e05 e
05 e
 
Brozura transfer technologii_kamil_krc_el.verze
Brozura transfer technologii_kamil_krc_el.verzeBrozura transfer technologii_kamil_krc_el.verze
Brozura transfer technologii_kamil_krc_el.verze
 
Learning : How to Learn
Learning : How to LearnLearning : How to Learn
Learning : How to Learn
 
регистрация фирмы в Португалии
регистрация фирмы в Португалиирегистрация фирмы в Португалии
регистрация фирмы в Португалии
 
Haiti HOT OSM éducation 15/02/2012 (FR)
Haiti HOT OSM éducation 15/02/2012 (FR)Haiti HOT OSM éducation 15/02/2012 (FR)
Haiti HOT OSM éducation 15/02/2012 (FR)
 
Suomen rakennerahasto-ohjelma2014-2020
Suomen rakennerahasto-ohjelma2014-2020Suomen rakennerahasto-ohjelma2014-2020
Suomen rakennerahasto-ohjelma2014-2020
 

Similaire à kdd2015

Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInBill Liu
 
Data and Business Team Collaboration
Data and Business Team CollaborationData and Business Team Collaboration
Data and Business Team CollaborationApple
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetCrossing Minds
 
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...Joseph Alaimo Jr
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
 
Productionalize content recommendation engine
Productionalize content recommendation engine Productionalize content recommendation engine
Productionalize content recommendation engine Kim Ming Teh
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in productionStepan Pushkarev
 
Recommender Systems Tutorial (Part 1) -- Introduction
Recommender Systems Tutorial (Part 1) -- IntroductionRecommender Systems Tutorial (Part 1) -- Introduction
Recommender Systems Tutorial (Part 1) -- IntroductionBee-Chung Chen
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet SentimentLucinda Linde
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...
SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...
SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...SmartNews, Inc.
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profilingyingfeng
 

Similaire à kdd2015 (20)

Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedIn
 
Data and Business Team Collaboration
Data and Business Team CollaborationData and Business Team Collaboration
Data and Business Team Collaboration
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Productionalize content recommendation engine
Productionalize content recommendation engine Productionalize content recommendation engine
Productionalize content recommendation engine
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
 
Recommender Systems Tutorial (Part 1) -- Introduction
Recommender Systems Tutorial (Part 1) -- IntroductionRecommender Systems Tutorial (Part 1) -- Introduction
Recommender Systems Tutorial (Part 1) -- Introduction
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Machine learning
Machine learningMachine learning
Machine learning
 
SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...
SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...
SmartNews TechNight Vol.5 : AD Data Engineering in practice: SmartNews Ads裏のデ...
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profiling
 

kdd2015

  • 1. Scaling Machine Learning and Statistics for Web Applications Recommendations, Search, Advertising Deepak Agarwal dagarwal@linkedin.com KDD, Aug 12th, 2015 @ Sydney, AUS
  • 2. 2 Our vision Create economic opportunity for every member of the global workforce Our mission Connect the world’s professionals to make them more productive and successful Our core value Members first!
  • 4. Connect talent with opportunity at scale
  • 5. 5
  • 6. Algorithmic Match-Making via Machine Learning and Data Mining JOBS to Apply Feed Articles to Read Connections to Nurture, Keep in Touch with Search PYMK
  • 7. Scale Algorithmic Match Making • Approach: continuously learn from historical data – Machine learning/statistical models – Run experiments (test new procedures and collect data) • Scalable and robust infrastructure – Large scale batch computations – Large scale near-line computations – High throughput, low latency online computations – Fast Retrieval Engines (Search)
  • 8. Search versus Recommendation Search: User specified query: Recall important
  • 9. Recommendations: Delivery Mechanisms • Pull Model: When the user visits, serve the most relevant – Desktop, mobile web, mobile app, iPad • Push Model: User is not visiting but we need to reach out with information {Email, Notifications} – Higher relevance bar: Right message, right user, right time, right frequency, right channel Done through ML and optimization
  • 10. Rest of the Talk • Data and Problem Formulation • Machine Learning Process – Illustrated with Feed Application • Lessons
  • 11. WHAT INFORMATION DO WE HAVE ABOUT USERS AND ITEMS ? MATCH-MAKING: Know your items, your users and their intent
  • 12. User Characteristics Profile Information Title, seniority, skills, education, endorsements, presentations,… Behavioral Activities, search,.. Edge features (ego- centric network) Connection strength, content affinities,.. • Professional profile of record
  • 13. User Intent • What are you here for ? – Hire, get hired, stay informed, grow network, nurture connections, sell, market,.. • Explicit (e.g., visiting jobs homepage, search query), • Implicit (needs to be inferred, e.g., based on activities)
  • 14. Things to recommend/search • Diverse – People, companies, groups, publishers, slides, courses, ads, jobs, leads, universities • Features: NLP, semantic (e.g., topics), Unsupervised (e.g., LDA) , Profile standardization (Title, Skills, …)
  • 15. How to scale Recommendations ? • Formulate objectives to optimize • Optimize via ML models – incorporate both implicit and explicit signals about user and intent • Automate
  • 16. Match making: Connecting long-term objectives to proxies that can be optimized by machines/algorithms Long-term objectives (return visits, advertising revenue, sign-ups, job apply,..) Formulate objectives, proxies (CTR, revenue/visit, multiple-objectives, …) Large scale optimization via ML, UI changes,.. Engage, experiment, Learn, Deploy, Innovate
  • 17. Automation Optimize proxies with short feedback loop via Machine Learning Whom? User Profile, User Intent Item Filtering, Understanding ContextWhat? Interaction Data INPUT SIGNALS MACHINE LEARNING RANK Items Sort by Score Mul -objec ve Business rule SCORE Items P(Click), P(Share) Similarity,…
  • 18. The Feed: Important starting point 21
  • 19. The Feed: Heterogeneity of “types” Network Updates (Nurture, keep in touch) • Job change • Job anniversaries • Connections • Change Profile Picture • … Content with Explicit follow • Articles by Influencer • Shares by members in your network • Content in Channels followed • Content by companies followed • … Recommendations & Ads • Articles, PYMK, Endorsements • Sponsored updates (Ads) • Jobs • … 22
  • 20. The Feed: How to build a relevant/optimized feed? • Independent vs Dependent – Feed has dependent observations within and across types • CTR decays when showing same type multiple times • CTR decays when showing multiple things from the same connection • ------ • What is the Metric we optimize for the Feed? -- connections – Revenue – Clicks – Likes, shares, comments – Job applications – ….. 23
  • 21. Fundamental Problem: Response Prediction Predict the probability that a user will respond to an item in a given context Provides a statistical framework to incorporate downstream utilities and other constraints in ranking items for users
  • 22. Response Prediction • Click prediction – Estimate P(click | user, item, context) – Use to calculate E[utility] = P(click) * utility • Logistic regression for click prediction, LTR for Search like problems – Scalable – Well understood • Challenges – Integrating feature data from multiple sources – Scaling training on large data with many features – Flexible and rapid experimentation – Real time scoring
  • 23. Response Prediction Models • Three main sources of features – User (e.g. industry, title) – Item (e.g. keywords, LDA topics) – Context (e.g. time, page) • Interactions are important – E.g., Industry-specific ads may only appeal to people in that industry • Features only get so far – Hard to beat Σclicks / Σviews for items/users with a lot of data – “Warmstart” terms incorporate per-item/per-user information xi : Features from user i yj : Features from item j zk : Features from context k a,b,g,... :Coefficients ai,bj,gk,... :Coefficients indexed by user, item, etc. A, B,C,... : Interaction coefficients Interaction Cold Start Warm Start log p 1- p æ è ç ö ø ÷ = w +aT xi + bT yj +gT zk + xi T Ayj + zk T Byj + wj +aj xi log p 1- p æ è ç ö ø ÷ = w +aT xi + bT yj +gT zk + hj T xi +tk T yjj + wj +aj xi hj = Ayj xi T Ayj = xi T Ayj( )=hj T xi tk = Bzk zk T Byj = zk T B( )yj = tk T yj
  • 24. Global Per-partition Model Fitting at Scale • Most fitting methods for LR are iterative – Multiple passes through the data – Poor fit with map/reduce – We use Apache Spark • We use Data Partitioning – Split data into blocks – Train separately – Merge the parameters • Alternating Direction Method of Multipliers (ADMM) – Proven to converge to global optimum • Small tweaks can improve performance – Optimized the starting values – Learning rate decay Q = Global parameter estimate Qr = Parameter estimate for partition r dr = Data in partition r Lr Qr;dr( ) = Likelihood function for partition r P Q( ) = Regularization penalty term ur = Per-partition bias values Qr t+1( ) = argmin Qr Lr Qr;dr( )+ r 2 Qr -Q t( ) +ur t( ) 2 Q t+1( ) = argmin Q P Q( )+ Rr 2 Qr -Q t( ) +ur t( ) 2 ur t+1( ) = ur t( ) +Qr t+1( ) -Q t+1( ) min Lr Qr;dr( )+ P Q( ) r=1 R å Subject to Qr -Q = 0 for r =1… R
  • 25. Flexible Configuration • Feature engineering – Every problem is different – Lots of trial and error – Faster, easier feature engineering translates to gains sooner • JSON-based config language – Sources: import features from outside – Transformers: apply functions to feature vectors – Assembler: packages feature terms for fitting/scoring • Rapid development – No code for most changes – Offline and Online in sync User Source Context Source Item Source SubsetSubset Interaction Assembler Request User Item Training or Scoring
  • 26. Runtime scoring optimizations • Real time performance – About 10µs per inference (1500 items = 15ms) – Reacts to changing features immediately • “Better wrong than late” – If a feature isn’t immediately available, back off to prior value • Asynchronous computation – Actions that block or take time run in background threads • Lazy evaluation – Sources & transformers do not create feature vectors for all items – Feature vectors are constructed/transformed only when needed • Partial results cache – Logistic regression scoring is a series of dot products – Scalars are small; cache can be huge – Hardware-like implementation to minimize locking and heap pressure 8 9 10 11 12 13 14 15 16 0 10 20 30 40 50 60 Timeperrequest(ms) Time (m)
  • 27. Beyond Response Prediction • Explore/exploit • Impression discounting – Don’t show same/similar stuff many times to the user • Diversification • Multi-objective optimization
  • 28. Explore/Exploit: How to Score a New Item ? • Predict the click rate for new item • Cold-start problem – No data to estimate warm start for new item just added • Solution: Controlled and economical experiments – Explore (experiment): Collect data by promoting new item to a small random sample of users – Exploit: Update warm start based on collected data – Automate explore/exploit • Only experiment when we can get bang for the bucks (potential of gain high)
  • 29. Explore/Exploit Problem Simplified setting: Two items CTR Probabilitydensity Item A Item B We know the CTR of Item A (say, shown 1 million times) We are uncertain about the CTR of Item B (only 100 times) If we only make a single decision, give 100% page views to Item A If we make multiple decisions in the future explore Item B since its CTR can be potentially higher    qp dppfqp )()(Potential CTR of item A is q CTR of item B is p Probability density function of item B’s CTR is f(p)
  • 30. Automating Explore/Exploit via Thompson Sampling Heuristic 33 + + + + + + + _ _ _ _ _ _ _ _ _ _ _ _ _ Cold Start Cold + Warm POSTERIOR of Warm-start COEFFICIENTS E/E: Sample warm start from the posterior (Thompson Sampling)
  • 31. Impression Discounting • Reduce the chance of showing the same item to the same user repeatedly • Decay the score of an item based on #times that the user saw the item before • Using real-time feedback • Discounting by user segments and item types Global (over all types) Impression discounting curves of a few item types
  • 32. Diversification • Users’ experience deteriorates when exposed to the same kind of items multiple times on the same page Discounting actor repetitions Group Discussion CTR Drop 2 adjacent discussions 21% 3 adjacent discussions 48%
  • 33. Multi-Objective Optimization • E.g: Maximize advertising revenue s.t. CTR ≥ (1-ε) max achievable CTR – Invert via duals by making it strongly convex, helps obtain serving for new user • Obtain Pareto optimal solutions ( efficient frontier ) CTR Revenue ε Feasible Impossible
  • 34. Putting it all together • Federated Model – Tier 1: Local models for update types – Tier 2: Calibrate local models and add more features to do holistic personalization • Re-rank by applying diversification, impression discounting, multi-objective optimization • Separate teams for Tier 1 and Tier 2
  • 35. Scaling Model Building Data Tracking Pre- Processing Feature Generation Model Training Offline Model Evaluation Online experiment Only if Good Extremely Computationally Intensive ▪ Research and Development: Flexible and easy to use software environment to create models [offline compatible with online] ▪ Maintenance: Models in production trained continuously and automatically, proper monitoring and testing for building reliable workflows, config based model deployment, A/B testing platform ▪ Stack of Models: From simple baselines to more sophistication ▪ Feature Management: Easy to discover and share features across applications
  • 36. Going beyond ML, Statistics  Dogfooding our own products – Look at employee experience – Debug model scores if something does not look right  Focus research group studies, user surveys  Product strategy & Intuition – E.g., remove/add certain content types – Adding constraints like freshness bounds, etc  Presentation – Testing different UI, presentation templates, fonts etc
  • 37. Summary • Formulating objectives is important (not easy) • Machine Learning is slow and fragile – Model training is not the only bottleneck • Data pre-processing, feature management, near-line computations and online scoring all important • Need A/B testing platform, Fast Retrieval systems • Need an end-to-end framework that can make it easy for modelers to test rapidly – Data Miners should be closely involved
  • 38. 42
  • 40. Example System Architecture Feature Extraction Feature Extraction Profile Network Feature Extraction Model User Data Store Online Data Processing Apache
  • 41. Example System Architecture Feature Extraction Feature Extraction Profile Network Feature Extraction Model User Data Store Online Data Processing Ranking Retrieval