The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
Google_Controlled Experimentation_Panel_The Hive
1. (A Few) Key Lessons Learned
Building LinkedIn Online
Experimentation Platform
Experimentation Panel 3-20-13
2. Experimentation at LinkedIn
• Essential part of the release process
• 1000s of concurrent experiments
• Complex range of target populations based on
content, behavior and social graph data
• Cater to a wide demographic
• Large set of KPIs
3. The next frontier
• KPIs – Beyond CTR
• Multiple objective optimization
• KPIs reconciliation
• User visit imbalance
• Virality preserving A/B testing
• Context dependent novelty effect
• Explicit feedback vs. implicit feedback
4. Picking the right KPI can be tricky
• Example: engagement measured by # comments on
posts on a blog website
• KPI1 = average # comments
per user – B wins by 30%
• KPI2 = ratio of active
(at least one posting) to
inactive users – A wins by 30%
• How is this possible? KPI1
KPI2
Do you want a smaller, highly
engaged community, or a larger, less
engaged community?
5. Winback campaign
• Definition
– Returning to the web site at least once?
– Returning to the web site with a certain level of
engagement, possible comparable, more or a bit less than
before the account went dormant?
• Example: reminder email at 30 days after
registration
Registered 335 Days Ago
4000
3500
3000
2500
2000
1500 Occurrence
1000
500
0
Came back once at 30 days
3
17
31
45
59
73
87
101
115
129
143
157
171
185
199
213
227
241
255
269
283
297
311
325
339
then went dormant
Loyalty Distribution: Time since last visit
6. Multiple competing objectives
Suggest relevant groups … that
one is more likely to participate in
TalentMatch
(Top 24 matches of a posted job for sale)
Suggest skilled candidates … who
will likely respond to hiring
managers inquiries
Semantic + engagement objectives
6
7. TalentMatch use case
• KPI: Repeated TM buyers
6m-1y window!
• Short-term proxy
with predictive
power:
– Optimize for InMail
response rate while
controlling for
booking rate and
InMail sent rate 7
8. KPIs reconciliation
• How do you compare apples and oranges?
– E.g. People vs. Job
recommendations
swap
– X% lift in job apps vs
Y% drop in invitations
– Value of an invitation
vs. value of
a job application?
• Long term cascading
effect on a set of
site-wide KPIs
9. User visit imbalance
• Observed sample ≠ intended random sample
• Consider an A/B test on the homepage lasting
L days. Your likely observed sample will have
– Repeated (>> L) obs for super power users
– ≈ L obs for daily users
– ≈ L/7 obs for weekly users
– NO obs for users coming less than every L days
• κ statistics
• Random effects models
10. Virality preserving A/B testing
• Random sampling destroys social graph
• Critical for social referrals
– ‘Warm’ recommendations
– ‘Wisdom of your friends’ social proof
• Core + fringe to mimimize
– WWW’11 FB, ‘12 Yahoo Group recommendations
11. Context dependent novelty effect
• Job recommendation algorithms A/B test
– first 2 weeks: 2X long term stationary lift
• TalentMatch – no short-term novelty effect
12. Explicit feedback A/B testing
• Enable you to understand usefulness of a
product/feature/algorithm with unequal depth
• Text based A/B test! Sentiment analysis
• Reveal unexpected complexities
• E.g. ‘Local’ means different things for different members
• Prevent misinterpretation of implicit user feedback!
• Help prioritize future improvements
12
13. References
• C. Posse, 2012: A (Few) Key Lessons Learned Building Recommender
Systems for Large-Scale Social Networks. Invited Talk, Industry Practice
Expo, 18th ACM SIGKDD Conference on Knowledge Discovery and Data
Mining, Beijing, China
• M. Rodriguez, C. Posse and E. Zhang. 2012. Multiple Objective
Optimization in Recommendation Systems. Proceedings of the Sixth ACM
Conference on Recommender Systems, pp. 11-18
• M. Amin, B. Yan, S. Sriram, A. Bhasin and C. Posse. 2012. Social Referral:
Using Network Connections to Deliver Recommendations. Proceedings of
the Sixth ACM Conference on Recommender Systems, pp. 273-276
• X. Amatriain, P. Castells, A. de Vries, C. Posse, 2012. Workshop on
Recommendation Utility Evaluation: Beyond RMSE, Proceedings of the
Sixth ACM Conference on Recommender Systems, pp. 351-352
13
Notes de l'éditeur
- At LinkedIn we A/B tested on everything: new feature, new algorithm, user experience (user flow, UI)From simple samples to highly targeted samples such as all users that have come to the site in the last 30days, working for US companies that have at least 500 employees and have not uploaded their email address book in the last 90 days….Demographics: job seekers, recruiters, outbound professionals, content providers, content consumers, networkers, branders..
Complex metrics beyond CTR, engagement component context dependent, short-term proxies to avoid long terms A/B testsI will illustrate each with real problems we had on LinkedIn
Same applies to cannibalization
Social Referral: Leveraging Network Connections toDeliver Recommendations‘Wisdom of your friend’ social proofs