SlideShare a Scribd company logo
1 of 28
Beyond Collaborative
Filtering: ML &
Recommendations at
Meetup
Evan Estola
Meetup.com
evan@meetup.com
@estola
My Background
● Machine Learning Engineer
● At Meetup since May 2012
● BS Computer Science
○ Information Retrieval
○ Data Mining
○ Math
■ Linear Algebra
■ Graph Theory
Meetup what are you
Why Meetup data is cool
● Real people meeting up
● Every meetup could change someone's life
● No ads, just do the best thing
● Oh and >125 million rsvps by >17 million
members
● ~3 million rsvps in the last 30 days
○ >1/second
Tools at Meetup
● Hive - SQL on Hadoop
● Spark - Distributed Scala on Hadoop cluster
● Scala - Recommendations service
● R - Data analysis, Model building
● Python - Scripting, Data organizing
● Java - Backend of our web stack
“Everything is a recommendation”
Collaborative Filtering
● Classic recommendations approach
● Users who like this also like this
Weaknesses of CF
● Sparsity
● Cold Start
● Coverage
● Diversity
Why Recs at Meetup are hard
● Incomplete Data (topics)
● Cold start
● Asking user for data is hard
● Going to meetups is scary
● Sparsity
○ Location
○ Groups/person
○ Membership: 0.001%
○ Compare to Netflix: 1%
Cleaning data
● Schenectady
● Beverly Hills
● Fake RSVP boosts (+100 guests!)
● Rsvp hogs
Real data is gross
● Preprocessing is critical!
○ missing data
○ outliers
○ log scale
○ bucketing
○ sampling bias
Supervised Learning/Classification
● “Inferring a function from labeled training
data”
● Joined Meetup group/Didn’t join Meetup
group
Ranking
● Membership << expected error rate
○ Sample to 50/50 join/no-join
● Model output label no longer explicitly true
● Use a classifier that gives you a useful
output
Topic Match
State Match
Ensemble Learning
“... use multiple learning algorithms to obtain
better predictive performance than could be
obtained from any of the constituent learning
algorithms”
Ensemble Learning
● Collaborative Filtering on Topics
● Other simple features
Logistic Regression Output
● RsvpScore 0.02
● FbFriends 2.02
● 2ndFbFriends 0.09
● AgeUnmatch -2.40
● GenUnmatch -3.37
● Distance -0.04
● StateMatch 0.54
● CountyMatch 0.41
● ZipScore 0.06
● TopicScore 4.14
● ExtendedTS 0.47
● RelatedTS 0.66
● FbLikeTS 0.78
Facebook Likes
● Lots of information, but how to use?
● Map to topics, let training the model take
care of the rest!
● Bonus: Recommendations server knows
topics, generated topics can be passed in by
request
Mapping FB Likes to Meetup Topics
● Text based?
○ Go(game) vs Go(lang)?
○ Burton?
● Data approach!
○ Grab most popular topics across all members with
the same like
Normalization
● Top topics for Burton-Likers
○ Meeting New People, Coffee, bla bla
○ Most popular still dominates
● Normalize based on expected topic
occurrence in sample
Normalization
● For members with a given Like
● Compare percent with each topic to
expected among total population
● Burton:
○ 20% “Meeting New People”
○ 9% “Snowboarding”
● Total population
○ 20% “Meeting New People”
○ 2% “Snowboarding
Processing
● Load FB Like connections, topics into
Hadoop
● Process with Hive to generate top topics for
each like
● Join with member likes to generate top
topics per member
● Add feature to model using FB-Like-
Generated-Topics crossover with groups...
Results
● Positive weight
○ Very good sign
○ Captured information about member identity, not just
behavior
● Deploy/Split test
○ 1.5% lift in conversion overall
○ (Only have facebook data for ~10% of members)
Thanks!
Smart people come work with me.
http://www.meetup.com/jobs/

More Related Content

Similar to Estola meetup big_datacampla_6_14_evan_estola

Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892mercedes calderon
 
Customer segmentation scbcn17
Customer segmentation scbcn17Customer segmentation scbcn17
Customer segmentation scbcn17Julio Martinez
 
Life in CSE.pptx
Life in CSE.pptxLife in CSE.pptx
Life in CSE.pptxVedVekhande
 
Computer Science Career Guidance
Computer Science Career GuidanceComputer Science Career Guidance
Computer Science Career GuidanceDeepak Sood
 
Cepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxCepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxgyan98
 
Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and SparkFelix Crisan
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...Amazon Web Services
 
Graph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph framesGraph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph framesRon Barabash
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...Dataiku
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningNikhil Dandekar
 
Better programmer overview
Better programmer   overviewBetter programmer   overview
Better programmer overviewCourseHunt
 
Overcome the Reign of Chaos
Overcome the Reign of ChaosOvercome the Reign of Chaos
Overcome the Reign of ChaosMichael Stockerl
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with PythonBenjamin Bengfort
 
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...Jose Mº Muñoz
 
Active Learning on Question Answering with Dialogues
 Active Learning on Question Answering with Dialogues Active Learning on Question Answering with Dialogues
Active Learning on Question Answering with DialoguesJinho Choi
 
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora
 
Recommending the world's knowledge
Recommending the world's knowledgeRecommending the world's knowledge
Recommending the world's knowledgeLei Yang
 
How we won the 2nd place In KaggleDays Championship.pptx
How we won the 2nd place In KaggleDays Championship.pptxHow we won the 2nd place In KaggleDays Championship.pptx
How we won the 2nd place In KaggleDays Championship.pptxjficwwc
 

Similar to Estola meetup big_datacampla_6_14_evan_estola (20)

Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892
 
Customer segmentation scbcn17
Customer segmentation scbcn17Customer segmentation scbcn17
Customer segmentation scbcn17
 
Life in CSE.pptx
Life in CSE.pptxLife in CSE.pptx
Life in CSE.pptx
 
Computer Science Career Guidance
Computer Science Career GuidanceComputer Science Career Guidance
Computer Science Career Guidance
 
Be Part of a Community
Be Part of a CommunityBe Part of a Community
Be Part of a Community
 
Cepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxCepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptx
 
Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and Spark
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
 
Graph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph framesGraph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph frames
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learning
 
Better programmer overview
Better programmer   overviewBetter programmer   overview
Better programmer overview
 
Overcome the Reign of Chaos
Overcome the Reign of ChaosOvercome the Reign of Chaos
Overcome the Reign of Chaos
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
 
Active Learning on Question Answering with Dialogues
 Active Learning on Question Answering with Dialogues Active Learning on Question Answering with Dialogues
Active Learning on Question Answering with Dialogues
 
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
 
Recommending the world's knowledge
Recommending the world's knowledgeRecommending the world's knowledge
Recommending the world's knowledge
 
How we won the 2nd place In KaggleDays Championship.pptx
How we won the 2nd place In KaggleDays Championship.pptxHow we won the 2nd place In KaggleDays Championship.pptx
How we won the 2nd place In KaggleDays Championship.pptx
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

Estola meetup big_datacampla_6_14_evan_estola

  • 1. Beyond Collaborative Filtering: ML & Recommendations at Meetup Evan Estola Meetup.com evan@meetup.com @estola
  • 2. My Background ● Machine Learning Engineer ● At Meetup since May 2012 ● BS Computer Science ○ Information Retrieval ○ Data Mining ○ Math ■ Linear Algebra ■ Graph Theory
  • 4. Why Meetup data is cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and >125 million rsvps by >17 million members ● ~3 million rsvps in the last 30 days ○ >1/second
  • 5.
  • 6. Tools at Meetup ● Hive - SQL on Hadoop ● Spark - Distributed Scala on Hadoop cluster ● Scala - Recommendations service ● R - Data analysis, Model building ● Python - Scripting, Data organizing ● Java - Backend of our web stack
  • 7. “Everything is a recommendation”
  • 8.
  • 9. Collaborative Filtering ● Classic recommendations approach ● Users who like this also like this
  • 10. Weaknesses of CF ● Sparsity ● Cold Start ● Coverage ● Diversity
  • 11. Why Recs at Meetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Groups/person ○ Membership: 0.001% ○ Compare to Netflix: 1%
  • 12. Cleaning data ● Schenectady ● Beverly Hills ● Fake RSVP boosts (+100 guests!) ● Rsvp hogs
  • 13. Real data is gross ● Preprocessing is critical! ○ missing data ○ outliers ○ log scale ○ bucketing ○ sampling bias
  • 14. Supervised Learning/Classification ● “Inferring a function from labeled training data” ● Joined Meetup group/Didn’t join Meetup group
  • 15. Ranking ● Membership << expected error rate ○ Sample to 50/50 join/no-join ● Model output label no longer explicitly true ● Use a classifier that gives you a useful output
  • 16.
  • 19. Ensemble Learning “... use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms”
  • 20. Ensemble Learning ● Collaborative Filtering on Topics ● Other simple features
  • 21. Logistic Regression Output ● RsvpScore 0.02 ● FbFriends 2.02 ● 2ndFbFriends 0.09 ● AgeUnmatch -2.40 ● GenUnmatch -3.37 ● Distance -0.04 ● StateMatch 0.54 ● CountyMatch 0.41 ● ZipScore 0.06 ● TopicScore 4.14 ● ExtendedTS 0.47 ● RelatedTS 0.66 ● FbLikeTS 0.78
  • 22. Facebook Likes ● Lots of information, but how to use? ● Map to topics, let training the model take care of the rest! ● Bonus: Recommendations server knows topics, generated topics can be passed in by request
  • 23. Mapping FB Likes to Meetup Topics ● Text based? ○ Go(game) vs Go(lang)? ○ Burton? ● Data approach! ○ Grab most popular topics across all members with the same like
  • 24. Normalization ● Top topics for Burton-Likers ○ Meeting New People, Coffee, bla bla ○ Most popular still dominates ● Normalize based on expected topic occurrence in sample
  • 25. Normalization ● For members with a given Like ● Compare percent with each topic to expected among total population ● Burton: ○ 20% “Meeting New People” ○ 9% “Snowboarding” ● Total population ○ 20% “Meeting New People” ○ 2% “Snowboarding
  • 26. Processing ● Load FB Like connections, topics into Hadoop ● Process with Hive to generate top topics for each like ● Join with member likes to generate top topics per member ● Add feature to model using FB-Like- Generated-Topics crossover with groups...
  • 27. Results ● Positive weight ○ Very good sign ○ Captured information about member identity, not just behavior ● Deploy/Split test ○ 1.5% lift in conversion overall ○ (Only have facebook data for ~10% of members)
  • 28. Thanks! Smart people come work with me. http://www.meetup.com/jobs/