SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Exploring content
recommendation
Felipe Besson
@fmbesson
March, 2013
“A lot of times, people don't know what they
want until you show it to them.”
Steve Jobs
“We don't make money when we sell things;
we make money when we help customers
make purchase decisions.”
Jeff Bezos, Amazon
Why recommendation is important ?
An Apache project to build scalable machine
learning libraries
●
Focused on large data sets
●
Adaption of standard machine learning algorithms
●
Run on Apache Hadoop (map/reduce paradigm)
… or on a non Hadoop node
Who is using Mahout ?
Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
Supported core algorithms
●
Classification
●
Clustering
●
Recommendation
●
Pattern Mining
●
Regression
●
Dimension Reduction
●
Evolutionary Algorithms
●
Vector Similarity
Mahout Recommender
Collaborative filtering
People often get the best recommendation from someone
with similar taste
●
People tend to like things that are similar to other things
they like
●
There are patterns in people likes and dislikes
John Bob
movie1 movie1
movie2
movie2
movie42
movie4
movie5
Will Bob like movie4? and
movie5?
Mahout Recommender
Available recommenders
●
Item based
●
User based
Execution modes
●
Taste: online but not distributed
●
Hadoop: offline (batch) but distributed
Parameters
●
Many coefficients to calculate user and item
similarity and neighborhood
●
Data model abstractions
Mahout Recommender (Hadoop)
Input
user_id
item_id
preference_value (optional)
1, 23, 0.9
1, 15, 0.5
1, 89, 0.1
2, 11, 0.3
2, 15, 0.2
9, 10, 0.5
9, 99, 0.9
9, 11, 0.1
8, 11, 0.5
...
Output
user_id
[recommended_item, score]
1: [10, 0.93; 11, 0.84; … ]
2: [23, 0.72; 17, 0.60; … ]
8: [121, 0.98; 23, 0.78; … ]
17: [12, 0.89; 32, 0.56; … ]
42: [129, 0.92; 98, 0.45; … ]
...
1st try!
Movie recommendation
Netflix base (http://www.netflixprize.com/)
●
# of user tastes: 2.817.131
●
# of movies: 17.770
●
# of users: 472891
Environment and performance
●
Hadoop pseudo-distributed
●
Computer
●
Intel® Core™ i5-3317U CPU @ 1.70GHz × 4
●
6Gb RAM
●
Total time: ~ 16 minutes
How to run ?
1. Copy the input file to HDFS (Hadoop distributed
file system)
hadoop fs -put qualifying.txt /netflix/input/data.txt
2. Run the recommender
hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=/netflix/input/data.txt
-Dmapred.output.dir=/netflix/output
--numRecommendations 10
--similarityClassname SIMILARITY_LOGLIKELIHOOD
Results
Recommender analyzer
https://github.com/besson/recommender_analyzer
http://rec-analyzer.herokuapp.com/
Results
References
Sean Owen, Robin Anil, Ted Dunning, and Ellen
Friedman. Mahout in Action, Manning publications,
2011.
Thanks
Felipe Besson
@fmbesson

Contenu connexe

Similaire à Mahout Recommendation Systems Explained

Apache Mahout
Apache MahoutApache Mahout
Apache MahoutAjit Koti
 
Azure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learningAzure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learningSetu Chokshi
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.
 
Forget the Fairy Dust - How to Create Content That (Actually) Works
Forget the Fairy Dust - How to Create Content That (Actually) WorksForget the Fairy Dust - How to Create Content That (Actually) Works
Forget the Fairy Dust - How to Create Content That (Actually) WorksJoel Klettke
 
No Nonsense Content Marketing - MNsearch 2017 - Slideshare
No Nonsense Content Marketing - MNsearch 2017 - SlideshareNo Nonsense Content Marketing - MNsearch 2017 - Slideshare
No Nonsense Content Marketing - MNsearch 2017 - SlideshareJohn Doherty
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Q2 HUG - Content in AI.pdf
Q2 HUG - Content in AI.pdfQ2 HUG - Content in AI.pdf
Q2 HUG - Content in AI.pdfAlexisLyga
 
Be A Great Product Leader (Amplify, Oct 2019)
Be A Great Product Leader (Amplify, Oct 2019)Be A Great Product Leader (Amplify, Oct 2019)
Be A Great Product Leader (Amplify, Oct 2019)Adam Nash
 
Impersonal Recommendation system on top of Hadoop
Impersonal Recommendation system on top of HadoopImpersonal Recommendation system on top of Hadoop
Impersonal Recommendation system on top of HadoopKostiantyn Kudriavtsev
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actElad Rosenheim
 
How to create searchable content
How to create searchable contentHow to create searchable content
How to create searchable contentBeth Browning
 
Inbound Marketing Conference 2016 Summary
Inbound Marketing Conference 2016 SummaryInbound Marketing Conference 2016 Summary
Inbound Marketing Conference 2016 SummaryJimmy Smith
 
Jumpstart - 02/01/2015
Jumpstart - 02/01/2015Jumpstart - 02/01/2015
Jumpstart - 02/01/2015Tom Hartman
 
Be a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxBe a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxAmplitude
 
Download Materials
Download MaterialsDownload Materials
Download Materialsbutest
 

Similaire à Mahout Recommendation Systems Explained (20)

Evc2014
Evc2014Evc2014
Evc2014
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Azure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learningAzure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learning
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Forget the Fairy Dust - How to Create Content That (Actually) Works
Forget the Fairy Dust - How to Create Content That (Actually) WorksForget the Fairy Dust - How to Create Content That (Actually) Works
Forget the Fairy Dust - How to Create Content That (Actually) Works
 
No Nonsense Content Marketing - MNsearch 2017 - Slideshare
No Nonsense Content Marketing - MNsearch 2017 - SlideshareNo Nonsense Content Marketing - MNsearch 2017 - Slideshare
No Nonsense Content Marketing - MNsearch 2017 - Slideshare
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Bootstrapping Coursepad
Bootstrapping CoursepadBootstrapping Coursepad
Bootstrapping Coursepad
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Q2 HUG - Content in AI.pdf
Q2 HUG - Content in AI.pdfQ2 HUG - Content in AI.pdf
Q2 HUG - Content in AI.pdf
 
Yahoo Help Content Strategy - Chris Todd
Yahoo Help Content Strategy -  Chris ToddYahoo Help Content Strategy -  Chris Todd
Yahoo Help Content Strategy - Chris Todd
 
Be A Great Product Leader (Amplify, Oct 2019)
Be A Great Product Leader (Amplify, Oct 2019)Be A Great Product Leader (Amplify, Oct 2019)
Be A Great Product Leader (Amplify, Oct 2019)
 
Impersonal Recommendation system on top of Hadoop
Impersonal Recommendation system on top of HadoopImpersonal Recommendation system on top of Hadoop
Impersonal Recommendation system on top of Hadoop
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing act
 
How to create searchable content
How to create searchable contentHow to create searchable content
How to create searchable content
 
Inbound Marketing Conference 2016 Summary
Inbound Marketing Conference 2016 SummaryInbound Marketing Conference 2016 Summary
Inbound Marketing Conference 2016 Summary
 
Jumpstart - 02/01/2015
Jumpstart - 02/01/2015Jumpstart - 02/01/2015
Jumpstart - 02/01/2015
 
Be a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxBe a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, Dropbox
 
Download Materials
Download MaterialsDownload Materials
Download Materials
 
Better Search Engine Testing
Better Search Engine TestingBetter Search Engine Testing
Better Search Engine Testing
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Mahout Recommendation Systems Explained