Mahout part1

•Télécharger en tant que PPT, PDF•

6 j'aime•2,078 vues

Yasmine Gaber

Part one of a presentation about Mahout system. It is based on http://my.safaribooksonline.com/9781935182689/

Formation Technologie

Mahout in Action
Part 1

Yasmine M. Gaber
28 February 2013

Agenda

Meet Apache Mahout

Part 1: Recommendation

Part 2: Clustering

Part 3: Classification

Meet Apache Mahout

It is an open source machine learning library
from Apache

It is scalable

It is a Java library

It can be used with Hadoop to deal with large
scale data.

Famous Engines

Recommender engines:

Amazon.com

Netflix

Dating sites like Líbímseti

Social networking sites like Facebook

Clustering engines:

Google News

Search engines like Clusty

Classification engines:

Spam emails

Google’s Picasa

Optical character recognition software

Apple’s Genius feature in iTunes

Recommender Input

A preference consists of a user ID and an item
ID, user’s preference for the item

It is .csv file

Recommender Evaluation

Average difference vs Root-mean-square

Representing Recommender Data

Preference object
− new GenericPreference(123, 456, 3.0f)

Preference Array

Representing Recommender Data

Preference Array


FastByIDMap and FastIDSet

In-memory DataModels

GenericDataModel


File-based data


Refreshable components


Database-based data

User-based Recommender

The algorithm

for every item i that u has no preference for yet
for every other user v that has a preference for i
compute a similarity s between u and v
incorporate v's preference for i, weighted by s, into a running
average
return the top items, ranked by weighted average

Recommender Components

Data model, implemented via DataModel


User-user similarity metric, implemented via
UserSimilarity


User neighborhood definition, implemented via
UserNeighborhood


Recommender engine, implemented via a
Recommender (here,

User Neighborhoods

Fixed-size neighborhoods


Threshold-based neighborhood

similarity metrics

Pearson correlation–based similarity
− It is a number between –1 and 1 that measures
the tendency of two series of numbers, paired up
one-to-one, to move together
− Problems:

It doesn’t take into account the number of items in
which two users’ preferences overlap, which is probably
a weakness in the context of recommender engines.

If two users overlap on only one item, no correlation can
be computed because of how the computation is
defined

similarity metrics

Euclidean distance similarity
− 1 / (1+euclidean distance)

Cosine measure similarity
− between –1 and 1

Tanimoto coefficient similarity
− The ratio of the size of the
intersection to the size of
the union of their preferred items

Item-based recommendation

The algorithm

for every item i that u has no preference for yet
for every item j that u has a preference for
compute a similarity s between i and j
add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average

Slope-one recommender

The algorithm

for every item i the user u expresses no preference for
for every item j that user u expresses a preference for
find the average preference difference between j and i
add this diff to u's preference value for j
add this to a running average
return the top items, ranked by these averages

Thank You

Contact at:
Email: Yasmine.Gaber@espace.com.eg
Twitter: Twitter.com/yasmine_mohamed

Contenu connexe

Tendances

Collaborative Filtering Recommendation Algorithm based on HadoopTien-Yang (Aiden) Wu

Recommender SystemsFederico Cargnelutti

Project presentationShivarshi Bajpai

Collaborative Filtering 2: Item-based CFYusuke Yamamoto

Improving Social Recommendations by applying a Personalized Item Clustering P...Γιώργος Αλεξανδρίδης

Movie lens recommender systemsKapil Garg

Presentation_Malware Analysis.pptxnishanth kurush

(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam

The International Journal of Engineering and Science (The IJES)theijes

DmRajath Mahesh

Matrix Factorization Technique for Recommender SystemsAladejubelo Oluwashina

intership summaryJunting Ma

Movies Recommendation SystemShubham Patil

Recommender EnginesThomas Hess

Towards Automatic Evaluation of Learning Object Metadata QualityXavier Ochoa

Analyzing Adverse Drug Events Using Data Mining ApproachRupal7

Recommender systemSaiguru P.v

IRE Project IIIT Hyderabad Tweet classification Group 37manish jindal

Matrix Factorization Techniques For Recommender SystemsLei Guo

Tendances (20)

Collaborative Filtering Recommendation Algorithm based on Hadoop

Recommender Systems

Project presentation

Collaborative Filtering 2: Item-based CF

Improving Social Recommendations by applying a Personalized Item Clustering P...

Movie lens recommender systems

Presentation_Malware Analysis.pptx

(Gaurav sawant & dhaval sawlani)bia 678 final project report

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...

The International Journal of Engineering and Science (The IJES)

Matrix Factorization Technique for Recommender Systems

intership summary

Movies Recommendation System

Recommender Engines

Towards Automatic Evaluation of Learning Object Metadata Quality

Analyzing Adverse Drug Events Using Data Mining Approach

Recommender system

IRE Project IIIT Hyderabad Tweet classification Group 37

Matrix Factorization Techniques For Recommender Systems

Similaire à Mahout part1

Collaborative Filtering Recommendation SystemMilind Gokhale

B1802021823IOSR Journals

Item basedcollaborativefilteringrecommendationalgorithmsAravindharamanan S

Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal

Lecture Notes on Recommender System IntroductionPerumalPitchandi

Overview of recommender systemStanley Wang

Recommendation SystemsRobin Reni

Social Recommender Systems Tutorial - WWW 2011idoguy

Downloadbutest

Movie Recommender System Using Artificial Intelligence Shrutika Oswal

Filtering content bbased crsAravindharamanan S

movierecommendationproject-171223181147.pptxAryanVyawahare

Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto

Recommenders SystemsTariq Hassan

LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING International Journal of Technical Research & Application

Zaffar+Ahmed+ +Collaborative+FilteringZaffar Ahmed Shaikh

Investigation and application of Personalizing Recommender Systems based on A...Eswar Publications

Recommendation Systems RoadtripThe Real Dyl

A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria RatingsNat Rice

Similaire à Mahout part1 (20)

Collaborative Filtering Recommendation System

B1802021823

Item basedcollaborativefilteringrecommendationalgorithms

Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...

Lecture Notes on Recommender System Introduction

Overview of recommender system

Recommendation Systems

Social Recommender Systems Tutorial - WWW 2011

Download

Movie Recommender System Using Artificial Intelligence

Filtering content bbased crs

movierecommendationproject-171223181147.pptx

Apache Mahout Tutorial - Recommendation - 2013/2014

Recommenders Systems

LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING

Zaffar+Ahmed+ +Collaborative+Filtering

Investigation and application of Personalizing Recommender Systems based on A...

Recommendation Systems Roadtrip

A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings

Plus de Yasmine Gaber

CapistranoYasmine Gaber

IonicYasmine Gaber

Dyna traceYasmine Gaber

Mahout part2Yasmine Gaber

Ibn SinaYasmine Gaber

Home BowlingYasmine Gaber

Oauth2.0Yasmine Gaber

Why_do i_hate_shoppingYasmine Gaber

Plus de Yasmine Gaber (8)

Capistrano

Ionic

Dyna trace

Mahout part2

Ibn Sina

Home Bowling

Oauth2.0

Why_do i_hate_shopping

Dernier

ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1

Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña

Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav

Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7

prashanth updated resume 2024 for Teaching ProfessionSri Sairam College Of Engineering Bengaluru

ClimART Action | eTwinning Projectjordimapav

Concurrency Control in Database Management systemChristalin Nelson

MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir

4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239

Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43

Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar

Transaction Management in Database Management SystemChristalin Nelson

ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10

Textual Evidence in Reading and Writing of SHSMae Pangan

Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543

DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1

ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco

Dernier (20)

ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv

Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx

Narcotic and Non Narcotic Analgesic..pdf

Using Grammatical Signals Suitable to Patterns of Idea Development

prashanth updated resume 2024 for Teaching Profession

ClimART Action | eTwinning Project

Concurrency Control in Database Management system

MS4 level being good citizen -imperative- (1) (1).pdf

4.11.24 Mass Incarceration and the New Jim Crow.pptx

Grade Three -ELLNA-REVIEWER-ENGLISH.pptx

Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx

Transaction Management in Database Management System

ROLES IN A STAGE PRODUCTION in arts.pptx

Textual Evidence in Reading and Writing of SHS

Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)

DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx

ICS2208 Lecture6 Notes for SL spaces.pdf

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf

Mahout part1

1. Mahout in Action Part 1 Yasmine M. Gaber 28 February 2013

2. Agenda  Meet Apache Mahout  Part 1: Recommendation  Part 2: Clustering  Part 3: Classification

3. Meet Apache Mahout  It is an open source machine learning library from Apache  It is scalable  It is a Java library  It can be used with Hadoop to deal with large scale data.

4. Famous Engines  Recommender engines:  Amazon.com  Netflix  Dating sites like Líbímseti  Social networking sites like Facebook  Clustering engines:  Google News  Search engines like Clusty  Classification engines:  Spam emails  Google’s Picasa  Optical character recognition software  Apple’s Genius feature in iTunes

5. Recommendations

6. Recommender Input  A preference consists of a user ID and an item ID, user’s preference for the item  It is .csv file

7. Create Recommender

8. Recommender Evaluation  Average difference vs Root-mean-square

9. Mahout RecommenderEvaluator

10. Precision and Recall

11. RecommenderIRStatsEvaluator

12. Representing Recommender Data  Preference object − new GenericPreference(123, 456, 3.0f)  Preference Array

13. Representing Recommender Data  Preference Array  FastByIDMap and FastIDSet

14. In-memory DataModels  GenericDataModel  File-based data  Refreshable components  Database-based data

15. Coping without preference values

16. Coping without preference values

17. User-based Recommender  The algorithm for every item i that u has no preference for yet for every other user v that has a preference for i compute a similarity s between u and v incorporate v's preference for i, weighted by s, into a running average return the top items, ranked by weighted average

18. Recommender Components  Data model, implemented via DataModel  User-user similarity metric, implemented via UserSimilarity  User neighborhood definition, implemented via UserNeighborhood  Recommender engine, implemented via a Recommender (here,

19. GenericUserBasedRecommender

20. User Neighborhoods  Fixed-size neighborhoods  Threshold-based neighborhood

21. similarity metrics  Pearson correlation–based similarity − It is a number between –1 and 1 that measures the tendency of two series of numbers, paired up one-to-one, to move together − Problems:  It doesn’t take into account the number of items in which two users’ preferences overlap, which is probably a weakness in the context of recommender engines.  If two users overlap on only one item, no correlation can be computed because of how the computation is defined

22. similarity metrics  Euclidean distance similarity − 1 / (1+euclidean distance)  Cosine measure similarity − between –1 and 1  Tanimoto coefficient similarity − The ratio of the size of the intersection to the size of the union of their preferred items

23. Item-based recommendation  The algorithm for every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add u's preference for j, weighted by s, to a running average return the top items, ranked by weighted average

24. GenericItemBasedRecommender

25. Slope-one recommender  The algorithm for every item i the user u expresses no preference for for every item j that user u expresses a preference for find the average preference difference between j and i add this diff to u's preference value for j add this to a running average return the top items, ranked by these averages

26. Taking Recommender to Production

27. User-based recommenders

28. Thank You Contact at: Email: Yasmine.Gaber@espace.com.eg Twitter: Twitter.com/yasmine_mohamed

Mahout part1

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Mahout part1

Similaire à Mahout part1 (20)

Plus de Yasmine Gaber

Plus de Yasmine Gaber (8)

Dernier

Dernier (20)

Mahout part1