Introduction to Apache Mahout

•

3 likes•2,053 views

Aman Adhikari

A presentation given on final semester of BSc IT

Technology Education

Name:Aman Adhikari
Email: adhikariaman01@gmail.com

 Machine Learning , a branch of AI, is about
construction and study of system that can
learn from existing data.
It is used in field like:
Information retrieval
Identify key topics in large collections of text
Biology
Linear Algebra etc.

 An Apache Software Foundation project to
create scalable machine learning libraries
under the Apache Software License.
WHY MAHOUT ?
Many Open Source Machine Learning libraries either:
 Lack Community
 Lack Documentation and Examples
 Lack Scalability
 Lack the Apache License
 Or are not research-oriented

 Began life at 2008 as sub project of Apache
Lucene (search, text mining- API).
 Lucene commiter felt it to include as
separate project and mahout absorbed Taste
collaborative filtering project.
 At April 2010, Mahout became top level
apache project

 Google News sees about 3.5 million new
news articles per day and clustered with
other articles in minutes to deliver timely.
Other eg. Picasa.
 Mahout makes use of hadoop.
 Some algorithms won’t scale to massive machine
clusters but map-reduce framework like apache
hadoop do.
 Mahout convert algorithm to work at scale on top
of Hadoop.

 Recommender engines (Collaborative
Filtering)
 Clustering
 Classification

 Extensive framework for collaborative
filtering.
 Recommenders:
-- User Based
-- Item Based
 Online and Offline support
-- Offline can utilize hadoop
 Used by Amazon , Facebook etc.

 Clustering techniques attempt to group a
large number of things together into clusters
that share some similarity.
 K-means , Fuzzy K-means
 Summly app also summarize similar stories
from different news site and gives a brief
news on that app.(concept of Google news)

 Classification techniques decide how much a
thing is or isn’t part of some type or
category, or how much it does or doesn’t
have some attribute.
 Example:
-- Yahoo Mail spam checker
-- Facebook face detection

 Mahout is young ,open source , scalable
machine learning library from apache
 Its technique are no longer theory instead
deployed to solve in real world like e-
commerce, video , picture etc.
 Scalability being the major issue Hadoop is
on rescue.

What's hot

Map reduce presentationateeq ateeq

Tutorial Mahout - RecommendationCataldo Musto

Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn

Introduction to Hadoop TechnologyManish Borkar

Hadoop combiner and partitionerSubhas Kumar Ghosh

Hadoop And Their Ecosystem pptsunera pathan

Anatomy of classic map reduce in hadoop Rajesh Ananda Kumar

Mahout Tutorial and Hands-on (version 2015)Cataldo Musto

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn

Apache avro and overview hadoop toolsalireza alikhani

Performance analysis of MongoDB and HBaseSindhujanDhayalan

Diabetes prediction using machine learningdataalcott

Map ReducePrashant Gupta

Session 14 - HiveAnandMHadoop

Hadoop ecosystemStanley Wang

PPT on HadoopShubham Parmar

Presentation on K-Means ClusteringPabna University of Science & Technology

Recommender systems: Content-based and collaborative filteringViet-Trung TRAN

RDDTien-Yang (Aiden) Wu

Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto

What's hot (20)

Map reduce presentation

Tutorial Mahout - Recommendation

Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...

Introduction to Hadoop Technology

Hadoop combiner and partitioner

Hadoop And Their Ecosystem ppt

Anatomy of classic map reduce in hadoop

Mahout Tutorial and Hands-on (version 2015)

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...

Apache avro and overview hadoop tools

Performance analysis of MongoDB and HBase

Diabetes prediction using machine learning

Map Reduce

Session 14 - Hive

Hadoop ecosystem

PPT on Hadoop

Presentation on K-Means Clustering

Recommender systems: Content-based and collaborative filtering

RDD

Apache Mahout Tutorial - Recommendation - 2013/2014

Viewers also liked

Machine Learning and Apache Mahout : An IntroductionVarad Meru

Intro to Apache MahoutGrant Ingersoll

Mahout and RecommendationsTed Dunning

Scientific Article Recommendation with MahoutKris Jack

Apache MahoutAjit Koti

Introduction to MahoutTed Dunning

Mahout classification presentationNaoki Nakatani

Biometric Databases and Hadoop__HadoopSummit2010Yahoo Developer Network

Machine Learning with Apache MahoutDaniel Glauser

Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group

Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group

Apache MahoutSave Manos

Apache Mahout 於電子商務的應用James Chen

Introduction to Mahout and Machine LearningVarad Meru

Viewers also liked (14)

Machine Learning and Apache Mahout : An Introduction

Intro to Apache Mahout

Mahout and Recommendations

Scientific Article Recommendation with Mahout

Apache Mahout

Introduction to Mahout

Mahout classification presentation

Biometric Databases and Hadoop__HadoopSummit2010

Machine Learning with Apache Mahout

Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...

Yahoo! Mail antispam - Bay area Hadoop user group

Apache Mahout

Apache Mahout 於電子商務的應用

Introduction to Mahout and Machine Learning

Similar to Introduction to Apache Mahout

Mahout in actionSakthivel Murugasamy

Apache mahout and R-mining complex dataobjectsakthibalabalamuruga

Test PresentationPrafulla Kiran

Machine Learning HadoopAletheLabs

Recommendation engineVignesh Prajapati

Apache mahoutPuneet Gupta

Hadoop framework thesis (3)JonySaini2

Hadoop ReportNishant Gandhi

Vipul divyanshu mahout_documentationVipul Divyanshu

Cap 10 inglesElianaSalinas4

mahout introductionchanggeng Zhang

How facebook works and function- a complete approachPrakhar Gethe

Architecture of FacebookSyed Bahadur Shah

MahoutNewRahul Reghunath

A Glimpse of Bigdata - Introductionsaisreealekhya

Top Artificial Intelligence Tools & Frameworks in 2023.pdfYamuna5

Data analyticsowaiz shaikh

Twitter word frequency count using hadoop components 150331221753pradip patel

Similar to Introduction to Apache Mahout (20)

Mahout in action

Apache mahout and R-mining complex dataobject

Test Presentation

Machine Learning Hadoop

Recommendation engine

Apache mahout

Hadoop framework thesis (3)

Hadoop Report

Vipul divyanshu mahout_documentation

Cap 10 ingles

mahout introduction

How facebook works and function- a complete approach

Architecture of Facebook

MahoutNew

A Glimpse of Bigdata - Introduction

Top Artificial Intelligence Tools & Frameworks in 2023.pdf

Data analytics

Twitter word frequency count using hadoop components 150331221753

Recently uploaded

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

Connecting the Dots for Information Discovery.pdfNeo4j

Data governance with Unity Catalog PresentationKnoldus Inc.

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

A Framework for Development in the AI AgeCprime

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Manual 508 Accessibility Compliance AuditSkynet Technologies

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Sample pptx for embedding into website for demoHarshalMandlekar2

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Recently uploaded (20)

Time Series Foundation Models - current state and future directions

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...

Connecting the Dots for Information Discovery.pdf

Data governance with Unity Catalog Presentation

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Long journey of Ruby standard library at RubyConf AU 2024

A Framework for Development in the AI Age

The Ultimate Guide to Choosing WordPress Pros and Cons

Manual 508 Accessibility Compliance Audit

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes

DevEX - reference for building teams, processes, and platforms

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

Sample pptx for embedding into website for demo

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Assure Ecommerce and Retail Operations Uptime with ThousandEyes

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Decarbonising Buildings: Making a net-zero built environment a reality

Scale your database traffic with Read & Write split using MySQL Router

Introduction to Apache Mahout

1. Name:Aman Adhikari Email: adhikariaman01@gmail.com

2.  Machine Learning , a branch of AI, is about construction and study of system that can learn from existing data. It is used in field like: Information retrieval Identify key topics in large collections of text Biology Linear Algebra etc.

3.  An Apache Software Foundation project to create scalable machine learning libraries under the Apache Software License. WHY MAHOUT ? Many Open Source Machine Learning libraries either:  Lack Community  Lack Documentation and Examples  Lack Scalability  Lack the Apache License  Or are not research-oriented

4.  Began life at 2008 as sub project of Apache Lucene (search, text mining- API).  Lucene commiter felt it to include as separate project and mahout absorbed Taste collaborative filtering project.  At April 2010, Mahout became top level apache project

5.  Google News sees about 3.5 million new news articles per day and clustered with other articles in minutes to deliver timely. Other eg. Picasa.  Mahout makes use of hadoop.  Some algorithms won’t scale to massive machine clusters but map-reduce framework like apache hadoop do.  Mahout convert algorithm to work at scale on top of Hadoop.

6.  Recommender engines (Collaborative Filtering)  Clustering  Classification

7.  Extensive framework for collaborative filtering.  Recommenders: -- User Based -- Item Based  Online and Offline support -- Offline can utilize hadoop  Used by Amazon , Facebook etc.

9.  Clustering techniques attempt to group a large number of things together into clusters that share some similarity.  K-means , Fuzzy K-means  Summly app also summarize similar stories from different news site and gives a brief news on that app.(concept of Google news)

10.  Classification techniques decide how much a thing is or isn’t part of some type or category, or how much it does or doesn’t have some attribute.  Example: -- Yahoo Mail spam checker -- Facebook face detection

11.  Mahout is young ,open source , scalable machine learning library from apache  Its technique are no longer theory instead deployed to solve in real world like e- commerce, video , picture etc.  Scalability being the major issue Hadoop is on rescue.

Introduction to Apache Mahout

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Introduction to Apache Mahout

Similar to Introduction to Apache Mahout (20)

More from Aman Adhikari

More from Aman Adhikari (20)

Recently uploaded

Recently uploaded (20)

Introduction to Apache Mahout