2. Machine Learning , a branch of AI, is about
construction and study of system that can
learn from existing data.
It is used in field like:
Information retrieval
Identify key topics in large collections of text
Biology
Linear Algebra etc.
3. An Apache Software Foundation project to
create scalable machine learning libraries
under the Apache Software License.
WHY MAHOUT ?
Many Open Source Machine Learning libraries either:
Lack Community
Lack Documentation and Examples
Lack Scalability
Lack the Apache License
Or are not research-oriented
4. Began life at 2008 as sub project of Apache
Lucene (search, text mining- API).
Lucene commiter felt it to include as
separate project and mahout absorbed Taste
collaborative filtering project.
At April 2010, Mahout became top level
apache project
5. Google News sees about 3.5 million new
news articles per day and clustered with
other articles in minutes to deliver timely.
Other eg. Picasa.
Mahout makes use of hadoop.
Some algorithms won’t scale to massive machine
clusters but map-reduce framework like apache
hadoop do.
Mahout convert algorithm to work at scale on top
of Hadoop.
7. Extensive framework for collaborative
filtering.
Recommenders:
-- User Based
-- Item Based
Online and Offline support
-- Offline can utilize hadoop
Used by Amazon , Facebook etc.
8.
9. Clustering techniques attempt to group a
large number of things together into clusters
that share some similarity.
K-means , Fuzzy K-means
Summly app also summarize similar stories
from different news site and gives a brief
news on that app.(concept of Google news)
10. Classification techniques decide how much a
thing is or isn’t part of some type or
category, or how much it does or doesn’t
have some attribute.
Example:
-- Yahoo Mail spam checker
-- Facebook face detection
11. Mahout is young ,open source , scalable
machine learning library from apache
Its technique are no longer theory instead
deployed to solve in real world like e-
commerce, video , picture etc.
Scalability being the major issue Hadoop is
on rescue.