SlideShare une entreprise Scribd logo
1  sur  48
Machine Learning
Techniques in Java
Ramesh Gundeti & Ferosh Jacob
Search and Personalization, The Home Depot
Agenda
• Motivation
• Introduction to machine learning
• Generating Recommendations
• Weka tutorial
• Conclusion
2
Agenda
• Motivation
• Introduction to machine learning
• Generating Recommendations
• Weka tutorial
• Conclusion
3
Motivation: TheHomeDepot.com
4
Motivation: TheHomeDepot.com
• More than 4 Million sessions in a day
• 1 Billion searches last year
• 4K different types of products
• Can you guess the most searched phrase last year?
toilet (1,177,157)
bathroom vanity (1,141,770)
refrigerator (1,128,169)
5
Agenda
• Motivation
• Introduction to machine learning
• Generating Recommendations
• Weka tutorial
• Conclusion
6
Introduction to Machine learning
 “Machine learning is a type of artificial intelligence (AI) that provides
computers with the ability to learn without being explicitly
programmed.” - Wikipedia
 Types of machine learning
 Supervised machine learning
 Unsupervised machine learning
7
Introduction to Machine learning:
Machine learning at home depot
 Smart Sort in product listing page
 Search results
 Recommendations
8
Agenda
• Motivation
• Introduction to machine learning
• Generating Recommendations
• Weka tutorial
• Conclusion
9
Generating Recommendations :
HomeDepot.com Recommendations
10
• There is no store associate on
HD.com site
• 20% of HD.com revenue is
generated through
recommendations.
Generating Recommendations :
HomeDepot.com Recommendations
 Frequently bought together
 Item related groups
 Frequently compared
11
Generating Recommendations :
Mahout Introduction
 Mahout
 Apache license
 Java library
 Also has implementation in Hadoop, Spark, H2O
 Recommendations using Mahout
 Data preparation
 Training models
 Evaluating/Testing
12
Generating Recommendations :
Data preparation
 “Garbage in – Garbage out”
 Select data
 Preprocess and format data
 Clean up
13
Generating Recommendations :
Frequent Pattern Growth
 A pattern mining algorithm.
 Takes in transactions.
p1,p2,p3
p1,p2,p4
p1,p5,p2
 Generates frequent patterns.
p5 :: ([p1, p2, p5],1)
p4 :: ([p1, p2, p4],1)
p3 :: ([p1, p2, p3],1)
p2 :: ([p1, p2],3), ([p1, p2, p4],1), ([p1, p2, p5],1), ([p1, p2, p3],1)
p1 :: ([p1, p2],3), ([p1, p2, p4],1), ([p1, p2, p5],1), ([p1, p2, p3],1)
14
Generating Recommendations :
Frequent Pattern Growth
 Example
15
Generating Recommendations :
Collaborative filtering
 Item based recommendations
 User based recommendations
 Preferences data
 Users (long userId)
 Items (long itemId)
 Preferences/Ratings (float preference)
16
Generating Recommendations :
User-Item matrix
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Similar
ity to
User 1
User 1 5.0 3.0 2.5 - - - -
User 2 2.0 2.5 5.0 2.0 - - -
User 3 2.5 - - 4.0 4.5 - 5.0
User 4 5.0 - 3.0 4.5 - 4.0 -
User 5 4.0 3.0 2.0 4.0 3.5 4.0 -
17
Generating Recommendations :
Similarity metrics
 Pearson correlation-based similarity
n = number of pairs of scores
∑xy = sum of products of paired scores
∑x = sum of x scores
∑y = sum of y scores
18
Generating Recommendations :
Similarity metrics
 Tanimoto coefficient

19
Generating Recommendations :
Similarity metrics
 Log-likelihood-based Similarity
How strongly unlikely it is that two users have no resemblance in their
preferences.
LLR = 2 sum(k) (H(k) - H(rowSums(k)) - H(colSums(k)))
H is Shannon's entropy
20
Generating Recommendations :
Neighborhoods
 Fixed-size neighborhoods
 Nearest n users
 Threshold based neighborhood
 Similarity threshold
21
Generating recommendations:
Demo
 Example
22
Generating Recommendations :
Evaluating recommendations
 Average Absolute Difference
(0.5 + 0.5 + 0.5 + 1.0) / 4 = 0.625
 Root Mean Square
⎷((0.52 + 0.52 + 0.52 + 1.02)/4) = 0.4375
 Precision
Fraction of retrieved products that are relevant.
 Recall
Fraction of relevant products that are retrieved.
23
Item 1 Item 2 Item 3 Item 4
Actual 4.0 3.5 2.0 5.0
Estimate 3.5 3.0 2.5 4.0
Difference 0.5 0.5 0.5 1.0
Generating Recommendations :
Evaluating recommendations demo
 Example
24
WEKA Tutorial
25
26
Machine learning overview
27
“The acquisition of knowledge is always of use to the intellect, because it may thus
drive out useless things and retain the good. For nothing can be loved or hated
unless it is first known.”
Data vs Information
Machine learning overview: Contact lenses
28
Age Spectacle prescription Astigmatism Tear production rate Recommended lenses
Young Myope No Reduced None
Young Myope No Normal Soft
Young Myope Yes Reduced None
Young Myope Yes Normal Hard
Young Hypermetrope No Reduced None
Young Hypermetrope No Normal Soft
Young Hypermetrope Yes Reduced None
Young Hypermetrope Yes Normal hard
Pre-presbyopic Myope No Reduced None
Pre-presbyopic Myope No Normal Soft
Pre-presbyopic Myope Yes Reduced None
Pre-presbyopic Myope Yes Normal Hard
Pre-presbyopic Hypermetrope No Reduced None
Pre-presbyopic Hypermetrope No Normal Soft
Pre-presbyopic Hypermetrope Yes Reduced None
Pre-presbyopic Hypermetrope Yes Normal None
Presbyopic Myope No Reduced None
Presbyopic Myope No Normal None
Presbyopic Myope Yes Reduced None
Presbyopic Myope Yes Normal Hard
Presbyopic Hypermetrope No Reduced None
Presbyopic Hypermetrope No Normal Soft
Presbyopic Hypermetrope Yes Reduced None
Presbyopic Hypermetrope Yes Normal None
Presbyopia is a condition associated with aging in which the eye exhibits a progressively
diminished ability to focus on near objects
Machine learning overview: Contact lenses
29
Machine learning overview: Contact lenses
30
if tearProductionRate == reduced
then recommendation == none
if age == young && astigmatic == no && tearProductionRate == normal
then recommendation == soft
if age == pre-presbyopic && astigmatic == no && tearProductionRate == normal
then recommendation == soft
if age == presbyopic && spectaclePrescription == myope && astigmatic == no
then recommendation == none
if spectaclePrescription == hypermetrope && astigmatic == no && tearProductionRate == normal
then recommendation == soft
if spectaclePrescription == myope && astigmatic == yes && tearProductionRate == normal
then recommendation == hard
if age young && astigmatic == yes && tearProductionRate == normal
then recommendation == hard
if age == pre-presbyopic && spectaclePrescription == hypermetrope && astigmatic == yes
then recommendation == none
if age == presbyopic && spectaclePrescription == hypermetrope && astigmatic == yes
then recommendation == none
WEKA Introduction
31
“The weka (also known as Maori hen or woodhen) (Gallirallus australis) is a
flightless bird species of the rail family. It is endemic to New Zealand” -Wikipedia
WEKA Introduction
32
• The algorithms can either be applied
• directly to a dataset
• called from your own Java code.
• Weka contains tools for
• data pre-processing,
• classification,
• regression,
• clustering,
• association rules,
• and visualization.
• A collection of machine learning
algorithms for data mining tasks.
• Weka is open source software
issued under the GNU General
Public License.
Overview:
WORD SENSE DISAMBIGUATION using WEKA
1. Problem specification
2. Data preparation
3. Modeling using the WEKA GUI
4. Using the model from Java/SCALA code
33
1. Problem specification:
Identify product senses of words
 Words have different meanings in different contexts (E.g., "speaker"
can be used in the context of an "electrical device" or in the context
of a "presiding officer").
 The goal is to identify whether a given word within a given context
can be identified as a product sold in a retail/home improvement
store (i.e."speaker" as an "electrical device” can be be found in a
retail/home improvement store, but “speaker” as “presiding” officer”
cannot).
34
1. Problem specification:
Identify product senses of words
 Example 1. Speaker
 speaker – “an electrical device”
 THIS IS A PRODUCT SENSE
 speaker – “presiding officer”
 THIS IS NOT A PRODUCT SENSE
 Example 2. Hammer
 hammer – “act of pounding (delivering repeated heavy blows); the
sudden hammer of fists caught him off guard; the pounding of feet on
the hallway”
 THIS IS NOT A PRODUCT SENSE
 hammer- “hand tool with a heavy rigid head and a handle; used to
deliver an impulsive force by striking”
 THIS IS A PRODUCT SENSE
35
Problem specification:
Identify product senses of words
4958550 light
the visual effect of illumination on objects or scenes as created in pictures; "he could paint the lightest
light and the darkest dark"
8272926 smoker a party for men only (or one considered suitable for men only)
7023062 book a written version of a play or other dramatic composition; used in preparing for a performance
3464523 grille a framework of metal bars used as a partition or a grate; "he cooked hamburgers on the grill"
2937374 cable a television system that transmits over cables
3860335 pipe the flues and stops on a pipe organ
9984335 scribe someone employed to make written copies of documents and manuscripts
4316686 steamer a cooking utensil that can be used to cook food by steaming it
10090370 shower someone who organizes an exhibit for others to see
2884787 bowl a wooden ball (with flattened sides so that it rolls on a curved course) used in the game of lawn bowling
3688932 locker a fastener that locks or closes
3347207 escutcheon a flat protective covering (on a door or wall etc) to prevent soiling by dirty fingers
12808124 christmas tree Australian tree or shrub with red flowers; often used in Christmas decoration
7688535 suet hard fat around the kidneys and loins in beef and sheep
4504300 tumbler
a movable obstruction in a lock that must be adjusted to a given position (as by a key) before the bolt
can be thrown
3084637 compass drafting instrument used for drawing circles
4453410 toilet a room or building equipped with one or more toilets
3413354 futon mattress consisting of a pad of cotton batting that is used for sleeping on the floor or on a raised frame
36
Problem specification:
Identify product senses of words
37
“CrowdFlower is a data enrichment, data mining and crowdsourcing company
based in the Mission District of San Francisco, California. The company's
software as a service platform allows users to access an online workforce of
millions of people to clean, label and enrich data.” - Wikipedia
Overview:
WORD SENSE DISAMBIGUATION using WEKA
1. Problem specification
2. Data preparation
3. Modeling using the WEKA GUI
4. Using the model from Java/SCALA code
38
Data preparation:
ARFF file generation
What are ARFF files
 An ARFF (Attribute-Relation File Format) file is an ASCII text file that
describes a list of instances sharing a set of attributes.
 ARFF files were developed by the Machine Learning Project at the
Department of Computer Science of The University of Waikato for use
with the Weka machine learning software
39
Data preparation:
ARFF file generation
40
% 1. Title: Iris Plants Database
%
% 2. Sources:
% (a) Creator: R.A. Fisher
% (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
% (c) Date: July, 1988
%
@RELATION iris
@ATTRIBUTE sepallength NUMERIC
@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
Header section
Data section
Data preparation:
ARFF file generation
41
@relation ProductSense
@attribute text string
@attribute isValid {yes,no}
@data
'a party for men only (or one considered suitable for men only)',yes
'a written version of a play or other dramatic composition; used in preparing for a performance',no
'a framework of metal bars used as a partition or a grate; "he cooked hamburgers on the grill"',no
'a television system that transmits over cables',no
'the flues and stops on a pipe organ',yes
'someone employed to make written copies of documents and manuscripts',yes
'a cooking utensil that can be used to cook food by steaming it',no
Overview:
WORD SENSE DISAMBIGUATION using WEKA
1. Problem specification
2. Data preparation
3. Modeling using the WEKA GUI
4. Using the model from Java/SCALA code
42
Modeling using the WEKA GUI:
WEKA GUI in Action
43
Modeling using the WEKA GUI:
Algorithm comparison
44
Algorithm TP Rate FP Rate Precision Recall F-Measure ROC Area
J48 0.698 0.34 0.695 0.698 0.696 0.721
Naiver Bayes 0.721 0.299 0.722 0.721 0.721 0.776
Random Forest 0.724 0.297 0.725 0.724 0.725 0.778
LibSVM 0.601 0.601 0.361 0.601 0.451 0.5
Logisitic 0.622 0.398 0.627 0.622 0.624 0.632
Overview:
WORD SENSE DISAMBIGUATION using WEKA
1. Problem specification
2. Data preparation
3. Modeling using the WEKA GUI
4. Using the model from Java/SCALA code
45
Using the model from Java/SCALA code:
Source code view
 https://github.com/feroshjacob/AJUGDemos
 http://localhost:8080
46
Agenda
• Motivation
• Introduction to machine learning
• Generating Recommendations
• Weka tutorial
• Conclusion
47
Questions?
48

Contenu connexe

En vedette

Montaggio Doccia Chiocciola
Montaggio Doccia ChiocciolaMontaggio Doccia Chiocciola
Montaggio Doccia Chiocciola
Galli Gianni
 
Attachments 2009 04 08
Attachments 2009 04 08Attachments 2009 04 08
Attachments 2009 04 08
guest4e6c81
 
лекц №9
лекц №9лекц №9
лекц №9
azora14
 
Tafsir Ahsan-ul-Bayan┇Para 1┇ آلم
Tafsir Ahsan-ul-Bayan┇Para 1┇ آلمTafsir Ahsan-ul-Bayan┇Para 1┇ آلم
Tafsir Ahsan-ul-Bayan┇Para 1┇ آلم
Quran Juz (Para)
 
хайр
хайрхайр
хайр
azora14
 

En vedette (11)

Montaggio Doccia Chiocciola
Montaggio Doccia ChiocciolaMontaggio Doccia Chiocciola
Montaggio Doccia Chiocciola
 
Attachments 2009 04 08
Attachments 2009 04 08Attachments 2009 04 08
Attachments 2009 04 08
 
лекц №9
лекц №9лекц №9
лекц №9
 
Tafsir Ahsan-ul-Bayan┇Para 1┇ آلم
Tafsir Ahsan-ul-Bayan┇Para 1┇ آلمTafsir Ahsan-ul-Bayan┇Para 1┇ آلم
Tafsir Ahsan-ul-Bayan┇Para 1┇ آلم
 
Smart Emission - Citizens measuring Air Quality - Overview
Smart Emission - Citizens measuring Air Quality - OverviewSmart Emission - Citizens measuring Air Quality - Overview
Smart Emission - Citizens measuring Air Quality - Overview
 
Lightweight developer provisioning with gradle and seu as-code
Lightweight developer provisioning with gradle and seu as-codeLightweight developer provisioning with gradle and seu as-code
Lightweight developer provisioning with gradle and seu as-code
 
Geri Yayılım Algoritması
Geri Yayılım AlgoritmasıGeri Yayılım Algoritması
Geri Yayılım Algoritması
 
Der Cloud Native Stack in a Nutshell
Der Cloud Native Stack in a NutshellDer Cloud Native Stack in a Nutshell
Der Cloud Native Stack in a Nutshell
 
хайр
хайрхайр
хайр
 
(کافر کے کُفر میں شک کا ازالہ (القول المحتد | Al qaul muhtud -Kafir ke kufr m...
(کافر کے کُفر میں شک کا ازالہ (القول المحتد | Al qaul muhtud -Kafir ke kufr m...(کافر کے کُفر میں شک کا ازالہ (القول المحتد | Al qaul muhtud -Kafir ke kufr m...
(کافر کے کُفر میں شک کا ازالہ (القول المحتد | Al qaul muhtud -Kafir ke kufr m...
 
Machine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics OverviewMachine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics Overview
 

Similaire à Java andml may17-v1

Part XIV
Part XIVPart XIV
Part XIV
butest
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
Maria Eskevich
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Marcel Kurovski
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
inovex GmbH
 

Similaire à Java andml may17-v1 (20)

Hijacking Treejack UXPA 2023 Talk.pptx.pdf
Hijacking Treejack UXPA 2023 Talk.pptx.pdfHijacking Treejack UXPA 2023 Talk.pptx.pdf
Hijacking Treejack UXPA 2023 Talk.pptx.pdf
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Understanding online audiences ara conf 28 aug 15 martin bazley upload version
Understanding online audiences ara conf 28 aug 15 martin bazley upload versionUnderstanding online audiences ara conf 28 aug 15 martin bazley upload version
Understanding online audiences ara conf 28 aug 15 martin bazley upload version
 
Business idea jenga hackathon
Business idea jenga hackathonBusiness idea jenga hackathon
Business idea jenga hackathon
 
World Future Society 2015 Professional Members Forum
World Future Society 2015 Professional Members ForumWorld Future Society 2015 Professional Members Forum
World Future Society 2015 Professional Members Forum
 
Mobile UX London - Mobile Usability Hands-on by SABRINA DUDA
Mobile UX London - Mobile Usability Hands-on by SABRINA DUDAMobile UX London - Mobile Usability Hands-on by SABRINA DUDA
Mobile UX London - Mobile Usability Hands-on by SABRINA DUDA
 
Field Research at the Speed of Business
Field Research at the Speed of BusinessField Research at the Speed of Business
Field Research at the Speed of Business
 
AI Orange Belt - Session 1
AI Orange Belt - Session 1AI Orange Belt - Session 1
AI Orange Belt - Session 1
 
Why am I doing this???
Why am I doing this???Why am I doing this???
Why am I doing this???
 
Part XIV
Part XIVPart XIV
Part XIV
 
creativity
creativitycreativity
creativity
 
Coaching teams in creative problem solving
Coaching teams in creative problem solvingCoaching teams in creative problem solving
Coaching teams in creative problem solving
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
An Introduction to Usability
An Introduction to UsabilityAn Introduction to Usability
An Introduction to Usability
 
The Essentials of Great Product Design
The Essentials of Great Product DesignThe Essentials of Great Product Design
The Essentials of Great Product Design
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation System
 
New perspectives in innovation methods
New perspectives in innovation methodsNew perspectives in innovation methods
New perspectives in innovation methods
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 

Dernier

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 

Dernier (20)

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 

Java andml may17-v1

  • 1. Machine Learning Techniques in Java Ramesh Gundeti & Ferosh Jacob Search and Personalization, The Home Depot
  • 2. Agenda • Motivation • Introduction to machine learning • Generating Recommendations • Weka tutorial • Conclusion 2
  • 3. Agenda • Motivation • Introduction to machine learning • Generating Recommendations • Weka tutorial • Conclusion 3
  • 5. Motivation: TheHomeDepot.com • More than 4 Million sessions in a day • 1 Billion searches last year • 4K different types of products • Can you guess the most searched phrase last year? toilet (1,177,157) bathroom vanity (1,141,770) refrigerator (1,128,169) 5
  • 6. Agenda • Motivation • Introduction to machine learning • Generating Recommendations • Weka tutorial • Conclusion 6
  • 7. Introduction to Machine learning  “Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed.” - Wikipedia  Types of machine learning  Supervised machine learning  Unsupervised machine learning 7
  • 8. Introduction to Machine learning: Machine learning at home depot  Smart Sort in product listing page  Search results  Recommendations 8
  • 9. Agenda • Motivation • Introduction to machine learning • Generating Recommendations • Weka tutorial • Conclusion 9
  • 10. Generating Recommendations : HomeDepot.com Recommendations 10 • There is no store associate on HD.com site • 20% of HD.com revenue is generated through recommendations.
  • 11. Generating Recommendations : HomeDepot.com Recommendations  Frequently bought together  Item related groups  Frequently compared 11
  • 12. Generating Recommendations : Mahout Introduction  Mahout  Apache license  Java library  Also has implementation in Hadoop, Spark, H2O  Recommendations using Mahout  Data preparation  Training models  Evaluating/Testing 12
  • 13. Generating Recommendations : Data preparation  “Garbage in – Garbage out”  Select data  Preprocess and format data  Clean up 13
  • 14. Generating Recommendations : Frequent Pattern Growth  A pattern mining algorithm.  Takes in transactions. p1,p2,p3 p1,p2,p4 p1,p5,p2  Generates frequent patterns. p5 :: ([p1, p2, p5],1) p4 :: ([p1, p2, p4],1) p3 :: ([p1, p2, p3],1) p2 :: ([p1, p2],3), ([p1, p2, p4],1), ([p1, p2, p5],1), ([p1, p2, p3],1) p1 :: ([p1, p2],3), ([p1, p2, p4],1), ([p1, p2, p5],1), ([p1, p2, p3],1) 14
  • 15. Generating Recommendations : Frequent Pattern Growth  Example 15
  • 16. Generating Recommendations : Collaborative filtering  Item based recommendations  User based recommendations  Preferences data  Users (long userId)  Items (long itemId)  Preferences/Ratings (float preference) 16
  • 17. Generating Recommendations : User-Item matrix Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Similar ity to User 1 User 1 5.0 3.0 2.5 - - - - User 2 2.0 2.5 5.0 2.0 - - - User 3 2.5 - - 4.0 4.5 - 5.0 User 4 5.0 - 3.0 4.5 - 4.0 - User 5 4.0 3.0 2.0 4.0 3.5 4.0 - 17
  • 18. Generating Recommendations : Similarity metrics  Pearson correlation-based similarity n = number of pairs of scores ∑xy = sum of products of paired scores ∑x = sum of x scores ∑y = sum of y scores 18
  • 19. Generating Recommendations : Similarity metrics  Tanimoto coefficient  19
  • 20. Generating Recommendations : Similarity metrics  Log-likelihood-based Similarity How strongly unlikely it is that two users have no resemblance in their preferences. LLR = 2 sum(k) (H(k) - H(rowSums(k)) - H(colSums(k))) H is Shannon's entropy 20
  • 21. Generating Recommendations : Neighborhoods  Fixed-size neighborhoods  Nearest n users  Threshold based neighborhood  Similarity threshold 21
  • 23. Generating Recommendations : Evaluating recommendations  Average Absolute Difference (0.5 + 0.5 + 0.5 + 1.0) / 4 = 0.625  Root Mean Square ⎷((0.52 + 0.52 + 0.52 + 1.02)/4) = 0.4375  Precision Fraction of retrieved products that are relevant.  Recall Fraction of relevant products that are retrieved. 23 Item 1 Item 2 Item 3 Item 4 Actual 4.0 3.5 2.0 5.0 Estimate 3.5 3.0 2.5 4.0 Difference 0.5 0.5 0.5 1.0
  • 24. Generating Recommendations : Evaluating recommendations demo  Example 24
  • 26. 26
  • 27. Machine learning overview 27 “The acquisition of knowledge is always of use to the intellect, because it may thus drive out useless things and retain the good. For nothing can be loved or hated unless it is first known.” Data vs Information
  • 28. Machine learning overview: Contact lenses 28 Age Spectacle prescription Astigmatism Tear production rate Recommended lenses Young Myope No Reduced None Young Myope No Normal Soft Young Myope Yes Reduced None Young Myope Yes Normal Hard Young Hypermetrope No Reduced None Young Hypermetrope No Normal Soft Young Hypermetrope Yes Reduced None Young Hypermetrope Yes Normal hard Pre-presbyopic Myope No Reduced None Pre-presbyopic Myope No Normal Soft Pre-presbyopic Myope Yes Reduced None Pre-presbyopic Myope Yes Normal Hard Pre-presbyopic Hypermetrope No Reduced None Pre-presbyopic Hypermetrope No Normal Soft Pre-presbyopic Hypermetrope Yes Reduced None Pre-presbyopic Hypermetrope Yes Normal None Presbyopic Myope No Reduced None Presbyopic Myope No Normal None Presbyopic Myope Yes Reduced None Presbyopic Myope Yes Normal Hard Presbyopic Hypermetrope No Reduced None Presbyopic Hypermetrope No Normal Soft Presbyopic Hypermetrope Yes Reduced None Presbyopic Hypermetrope Yes Normal None Presbyopia is a condition associated with aging in which the eye exhibits a progressively diminished ability to focus on near objects
  • 29. Machine learning overview: Contact lenses 29
  • 30. Machine learning overview: Contact lenses 30 if tearProductionRate == reduced then recommendation == none if age == young && astigmatic == no && tearProductionRate == normal then recommendation == soft if age == pre-presbyopic && astigmatic == no && tearProductionRate == normal then recommendation == soft if age == presbyopic && spectaclePrescription == myope && astigmatic == no then recommendation == none if spectaclePrescription == hypermetrope && astigmatic == no && tearProductionRate == normal then recommendation == soft if spectaclePrescription == myope && astigmatic == yes && tearProductionRate == normal then recommendation == hard if age young && astigmatic == yes && tearProductionRate == normal then recommendation == hard if age == pre-presbyopic && spectaclePrescription == hypermetrope && astigmatic == yes then recommendation == none if age == presbyopic && spectaclePrescription == hypermetrope && astigmatic == yes then recommendation == none
  • 31. WEKA Introduction 31 “The weka (also known as Maori hen or woodhen) (Gallirallus australis) is a flightless bird species of the rail family. It is endemic to New Zealand” -Wikipedia
  • 32. WEKA Introduction 32 • The algorithms can either be applied • directly to a dataset • called from your own Java code. • Weka contains tools for • data pre-processing, • classification, • regression, • clustering, • association rules, • and visualization. • A collection of machine learning algorithms for data mining tasks. • Weka is open source software issued under the GNU General Public License.
  • 33. Overview: WORD SENSE DISAMBIGUATION using WEKA 1. Problem specification 2. Data preparation 3. Modeling using the WEKA GUI 4. Using the model from Java/SCALA code 33
  • 34. 1. Problem specification: Identify product senses of words  Words have different meanings in different contexts (E.g., "speaker" can be used in the context of an "electrical device" or in the context of a "presiding officer").  The goal is to identify whether a given word within a given context can be identified as a product sold in a retail/home improvement store (i.e."speaker" as an "electrical device” can be be found in a retail/home improvement store, but “speaker” as “presiding” officer” cannot). 34
  • 35. 1. Problem specification: Identify product senses of words  Example 1. Speaker  speaker – “an electrical device”  THIS IS A PRODUCT SENSE  speaker – “presiding officer”  THIS IS NOT A PRODUCT SENSE  Example 2. Hammer  hammer – “act of pounding (delivering repeated heavy blows); the sudden hammer of fists caught him off guard; the pounding of feet on the hallway”  THIS IS NOT A PRODUCT SENSE  hammer- “hand tool with a heavy rigid head and a handle; used to deliver an impulsive force by striking”  THIS IS A PRODUCT SENSE 35
  • 36. Problem specification: Identify product senses of words 4958550 light the visual effect of illumination on objects or scenes as created in pictures; "he could paint the lightest light and the darkest dark" 8272926 smoker a party for men only (or one considered suitable for men only) 7023062 book a written version of a play or other dramatic composition; used in preparing for a performance 3464523 grille a framework of metal bars used as a partition or a grate; "he cooked hamburgers on the grill" 2937374 cable a television system that transmits over cables 3860335 pipe the flues and stops on a pipe organ 9984335 scribe someone employed to make written copies of documents and manuscripts 4316686 steamer a cooking utensil that can be used to cook food by steaming it 10090370 shower someone who organizes an exhibit for others to see 2884787 bowl a wooden ball (with flattened sides so that it rolls on a curved course) used in the game of lawn bowling 3688932 locker a fastener that locks or closes 3347207 escutcheon a flat protective covering (on a door or wall etc) to prevent soiling by dirty fingers 12808124 christmas tree Australian tree or shrub with red flowers; often used in Christmas decoration 7688535 suet hard fat around the kidneys and loins in beef and sheep 4504300 tumbler a movable obstruction in a lock that must be adjusted to a given position (as by a key) before the bolt can be thrown 3084637 compass drafting instrument used for drawing circles 4453410 toilet a room or building equipped with one or more toilets 3413354 futon mattress consisting of a pad of cotton batting that is used for sleeping on the floor or on a raised frame 36
  • 37. Problem specification: Identify product senses of words 37 “CrowdFlower is a data enrichment, data mining and crowdsourcing company based in the Mission District of San Francisco, California. The company's software as a service platform allows users to access an online workforce of millions of people to clean, label and enrich data.” - Wikipedia
  • 38. Overview: WORD SENSE DISAMBIGUATION using WEKA 1. Problem specification 2. Data preparation 3. Modeling using the WEKA GUI 4. Using the model from Java/SCALA code 38
  • 39. Data preparation: ARFF file generation What are ARFF files  An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes.  ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software 39
  • 40. Data preparation: ARFF file generation 40 % 1. Title: Iris Plants Database % % 2. Sources: % (a) Creator: R.A. Fisher % (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) % (c) Date: July, 1988 % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa Header section Data section
  • 41. Data preparation: ARFF file generation 41 @relation ProductSense @attribute text string @attribute isValid {yes,no} @data 'a party for men only (or one considered suitable for men only)',yes 'a written version of a play or other dramatic composition; used in preparing for a performance',no 'a framework of metal bars used as a partition or a grate; "he cooked hamburgers on the grill"',no 'a television system that transmits over cables',no 'the flues and stops on a pipe organ',yes 'someone employed to make written copies of documents and manuscripts',yes 'a cooking utensil that can be used to cook food by steaming it',no
  • 42. Overview: WORD SENSE DISAMBIGUATION using WEKA 1. Problem specification 2. Data preparation 3. Modeling using the WEKA GUI 4. Using the model from Java/SCALA code 42
  • 43. Modeling using the WEKA GUI: WEKA GUI in Action 43
  • 44. Modeling using the WEKA GUI: Algorithm comparison 44 Algorithm TP Rate FP Rate Precision Recall F-Measure ROC Area J48 0.698 0.34 0.695 0.698 0.696 0.721 Naiver Bayes 0.721 0.299 0.722 0.721 0.721 0.776 Random Forest 0.724 0.297 0.725 0.724 0.725 0.778 LibSVM 0.601 0.601 0.361 0.601 0.451 0.5 Logisitic 0.622 0.398 0.627 0.622 0.624 0.632
  • 45. Overview: WORD SENSE DISAMBIGUATION using WEKA 1. Problem specification 2. Data preparation 3. Modeling using the WEKA GUI 4. Using the model from Java/SCALA code 45
  • 46. Using the model from Java/SCALA code: Source code view  https://github.com/feroshjacob/AJUGDemos  http://localhost:8080 46
  • 47. Agenda • Motivation • Introduction to machine learning • Generating Recommendations • Weka tutorial • Conclusion 47