An introduction to Machine Learning and we used it at FreshBooks to automatically categorize our customers' expenses. Presented at the November 2015 ExploreTech Toronto meetup by Alex Vermeulen & Tobi Ogunbiyi
2. What is Machine Learning?
A computer program is said to learn if its
measured performance on a task improves with
experience.
2
3. • Google’s self-driving car
• Optical Character Recognition (OCR)
• Google street view
• Facebook
Machine Learning Applications
3
4. Supervised Learning
The machine is “trained” using examples for
which we know the correct answer.
- Labeled data
- Used for classification or prediction
4
5. • Features: shape, size, colour, and sound
• Labels: “cow”, “pig”, “chicken”, “llama”
Supervised Learning
Example
5
6. Unsupervised Learning
Tries to find patterns and groupings by analyzing
the characteristics of the data
• Unlabeled data
• Identifies patterns and groupings in the data
6
7. • No labels, just looking to group similar animals
• Features: shape, size, colour, and sound
Unsupervised Learning
7
16. 13
#5795# QTH Toronto ON
#991# Toronto ON
Roots #130 Etobicoke ON
True North Climbing Toronto ON
Tim Hortons
Eddie Bauer Canada
M9C
Pre Processing
17. 13
#5795# QTH Toronto ON
#991# Toronto ON
Roots #130 Etobicoke ON
True North Climbing Toronto ON
Tim Hortons
Eddie Bauer Canada
M9C
Pre Processing
18. 13
QTH Toronto ON
Toronto ON
Roots Etobicoke ON
True North Climbing Toronto ON
Tim Hortons
Eddie Bauer Canada
Pre Processing
19. 14
Tim Hortons QTH Toronto ON
Vectorize
(i)
(ii) Eddie Bauer Canada Toronto ON
20. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
21. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
-
-
-
22. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
-
-
-
23. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i) 1
-
-
-
-
-
-
-
(ii) Eddie Bauer Canada Toronto ON
24. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i) 1
-
-
-
-
-
-
-
(ii) Eddie Bauer Canada Toronto ON
25. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
-
-
-
-
-
-
26. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i)
(ii) Eddie Bauer Canada Toronto ON
27. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i) (ii)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
-
-
-
28. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i) (ii)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
-
-
-
29. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i) (ii)
(ii) Eddie Bauer Canada Toronto ON
-
-
-
-
-
1
-
-
30. 14
Tim Hortons QTH Toronto ON
Vectorize
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
1
1
1
1
1
0
0
0
(i) (ii)
(ii) Eddie Bauer Canada Toronto ON
0
0
0
1
1
1
1
1
33. 15
Transform
Term Frequency - Inverse Document Frequency
= term frequency x inverse document freq.
A numerical statistic that reflects how important
or descriptive a term is to a single document in a
collection.
34. 15
Transform
Term Frequency - Inverse Document Frequency
= term frequency x inverse document freq.
term freq. = occurrences of term in document
35. 15
Transform
inverse document freq.= log
total documents
docs containing the term
Term Frequency - Inverse Document Frequency
= term frequency x inverse document freq.
term freq. = occurrences of term in document
36. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
37. 16
Tf-Idf Example
tf(tim, d1) = occurrences of term
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
38. 16
Tf-Idf Example
tf(tim, d1) = occurrences of term
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
39. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i) tf(tim, d1) = 1
40. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
total docs
docs cont. term
tf(tim, d1) = 1
41. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
total docs
docs cont. term
tf(tim, d1) = 1
42. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
docs cont. term
2
tf(tim, d1) = 1
43. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
docs cont. term
2
tf(tim, d1) = 1
44. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
2
1
tf(tim, d1) = 1
45. 16
Tf-Idf Example
= 0.301
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
2
1
tf(tim, d1) = 1
46. 16
Tf-Idf Example
tfidf(tim, d1) = 1 x 0.301 = 0.301
= 0.301
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i)
idf(tim, D) = log
2
1
tf(tim, d1) = 1
47. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i) .301
.301
.301
0
0
0
0
0
48. 16
Tf-Idf Example
Tim Hortons QTH Toronto ON
“tim”
“hortons”
“qth”
“toronto”
“on”
“eddie”
“bauer”
“canada”
(i)
(ii) Eddie Bauer Canada Toronto ON
1
1
1
1
1
0
0
0
(i) .301
.301
.301
0
0
0
0
0
53. • Multinomial Logistic Regression (supervised)
• Used for unordered, categorical outputs with
more than 2 possible categories
• Series of linear sub-models which output a real
number
18
Classify
65. Refinement
• How do you know if your model is good enough?
• What can you do to improve your model?
• Adjust the amount of data
• Clean irrelevant parts of the data
• Tweak parameters of the algorithm
24
72. Prediction
27
“Advertising”
“Car & Truck Expenses
“Meals & Entertainment”
“Personal”
“Rent or Lease”
“Travel”
“Utilities”
0.020
0.018
0.710
0.230
0.003
0.011
0.008
Tim Hortons 3335 QPS Toronto ON
73. Prediction
27
“Advertising”
“Car & Truck Expenses
“Meals & Entertainment”
“Personal”
“Rent or Lease”
“Travel”
“Utilities”
0.020
0.018
0.710
0.230
0.003
0.011
0.008
Tim Hortons 3335 QPS Toronto ON
74. Prediction
27
“Advertising”
“Car & Truck Expenses
“Meals & Entertainment”
“Personal”
“Rent or Lease”
“Travel”
“Utilities”
0.020
0.018
0.710
0.230
0.003
0.011
0.008
Tim Hortons 3335 QPS Toronto ON
Threshold: 0.6
75. Getting Started
• What kind of data do you have?
• Labeled or unlabeled?
• What questions are you trying to answer?
• Make predictions
• Label or classify
• Identify patterns or groupings
28
76. Lessons Learned
• Machines are intelligent, but not magicians
• It’s easy to know you’re wrong, but harder to
know when you’re right
• Some people prefer to have control
29
77. Take away?
• Opens up new opportunities
• Potential to deliver amazing user experiences
• Machine Learning is fun!
30
79. Resources
• Interested in following along with FreshBooks
Learnings?
• medium.com/@freshbookspd
• Want to learn more about Machine Learning?
• udacity.com
• coursera.org
• Python’s scikit-learn
32