Durante il talk verranno illustrati 3 casi d'uso reali di utilizzo del machine learning da parte delle maggiori piattaforme web (Google, Facebook, Amazon, Twitter, PayPal) per l'implementazione di particolari features. Per ciascun esempio verrà spiegato l'algoritmo utilizzato mostrando come realizzare le medesime funzionalità attraverso l'utilizzo di Apache Spark MLlib e del linguaggio Scala.
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Machine Learning Real Life Applications By Examples
1. DATA
DRIVEN
INNOVATION
Rome 2017 | Open Summit
MACHINE LEARNING
REAL LIFE APPLICATIONS
BY EXAMPLES
SPEAKER
MARIO CARTIA
MARIO@BIG-DATA.NINJA
2. DDI
R O M E| 2017
M A RI O C A RTI A
Can machines
think?
Computing machinery and intelligence.
Mind, 59, 433-460 (1950)
Turing A.M.
3. DDI
R O M E| 2017
M A RI O C A RTI A
1968
2001: A Space Odyssey
“I'm sorry Dave, I'm afraid I can't do that”
4. DDI
R O M E| 2017
M A RI O C A RTI A
1982
Supercar
5. DDI
R O M E| 2017
M A RI O C A RTI A
1983
Wargames
6. DDI
R O M E| 2017
M A RI O C A RTI A
1996
Kasparov vs.
Deep Blue
7. DDI
R O M E| 2017
M A RI O C A RTI A
Does Deep Blue use artificial intelligence?
The short answer is "no." Earlier computer designs that tried
to mimic human thinking weren't very good at it. No formula
exists for intuition. So Deep Blue's designers have gone
"back to the future." Deep Blue relies more on computational
power and a simpler search and evaluation function.
The long answer is "no." "Artificial Intelligence" is more
successful in science fiction than it is here on earth, and you
don't have to be Isaac Asimov to know why it's hard to
design a machine to mimic a process we don't understand
very well to begin with.
Source:
https://www.research.ibm.com/deepblue/meet/html/d.3.3a.shtml
8. DDI
R O M E| 2017
M A RI O C A RTI A
Decision Tree (IF... THEN)
9. DDI
R O M E| 2017
M A RI O C A RTI A
“Machine learning is the
subfield of computer science
that gives computers the
ability to learn without being
explicitly programmed.”
Arthur Samuel, 1959
10. DDI
R O M E| 2017
M A RI O C A RTI A
Spam Email Filtering
11. DDI
R O M E| 2017
M A RI O C A RTI A
Email Category Tabs
12. DDI
R O M E| 2017
M A RI O C A RTI A
“If you can't
explain it simply
you don't
understand
it well enough”
13. DDI
R O M E| 2017
M A RI O C A RTI A
SUPERVISED LEARNING
Supervised learning is where you
have input variables (x) and an
output variable (y) and you use an
algorithm to learn the mapping
function from the input to the
output
y=f(x)
14. DDI
R O M E| 2017
M A RI O C A RTI A
SUPERVISED LEARNING
Classification is a general process
related to categorization, the process
in which ideas and objects are
recognized, differentiated, and
understood
A classification system is an approach
to accomplishing classification
15. DDI
R O M E| 2017
M A RI O C A RTI A
CLASSIFICATION
In Machine Learning, Naive
Bayes Classifiers are a family of
simple probabilistic classifiers
based on applying Bayes'
theorem with strong (naive)
independence assumptions
between the features
16. DDI
R O M E| 2017
M A RI O C A RTI A
NAIVE BAYES CLASSIFIERS
Naive Bayes has been studied
extensively since the 1950s and
remains a popular (baseline) method for
text categorization, the problem of
judging documents as belonging to one
category or the other (such as spam or
legitimate, sports or politics, etc.) with
word frequencies as the features
17. DDI
R O M E| 2017
M A RI O C A RTI A
TEXT CATEGORIZATION
SUPERVISED LEARNING
CLASSIFICATION
NAIVE BAYES CLASSIFIER
?
23. DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
24. DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
25. DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
26. DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
27. DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
28. DDI
R O M E| 2017
M A RI O C A RTI A
UNSUPERVISED LEARNING
Unsupervised learning algorithms
are machine learning algorithms
that work without a desired output
label
Essentially, the algorithm attempts
to estimate the underlying structure
of the population of input data
29. DDI
R O M E| 2017
M A RI O C A RTI A
UNSUPERVISED LEARNING
Collaborative Filtering is a method of making
automatic predictions (filtering) about the
interests of a user by collecting preferences or
taste information from many users
(collaborating)
In the more general sense, Collaborative
Filtering is the process of filtering for
information or patterns using techniques
involving collaboration among multiple agents,
viewpoints, data sources, etc.
30. DDI
R O M E| 2017
M A RI O C A RTI A
COLLABORATIVE FILTERING
Applications of Collaborative Filtering typically
involve very large data sets
As the numbers of users and items grow,
traditional CF algorithms will suffer serious
scalability problems
Large web companies use clusters of
machines to scale recommendations for their
millions of users
31. DDI
R O M E| 2017
M A RI O C A RTI A
RECOMMENDATION SYSTEM
UNSUPERVISED LEARNING
COLLABORATIVE FILTERING
USER BASED / ITEM BASED / OTHER
?
36. DDI
R O M E| 2017
M A RI O C A RTI A
Targeted Advertising
37. DDI
R O M E| 2017
M A RI O C A RTI A
Targeted Advertising
38. DDI
R O M E| 2017
M A RI O C A RTI A
UNSUPERVISED LEARNING
Cluster analysis or Clustering is the
task of grouping a set of objects in
such a way that objects in the same
group (called a cluster) are more
similar (in some sense or another)
to each other than to those in other
groups (clusters)
39. DDI
R O M E| 2017
M A RI O C A RTI A
CLUSTERING
K-means clustering is a method of vector
quantization, originally from signal
processing, that is popular for cluster
analysis in data mining
K-means clustering aims to partition n
observations into k clusters in which each
observation belongs to the cluster with
the nearest mean
40. DDI
R O M E| 2017
M A RI O C A RTI A
TARGETED ADVERTISING
UNSUPERVISED LEARNING
CLUSTERING
K-MEANS CLUSTERING
?
43. DDI
R O M E| 2017
M A RI O C A RTI A
TYPICAL ML WORKFLOW
ü Data and problem definition
ü Data collection
ü Data preprocessing
ü Data analysis and modeling with
unsupervised and supervised
learning
ü Process evaluation
44. DDI
R O M E| 2017
M A RI O C A RTI A
EVALUATION METRICS
The root-mean-square deviation
(RMSD) or root-mean-square error
(RMSE) is a frequently used
measure of the differences between
values (sample and population
values) predicted by a model or an
estimator and the values actually
observed
45. DDI
R O M E| 2017
M A RI O C A RTI A
BEYOND ML
Deep learning is a branch of
machine learning based on a set of
algorithms that attempt to model
high level abstractions in data
Deep learning is part of a broader
family of machine learning methods
based on learning representations
46. DDI
R O M E| 2017
M A RI O C A RTI A
BEYOND ML
One of the promises of Deep Learning
is replacing handcrafted features with
efficient algorithms for unsupervised
or semi-supervised feature learning
and hierarchical feature extraction
Some of the representations are
inspired by advances in neuroscience
47. DDI
R O M E| 2017
M A RI O C A RTI A
BEYOND ML
Various Deep Learning architectures
such as deep neural networks have
been applied to fields like computer
vision, automatic speech recognition,
natural language processing, audio
recognition and bioinformatics where
they have been shown to produce state-
of-the-art results on various tasks
50. DDI
R O M E| 2017
M A RI O C A RTI A
ML & BIG DATA
“We don’t have better algorithms.
We just have more data.”
Peter Norvig
Google’s Research Director
51. DDI
R O M E| 2017
M A RI O C A RTI A
ML & BIG DATA
Apache Hadoop is an open-source
software framework used for distributed
storage and processing of big data sets
using clusters built from commodity
hardware
52. DDI
R O M E| 2017
M A RI O C A RTI A
ML & BIG DATA
Apache Spark is a fast and general-purpose
cluster computing system
It provides high-level APIs in Scala, Java, Python
and R, and an optimized engine that supports
general execution graphs
It also supports a rich set of higher-level tools
including Spark SQL for SQL and structured data
processing, MLlib for machine learning, GraphX
for graph processing, and Spark Streaming
53. DDI
R O M E| 2017
M A RI O C A RTI A
ML & BIG DATA
54. DDI
R O M E| 2017
M A RI O C A RTI A
WHY TO USE SCALA?
Spark Survey 2016
55. DDI
R O M E| 2017
M A RI O C A RTI A
WHY TO USE SCALA?
Scala is one of the most exciting languages to be created in the 21st
century. It is a multi-paradigm language that fully supports functional,
object-oriented, imperative and concurrent programming. It also has a
strong type system, and from our point of view, strong type is a
convenient form of self-documenting code.
Scala works on the JVM and has access to the riches of the Java
ecosystem, but it is less verbose than Java. As we employ it for ND4J,
its syntax is strikingly similar to Python, a language that many data
scientists are comfortable with. Like Python, Scala makes
programmers happy, but like Java, it is quite fast.
Finally, Apache Spark is written in Scala, and any library that purports
to work on distributed run times should at the very least be able to
interface with Spark
Source: https://deeplearning4j.org/scala
56. DDI
R O M E| 2017
M A RI O C A RTI A
GRAZIE!
MARIO@BIG-DATA.NINJA