Slide presentasi ini dibawakan oleh Imron Zuhri dalam acara Seminar & Workshop Pengenalan & Potensi Big Data & Machine Learning yang diselenggarakan oleh KUDO pada tanggal 14 Mei 2016.
3. in 1996, Garry Kasparov was not afraid of a computer, and he won
the next year, he played against a new and improved Deep Blue and lost
4. this is the move that was so surprising, so un-machine-like,
that he was sure the IBM team had cheated
Rd5
Rd1
5. a random move, a computer bug
to kasparov, a sign of superior intelligence
Rd5
Rd1
6. big data analytics, is the culmination
of the machine way of thinking
we can now immensely
extend our memory and computational power
to helped us doing that
8. some definitions
a (hypnotized) user’s perspective
a scientific (witchcraft) field that:
researches fundamental principles from data (potions) and
develops magical algorithms (spells to cast)
(pascal vincent, 2015)
field of study that gives computers the ability to learn without
being explicitly programmed
arthur samuel (1959)
formal definitions (tom mitchell, 1998):
“A machine is said to be learning IF
it improves with:
each experience E
on specific tasks T
with specific performance P
10. 10
three niches for machine learning
data mining: using historical data to improve
decisions
medical records medical knowledge
software applications that are difficult to program
by hand
autonomous driving
image classification
user modeling
automatic recommender systems
source: rong jin, 2013
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49. (some) open problems in machine learning
one-shot learning
unsupervised learning
reinforced learning
artificial general intelligence
“most of human and animal learning
is unsupervised learning. If
intelligence was a cake, unsupervised
learning would be the cake,
supervised learning would be the
icing on the cake, and reinforcement
learning would be the cherry on the
cake. We know how to make the icing
and the cherry, but we don't know
how to make the cake.”
yan lecun
50. challenges in machine learning
data-related:
abundant yet scattered data
unstructured, noisy data
offline-stored data (duh!)
resource-related:
data storage
space constraints
computing power
training time
inve$$$tments
• initial investments
• running costs
52. recent breakthroughs in machine learning
deepmind atari q learner (2014)
plays 5 kinds of atari 2600 games
states: pixels in atari
actions: left/right move
reward: score
algorithm used:
feedforward “q-learning”
conv-net
for unsupervised map of reward
53. recent breakthroughs in machine learning
the translator (2015)
real-time translations of speech
from/into 7 different languages
able to run from even from
resource-constrained embedded
hardware (i.e. smartphones)
uses same engine that was used in
microsoft cortana (creepy!)
54. Reinforcement Learning: DeepMind AlphaGo
google deepmind alphago (2016)
99.8% winning rate
vs other algorithm
first program to defeat
human go champion
algorithm used:
deep neural network
monte carlo search tree
supervised learning from expert games
reinforcement learning vs other alphago instances
55. supervised learning: random forest
deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data,
result:
top 5 are random forest classifier
for kaggle competition, try gbm : xgboost.
56. supervised: deep learning
don’t be fooled, dl research improve
part by part, either new kind of layer,
new activation function, new non-
convex optimization solver, or deeper
neural net.
from rodrigo benenson
deep learning accuracies ranking
57. supervised: deep learning
summary:
relu works better than sigmoid function for activation.
maxout works better when applied to dropconnect for
activation function.
dropout layer works to fight overfitting.
adagrad and adadelta works better if you don’t want to
tune optimization hyperparameter.
deeper layer works: highway layer and residual layer.
58. unsupervised: t-sne
t-stochastic neighbor embedding
maaten and hinton (2008):
mnist data set visualization
works best for data-viz
can be used for clustering too
(if you’d bother to tweak the algo)
59. Given 100 and 1000 label of data, and the other unlabeled (~50.000)
Try to predict 10.000 future data.
● It works! with small label data.
● Now we don’t have to tell some interns or PhD student to label some
data. :)
A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015)
semi-supervised learning: ladder neural networks
60. collaborative filtering: restricted boltzmann machine
rbm for collaborative learning (hinton, 2008):
it has been used in netflix and spotify algo.
it works better than svd!
correlation(svd, rbm) : -1 < c < 1
• can be assembled with svd
to improve the prediction.
61. some advices for applied machine learning research
(this competition)
preprocessing: scaling & imputation
cross-validation: choose best algos
hyperparameter optimization
ensembling n-models: dark knowledge
63. cross-validation: how to choose best algo?
cross-validation is a must!
(tibshirani et.al 2014)
don’t overlap your cross-
validation data partition!
(zhang, data robot)
64. hyperparameter optimization
if you want to search best hyperparamaters:
do random search.
random search is better than grid search
(bengio, 2012)
65. ensembling n-models: dark knowledge
If two model give same accuracy, but low
correlation of prediction output, then we can
improve prediction accuracy by averaging
model prediction.
(Hinton, 2015)
74. do you follow waze instruction during the first one week?
75. would you buy a self-driving car that couldn’t drive
itself in 99 percent of the country?
or that knew nearly nothing about parking,
couldn’t be taken out in snow or heavy rain,
and would drive straight over a gaping pothole?
if your answer is yes, then check out the google self-driving car, model year
2014
81. the current challenges of big data analytics?
heterogeneous
data sources,
systems and
formats
time consuming
and complex
data preparation
process
almost
impossible task
of integrating
various kind of
data
it requires
experts to
analyze big and
complex data
most of the user
interactions are
not intuitive
“Before performing analytics, data scientists must first
format and prepare the raw data for analytics, often with
more than 80% of the effort.”, said Intel Corp. Research
82. what it would be like,
if we can simplify the whole process?
?
?
83. hence our vision
we believe human should not be bogged down by tedious matters.
by reimagining analytics we envisioned the creation of intelligent
machines,
that will free human to focus on solving the world’s toughest
problems.
84. intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
85. intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
86. then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
x-men
87. then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
88. complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
89. complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
90. complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
91. and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
gundam
batman
sith
92. and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
regi
mita
gundam
batman
tom
mediatrac
gundam
batman
sith
93. through a highly intuitive and natural user interface
natural language interface
voice and gesture recognition
ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
107. when we can have intelligent machines that can
connect everything, in a meaningful way…
we can start asking questions, on things we never
thought possible to be asked before
108. can map songs across social
graphs.
Spotify
can give us situational data — where
someone is listening to a song,
when, how and even (to an extent) why.
Shazam
can help us track the growth of a song
using search and streams.
YouTube
are becoming hotbeds for music discovery.
Instagram & Vine
If we can connect all their data together?
109. or if you have a radio station, what sort of playlist that will appeal to
your target audience, if we know, that a sizeable percentage of them
have a hummer?
110. we can even predict specific combination of words, notes and
beats that will increase the chance of putting the song in
billboard top 40 this upcoming season.
111. here are some sample of hidden insights
that we can discover from our own large repository of data,
using our intelligent data integration and data discovery tools
112. when we integrate historical media articles with geodemographic and point of
interest database we can create a model that can predict high probability of fire
incidence down to street level
153. scalability problems - outline
large scale machine learning
mahout - scalable ml on hadoop
jubatus – distributed online real-time ml
vowpal wabbit – fast learning at yahoo/ms
trident ml and storm pattern: ml on storm, yarn
upcoming --- samoa: ml on s4, storm
issues in scalable distributed ml
load balancing
auto scaling
job scheduling
workflow management
data and model parallelism
parameter server framework
peer-to-peer framework
154. scalability problems - outline
distributed deep learning
yahoolda: scalable parallel framework in latent variable models
distbelief – distributed deep learning on cluster
h2o – distributed deep learning on spark
adam at msr – distributed deep learning
dl4j – open source for deep learning on hadoop and spark
petuum – distributed machine learning
singa – distributed deep learning
tensorflow: google large scale distributed dl
mxnet: heterogeneous distributed deep learning
caffee on spark: yahoo
distributed learning and optimization
proximal splitting/auxiliary coordinates;
bundle (sub-gradient);
shotgun: parallelized cdm (coordinate descent method)
asynchronous sgd;
hogwild/dogwild;
159. emerging analytics technology for automatic
analytics on large dimensional data
online deep learning
topological data analysis
fuzzy-rough set based data exploration system
granular computing
kernel set and spatiotemporal analysis
applied differential geometry
non axiomatic reasoning system
intelligent rule and knowledge extraction/discovery
multi agent based modeling
weak signal detection and analysis
bayesian networks analysis
genetic programming
self organizing neural networks
160. and also more humanlike user
interaction and data visualization
technology
eye tracking
glass-free auto stereoscopy
touch sensitive hologram
natural language user interface
tangible user interface
wearable gestural interface
brain-computer interface
sensor network user interface
162. principles for the development of a complete mind:
study the science of art. study the art of science.
develop your senses — especially learn how to see.
realize that everything connects to everything else.
Leonardo DaVinci