SlideShare une entreprise Scribd logo
1  sur  162
challenges, learnings and opportunities
presented by imron zuhri, adit, and samudra
KUDO codefest 14 May 2016
machine learning
can a machine think?
in 1996, Garry Kasparov was not afraid of a computer, and he won
the next year, he played against a new and improved Deep Blue and lost
this is the move that was so surprising, so un-machine-like,
that he was sure the IBM team had cheated
Rd5
Rd1
a random move, a computer bug
to kasparov, a sign of superior intelligence
Rd5
Rd1
big data analytics, is the culmination
of the machine way of thinking
we can now immensely
extend our memory and computational power
to helped us doing that
what is machine learning
some definitions
 a (hypnotized) user’s perspective
a scientific (witchcraft) field that:
researches fundamental principles from data (potions) and
develops magical algorithms (spells to cast)
 (pascal vincent, 2015)
 field of study that gives computers the ability to learn without
being explicitly programmed
 arthur samuel (1959)
 formal definitions (tom mitchell, 1998):
“A machine is said to be learning IF
it improves with:
 each experience E
 on specific tasks T
 with specific performance P
CURRENT VIEW OF ML FOUNDING DISCIPLINES
10
three niches for machine learning
data mining: using historical data to improve
decisions
 medical records  medical knowledge
software applications that are difficult to program
by hand
 autonomous driving
 image classification
user modeling
 automatic recommender systems
source: rong jin, 2013
(some) open problems in machine learning
 one-shot learning
 unsupervised learning
 reinforced learning
 artificial general intelligence
“most of human and animal learning
is unsupervised learning. If
intelligence was a cake, unsupervised
learning would be the cake,
supervised learning would be the
icing on the cake, and reinforcement
learning would be the cherry on the
cake. We know how to make the icing
and the cherry, but we don't know
how to make the cake.”
yan lecun
challenges in machine learning
 data-related:
 abundant yet scattered data
 unstructured, noisy data
 offline-stored data (duh!)
 resource-related:
 data storage
 space constraints
 computing power
 training time
 inve$$$tments
• initial investments
• running costs
challenges in machine learning
 methodical issues:
 result consistency
(i.e. accuracy)
 overfitting
 algorithm computational efficiency
 miscellaneous:
 architectural differences/
 portability issues
 popularity of non-open standard, vendor-
locked compute libraries/apis
(rawr!)
recent breakthroughs in machine learning
deepmind atari q learner (2014)
plays 5 kinds of atari 2600 games
states: pixels in atari
actions: left/right move
reward: score
algorithm used:
feedforward “q-learning”
conv-net
for unsupervised map of reward
recent breakthroughs in machine learning
the translator (2015)
real-time translations of speech
from/into 7 different languages
able to run from even from
resource-constrained embedded
hardware (i.e. smartphones)
uses same engine that was used in
microsoft cortana (creepy!)
Reinforcement Learning: DeepMind AlphaGo
 google deepmind alphago (2016)
 99.8% winning rate
vs other algorithm
 first program to defeat
human go champion
 algorithm used:
 deep neural network
 monte carlo search tree
 supervised learning from expert games
 reinforcement learning vs other alphago instances
supervised learning: random forest
deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data,
result:
 top 5 are random forest classifier
 for kaggle competition, try gbm : xgboost.
supervised: deep learning
don’t be fooled, dl research improve
part by part, either new kind of layer,
new activation function, new non-
convex optimization solver, or deeper
neural net.
from rodrigo benenson
deep learning accuracies ranking
supervised: deep learning
summary:
 relu works better than sigmoid function for activation.
 maxout works better when applied to dropconnect for
activation function.
 dropout layer works to fight overfitting.
 adagrad and adadelta works better if you don’t want to
tune optimization hyperparameter.
 deeper layer works: highway layer and residual layer.
unsupervised: t-sne
t-stochastic neighbor embedding
maaten and hinton (2008):
mnist data set visualization
 works best for data-viz
 can be used for clustering too
(if you’d bother to tweak the algo)
Given 100 and 1000 label of data, and the other unlabeled (~50.000)
Try to predict 10.000 future data.
● It works! with small label data.
● Now we don’t have to tell some interns or PhD student to label some
data. :)
A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015)
semi-supervised learning: ladder neural networks
collaborative filtering: restricted boltzmann machine
rbm for collaborative learning (hinton, 2008):
 it has been used in netflix and spotify algo.
 it works better than svd!
 correlation(svd, rbm) : -1 < c < 1
• can be assembled with svd
 to improve the prediction.
some advices for applied machine learning research
(this competition)
 preprocessing: scaling & imputation
 cross-validation: choose best algos
 hyperparameter optimization
 ensembling n-models: dark knowledge
raschka(2014):
scaling improve prediction!
gelman(2006)
do prediction for n/a data, then
predict the data with noise
less biased!
data preprocessing: scaling & imputation
cross-validation: how to choose best algo?
 cross-validation is a must!
 (tibshirani et.al 2014)
 don’t overlap your cross-
validation data partition!
 (zhang, data robot)
hyperparameter optimization
if you want to search best hyperparamaters:
do random search.
random search is better than grid search
(bengio, 2012)
ensembling n-models: dark knowledge
If two model give same accuracy, but low
correlation of prediction output, then we can
improve prediction accuracy by averaging
model prediction.
(Hinton, 2015)
the landscape of opportunities
Popular Big Data Industry
Financial Services Telco Web/Media Retail Healthcare Government
• Fraud detection
• Compliance
reporting
• Portfolio analysis
• Customer
statements
• Wire transfer alerts
• Customer
acquisition,
retention, and
profitability
• Subscriber data
management
• Fraud analysis
• Social analysis
• Response times
• Traffic analysis
• Product
affinity/bundling
• Sentiment Analysis
• Content
monetization
• Advertising
optimization
• Optimization of user
experience/ click
stream analysis
• Network
optimization to
support service
levels
• Store operation
analysis
• Customer loyalty
programs
• Collaborative
planning and
forecasting
• Loss prevention
• Supply chain
optimization
• Drug development
and launch cost
reduction
• Regulatory
compliance
• Product quality
• Return on
promotional
investment
• Lowered risk of new
product success
• Security/anti-terror
• Recovery Act public
disclosure
• Budgetary control
and management
• Educational
reporting
• Asset control and
assessment
Environment
monitoring
*cisco 2013-2014
currently the biggest prescriptive analytics engine:
contextual advertising
http://www.flashtalking.com/us/targeted-ads/
another one:
marketplace and services recommendation engine
challenges of implementation
and
what we do with machine learning
do you follow waze instruction during the first one week?
 would you buy a self-driving car that couldn’t drive
itself in 99 percent of the country?
 or that knew nearly nothing about parking,
 couldn’t be taken out in snow or heavy rain,
 and would drive straight over a gaping pothole?
if your answer is yes, then check out the google self-driving car, model year
2014
but
can we trust them enough?
the BIGGEST CHALLENGES in indonesia
DATA SETS
the current analytics technology
human still doing
most of the process
the current challenges of big data analytics?
heterogeneous
data sources,
systems and
formats
time consuming
and complex
data preparation
process
almost
impossible task
of integrating
various kind of
data
it requires
experts to
analyze big and
complex data
most of the user
interactions are
not intuitive
“Before performing analytics, data scientists must first
format and prepare the raw data for analytics, often with
more than 80% of the effort.”, said Intel Corp. Research
what it would be like,
if we can simplify the whole process?
?
?
hence our vision
we believe human should not be bogged down by tedious matters.
by reimagining analytics we envisioned the creation of intelligent
machines,
that will free human to focus on solving the world’s toughest
problems.
intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
x-men
then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
gundam
batman
sith
and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
regi
mita
gundam
batman
tom
mediatrac
gundam
batman
sith
through a highly intuitive and natural user interface
natural language interface
voice and gesture recognition
ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
digital
telco
legal
retail
healthcare
agriculture
multi format
structured
unstructured
unclean
missing data
unstandardized
unconnected
difficult to analyze
cleaned and standardized
enriched and validated
connected at granular level
analytics ready
data
automatic
data collection
automatic
data preparation
automatic
data integration
teritory management
CONFIDENTIAL for internal use only
all of our silo data will have a totally elevated value,
once you connect them all in a meaningful way
are all of our current data connected yet?
Almost…
google is a humongous library index, with a smart
library card search that redirects you to the original
documents
facebook is a giant personal scrapbook of all your
acquaintances that are currently linked by manual
tagging and friends list
source:techglimpse
youtube and instagram are a huge repository of
current knowledge, lifestyle and trends that are still
largely unconnected
now imagine this!
when we can have intelligent machines that can
connect everything, in a meaningful way…
we can start asking questions, on things we never
thought possible to be asked before
can map songs across social
graphs.
Spotify
can give us situational data — where
someone is listening to a song,
when, how and even (to an extent) why.
Shazam
can help us track the growth of a song
using search and streams.
YouTube
are becoming hotbeds for music discovery.
Instagram & Vine
If we can connect all their data together?
or if you have a radio station, what sort of playlist that will appeal to
your target audience, if we know, that a sizeable percentage of them
have a hummer?
we can even predict specific combination of words, notes and
beats that will increase the chance of putting the song in
billboard top 40 this upcoming season.
here are some sample of hidden insights
that we can discover from our own large repository of data,
using our intelligent data integration and data discovery tools
when we integrate historical media articles with geodemographic and point of
interest database we can create a model that can predict high probability of fire
incidence down to street level
productivy optimization
lessons learned including how to scale your ML
scalability problems - outline
 large scale machine learning
 mahout - scalable ml on hadoop
 jubatus – distributed online real-time ml
 vowpal wabbit – fast learning at yahoo/ms
 trident ml and storm pattern: ml on storm, yarn
 upcoming --- samoa: ml on s4, storm
 issues in scalable distributed ml
 load balancing
 auto scaling
 job scheduling
 workflow management
 data and model parallelism
 parameter server framework
 peer-to-peer framework
scalability problems - outline
 distributed deep learning
 yahoolda: scalable parallel framework in latent variable models
 distbelief – distributed deep learning on cluster
 h2o – distributed deep learning on spark
 adam at msr – distributed deep learning
 dl4j – open source for deep learning on hadoop and spark
 petuum – distributed machine learning
 singa – distributed deep learning
 tensorflow: google large scale distributed dl
 mxnet: heterogeneous distributed deep learning
 caffee on spark: yahoo
 distributed learning and optimization
 proximal splitting/auxiliary coordinates;
 bundle (sub-gradient);
 shotgun: parallelized cdm (coordinate descent method)
 asynchronous sgd;
 hogwild/dogwild;
what’s next?
emerging analytics technology for automatic
analytics on large dimensional data
online deep learning
topological data analysis
fuzzy-rough set based data exploration system
granular computing
kernel set and spatiotemporal analysis
applied differential geometry
non axiomatic reasoning system
intelligent rule and knowledge extraction/discovery
multi agent based modeling
weak signal detection and analysis
bayesian networks analysis
genetic programming
self organizing neural networks
and also more humanlike user
interaction and data visualization
technology
eye tracking
glass-free auto stereoscopy
touch sensitive hologram
natural language user interface
tangible user interface
wearable gestural interface
brain-computer interface
sensor network user interface
In the meantime
principles for the development of a complete mind:
study the science of art. study the art of science.
develop your senses — especially learn how to see.
realize that everything connects to everything else.
Leonardo DaVinci

Contenu connexe

Tendances

Genetic algorithms in Data Mining
Genetic algorithms in Data MiningGenetic algorithms in Data Mining
Genetic algorithms in Data MiningAtul Khanna
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber SecurityRishi Kant
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting StartedPython Machine Learning - Getting Started
Python Machine Learning - Getting StartedRafey Iqbal Rahman
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxKunal Kalamkar
 

Tendances (20)

Genetic algorithms in Data Mining
Genetic algorithms in Data MiningGenetic algorithms in Data Mining
Genetic algorithms in Data Mining
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Decision tree
Decision treeDecision tree
Decision tree
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Random forest
Random forestRandom forest
Random forest
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Deep learning
Deep learningDeep learning
Deep learning
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber Security
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting StartedPython Machine Learning - Getting Started
Python Machine Learning - Getting Started
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Randomized Algorithm
Randomized AlgorithmRandomized Algorithm
Randomized Algorithm
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptx
 

Similaire à ML insights from KUDO codefest

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...Dozie Agbo
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinTuri, Inc.
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkSaratoga
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Mathieu DESPRIEE
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
What is Artificial Intelligence - Beginners
What is Artificial Intelligence - BeginnersWhat is Artificial Intelligence - Beginners
What is Artificial Intelligence - BeginnersAnkur Jain
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New BossAndreas Dewes
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsxSangeetaTripathi8
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningGovind Mudumbai
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van TolTalentEvent
 

Similaire à ML insights from KUDO codefest (20)

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Intro to AI.pptx
Intro to AI.pptxIntro to AI.pptx
Intro to AI.pptx
 
What is Artificial Intelligence - Beginners
What is Artificial Intelligence - BeginnersWhat is Artificial Intelligence - Beginners
What is Artificial Intelligence - Beginners
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
SSE 2017 10-09
SSE 2017 10-09SSE 2017 10-09
SSE 2017 10-09
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van Tol
 
AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 

Plus de CodePolitan

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberCodePolitan
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0CodePolitan
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarovCodePolitan
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjajaCodePolitan
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialCodePolitan
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayatCodePolitan
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexanderCodePolitan
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3CodePolitan
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for HackathonCodePolitan
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsCodePolitan
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless ArchitectureCodePolitan
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCodePolitan
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of ThingsCodePolitan
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?CodePolitan
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan
 

Plus de CodePolitan (19)

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium Member
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarov
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potential
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayat
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexander
 
Vison final
Vison   finalVison   final
Vison final
 
Tride
TrideTride
Tride
 
React ftw
React ftwReact ftw
React ftw
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for Hackathon
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.js
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User Profiling
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of Things
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOP
 

Dernier

Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 

Dernier (20)

Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 

ML insights from KUDO codefest

  • 1. challenges, learnings and opportunities presented by imron zuhri, adit, and samudra KUDO codefest 14 May 2016 machine learning
  • 2. can a machine think?
  • 3. in 1996, Garry Kasparov was not afraid of a computer, and he won the next year, he played against a new and improved Deep Blue and lost
  • 4. this is the move that was so surprising, so un-machine-like, that he was sure the IBM team had cheated Rd5 Rd1
  • 5. a random move, a computer bug to kasparov, a sign of superior intelligence Rd5 Rd1
  • 6. big data analytics, is the culmination of the machine way of thinking we can now immensely extend our memory and computational power to helped us doing that
  • 7. what is machine learning
  • 8. some definitions  a (hypnotized) user’s perspective a scientific (witchcraft) field that: researches fundamental principles from data (potions) and develops magical algorithms (spells to cast)  (pascal vincent, 2015)  field of study that gives computers the ability to learn without being explicitly programmed  arthur samuel (1959)  formal definitions (tom mitchell, 1998): “A machine is said to be learning IF it improves with:  each experience E  on specific tasks T  with specific performance P
  • 9. CURRENT VIEW OF ML FOUNDING DISCIPLINES
  • 10. 10 three niches for machine learning data mining: using historical data to improve decisions  medical records  medical knowledge software applications that are difficult to program by hand  autonomous driving  image classification user modeling  automatic recommender systems source: rong jin, 2013
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. (some) open problems in machine learning  one-shot learning  unsupervised learning  reinforced learning  artificial general intelligence “most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake.” yan lecun
  • 50. challenges in machine learning  data-related:  abundant yet scattered data  unstructured, noisy data  offline-stored data (duh!)  resource-related:  data storage  space constraints  computing power  training time  inve$$$tments • initial investments • running costs
  • 51. challenges in machine learning  methodical issues:  result consistency (i.e. accuracy)  overfitting  algorithm computational efficiency  miscellaneous:  architectural differences/  portability issues  popularity of non-open standard, vendor- locked compute libraries/apis (rawr!)
  • 52. recent breakthroughs in machine learning deepmind atari q learner (2014) plays 5 kinds of atari 2600 games states: pixels in atari actions: left/right move reward: score algorithm used: feedforward “q-learning” conv-net for unsupervised map of reward
  • 53. recent breakthroughs in machine learning the translator (2015) real-time translations of speech from/into 7 different languages able to run from even from resource-constrained embedded hardware (i.e. smartphones) uses same engine that was used in microsoft cortana (creepy!)
  • 54. Reinforcement Learning: DeepMind AlphaGo  google deepmind alphago (2016)  99.8% winning rate vs other algorithm  first program to defeat human go champion  algorithm used:  deep neural network  monte carlo search tree  supervised learning from expert games  reinforcement learning vs other alphago instances
  • 55. supervised learning: random forest deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data, result:  top 5 are random forest classifier  for kaggle competition, try gbm : xgboost.
  • 56. supervised: deep learning don’t be fooled, dl research improve part by part, either new kind of layer, new activation function, new non- convex optimization solver, or deeper neural net. from rodrigo benenson deep learning accuracies ranking
  • 57. supervised: deep learning summary:  relu works better than sigmoid function for activation.  maxout works better when applied to dropconnect for activation function.  dropout layer works to fight overfitting.  adagrad and adadelta works better if you don’t want to tune optimization hyperparameter.  deeper layer works: highway layer and residual layer.
  • 58. unsupervised: t-sne t-stochastic neighbor embedding maaten and hinton (2008): mnist data set visualization  works best for data-viz  can be used for clustering too (if you’d bother to tweak the algo)
  • 59. Given 100 and 1000 label of data, and the other unlabeled (~50.000) Try to predict 10.000 future data. ● It works! with small label data. ● Now we don’t have to tell some interns or PhD student to label some data. :) A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015) semi-supervised learning: ladder neural networks
  • 60. collaborative filtering: restricted boltzmann machine rbm for collaborative learning (hinton, 2008):  it has been used in netflix and spotify algo.  it works better than svd!  correlation(svd, rbm) : -1 < c < 1 • can be assembled with svd  to improve the prediction.
  • 61. some advices for applied machine learning research (this competition)  preprocessing: scaling & imputation  cross-validation: choose best algos  hyperparameter optimization  ensembling n-models: dark knowledge
  • 62. raschka(2014): scaling improve prediction! gelman(2006) do prediction for n/a data, then predict the data with noise less biased! data preprocessing: scaling & imputation
  • 63. cross-validation: how to choose best algo?  cross-validation is a must!  (tibshirani et.al 2014)  don’t overlap your cross- validation data partition!  (zhang, data robot)
  • 64. hyperparameter optimization if you want to search best hyperparamaters: do random search. random search is better than grid search (bengio, 2012)
  • 65. ensembling n-models: dark knowledge If two model give same accuracy, but low correlation of prediction output, then we can improve prediction accuracy by averaging model prediction. (Hinton, 2015)
  • 66. the landscape of opportunities
  • 67.
  • 68. Popular Big Data Industry Financial Services Telco Web/Media Retail Healthcare Government • Fraud detection • Compliance reporting • Portfolio analysis • Customer statements • Wire transfer alerts • Customer acquisition, retention, and profitability • Subscriber data management • Fraud analysis • Social analysis • Response times • Traffic analysis • Product affinity/bundling • Sentiment Analysis • Content monetization • Advertising optimization • Optimization of user experience/ click stream analysis • Network optimization to support service levels • Store operation analysis • Customer loyalty programs • Collaborative planning and forecasting • Loss prevention • Supply chain optimization • Drug development and launch cost reduction • Regulatory compliance • Product quality • Return on promotional investment • Lowered risk of new product success • Security/anti-terror • Recovery Act public disclosure • Budgetary control and management • Educational reporting • Asset control and assessment Environment monitoring *cisco 2013-2014
  • 69.
  • 70.
  • 71. currently the biggest prescriptive analytics engine: contextual advertising http://www.flashtalking.com/us/targeted-ads/
  • 72. another one: marketplace and services recommendation engine
  • 73. challenges of implementation and what we do with machine learning
  • 74. do you follow waze instruction during the first one week?
  • 75.  would you buy a self-driving car that couldn’t drive itself in 99 percent of the country?  or that knew nearly nothing about parking,  couldn’t be taken out in snow or heavy rain,  and would drive straight over a gaping pothole? if your answer is yes, then check out the google self-driving car, model year 2014
  • 76. but
  • 77. can we trust them enough?
  • 78. the BIGGEST CHALLENGES in indonesia
  • 80. the current analytics technology human still doing most of the process
  • 81. the current challenges of big data analytics? heterogeneous data sources, systems and formats time consuming and complex data preparation process almost impossible task of integrating various kind of data it requires experts to analyze big and complex data most of the user interactions are not intuitive “Before performing analytics, data scientists must first format and prepare the raw data for analytics, often with more than 80% of the effort.”, said Intel Corp. Research
  • 82. what it would be like, if we can simplify the whole process? ? ?
  • 83. hence our vision we believe human should not be bogged down by tedious matters. by reimagining analytics we envisioned the creation of intelligent machines, that will free human to focus on solving the world’s toughest problems.
  • 84. intelligent machines that can helped us collect the massive amount of data automatically reads and connects to any kind of data, including automatic machine to machine connections structured data printed invoices social media conversation
  • 85. intelligent machines that can helped us collect the massive amount of data automatically reads and connects to any kind of data, including automatic machine to machine connections structured data printed invoices social media conversation
  • 86. then helped us separate the signals from the noise automatic data quality assessments, data cleansing and data filtering regi mita gundam x-men
  • 87. then helped us separate the signals from the noise automatic data quality assessments, data cleansing and data filtering regi mita gundam
  • 88. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam
  • 89. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam batman tom mediatrac
  • 90. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam batman tom mediatrac
  • 91. and finally helped us making sense of the massively connected data contextual search and recommendation intelligent data discovery gundam batman sith
  • 92. and finally helped us making sense of the massively connected data contextual search and recommendation intelligent data discovery regi mita gundam batman tom mediatrac gundam batman sith
  • 93. through a highly intuitive and natural user interface natural language interface voice and gesture recognition ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
  • 95.
  • 96. multi format structured unstructured unclean missing data unstandardized unconnected difficult to analyze cleaned and standardized enriched and validated connected at granular level analytics ready data automatic data collection automatic data preparation automatic data integration
  • 97.
  • 98.
  • 100. all of our silo data will have a totally elevated value, once you connect them all in a meaningful way
  • 101. are all of our current data connected yet?
  • 103. google is a humongous library index, with a smart library card search that redirects you to the original documents
  • 104. facebook is a giant personal scrapbook of all your acquaintances that are currently linked by manual tagging and friends list source:techglimpse
  • 105. youtube and instagram are a huge repository of current knowledge, lifestyle and trends that are still largely unconnected
  • 107. when we can have intelligent machines that can connect everything, in a meaningful way… we can start asking questions, on things we never thought possible to be asked before
  • 108. can map songs across social graphs. Spotify can give us situational data — where someone is listening to a song, when, how and even (to an extent) why. Shazam can help us track the growth of a song using search and streams. YouTube are becoming hotbeds for music discovery. Instagram & Vine If we can connect all their data together?
  • 109. or if you have a radio station, what sort of playlist that will appeal to your target audience, if we know, that a sizeable percentage of them have a hummer?
  • 110. we can even predict specific combination of words, notes and beats that will increase the chance of putting the song in billboard top 40 this upcoming season.
  • 111. here are some sample of hidden insights that we can discover from our own large repository of data, using our intelligent data integration and data discovery tools
  • 112. when we integrate historical media articles with geodemographic and point of interest database we can create a model that can predict high probability of fire incidence down to street level
  • 113.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120. lessons learned including how to scale your ML
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153. scalability problems - outline  large scale machine learning  mahout - scalable ml on hadoop  jubatus – distributed online real-time ml  vowpal wabbit – fast learning at yahoo/ms  trident ml and storm pattern: ml on storm, yarn  upcoming --- samoa: ml on s4, storm  issues in scalable distributed ml  load balancing  auto scaling  job scheduling  workflow management  data and model parallelism  parameter server framework  peer-to-peer framework
  • 154. scalability problems - outline  distributed deep learning  yahoolda: scalable parallel framework in latent variable models  distbelief – distributed deep learning on cluster  h2o – distributed deep learning on spark  adam at msr – distributed deep learning  dl4j – open source for deep learning on hadoop and spark  petuum – distributed machine learning  singa – distributed deep learning  tensorflow: google large scale distributed dl  mxnet: heterogeneous distributed deep learning  caffee on spark: yahoo  distributed learning and optimization  proximal splitting/auxiliary coordinates;  bundle (sub-gradient);  shotgun: parallelized cdm (coordinate descent method)  asynchronous sgd;  hogwild/dogwild;
  • 156.
  • 157.
  • 158.
  • 159. emerging analytics technology for automatic analytics on large dimensional data online deep learning topological data analysis fuzzy-rough set based data exploration system granular computing kernel set and spatiotemporal analysis applied differential geometry non axiomatic reasoning system intelligent rule and knowledge extraction/discovery multi agent based modeling weak signal detection and analysis bayesian networks analysis genetic programming self organizing neural networks
  • 160. and also more humanlike user interaction and data visualization technology eye tracking glass-free auto stereoscopy touch sensitive hologram natural language user interface tangible user interface wearable gestural interface brain-computer interface sensor network user interface
  • 162. principles for the development of a complete mind: study the science of art. study the art of science. develop your senses — especially learn how to see. realize that everything connects to everything else. Leonardo DaVinci