Our Summer 2017 release presents Deepnets, a highly effective supervised learning method that solves classification and regression problems in a way that can match or exceed human performance, especially in domains where effective feature engineering is difficult. BigML Deepnets bring two unique parameter optimization options: Automatic Network Search and Structure Suggestion. These options avoid the difficult and time-consuming work of hand-tuning the algorithm and ensure the best network among all possible networks to solve your problem. This new resource is available from the BigML Dashboard, API, as well as from WhizzML for its automation. Deepnets are state-of-the-art in many important supervised learning applications.
2. BigML, Inc 2Summer Release Webinar - October 2017
Summer 2017 Release
CHARLES PARKER, PH.D. - VP of Machine
Learning Algorithms
Please enter questions into chat box – We will
answer some via chat and others at the end of the
session
https://bigml.com/releases/summer-2017
ATAKAN CETINSOY - VP of Predictive Applications
Resources
Moderator
Speaker
Contact support@bigml.com
Twitter @bigmlcom
Questions
3. BigML, Inc 3Summer Release Webinar - October 2017
Power To The People!
• Why another supervised learning algorithm?
• Deep neural networks have been shown to be
state of the art in several niche applications
• Vision
• Speech recognition
• NLP
• While powerful, these networks have historically
been difficult for novices to train
4. BigML, Inc 4Summer Release Webinar - October 2017
Goals of BigML Deepnets
• What BigML Deepnets are not (yet)
• Convolutional networks
• Recurrent networks (e.g., LSTM Networks)
• These solve a particular type of sub-problem, and
are carefully engineered by experts to do so
• Can we bring some of the power of Deep Neural
Networks to your problem, even if you have no
deep learning expertise?
• Let’s try to separate deep neural network myths
from realities
5. BigML, Inc 5Summer Release Webinar - October 2017
Myth #1
Deep neural networks are the next step in evolution,
destined to perfect humanity or destroy it utterly.
6. BigML, Inc 6Summer Release Webinar - October 2017
Some Weaknesses
• Trees
• Pro: Massive representational power that expands as the data
gets larger; efficient search through this space
• Con: Difficult to represent smooth functions and functions of
many variables
• Ensembles mitigate some of these difficulties
• Logistic Regression
• Pro: Some smooth, multivariate, functions are not a problem;
fast optimization
• Con: Parametric - If decision boundary is nonlinear, tough luck
• Can these be mitigated?
7. BigML, Inc 7Summer Release Webinar - October 2017
Logistic Level Up
Outputs
Inputs
8. BigML, Inc 8Summer Release Webinar - October 2017
Logistic Level Up
wi
Class “a”, logistic(w, b)
9. BigML, Inc 9Summer Release Webinar - October 2017
Logistic Level Up
Outputs
Inputs
Hidden layer
10. BigML, Inc 10Summer Release Webinar - October 2017
Logistic Level Up
Class “a”, logistic(w, b)
Hidden node 1,
logistic(w, b)
11. BigML, Inc 11Summer Release Webinar - October 2017
Logistic Level Up
Class “a”, logistic(w, b)
Hidden node 1,
logistic(w, b)
n
hidden nodes?
12. BigML, Inc 12Summer Release Webinar - October 2017
Logistic Level Up
Class “a”, logistic(w, b)
Hidden node 1,
logistic(w, b)
n
hidden
layers?
13. BigML, Inc 13Summer Release Webinar - October 2017
Logistic Level Up
Class “a”, logistic(w, b)
Hidden node 1,
logistic(w, b)
14. BigML, Inc 14Summer Release Webinar - October 2017
Myth #2
Deep neural networks are great for the established
marquee applications, but less interesting for general use.
15. BigML, Inc 15Summer Release Webinar - October 2017
Parameter Paralysis
Parameter Name Possible Values
Descent Algorithm Adam, RMSProp, Adagrad, Momentum, FTRL
Number of hidden layers 0 - 32
Activation Function (per layer) relu, tanh, sigmoid, softplus, etc.
Number of nodes (per layer) 1 - 8192
Learning Rate 0 - 1
Dropout Rate 0 - 1
Batch size 1 - 1024
Batch Normalization True, False
Learn Residuals True, False
Missing Numerics True, False
Objective weights Weight per class
. . . and that’s ignoring the parameters that are
specific to the descent algorithm.
16. BigML, Inc 16Summer Release Webinar - October 2017
What Can We Do?
• Clearly there are too many parameters to fuss with
• Setting them takes significant expert knowledge
• Solution: Metalearning (a good initial guess)
• Solution: Network search (try a bunch)
17. BigML, Inc 17Summer Release Webinar - October 2017
Metalearning
We trained 296,748 deep neural networks
so you don’t have to!
18. BigML, Inc 18Summer Release Webinar - October 2017
Bayesian Parameter Optimization
Model and EvaluateStructure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6
19. BigML, Inc 19Summer Release Webinar - October 2017
Bayesian Parameter Optimization
Model and EvaluateStructure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6
0.75
20. BigML, Inc 20Summer Release Webinar - October 2017
Bayesian Parameter Optimization
Model and EvaluateStructure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6
0.75
0.48
21. BigML, Inc 21Summer Release Webinar - October 2017
Bayesian Parameter Optimization
Model and EvaluateStructure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6
0.75
0.48
0.91
23. BigML, Inc 23Summer Release Webinar - October 2017
Benchmarking
• The ML world is filled with crummy benchmarks
• Not enough datasets
• No cross-validation
• Only one metric
• Solution: Roll our own
• 50+ datasets, 5 replications of 10-fold CV
• 10 different metrics
• 30+ competing algorithms (R, scikit-learn, weka, xgboost)
http://www.clparker.org/ml_benchmark/
24. BigML, Inc 24Summer Release Webinar - October 2017
Myth #3
Deep neural networks have such spectacular performance
that all other supervised learning techniques are now irrelevant
25. BigML, Inc 25Summer Release Webinar - October 2017
Caveat Emptor
• Things that make deep learning less useful:
• Small data (where that could still be thousands of instances)
• Problems where you could benefit by iterating quickly (better
features always beats better models)
• Problems that are easy, or for which top-of-the-line
performance isn’t absolutely critical
• Remember deep learning is just another sort
of supervised learning algorithm
“…deep learning has existed in the neural network community for over 20 years. Recent advances are
driven by some relatively minor improvements in algorithms and models and by the availability of large
data sets and much more powerful collections of computers.” — Stuart Russell
26. BigML, Inc 26Summer Release Webinar - October 2017
https://bigml.com/releases/summer-2017
Learn about Deepnets