SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Harnessing Neural Networks
Corinna Cortes
Google Research, NY
Harnessing the Power of Neural Networks
Introduction
How do we standardize the output?
How do we speed up inference?
How do we automatically find a good network architecture?
Google’s mission is to organize the world’s information
and make it universally accessible and useful.
Google Translate
Smart reply
in Inbox
10%
of all responses
sent on mobile
LSTM in Action
LSTMs and Extrapolation
They daydream or hallucinate :-)
Feature or bug?
DeepDream Art Auction and Symposium (A&MI)
Magenta
A
ht
Xt
A.I. Duet
https://aiexperiments.withgoogle.com/ai-duet/view/
Harnessing the Power of Neural Networks
Introduction
How do we standardize the output?
How do we speed up inference?
How do we automatically find a good network architecture?
Restricting the Output. Smart Replies.
http://www.kdd.org/kdd2016/papers/files/Paper_1069.pdf
● Ungrammatical and inappropriate answers
○ thanks hon!; Yup, got it thx; Leave me alone!
● Work with a Fixed Response Set
○ Sanitized answers are clustered in semantically similar answers using
label propagation;
○ The answers in the clusters are used to filter the candidate set generated
by the LSTM. Diversity is ensured by using top answers from different
clusters.
● Efficient search via tries
Search Tree, Trie, for Valid Responses
Tuesday Wednesday Tuesday? Wednesday?
I can do
Cluster responses
How about
. !
! What time
works for you?
. What time
works for you?
Computational Complexity
● Exhaustive: R x l
R size of response set, l length of longest sentence
● Beam search: b x l
Typical size of R ~ millions, typical size of b ~ 10-30
● A more elegant solution based on rules
○ Exploit rules to efficiently enlarge the response set:
■ “Can you do Monday?” “Yes, I can do Monday”
■ “Can you do Tuesday?” “Yes, I can do Tuesday”
■ ...
“Can you do <time>?”
“Yes, I can do <time>” or “No, I can do <time + 1>
What if the Response Set in Billions?
Rules for Response Set
Text Normalization for Text-to-Speech, TTS, Systems
Navigation assistant
Text Normalization
Richard Sproat, Navdeep Jaitly, Google: “RNN Approaches to Text Normalization: A Challenge”
https://arxiv.org/pdf/1611.00068.pdf
Break the Task in Two
● Channel model
○ possible normalizations of that token? Sequence of tokens to words.
○ Example: 123
■ one hundred twenty three, one two three, one twenty three, ...
● Language model
○ which one is appropriate to the given context? Words to words.
○ Example: 123
■ 123 King Ave. - the correct reading in American English would
normally be one twenty three.
Combining the Models
One combined LSTM
Silly Mistakes
Add a Grammar to Constrain the Output
Rule: <number> + <measurement abbreviation> => <number> + the possible
verbalizations of the measure abbreviation.
Instantiation: 24.2kg => twenty four point two kilogram, twenty four point two
kilograms, twenty four point two kilo.
Finite State Transducers: a finite state automaton which produces output as well
as reading input, pattern matching, regular expressions.
Thrax Grammar
MEASURE: <number> + <measurement abbreviation> -> <number> +
measurement verbalizations
Input: 5 kg -> five kilo/kilograms/kilogram
MONEY: $ <number> -> <number> dollars
Input composed with FSTs. The output of the FST is used to restrict the output of
the LSTM.
TTS: RNN + FST
Measure and Money
restricted by grammar.
Harnessing the Power of Neural Networks
Introduction
How do we standardize the output?
How do we speed up inference?
How do we automatically find a good network architecture?
One class per image type (horse, car, …), M classes.
Neural network inference: Just to compute the last layer requires MN multiply
adds.
Super-Multiclass Classification Problem
Output layer,
M units:
Last hidden layer, N units:
Asymmetric Hashing
W1
W2
W3
WM
Weights to the output layer, parted in N/k chunks
● Represent each chunk with
a set of cluster centers
(256) using k-means.
● Save the coordinates of the
centers, (ID, coordinates).
● Save each weight vector as
a set of closest IDs,
hashcode.
Asymmetric Hashing
W1
W2
W3
WM
Weights to the output layer, parted in N/k chunks
● Represent each chunk with
a set of cluster centers
(256) using k-means.
● Save the coordinates of the
centers, (ID, coordinates).
● Save each weight vector as
a set of closest IDs,
hashcode.
78 184 15
12 63 192
56 82 72
201 37 51
Asymmetric Hashing, Searching
● For given activation u, divide it into its N/k chunks, uj
:
○ Compute the 256 N/k distances to centers. 256N multiply adds, not MN.
○ Compute the distances to all hash codes:
● MN/k additions needed.
● The “Asymmetric” in “Asymmetric Hashing” refers to the fact that we hash the
weight vectors but not the activation vector.
Asymmetric Hashing
Incredible saving in inference time
Sometimes also with a bit of improved accuracy
Harnessing the Power of Neural Networks
Introduction
How do we standardize the output?
How do we speed up inference?
How do we automatically find a good network architecture?
“Learning to Learn” a.k.a
“Automated Hyperparameter Tuning”
Google: AdaNet, Architecture Search with Reinforcement Learning
MIT: Designing Neural Networks Architectures Using Reinforcement Learning,
Harvard,Toronto, MIT, Intel: Scalable Bayesian Optimization Using Deep Neural
Networks.
Genetic Algorithms, Reinforcement Learning, Boosting Algorithm
Modeling Challenges for ML
The right model choice can significantly improve
the performance. For Deep Learning it is
particularly hard as the search space is huge and
● Difficult non-convex optimization
● Lack of sufficient theory
Questions
● Can neural network architectures be learned
together with their weights?
● Can this problem be solved efficiently and in a
principled way?
● Can we capture the end-to-end process?
AdaNet
● Incremental construction: At each round, the algorithm adds a subnetwork to
the existing neural network;
● Algorithm leverages embeddings previous learned;
● Adaptively grows network, balancing trade-off between empirical error and
model complexity;
● Learning bound:
Experimental Results, AdaNet
CIFAR-10:
60,000 images,
10 classes
SD of all #’s: 0.01
Label Pair AdaNet Log. Reg. NN
deer-truck 0.94 0.90 0.92
deer-horse 0.84 0.77 0.81
automobile-truck 0.85 0.80 0.81
cat-dog 0.69 0.67 0.66
dog-horse 0.84 0.80 0.81
Neural Architecture Search with RL
Neural Architecture Search with RL
Error rates on
CIFAR-10
Perplexity on Penn
Treebank
Current accuracy of NAS on ImageNet: 78%
State-of-Art: 80.x%
“Learning to Learn” a.k.a
“Automated Hyperparameter Tuning”
Google: AdaNet, Architecture Search with Reinforcement Learning
MIT: Designing Neural Networks Architectures Using Reinforcement Learning,
Harvard,Toronto, MIT, Intel: Scalable Bayesian Optimization Using Deep Neural
Networks.
Genetic Algorithms, Reinforcement Learning, Boosting Algorithm
Harnessing the Power of Neural Networks
Introduction
How do we standardize the output?
How do we speed up inference?
How do we automatically find a good network architecture?

Contenu connexe

Tendances

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 

Tendances (20)

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network  Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
 

En vedette

Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
MLconf
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
MLconf
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
MLconf
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
MLconf
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
MLconf
 

En vedette (20)

Layla El Asri, Research Scientist, Maluuba
Layla El Asri, Research Scientist, Maluuba Layla El Asri, Research Scientist, Maluuba
Layla El Asri, Research Scientist, Maluuba
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
 
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
 
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
 

Similaire à Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 

Similaire à Corinna Cortes, Head of Research, Google, at MLconf NYC 2017 (20)

Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
supervised.pptx
supervised.pptxsupervised.pptx
supervised.pptx
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Practical ML
Practical MLPractical ML
Practical ML
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentation
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 

Plus de MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Plus de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

  • 1. Harnessing Neural Networks Corinna Cortes Google Research, NY
  • 2. Harnessing the Power of Neural Networks Introduction How do we standardize the output? How do we speed up inference? How do we automatically find a good network architecture?
  • 3. Google’s mission is to organize the world’s information and make it universally accessible and useful.
  • 5. Smart reply in Inbox 10% of all responses sent on mobile
  • 7. LSTMs and Extrapolation They daydream or hallucinate :-) Feature or bug?
  • 8. DeepDream Art Auction and Symposium (A&MI)
  • 10. Harnessing the Power of Neural Networks Introduction How do we standardize the output? How do we speed up inference? How do we automatically find a good network architecture?
  • 11. Restricting the Output. Smart Replies. http://www.kdd.org/kdd2016/papers/files/Paper_1069.pdf ● Ungrammatical and inappropriate answers ○ thanks hon!; Yup, got it thx; Leave me alone! ● Work with a Fixed Response Set ○ Sanitized answers are clustered in semantically similar answers using label propagation; ○ The answers in the clusters are used to filter the candidate set generated by the LSTM. Diversity is ensured by using top answers from different clusters. ● Efficient search via tries
  • 12. Search Tree, Trie, for Valid Responses Tuesday Wednesday Tuesday? Wednesday? I can do Cluster responses How about . ! ! What time works for you? . What time works for you?
  • 13. Computational Complexity ● Exhaustive: R x l R size of response set, l length of longest sentence ● Beam search: b x l Typical size of R ~ millions, typical size of b ~ 10-30
  • 14. ● A more elegant solution based on rules ○ Exploit rules to efficiently enlarge the response set: ■ “Can you do Monday?” “Yes, I can do Monday” ■ “Can you do Tuesday?” “Yes, I can do Tuesday” ■ ... “Can you do <time>?” “Yes, I can do <time>” or “No, I can do <time + 1> What if the Response Set in Billions?
  • 15. Rules for Response Set Text Normalization for Text-to-Speech, TTS, Systems Navigation assistant
  • 16. Text Normalization Richard Sproat, Navdeep Jaitly, Google: “RNN Approaches to Text Normalization: A Challenge” https://arxiv.org/pdf/1611.00068.pdf
  • 17. Break the Task in Two ● Channel model ○ possible normalizations of that token? Sequence of tokens to words. ○ Example: 123 ■ one hundred twenty three, one two three, one twenty three, ... ● Language model ○ which one is appropriate to the given context? Words to words. ○ Example: 123 ■ 123 King Ave. - the correct reading in American English would normally be one twenty three.
  • 18. Combining the Models One combined LSTM
  • 20. Add a Grammar to Constrain the Output Rule: <number> + <measurement abbreviation> => <number> + the possible verbalizations of the measure abbreviation. Instantiation: 24.2kg => twenty four point two kilogram, twenty four point two kilograms, twenty four point two kilo. Finite State Transducers: a finite state automaton which produces output as well as reading input, pattern matching, regular expressions.
  • 21. Thrax Grammar MEASURE: <number> + <measurement abbreviation> -> <number> + measurement verbalizations Input: 5 kg -> five kilo/kilograms/kilogram MONEY: $ <number> -> <number> dollars Input composed with FSTs. The output of the FST is used to restrict the output of the LSTM.
  • 22. TTS: RNN + FST Measure and Money restricted by grammar.
  • 23. Harnessing the Power of Neural Networks Introduction How do we standardize the output? How do we speed up inference? How do we automatically find a good network architecture?
  • 24. One class per image type (horse, car, …), M classes. Neural network inference: Just to compute the last layer requires MN multiply adds. Super-Multiclass Classification Problem Output layer, M units: Last hidden layer, N units:
  • 25. Asymmetric Hashing W1 W2 W3 WM Weights to the output layer, parted in N/k chunks ● Represent each chunk with a set of cluster centers (256) using k-means. ● Save the coordinates of the centers, (ID, coordinates). ● Save each weight vector as a set of closest IDs, hashcode.
  • 26. Asymmetric Hashing W1 W2 W3 WM Weights to the output layer, parted in N/k chunks ● Represent each chunk with a set of cluster centers (256) using k-means. ● Save the coordinates of the centers, (ID, coordinates). ● Save each weight vector as a set of closest IDs, hashcode. 78 184 15 12 63 192 56 82 72 201 37 51
  • 27. Asymmetric Hashing, Searching ● For given activation u, divide it into its N/k chunks, uj : ○ Compute the 256 N/k distances to centers. 256N multiply adds, not MN. ○ Compute the distances to all hash codes: ● MN/k additions needed. ● The “Asymmetric” in “Asymmetric Hashing” refers to the fact that we hash the weight vectors but not the activation vector.
  • 28. Asymmetric Hashing Incredible saving in inference time Sometimes also with a bit of improved accuracy
  • 29. Harnessing the Power of Neural Networks Introduction How do we standardize the output? How do we speed up inference? How do we automatically find a good network architecture?
  • 30. “Learning to Learn” a.k.a “Automated Hyperparameter Tuning” Google: AdaNet, Architecture Search with Reinforcement Learning MIT: Designing Neural Networks Architectures Using Reinforcement Learning, Harvard,Toronto, MIT, Intel: Scalable Bayesian Optimization Using Deep Neural Networks. Genetic Algorithms, Reinforcement Learning, Boosting Algorithm
  • 31. Modeling Challenges for ML The right model choice can significantly improve the performance. For Deep Learning it is particularly hard as the search space is huge and ● Difficult non-convex optimization ● Lack of sufficient theory Questions ● Can neural network architectures be learned together with their weights? ● Can this problem be solved efficiently and in a principled way? ● Can we capture the end-to-end process?
  • 32. AdaNet ● Incremental construction: At each round, the algorithm adds a subnetwork to the existing neural network; ● Algorithm leverages embeddings previous learned; ● Adaptively grows network, balancing trade-off between empirical error and model complexity; ● Learning bound:
  • 33. Experimental Results, AdaNet CIFAR-10: 60,000 images, 10 classes SD of all #’s: 0.01 Label Pair AdaNet Log. Reg. NN deer-truck 0.94 0.90 0.92 deer-horse 0.84 0.77 0.81 automobile-truck 0.85 0.80 0.81 cat-dog 0.69 0.67 0.66 dog-horse 0.84 0.80 0.81
  • 35. Neural Architecture Search with RL Error rates on CIFAR-10 Perplexity on Penn Treebank Current accuracy of NAS on ImageNet: 78% State-of-Art: 80.x%
  • 36. “Learning to Learn” a.k.a “Automated Hyperparameter Tuning” Google: AdaNet, Architecture Search with Reinforcement Learning MIT: Designing Neural Networks Architectures Using Reinforcement Learning, Harvard,Toronto, MIT, Intel: Scalable Bayesian Optimization Using Deep Neural Networks. Genetic Algorithms, Reinforcement Learning, Boosting Algorithm
  • 37. Harnessing the Power of Neural Networks Introduction How do we standardize the output? How do we speed up inference? How do we automatically find a good network architecture?