Dl applicationlandscape-mar2018-180405144127

Deep Learning:
Application landscape
Grigory Sapunov
Private Event / Mar 2018
gs@inten.to

AI/ML/DL
● Artificial Intelligence (AI) is a broad field of
study dedicated to complex problem solving.
● Machine Learning (ML) is usually considered
as a subfield of AI. ML is a data-driven
approach focused on creating algorithms that
has the ability to learn from the data without
being explicitly programmed.
● Deep Learning (DL) is a subfield of ML focused
on deep neural networks (NN) able to
automatically learn hierarchical
representations.

Different approaches to solving problems

Deep Learning success: why now?

Typical image-related tasks
https://research.facebook.com/blog/learning-to-segment/
Detection task is harder than classification, but both are almost done.
And with better-than-human quality.

Human quality is estimated as ~5.1% error rate on this dataset (0.051)
From Lex Fridman slides: https://selfdrivingcars.mit.edu/
Image recognition quality on ImageNet dataset

Example: Semantic Segmentation

https://stanfordmlgroup.github.io/projects/chexnet/
Example: Radiologist-Level Pneumonia Detection

Example: Image Colorization
Learning Representations for Automatic Colorization https://arxiv.org/abs/1603.06668

Example: Photo-realistic Style Transfer
https://arxiv.org/abs/1703.07511 Deep Photo Style Transfer

Example: Background removal
https://towardsdatascience.com/background-removal-with-deep-learning-c4f2104b3157

Example: Object removal
http://hi.cs.waseda.ac.jp/~iizuka/projects/completion/en/

Example: Image completion
http://hi.cs.waseda.ac.jp/~iizuka/projects/completion/en/

Example: Learning Lip Sync from Audio
http://grail.cs.washington.edu/projects/AudioToObama/
https://www.youtube.com/watch?v=9Yq67CjDqvw

Example: DeepFakes, FakeApp
https://thenextweb.com/artificial-intelligence/2018/02/21/deepfakes-algorithm-nails-donald-trump-in-most-convincing-fake-yet/

New kid on the block: GAN
https://www.technologyreview.com/lists/technologies/2018/

Example: Generating images by GAN
Progressive Growing of GANs for Improved Quality, Stability, and Variation,
https://github.com/tkarras/progressive_growing_of_gans
https://www.youtube.com/watch?v=XOxxPcy5Gr4

GAN rapid evolution
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
https://arxiv.org/abs/1802.07228

Example: Multi-Domain Image-to-Image Translation
https://github.com/yunjey/StarGAN

Example: Unsupervised Image-to-Image Translation
http://research.nvidia.com/publication/2017-12_Unsupervised-Image-to-Image-Translation
https://www.youtube.com/watch?v=nlyXoX2aIek

What’s with the Big Picture?
https://www.engadget.com/2018/01/23/photo-stitch-ai-fail-the-big-picture/

Still some issues exist: Reasoning
Deep learning is mainly about perception, but there is a lot of inference involved in
everyday human reasoning.
● Neural networks lack common sense
● Cannot find information by inference
● Cannot explain the answer
○ It could be a must-have requirement in
some areas, i.e. law, medicine.
○ GDPR is coming
The most fruitful approach is likely to be a hybrid
neural-symbolic system. Topic of active research
right now.

Adversarial Examples
https://spectrum.ieee.org/cars-that-think/transportation/sensors/slight-street-sign-modifications-can-fool-machine-learning-algorithms

Robust Adversarial Examples
https://blog.openai.com/robust-adversarial-inputs/

Physical Adversarial Examples
http://www.labsix.org/physical-objects-that-fool-neural-nets/

Adversarial Patch

Computer & Human Adversarial Examples
https://spectrum.ieee.org/the-human-os/robotics/artificial-intelligence/hacking-the-brain-with-adversarial-images

Deep Learning and NLP
Variety of tasks:
● Classification: language detection, genre and topic detection,
positive/negative sentiment analysis, authorship detection, …
● Fact extraction: people and company names, geography, prices, dates,
product names, …
● Language modeling, Part of speech recognition
● Key phrase extraction
● Finding synonyms
● Machine translation
● Search (written and spoken)
● Question answering
● Dialog systems

Example: Entity Extraction
https://aws.amazon.com/blogs/aws/amazon-comprehend-continuously-trained-natural-language-processing/

Example: Neural Machine Translation vs. other
https://research.googleblog.com/2016/09/a-neural-network-for-machine.html

Example: Machine Translation Quality Evolution
https://bit.ly/mt_mar2018

Example: Legal document analyzing / NDA
https://www.prnewswire.com/news-releases/artificial-intelligence-more-accurate-than-lawyers-for-reviewing-contracts-new-study-reveals-300603781.html
“The highest performing lawyer in the
study achieved 94% accuracy -
matching the AI - while the lowest
performing lawyer achieved an average
67% accuracy. The challenge took the
LawGeex AI 26 seconds to complete,
compared to an average of 92 minutes
for the lawyers. The longest time taken
by a lawyer to complete the test was
156 minutes, and the shortest time was
51 minutes.”

Example: Legal document analyzing / Privacy policies
https://www.wired.com/story/polisis-ai-reads-privacy-policies-so-you-dont-have-to/
“In about 30 seconds, Polisis can read
a privacy policy it's never seen before
and extract a readable summary,
displayed in a graphic flow chart, of
what kind of data a service collects,
where that data could be sent, and
whether a user can opt out of that
collection or sharing.”

https://research.googleblog.com/2017/05/efficient-smart-reply-now-for-gmail.html
Example: Text generation / Smart Reply

https://arxiv.org/abs/1708.08151 Automated Crowdturfing Attacks and Defenses in Online Review Systems
Example: Review generation (Human-like!)

Example: Seq2SQL
https://arxiv.org/abs/1709.00103 Seq2SQL: Generating Structured Queries from Natural Language ...

Example: Question Answering
SQuAD: 100,000+ Questions for Machine Comprehension of Text, https://arxiv.org/abs/1606.05250
https://rajpurkar.github.io/SQuAD-explorer/
http://u.cs.biu.ac.il/~yogo/squad-vs-human.pdf

https://blog.drift.com/chatbots-report/

Still many problems with chatbots
http://www.eweek.com/big-data-and-analytics/state-of-chatbots-in-2018-rapidly-moving-into-the-mainstream
Key PointSource findings include:
● When AI is present, half of (49 percent) consumers are already willing to
shop more frequently, 34 percent will spend more money and 38 percent will
share their experiences with friends and family.
● 51 percent of consumers still anticipate frustrations around chatbots not
understanding what they’re looking for; 44 percent question the accuracy
of the information chatbots provide.
● More than half (54 percent) of consumers would still prefer to talk to a
customer service representative.
● If a customer is on hold with a customer service rep, 34 percent of customers
want to switch to a chatbot after 5 minutes have passed. However, 59
percent get frustrated if a chatbot doesn’t resolve their inquiry in that same
time.

Text + Image / Multimodal learning

DL/Multi-modal Learning
Deep Learning models become multi-modal: they use 2+ modalities
simultaneously, i.e.:
● Image caption generation: images + text
● Search Web by an image: images + text
● Video describing: the same but added time dimension
● Visual question answering: images + text
● Speech recognition: audio + video (lip motion)
● Image classification and navigation: RGB-D (color + depth)
Will be possible to match different modalities easily.

Example: Caption Generation (text by image)
http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”

Example: NeuralTalk and Walk
Ingredients:
● https://github.com/karpathy/neuraltalk2
Project for learning Multimodal Recurrent Neural Networks that describe
images with sentences
● Webcam/notebook
Result:
● https://vimeo.com/146492001

More hacking: NeuralTalk and Walk

Example: Video description (text by video)
https://vsubhashini.github.io/s2vt.html

Example: Image generation by text
AttnGAN: Fine-Grained Text to Image Generation with
Attentional Generative Adversarial Networks, https://arxiv.org/abs/1711.10485

Example: Code generation by image
pix2code: Generating Code from a Graphical User Interface Screenshot,

SketchCode: Go from idea to HTML in 5 seconds
Automated front-end development using deep learning
https://blog.insightdatascience.com/automated-front-end-development-using-deep-learning-3169dd086e82

Speech Recognition: Word Error Rate (WER) [2017]
“Google’s speech recognition technology now has a 4.9% word error rate” (2017)
https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/
Microsoft “It can now transcribe human speech with a 5.1% error rate”
http://uk.businessinsider.com/microsofts-speech-recognition-5-1-error-rate-human-level-accuracy-2017-8
IBM. “The company has reached a 5.5 percent word error rate that's nearly on par
with humans.”
https://www.engadget.com/2017/03/10/ibm-speech-recognition-accuracy-record/

Speech Recognition: Lip Reading
“This lip reading performance beats a professional lip reader on videos from BBC
television, and we also demonstrate that visual information helps to improve
speech recognition performance even when the audio is available.”
Lip Reading Sentences in the Wild, https://arxiv.org/abs/1611.05358
“To the best of our knowledge, LipNet is the first end-to-end sentence-level
lipreading model that simultaneously learns spatiotemporal visual features and a
sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in
sentence-level, overlapped speaker split task, outperforming experienced human
lipreaders and the previous 86.4% word-level state-of-the-art accuracy.“
LipNet: End-to-End Sentence-level Lipreading, https://arxiv.org/abs/1611.01599

Case: Amazon Echo
Amazon Alexa is in more than 20 million devices. The vast majority of these are in the
Amazon Echo portfolio.
https://www.voicebot.ai/2017/10/27/bezos-says-20-million-amazon-alexa-devices-sold/

Case: Skype Live Translation
Translating voice calls and video calls in 8 languages and instant messages in over 50.
https://www.skype.com/en/features/skype-translator/

Case: Google Pixel Buds
Google packed its headphones (in combination with the Pixel 2) with the power to
translate between 40 languages, literally in real-time. The company has finally done
what science fiction and countless Kickstarters have been promising us, but failing
to deliver on, for years. This technology could fundamentally change how we
communicate across the global community.
https://www.engadget.com/2017/10/04/google-pixel-buds-translation-change-the-world/

● “Our approach does not use complex linguistic and acoustic features as input. Instead, we generate
human-like speech from text using neural networks trained using only speech examples and
corresponding text transcripts.”
Speech Synthesis: Tacotron 2 (Google, 2017)
https://research.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html

● “Deep Voice 3 introduces a completely novel neural network architecture for speech synthesis. This
novel architecture trains an order of magnitude faster, allowing us to scale over 800 hours of
training data and synthesize speech from over 2,400 voices, which is more than any other
previously published text-to-speech model.”
Speech Synthesis: Deep Voice 3 (Baidu, 2017)
http://research.baidu.com/deep-voice-3-2000-speaker-neural-text-speech/

But the same problem with adversarial examples...
Did you hear that? Adversarial Examples Against Automatic Speech Recognition

Did you hear that? Adversarial Examples Against Automatic Speech Recognition

Drone control
http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/
This drone can automatically follow forest
trails to track down lost hikers

Car control
Meet the 26-Year-Old Hacker Who Built a
Self-Driving Car... in His Garage
https://www.youtube.com/watch?v=KTrgRYa2wbI

Car driving
https://www.youtube.com/watch?v=YuyT2SDcYrU
“Actually a “Perception to Action” system. The visual perception and control
system is a Deep learning architecture trained end to end to transform pixels
from the cameras into steering angles. And this car uses regular color cameras,
not LIDARS like the Google cars. It is watching the driver and learns.”

Example: Sensorimotor Deep Learning
“In this project we aim to develop deep learning techniques that can be deployed
on a robot to allow it to learn directly from trial-and-error, where the only
information provided by the teacher is the degree to which it is succeeding at the
current task.”
http://rll.berkeley.edu/deeplearningrobotics/

Games
https://blog.openai.com/dota-2/
https://blog.openai.com/more-on-dota-2/

AlphaGo Lee: Computer-Human 4:1

Poker: Libratus
http://www.dailymail.co.uk/sciencetech/article-4177262/AI-beats-professional-poker-players-Pittsburgh.html
https://fr.pokernews.com/news/2017/01/ai-bot-libratus-poker-no-limit-wins-science-32312.htm
“The research has implications for situations where information is incomplete and
misinformation can be given, such as business negotiations, military strategy,
cybersecurity and planning of medical treatments.”

ML in datacenters
“We’ve managed to reduce the amount of energy we use for cooling by up to 40 percent.”
https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/

Device Placement with Reinforcement Learning
Device Placement Optimization with Reinforcement Learning

Neural Architecture Search
Efficient Neural Architecture Search via Parameter Sharing

Examples
- Improving ML algorithms: Device placement, Architecture search, Optimizer
search, Ensembling, ...
- Optimizing indexes in DB (The Case for Learned Index Structures,
https://arxiv.org/abs/1712.01208)
- Improving datacenter efficiency: optimize cooling, optimize virtual machine
placement, ...
- …
Computer Systems are filled with heuristics that work well “in general case”. But
they generally don’t adapt to actual pattern of usage and don’t take into account
available context.
We can use ML anywhere we’re using heuristics to make a decision!
See Jeff Dean talk at NIPS 2017
http://learningsys.org/nips17/assets/slides/dean-nips17.pdf

Examples
Compilers: instruction scheduling, register allocation, loop nest parallelization
strategies, …
Networking: TCP window size decisions, backoff for retransmits, data
compression, ...
Operating systems: process scheduling, buffer cache insertion/replacement, file
system prefetching, …
Job scheduling systems: which tasks/VMs to co-locate on same machine, which
tasks to pre-empt, ...
ASIC design: physical circuit layout, test case selection, …

No dataset — no deep learning
Deep learning requires a lot of data (otherwise simple models could be better).
But sometimes you have no dataset…
Nonetheless several ways available:
● Transfer learning
● Data augmentation
● Mechanical Turk
● Unsupervised pre-training
● moving towards one-shot and zero-shot learning
● …

The data scale versus the model performance

http://www.spacemachine.net/views/2016/3/datasets-over-algorithms
Importance of Datasets

Data & Models vs. Code
The almost same state-of-the-art code is mostly available for all the market.
Currently the real differentiator is a data or trained models (the data derivative
thing). Using a publicly available code/algorithm with unique data it’s possible to
create a better quality model than with the highly-specialized code with public
data.
There is a space for a new type of infrastructure
● Data and algorithm marketplaces
● Model marketplaces and model repositories
● AutoML (already appearing)
● Model management
● Model quality evaluation
● ...

Still some issues exist: Computing power
DL requires a lot of computations. Without a cluster or GPU machines
much more time is required.
● Currently GPUs (mostly NVIDIA) is the only choice
● FPGA/ASIC are coming into this field (Google TPU gen.2, Bitmain Sophon,
Intel 2018+). The situation resembles the path of Bitcoin mining
● Neuromorphic computing is on the rise (IBM TrueNorth, Intel, memristors, etc)
● Quantum computing can benefit machine learning as well (but probably it won’t be
a desktop or in-house server solutions)

NVIDIA slides: http://www.nvidia.com/content/events/geoInt2015/LBrown_DL.pdf

Computing power grows
https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664

Distributed training is a commodity now
Image from: https://github.com/uber/horovod

Case: AlphaGo Zero
https://deepmind.com/blog/alphago-zero-learning-scratch/

Trends: Supercomputer performance (GFLOPS FP64)
https://en.wikipedia.org/wiki/TOP500

Personal Supercomputers
● NVIDIA DGX-1 Server ($149,000)
Performance: 1000 TFLOPS FP16, 125 TFLOPS FP32
* NVIDIA DGX-2 (16 TESLA V100, 2 PFLOPS FP16) is just announced
● DeepLearning11 ($16,500, contains 10x NVIDIA GeForce GTX 1080 Ti)
Performance: 100 TFLOPS FP32
● NVIDIA GTX Titan V gaming card ($3000) 6.9 TFLOPS FP64 (! it is not usually
reported FP16 performance !)
○ Corresponds to the best supercomputer in the world at 2001–2002 (IBM ASCI
White with 7.226 TFLOPS peak speed) and a supercomputer on 500th place (still
a cool supercomputer) of the TOP500 list in November 2007 (the entry level to the
list was the 5.9 TFlop/s)
● For comparison: Huawei Mate 10 smartphone with Kirin 970 Neural Network
Processing Unit, 1.92 TFLOPS FP16
○ A similar performance (but FP64) had the top performing supercomputer of 1997
https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664

AI at the edge
● NVidia Jetson TK1/TX1/TX2
○ 192/256/256 CUDA Cores
○ 64/64/128-bit 4/4/6-Core ARM CPU, 2/4/8 Gb Mem
○ Xavier is coming
● Tablets, Smartphones
○ Qualcomm Snapdragon 845
○ Apple A11 Bionic
○ Huawei Kirin 970
● Raspberry Pi 3 (1.2 GHz 4-core)
● Movidius Neural Compute Stick

References:
Hardware for Deep Learning series of posts:
https://blog.inten.to/hardware-for-deep-learning-current-state-and-trends-51c01ebbb6dc
● Part 1: Introduction and Executive summary
● Part 2: CPU
● Part 3: GPU
● Part 4: FPGA
● Part 5: ASIC
● Part 6: Mobile AI
● Part 7: Neuromorphic computing
● Part 8: Quantum computing

https://blog.openai.com/preparing-for-malicious-uses-of-ai/

AI changes the landscape of threats
● Expansion of existing threats
○ The costs of attacks are lowered
■ Set of actors who can carry out attacks expands
■ The rate and scale of attacks can increase
■ The set of potential targets can expand
● Introduction of new threats
○ AI systems can compete tasks that would be otherwise impractical for
humans
○ Exploiting vulnerabilities of AI systems
● Change to the typical character of threats
○ Attacks can be especially effective
○ Finely targeted
○ Difficult to attribute

Many other issues exist as well
● Unintentional forms of AI misuse like algorithmic bias
● Indirect threats: mass unemployment, or other second- or third-order effects
from the deployment of AI technology
● System-level threats that would come from the dynamic interaction between
non-malicious actors, e.g. “race to the bottom” on AI safety
● Existential risks from the human-level AI
● Unclear regulation

https://ru.linkedin.com/in/grigorysapunov
gs@inten.to
Thanks!

Dl applicationlandscape-mar2018-180405144127

Recommended

Recommended

More Related Content

Similar to Dl applicationlandscape-mar2018-180405144127

Similar to Dl applicationlandscape-mar2018-180405144127 (20)

Recently uploaded

Recently uploaded (20)

Dl applicationlandscape-mar2018-180405144127