Who doesn't know of the super cool scenes in "Minority Report": intelligent machines and innovative user interfaces with speech and gestures?
In this deep dive, we will talk about how deep learning can enable such interactions using some Microsoft projects in the area of NUI (Natural User Interfaces): Kinect, Handpose, Skype Translator etc. Which predictive models are being used? What do we do if we don't have sufficient data? Finally we will dare an outlook into the future how new and innovative human-machine-interaction concepts can change our user experience with computers and in light of industry 4.0.
Breaking the Kubernetes Kill Chain: Host Path Mount
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
1. Deep Learning for
New User Interactions
(Gestures, Speech and Emotions)
Olivia Klose, Software Development Engineer, Microsoft
Dr. Marcel Tilly, Program Manager, Microsoft
3. Deep Neural Networks
… is inspired by the neural network in the brain
# of Neurons in the brains (~100 billion)
= # of Trees in the Amazon Rainforest (~ 300 billion)
# of Synapses (~ 100 - 1000 trillion)
= # of Leaves in the Amazon Rainforest
18. Skype Translator
Skype
Translator
Bots
Skype Service
Automatic Speech Recognition
Speech Correction
Translation
Text To Speech
this is
hum pig
• Punctuation
• Capitalization
• Disfluency removal
• Lattice Rescoring
this is
hum pig.
This is
hum pig.
This is
pig.
This is
big.
20. Skype Translator
Skype
Translator
Bots
Skype Service
Automatic Speech Recognition
Speech Correction
Translation
Text To Speech
this is
hum pig
C’est
grand.
this is
hum pig.
This is
hum pig.
This is
pig.
This is
big.
• Microsoft Translator core API
• Statistical Machine Translation
• 45 supported languages
25. front view top viewside viewinput depth inferred body parts
(no tracking or smoothing)
https://www.microsoft.com/en-us/research/video/real-time-human-pose-recognition-in-parts-from-single-depth-images-2/
32. Open-source, cross-platform toolkit for learning and evaluating
deep neural networks.
Expresses (nearly) arbitrary neural networks by composing simple
building blocks into complex computational networks
Production-ready: State-of-the-art accuracy, efficient, and scales to
multi-GPU/multi-server. http://cntk.ai
40. Vision
Computer Vision | Emotion | Face | Video
Speech
Computer Recognition | Speaker Recognition
Speech | Translator
Language
Bing Spell Check | Language Understanding
Linguistic Analysis | Text Analytics | Web Language Model
Knowledge
Academic Knowledge | Entity Linking
Knowledge Exploration | Recommendations
Search
Bing Auto Suggest | Bing Image Search | Bing News Search
Bing Video Search | Bing Web Search
Cognitive
Services
Give your solutions
a human side
http://microsoft.com/cognitive
41. Computer Vision API
Content of Image:
Categories v0: [{ “name”: “animal”, “score”: 0.9765625 }]
V1: [{ "name": "grass", "confidence": 0.9999992847442627 },
{ "name": "outdoor", "confidence": 0.9999072551727295 },
{ "name": "cow", "confidence": 0.99954754114151 },
{ "name": "field", "confidence": 0.9976195693016052 },
{ "name": "brown", "confidence": 0.988935649394989 },
{ "name": "animal", "confidence": 0.97904372215271 },
{ "name": "standing", "confidence": 0.9632768630981445 },
{ "name": "mammal", "confidence": 0.9366017580032349,
"hint": "animal" },
{ "name": "wire", "confidence": 0.8946959376335144 },
{ "name": "green", "confidence": 0.8844101428985596 },
{ "name": "pasture", "confidence": 0.8332059383392334 },
{ "name": "bovine", "confidence": 0.5618471503257751,
"hint": "animal" },
{ "name": "grassy", "confidence": 0.48627158999443054 },
{ "name": "lush", "confidence": 0.1874018907546997 },
{ "name": "staring", "confidence": 0.165890634059906 }]
Describe
0.975 "a brown cow standing on top of a lush green field“
0.974 “a cow standing on top of a lush green field”
0.965 “a large brown cow standing on top of a lush green field”