SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
Speech
recognition:
Art of the possible
Dominik.Lukes@ctl.ox.ac.uk @techczech
Dominik’s journey
Computational linguistics
Cognitive linguistics
Language teaching
1990–1995
Language teacher training
Translation
Metaphor / discourse studies
1995–2008
Readability
Learning / Assistive technology
Dyslexia teacher training
2009 – present
Bill Gates in 2011
“The next big thing is definitely
speech and voice recognition.”
What do we want to know?
What is the current state of the
art?
How we got here?
Where are going?
Are we asking the right
questions?
Tasks for speech recognition by difficulty
Select
word
from list
Interpret
command
Type
dictation
Transcribe
presentation
Transcribe
conversation
How we think of it vs how it is
Select word from list
Interpret
command Type
dictation
Transcribe
presentation
Transcribe
conversation
Transcribe
conversation
Transcribe
presentation
Type dictation
Interpret
command
Select
word from
list
Speech recognition approximate timeline
Select digit
1950s
Select from 1000
words
1970s
Select from large
vocabulary
1980s
Dictate word by
word
1990s
Dictate whole
sentences
1997
Transcribe
YouTube video
2012
Transcribe
conversation
2019
What is the actual job of
speech recognition?
What is this word?
[pʰɹɛtsɫ̩]
[pɹɛtsl]
/pretsəl/
<pretzel>
What’s the problem
aspirated /p/ at
start of a stressed syllable
devoiced /r/ following /p/
labialised /r/
following /p/ dark /l/
syllabic
consonant
glottal
stop
It gets worse: find the missing sounds
Course on speech recognition 1993
Faster computers won’t help
improve speech recognition. We
need a new approach.
Dragon Naturally Speaking
released in 1997. Can
recognise whole
sentences.
What happened?
How speech recognition does not work?
Finding individual sounds
(phonemes) in the speech and
matching them to letters.
How speech recognition actually works?
P(W|C)
What is the likelihood that the
next word is X given what came
before?
Actually, it is quite a bit more complicated (Huang and Deng 2009)
Probabilistic (stochastic)
ASR enabled the change.
Linguistics took the back
seat.
Fred Jelinek (ASR Pioneer - 1988?)
"Every time I fire a linguist, the
performance of the speech
recognizer goes up"
Consequence of
probabilistic approach:
Worse on words not
predictable from
context
Names Acronyms
Specialist
Terms
Question in 2011
I recorded a lecture, can I use
Dragon to transcribe it?
“Caption fails” in 2014 provided source for comedy
YouTube Captions today are usable and useful
So what happened
between 2014 and 2022?
Ingredients of success
Larger data sets
More computing power
Neural networks
Patrick Winston (2015) MIT Lecture 12a in AI course
It was in 2010, yes, that's right. It was in 2010. We
were having our annual discussion about what we
would dump from 6034 in order to make room for
some other stuff. And we almost killed off neural
nets. That might seem strange because our heads
are stuffed with neurons. … But many of us felt that
the neural models of the day weren't much in
the way of faithful models of what actually goes
on inside our heads. And besides that, nobody
had ever made a neural net that was worth a
darn for doing anything.
2012 – ImageNet showed
that Neural Networks are
much better at computing
the probabilities for
complex data.
Ok, we have neural nets,
what does that mean?
Things to know about Neural Nets
Everything has a probability
Same input does not produce
same output
They have no ‘sanity check’
or ‘common sense’
What do probabilities look like?
What BERT is not: Lessons from a new suite of
psycholinguistic diagnostics for language
models
Allyson Ettinger 2019
https://what-if.xkcd.com/34
Output changes as more
information is made
available. (Not always for
the better)
Examples from today’s captions
Crystal > Chris is
Am > and
experts > experience
AR > a our
Different ways of transcribing Dua Lipa
alipa
dualipa
dua lipa
lipa
duda lipa
Rise and mostly fall of Google’s new spell Czech
Tracking faces at the tips of the shoes
Hallucination is a big problem
Question asked by faculty member in 2021
We correct the transcripts, why
doesn’t the system learn the
correct spelling?
Adding your own word list
just tweaks the
probabilities.
Setting a genre setting
tweaks the probabilities.
Another thing to know about NN
Neural Nets use very large data
sets and can take days or
weeks to train.
Consequences of NN size
Speech recognition is often not
done on device.
Individual input often cannot adjust
the quality (except in pre-training)
Most applications use APIs from the
big players
Few open source/free options
Big players in the field
Google
Microsoft (now also Nuance)
Amazon
Interesting smaller companies
Verbit.ai
Carescribe.io (Caption.Ed)
Otter.ai
Rev.ai
Interesting applications
Descript
Microsoft Reading Progress
Microsoft Presentation Coach
What can we expect
in the future
Cautionary tale by SMBC
The Original Roomba (2002) vs Roomba S9+ (2019) - Wow!
What happens in speeches
Fillers Repetition
What does conversation actually look like?
Possible futures?
Incremental
improvement
similar to Roomba in 17 years
Accurate
lecture
transcripts
Fluent
dictation with
pauses
Better meeting
transcription
Revolutionary
change
similar to change in speech
recognition in 6 years
Informal
conversation
transcription
Interactive
dictation
Multilingual
speech
transcription
How should we think about accuracy?
We speak 120-180 words per minute
99% accurate = 2 errors per minute
From Sept 2014 xkcd.com/1425
Sometimes it is hard to judge
how much effort will be needed
to solve a seemingly easy
problem.
Wishlist (a few hours of coding)
Transcripts indicate level
of confidence
Benchmarks for lecture
transcripts
Better manual control of
transcripts (like Descript)
Dreamlist (5 years and a research team)
Multilingual transcription
(identify change in
language)
Multimodal transcription
(use information from
video)
Raw to readable
transcript
Welcome to the
panel
Kate Knill
Machine Intelligence
Lab, University of
Cambridge
Richard Cave
MND Association (and
formerly Google
project Euphonia)
Richard
Purcell
Caption.Ed
Irit Opher
Head of Research at
Verbit.ai
What is the current state of
the art of speech recognition
in general and in the
transcription of recorded
speech in particular?
What are the current quality
metrics and how much do
they tell us about suitability
of models? Do we need
better ones?
After the big recent jump in
performance, are we seeing
a plateau with incremental
growth or can we expect
another step change in
quality?
Where can we see the most
innovation? What are the
research and development
blind spots where more effort
is needed?
What are the currently
unsolved problems for
which we do not have a
solution?
What is the space for
smaller players to innovate
in this space? How much do
they have to rely on pre-
trained models from big
providers? Is there space for
open source?
This presentation is licensed
under Creative Commons By
Attribution license except where
otherwise noted.
Icons and stock images from Microsoft
Office 365 creative premium. They
cannot be distributed separately from this
document.

Contenu connexe

Similaire à Speech recognition - Art of the possible

Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLLawrie Hunter
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
Format Matters - How presentation affects understanding
Format Matters - How presentation affects understandingFormat Matters - How presentation affects understanding
Format Matters - How presentation affects understandingMike Rice
 
The Cocktail Party Effect. An inclusive vision of conversational interactions.
The Cocktail Party Effect. An inclusive vision of conversational interactions.The Cocktail Party Effect. An inclusive vision of conversational interactions.
The Cocktail Party Effect. An inclusive vision of conversational interactions.Isabella Loddo
 
Designing applications for voice interface platforms
Designing applications for voice interface platformsDesigning applications for voice interface platforms
Designing applications for voice interface platformsmanphilip
 
Narrate Your Way To Success
Narrate Your Way To SuccessNarrate Your Way To Success
Narrate Your Way To SuccessTCUK
 
Do We Need Better Presentations
Do We Need Better PresentationsDo We Need Better Presentations
Do We Need Better PresentationsJose Ramon Macias
 
How to tell a better story (in code)(final)
How to tell a better story (in code)(final)How to tell a better story (in code)(final)
How to tell a better story (in code)(final)Bonnie Pan
 
Sketchstorming Workshop - UX Copenhagen 2018
Sketchstorming Workshop  - UX Copenhagen 2018 Sketchstorming Workshop  - UX Copenhagen 2018
Sketchstorming Workshop - UX Copenhagen 2018 Teo Choong Ching
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
The State of Automatic Speech Recognition 2022 (2).pdf
The State of Automatic Speech Recognition 2022 (2).pdfThe State of Automatic Speech Recognition 2022 (2).pdf
The State of Automatic Speech Recognition 2022 (2).pdf3Play Media
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...TAUS - The Language Data Network
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendEgor Pushkin
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural FrontierJohn Tinsley
 

Similaire à Speech recognition - Art of the possible (20)

#5 Predicting Machine Translation Quality
#5 Predicting Machine Translation Quality#5 Predicting Machine Translation Quality
#5 Predicting Machine Translation Quality
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Format Matters - How presentation affects understanding
Format Matters - How presentation affects understandingFormat Matters - How presentation affects understanding
Format Matters - How presentation affects understanding
 
The Cocktail Party Effect. An inclusive vision of conversational interactions.
The Cocktail Party Effect. An inclusive vision of conversational interactions.The Cocktail Party Effect. An inclusive vision of conversational interactions.
The Cocktail Party Effect. An inclusive vision of conversational interactions.
 
Designing applications for voice interface platforms
Designing applications for voice interface platformsDesigning applications for voice interface platforms
Designing applications for voice interface platforms
 
Narrate Your Way To Success
Narrate Your Way To SuccessNarrate Your Way To Success
Narrate Your Way To Success
 
Do We Need Better Presentations
Do We Need Better PresentationsDo We Need Better Presentations
Do We Need Better Presentations
 
How to tell a better story (in code)(final)
How to tell a better story (in code)(final)How to tell a better story (in code)(final)
How to tell a better story (in code)(final)
 
Sketchstorming Workshop - UX Copenhagen 2018
Sketchstorming Workshop  - UX Copenhagen 2018 Sketchstorming Workshop  - UX Copenhagen 2018
Sketchstorming Workshop - UX Copenhagen 2018
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
The State of Automatic Speech Recognition 2022 (2).pdf
The State of Automatic Speech Recognition 2022 (2).pdfThe State of Automatic Speech Recognition 2022 (2).pdf
The State of Automatic Speech Recognition 2022 (2).pdf
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Visual basics
Visual basicsVisual basics
Visual basics
 
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 

Plus de Jisc

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...Jisc
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxJisc
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxJisc
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Jisc
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...Jisc
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptxJisc
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxJisc
 
The Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxThe Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxJisc
 
Are we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxJisc
 
JiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJisc
 
UWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxUWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxJisc
 
An introduction to Cyber Essentials
An introduction to Cyber EssentialsAn introduction to Cyber Essentials
An introduction to Cyber EssentialsJisc
 

Plus de Jisc (20)

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptx
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptx
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptx
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptx
 
The Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxThe Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptx
 
Are we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptx
 
JiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptx
 
UWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxUWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptx
 
An introduction to Cyber Essentials
An introduction to Cyber EssentialsAn introduction to Cyber Essentials
An introduction to Cyber Essentials
 

Dernier

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Dernier (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

Speech recognition - Art of the possible

  • 1. Speech recognition: Art of the possible Dominik.Lukes@ctl.ox.ac.uk @techczech
  • 2. Dominik’s journey Computational linguistics Cognitive linguistics Language teaching 1990–1995 Language teacher training Translation Metaphor / discourse studies 1995–2008 Readability Learning / Assistive technology Dyslexia teacher training 2009 – present
  • 3. Bill Gates in 2011 “The next big thing is definitely speech and voice recognition.”
  • 4. What do we want to know? What is the current state of the art? How we got here? Where are going?
  • 5. Are we asking the right questions?
  • 6. Tasks for speech recognition by difficulty Select word from list Interpret command Type dictation Transcribe presentation Transcribe conversation
  • 7. How we think of it vs how it is Select word from list Interpret command Type dictation Transcribe presentation Transcribe conversation Transcribe conversation Transcribe presentation Type dictation Interpret command Select word from list
  • 8. Speech recognition approximate timeline Select digit 1950s Select from 1000 words 1970s Select from large vocabulary 1980s Dictate word by word 1990s Dictate whole sentences 1997 Transcribe YouTube video 2012 Transcribe conversation 2019
  • 9. What is the actual job of speech recognition?
  • 10. What is this word? [pʰɹɛtsɫ̩] [pɹɛtsl] /pretsəl/ <pretzel>
  • 11. What’s the problem aspirated /p/ at start of a stressed syllable devoiced /r/ following /p/ labialised /r/ following /p/ dark /l/ syllabic consonant glottal stop
  • 12. It gets worse: find the missing sounds
  • 13. Course on speech recognition 1993 Faster computers won’t help improve speech recognition. We need a new approach.
  • 14. Dragon Naturally Speaking released in 1997. Can recognise whole sentences. What happened?
  • 15. How speech recognition does not work? Finding individual sounds (phonemes) in the speech and matching them to letters.
  • 16. How speech recognition actually works? P(W|C) What is the likelihood that the next word is X given what came before?
  • 17. Actually, it is quite a bit more complicated (Huang and Deng 2009)
  • 18. Probabilistic (stochastic) ASR enabled the change. Linguistics took the back seat.
  • 19. Fred Jelinek (ASR Pioneer - 1988?) "Every time I fire a linguist, the performance of the speech recognizer goes up"
  • 20. Consequence of probabilistic approach: Worse on words not predictable from context Names Acronyms Specialist Terms
  • 21. Question in 2011 I recorded a lecture, can I use Dragon to transcribe it?
  • 22. “Caption fails” in 2014 provided source for comedy
  • 23. YouTube Captions today are usable and useful
  • 24. So what happened between 2014 and 2022?
  • 25. Ingredients of success Larger data sets More computing power Neural networks
  • 26. Patrick Winston (2015) MIT Lecture 12a in AI course It was in 2010, yes, that's right. It was in 2010. We were having our annual discussion about what we would dump from 6034 in order to make room for some other stuff. And we almost killed off neural nets. That might seem strange because our heads are stuffed with neurons. … But many of us felt that the neural models of the day weren't much in the way of faithful models of what actually goes on inside our heads. And besides that, nobody had ever made a neural net that was worth a darn for doing anything.
  • 27. 2012 – ImageNet showed that Neural Networks are much better at computing the probabilities for complex data.
  • 28. Ok, we have neural nets, what does that mean?
  • 29. Things to know about Neural Nets Everything has a probability Same input does not produce same output They have no ‘sanity check’ or ‘common sense’
  • 30. What do probabilities look like?
  • 31. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models Allyson Ettinger 2019
  • 33. Output changes as more information is made available. (Not always for the better)
  • 34. Examples from today’s captions Crystal > Chris is Am > and experts > experience AR > a our
  • 35. Different ways of transcribing Dua Lipa alipa dualipa dua lipa lipa duda lipa
  • 36. Rise and mostly fall of Google’s new spell Czech
  • 37. Tracking faces at the tips of the shoes
  • 38. Hallucination is a big problem
  • 39. Question asked by faculty member in 2021 We correct the transcripts, why doesn’t the system learn the correct spelling?
  • 40. Adding your own word list just tweaks the probabilities.
  • 41. Setting a genre setting tweaks the probabilities.
  • 42. Another thing to know about NN Neural Nets use very large data sets and can take days or weeks to train.
  • 43. Consequences of NN size Speech recognition is often not done on device. Individual input often cannot adjust the quality (except in pre-training) Most applications use APIs from the big players Few open source/free options
  • 44. Big players in the field Google Microsoft (now also Nuance) Amazon
  • 46. Interesting applications Descript Microsoft Reading Progress Microsoft Presentation Coach
  • 47. What can we expect in the future
  • 49.
  • 50. The Original Roomba (2002) vs Roomba S9+ (2019) - Wow!
  • 51. What happens in speeches Fillers Repetition
  • 52. What does conversation actually look like?
  • 53. Possible futures? Incremental improvement similar to Roomba in 17 years Accurate lecture transcripts Fluent dictation with pauses Better meeting transcription Revolutionary change similar to change in speech recognition in 6 years Informal conversation transcription Interactive dictation Multilingual speech transcription
  • 54. How should we think about accuracy? We speak 120-180 words per minute 99% accurate = 2 errors per minute
  • 55. From Sept 2014 xkcd.com/1425 Sometimes it is hard to judge how much effort will be needed to solve a seemingly easy problem.
  • 56. Wishlist (a few hours of coding) Transcripts indicate level of confidence Benchmarks for lecture transcripts Better manual control of transcripts (like Descript)
  • 57. Dreamlist (5 years and a research team) Multilingual transcription (identify change in language) Multimodal transcription (use information from video) Raw to readable transcript
  • 59. Kate Knill Machine Intelligence Lab, University of Cambridge Richard Cave MND Association (and formerly Google project Euphonia) Richard Purcell Caption.Ed Irit Opher Head of Research at Verbit.ai
  • 60. What is the current state of the art of speech recognition in general and in the transcription of recorded speech in particular? What are the current quality metrics and how much do they tell us about suitability of models? Do we need better ones? After the big recent jump in performance, are we seeing a plateau with incremental growth or can we expect another step change in quality? Where can we see the most innovation? What are the research and development blind spots where more effort is needed? What are the currently unsolved problems for which we do not have a solution? What is the space for smaller players to innovate in this space? How much do they have to rely on pre- trained models from big providers? Is there space for open source?
  • 61. This presentation is licensed under Creative Commons By Attribution license except where otherwise noted. Icons and stock images from Microsoft Office 365 creative premium. They cannot be distributed separately from this document.