SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2714
A survey on Enhancements in Speech Recognition
Nitin Kendre1, Vivek Parit2, Asazad Pathan3, Akash Kamble4, Prof. T.V. Deokar5
1,2,3,4 Students, Department of Computer Science & Engineering, Sanjeevan Engineering and Technology Institute
Panhala (Maharashtra)
5 Assistant Professor, Department of Computer Science & Engineering, Sanjeevan Engineering and Technology
Institute Panhala (Maharashtra)
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – Purpose of This study is to know the
enhancements in speech recognitionfield. Fromstartingof the
21st century people are working and researching in the area
of voice recognition. Researchers have contributed many
things in this area. Normal speech without any noise is easy to
understand by computers, but ifthespeechincludesnoise, then
it is very difficult to understand by computer and separate the
noise from the speech. There are various reasons to have noise
in speech like background noise, environmental noise, signal
noise, crowded places, etc. In this paper, we are going to
present various techniques to enhance the speech recognition
system to work in any environment by researchers. Also, some
advanced enhancements in speech recognition to use this
system in other situations like emotion recognition.
Key Words: Robust Speech Recognition, Artificial
Intelligence, Feature Extraction, Noise Reduction, Deep
Learning.
1. INTRODUCTION
The technique through which a computer (or
another sort of machine) recognizes spoken wordsisknown
as speech recognition. Essentially, it is conversing with your
computer and having it accurately recognize your words.
Simply, it means talking to the computer and having it
correctly recognize what you are saying.
Voice is the most common and fastest mode of
communication, and each human voice has a distinct quality
that distinguishes it from the others. As a result, not only for
humans but also for automated machines, voice recognition
is required for easy and natural interaction [12].
Speech recognition has applications in a variety of
sectors, including medicine, engineering, and business. The
general problem with speech recognition is speaking rate,
gender, age, and the environment in which the discussion is
taking place, and the second issue is speech noise [12]. If we
can solve these issues with speech recognition, it will be
much easier to create goods or systems that people can use
everywhere, even in crowded areas or in noisy
environments.
Therefore, it is necessary to remove or reduce the
amount of noise in a speech to do effective recognition of
speech or voice. And to reduce or remove the noise from
speech we have to know the basics of recognition. The basic
model of speech recognition or speech-to-text model is
shown in figure 1. Figure 1 depicts the basic model ofspeech
recognition, also known as the speech-to-text paradigm.
Fig-1: The basic model of Speech Recognition.
1.1 Speech Input
A human voice is captured or recorded using a
microphone and sound card connected to the computer as
speech input. Modern sound cards can record audio at
sampling rates ranging from 16 kHz to 48 kHz, with bit rates
ranging from 8 to 16 bits per sample, and playback speeds of
up to 96 kHz [12].
1.2 Preprocessing
Signal processing takes place in this step. This process
converts an analog signal to a digital signal and does noise
reduction, as well as changes audio frequencies to make it
machine-ready [12][13].
1.2.1 Feature Extraction
The next step in pre-processing is to choose which
features will be valuable and which will be unnecessary. We
need to understand MFCCs (Mel Frequency Cepstral
Coefficients) in order to extract features.
1.2.2 MFCCs
MFCCs is a method for extracting features from
audio signals. It divides the audio signal's frequency bands
using the MEL scale, then extracts coefficients from each
frequency band to create a frequency separation. The
Discrete Cosine Transform is used by MFCC to conduct this
operation. The MEL scale is based on human sound
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2715
perception, or how the brain analyses audio impulses and
distinguishes between different frequencies.
Stevens, Volkmann, and Newmannproposeda pitch
in 1937 that gave birth to the MEL scale. It's a scale of audio
signals with varying pitch levels that people judge based on
their distances being equal. It's a scale that's based on how
people perceive things. For example, if we hear two sound
sources that are far apart, our brainwill beabletodetermine
the distance between them without having to see them. This
scale is based on how we humans use our senseofhearingto
determine the distance between audiosignals.Thedistances
on this scale increase with frequency because human
perception is non-linear.
This MEL scale is used to separate the frequency
bands in audio, after which it extracts the needed
information for recognition.
1.3 Model
Classification models are mostly used in the model
section to recognize or search for words detected inaudio.It
is now simple to recognize speech using neural networks,
and employing neural networks for speech recognition has
proven to be quite beneficial. For speech recognition, RNNs
(Recurrent Neural Networks) are commonly used. Speech
recognition can also be done without the use of neural
networks, although it will be less accurate.
Below are somemodelsused forspeechrecognition,
like recurrent neural network, Hidden Markov Model etc.
1.3.1 Recurrent Neural Network
Recurrent Neural Networks are a type of neural
network that is used mostly in NLP. RNNs are a type of
neural network that uses past outputs as inputs while
maintaining hidden states. RNN features a memory notion
that preserves all data about what was calculated up to that
time step. RNNs are referred to as recurrent becausethey do
the same task for each combination of inputs, with theresult
dependent on past calculations. Fig-2 shows the simple
illustration of recurrent network [15].
Fig-2: Simple Architecture of RNN.
Recurrent neural network is used for voice classification in
speech recognition. Also, it is used for language translation.
1.3.2 Hidden Morkov Model
When any person speaks, basically it creates the
vibrations. To convert this speech into text computer uses
several complex steps. First it converts our analog speech
signals into digital signals. Then it does pre-process on that
signal, like it removes non needed part from signals, it
removes noise and do feature extraction. That is, it selects
required features from signals and pass it to the Hidden
Markov Model, now HMM do the search for word in
Database. Then after word searchingitsenttotheusers [14].
Fig-3 shows the simple architecture of Hidden
Markov Model for Speech Recognition.
Fig-3: HMM for Speech Recognition
1.4 Transcript
The transcript is the text obtained after the user's
audio has been converted to text. TheSpeechtoTextmethod
is the name given to this procedure. The transcript is then
sent into a text-to-speech model or system, which allowsthe
computer to speak.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2716
2. HISTORY OF SPEECH RECOGNITION
"Audrey," a system built by Bell Laboratories in the
1950s, was the first authorized example of our
contemporary speech recognition technology. However,
audrey takes a full room to be set up and can only identify 9
numbers uttered by its inventor with 90% accuracy. Its goal
was to assist toll operators in taking more telephone call
over the line, but its high price and inability to detect a wide
range of voices rendered it unfeasible.
The next innovation was IBM's Shoebox, which
debuted at the World's Fair in 1962, took nearly 12 years to
build and was able to detect and discriminate between 16
words. However, person had to pause and talk slowly to
ensure that the system picked up on what they were saying.
Then, The Defence department began to appreciate
the significance of voice recognition technology in the early
1970s. The capacity of a computer to interpret genuine
human language might be extremely useful in a variety of
military and national defence applications.
As a result, they invested five years on DARPA's
Speech Understanding Research programme, one of the
largest initiatives of its sort in speech recognition history.
One of the most notable discoveries to emerge from this
research effort was "Harpy," a system capableofrecognising
over 1000 words.
Speech recognition systems were so commoninthe
late 1970s and early 1980s that they found their way into
children's toys. The Speak & Spell, which used a voice chip,
was created in 1978 to assist kidsspell words.Thevoicechip
included therein would prove to be a significant instrument
for the next stage of speech recognition software
development.
Then the "Julie" doll was released in 1987. Julie was
capable to answer to a person and identify between
speaker's voices in an amazing (if not disturbing) exhibition.
Then after Three years AT&T was experimenting
with over-the-phone voice recognition technologies to help
answer customer support calls.
Then Dragon launched "Naturally Speaking" in
1997, which allowed natural voice to be interpreted without
the need of pauses.
And after thatmuchinnovationsandresearchinthis
field today we have digital assistants like Google Assistant,
Alexa, Siri, Cortana which can process natural speech
without pauses and can recognize thousandsofwordsin any
language.
3. RELATED WORK
Many researchers have worked on various
techniques to enhance the speech recognition system in any
environment. Given enhancementsinvariouspapersproved
helpful in speech recognition systems.
3.1 Automatic Recognition of Noisy Speech
Jisi Zhang et.al. [1], have explored A framework for
training a monoaural neural augmentation model forrobust
voice recognition. To train the model, they used unpaired
speech data and noisy speech data. They conducted trials
using an end-to-end acoustic model and found that the WER
was lowered by 16 to 20%. In the real development and
evaluation sets, using a more powerful hybrid acoustic
model, the processed speechreduced WERby28%and39%,
accordingly, when compared to the unprocessed signal.
3.2 Integration of Speech Recognition and Self-
Supervised learning
Xuankai Chang et.al. [2], enhanced the end-to-end
ASR (Automatic Speech Recognition) model for powerful
voice recognition. When comparingtheirsuggestedmodel to
a traditional end-to-end ASR model, we can see that theirs
incorporates a speech augmentation module and a self-
supervised learning representation module. The voice
enhancement module improves the sound quality of loud
speech. The self-supervisedlearningrepresentation module,
on the other hand, performs the feature extractionand picks
useful features for voice recognition.
3.3 Noisy Robust Speech Recognition
Developing a speech recognition system thatworks
in noisy surroundings necessitates a vast amount of data,
including noisy speech and transcripts, to train the system,
yet such large data is not readily available. Chen Chen et al.
have suggested a generative adversarial network torecreate
a noisy spectrum from a clean spectrum with only 10
minutes of in-domain noisy voice data as labels.They'vealso
proposed a dual-path speech recognition method to boost
the system's robustness in noisy situations. The proposed
technique improves the absolute WER on the dual-path ASR
system by 7.3% to the best baseline, according to the
experimental data [3].
3.4 Speech Emotion Recognition
Surekha Reddy Bandela et.al. [4], have presented a
strategy that combines unsupervised feature selection in
combination with INTERSPEECH 2010 paralinguistic
characteristics, Gammatone Cepstral Coefficients (GTCC),
and Power Normalized Cepstral Coefficients (PNCC) to
improve speech recognition for emotionrecognition.Inboth
clean and noisy situations,theproposedsystemistested.For
de-noising the noisy voice signal, they adopted a dense Non-
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2717
Negative Matrix Factorization (denseNMF) approach. For
Speech Emotion Recognition analysis, they employed the
EMO-DB and IEMOCAP databases. They wereabletoachieve
an accuracy of 85 % using the EMO-DB database and 77 %
using the IEMOCAPdatabase.Withcross-corpusanalysis,the
proposed system can be improved for language
independence [4].
3.5 Speech Recognition based on multiple Deep
Learning Features
Because the English language has the advantage of
huge datasets in speech recognition, everyone loves it for
speech recognition systems. Speech recognition in the
English language is a simple task thankstobigdatasetsanda
wide community. As a result, Zhaojuan Song et al. [5], chose
English Speech as their study topic and created a deep
learning speech recognition method that incorporates both
speech features and speech attributes. They used the deep
neural network supervised learning approach to extract the
high-level qualities of the voice and train the GMM-HMM
ANN model with additional speechfeatures. Theyalsouseda
speech attribute extractor based on a deep neural network
trained for several speech attributes, with the retrieved
features being categorized into phonemes.
Then, using a neural network based on the linear
feature fusion technique, speech characteristics and speech
attributes are integrated into the same CNN framework,
considerably improvingthe performanceoftheEnglishvoice
recognition system [5].
3.6 Robust Speech Recognition
A.I. Alhamada, et al. al[6], have looked into previous
work and discovered a suitable deep learning architecture.
To improve the performance of speech recognition systems,
they used a convolutional neural network. They discovered
that a CNN-based approach had a validated accuracy of
94.3%.
3.7 Hybrid-Task Learning for ASR
Gueorgui Pironkov, et.al. [7]. presented a new
technique named Hybrid-Task learning. This method is
based on a mix of Multi-Task and Single-Task learning
architectures, resulting in a dynamic hybrid system that
switches between single and multi-task learning depending
on the type of input feature. Thishybridtask learningsystem
is especially well suited to robust automatic speech
recognition in situations where there are two types of data
available: real noise anddata recordedinreal-lifeconditions,
and simulated data obtained by artificially adding noise to
clean speech recorded in noiseless conditions [7].
As a result of Hybrid Task Learning, ASR
performance on actual and simulated data beats Multi-Task
Learning and Single Task Learning, with Hybrid Task
Learning bringing up to a 4.4% relative improvement over
Single Task Learning. This Hybrid Task Learning technique
can be tested on different datasets in the future [7].
3.8 Audio Visual Speech Recognition
An audio-visual speech recognition system is
believed to be one of the most promising technologies for
robust voice recognition in a noisy environment. As a result,
Pan Zhou et al. [9] proposed a multimodal attention-based
approach for audio-visual speech recognition, which could
automatically learn the merged representation from both
modes depending on their importance. The suggested
method achieves a 36% relative improvement and beats
previous feature concatenation-based AVSR systems,
according to experimental results. This strategy can be
tested with a larger AVSR dataset in the future [9].
3.9 Audio Visual Speech Recognition for Hearing
Impaired
Hearing-impaired persons would benefit greatly
from artificial intelligence;approximately466millionpeople
worldwide suffer from hearing loss. Students who are deaf
or hard of hearing depend on lip-reading to grasp what is
being said. However, hearing-impaired students encounter
other hurdles, including a shortage of skilled sign language
facilitators and the expensive cost of assistivetechnology.As
a result, L Ashok Kumar et al. [8], proposed an approach for
visual speech recognitionbuiltoncutting-edgedeeplearning
methods They merged the results of audio and visual. They
also suggested a novel audio-visual speech recognition
algorithm based on deep learning for efficient lip reading.
They reduced the word error rate in the Automatic Speech
Recognition system to about 6.59 % and established a lip-
reading model accuracy of nearly 95 % using this model. In
the future, a BERT-based language model could improve the
suggested model's Word Error Rate [8].
3.10 Robust Speech Recognition Using
Reinforcement Learning.
The goal of Deep Neural Network-based voice
enhancement is to reduce the mean square error among
improved speech and a cleanreference.Asa result,Yih-Liang
Shen et al. [10] suggested a reinforcementlearningapproach
to improve the voice enhancement model based on
recognition results. They used the Chinese broadcast news
dataset to test the proposed voice enhancement system
(MATBN). The RL-Based voice enhancement mayeffectively
reduce character mistake rates by 12.40% and 19.23% by
using recognition errors as the goal function. More noise
kinds will be investigated in the future when developing the
RL-based system [10].
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2718
3.11 Noise Reduction in Speech Recognition
We've seen that reducing speech noise is essential
for constructing reliable speechrecognitionsystemsinnoisy
contexts. As a result, Imad Qasim Habeeb et al. [11]
suggested an ensemble approach that uses numerousnoise-
reducing filters to increasetheaccuracyofAutomaticSpeech
Recognition. To boost Automatic Speech Recognition
Accuracy, they applied three noise reduction filters. These
three filters generate numerous copies of the speech signal,
with the best copy chosen for the voice recognition system's
final output. They studied this model and found promising
results, such as a 16.61% drop-in character errorrateand an
11.54% reduction in word error rate when compared to
previous strategies [11].
The results of the experiments showed that
alternative noise reduction filters could potentially provide
additional information about the phonemes to be
categorized, which could be used to improve the Automatic
Speech Recognition System's accuracy. However, because a
human can guess words or phrases even if he doesn't hear
them completely, a technique based on the n-gram model
can be developed to give a mechanism similar to the human
hearing system [11].
4. DISCUSSION AND CONCLUSIONS
This study discusses strong voice recognition and
critical factors in speech recognition, as well as diverse
methodologies used by different researchers. We've looked
at how speech recognition works, covering audio recording,
voice pre-processing, classification models, and neural
networks for speech recognition. We also spoke about how
to extract features from speech signals. An overview of tests
conducted by authors with their work has also been
included.
Robust speech recognition is still a long way off for
humans, and developing such a system with good
performance is a difficult undertaking.Inthispaper, wehave
tried to provide a comprehensive overview of key
advancements in the field of Automatic Speech Recognition.
We've also attempted to list some research that could be
useful in developing systems for physically challenged
people.
REFERENCES
[1] Jisi Zhang, Catalin Zoril a, Rama Doddipatla and Jon
Barker “On monoaural speech enhancement for
automatic recognitionof real noisyspeechusingmixture
invariant training.” arXiv preprint arXiv:2205.01751,
2022.
[2] Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji
Watanabe “End-to-End Integration of Speech
Recognition, Speech Enhancement, and Self-Supervised
Learning Representation.” arXiv preprint
arXiv:2204.00540, 2022.
[3] Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng
Siong Chng “NOISE-ROBUST SPEECH RECOGNITION
WITH 10 MINUTES UNPARALLELED IN-DOMAIN
DATA.” arXiv preprint arXiv:2203.15321, 2022.
[4] Surekha Reddy Bandela, T. Kishore Kumar
“Unsupervised feature selection andNMFde-noisingfor
robust Speech Emotion Recognition”,AppliedAcoustics,
Elsevier, Vol 172, January 2021.
[5] Zhaojuan Song “English speech recognition based on
deep learning with multiple features.” Springer, 2019.
[6] A. I. Alhamada, O. O. Khalifa, and A. H. Abdalla “Deep
Learning for Environmentally Robust Speech
Recognition”, AIP Conference Proceedings, 2020
[7] Gueorgui Pironkov, Sean UN Wood, Stephane Dupont
“Hybrid-task learning for robust automatic speech
recognition.” Elsevier, Computer Speech & Language,
Volume 64, 2020.
[8] L Ashok Kumar, D Karthika Renuka, S Lovelyn Rose, MC
Shunmuga Priya, I Made Wartana “Deep learning based
assistive technology on audio visual speech recognition
for hearing impaired.”,International Journal ofCognitive
Computing in Engineering, Volume 3, 2022.
[9] Pan Zhou, Wenwen Yang, Wei Chen, Yanfeng Wang, Jia
Jia “MODALITY ATTENTION FOR END-TO-END AUDIO-
VISUAL SPEECH RECOGNITION”, arXiv preprint
arXiv:1811.05250, 2019.
[10] Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu
Tsao, Hsin-Min Wang, and Tai-Shih Chi,
“REINFORCEMENT LEARNING BASED SPEECH
ENHANCEMENT FOR ROBUSTSPEECHRECOGNITION “,
arXiv preprint arXiv:1811.04224, 2018.
[11] Imad Qasim Habeeb, Tamara Z. Fadhil, Yaseen Naser
Jurn, Zeyad Qasim Habeeb, Hanan Najm Abdulkhudhur,
“An ensemble technique for speech recognition in noisy
environments.”, Indonesian Journal of Electrical
Engineering and Computer Science Vol. 18, No. 2, May
2020
[12] Pranjal Maurya, DayasankarSingh “A Survey – Robust
Speech Recognition”, International Journal of Advance
Research in Science and Engineering, Volume 7, 2018.
[13] Atma Prakash Singh, Ravindra Nath, Santosh Kumar, “A
Survey: Speech Recognition Approaches and
Techniques”, IEEE Xplore, 2019.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2719
[14] Neha Jain, Somya Rastogi “SPEECH RECOGNITION
SYSTEMS - A COMPREHENSIVE STUDY OF CONCEPTS
AND MECHANISM.”, Acta Informatica Malaysia, 2019
[15] Dr.R.L.K.Venkateswarlu, Dr. R. Vasantha Kumari, G.Vani
JayaSri “Speech Recognition By Using Recurrent Neural
Networks.”, International Journal of Scientific &
Engineering Research Volume 2, 2011.

Contenu connexe

Similaire à A survey on Enhancements in Speech Recognition

AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptxJhalakDashora
 
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...IRJET Journal
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...IRJET Journal
 
Silentsound documentation
Silentsound documentationSilentsound documentation
Silentsound documentationRaj Niranjan
 
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLEHAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLEIRJET Journal
 
A Translation Device for the Vision Based Sign Language
A Translation Device for the Vision Based Sign LanguageA Translation Device for the Vision Based Sign Language
A Translation Device for the Vision Based Sign Languageijsrd.com
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET Journal
 
Sign Language Detection using Action Recognition
Sign Language Detection using Action RecognitionSign Language Detection using Action Recognition
Sign Language Detection using Action RecognitionIRJET Journal
 
IRJET - Sign Language Text to Speech Converter using Image Processing and...
IRJET -  	  Sign Language Text to Speech Converter using Image Processing and...IRJET -  	  Sign Language Text to Speech Converter using Image Processing and...
IRJET - Sign Language Text to Speech Converter using Image Processing and...IRJET Journal
 
DHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEDHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEIRJET Journal
 
DHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEDHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEIRJET Journal
 
Paper id 23201490
Paper id 23201490Paper id 23201490
Paper id 23201490IJRAT
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET Journal
 
Sign Language Recognition using Mediapipe
Sign Language Recognition using MediapipeSign Language Recognition using Mediapipe
Sign Language Recognition using MediapipeIRJET Journal
 
SILINGO – SIGN LANGUAGE DETECTION/ RECOGNITION USING CONVOLUTIONAL NEURAL NET...
SILINGO – SIGN LANGUAGE DETECTION/ RECOGNITION USING CONVOLUTIONAL NEURAL NET...SILINGO – SIGN LANGUAGE DETECTION/ RECOGNITION USING CONVOLUTIONAL NEURAL NET...
SILINGO – SIGN LANGUAGE DETECTION/ RECOGNITION USING CONVOLUTIONAL NEURAL NET...IRJET Journal
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
 
IRJETDevice for Location Finder and Text Reader for Visually Impaired People
IRJETDevice for Location Finder and Text Reader for Visually Impaired PeopleIRJETDevice for Location Finder and Text Reader for Visually Impaired People
IRJETDevice for Location Finder and Text Reader for Visually Impaired PeopleIRJET Journal
 
IRJET- Device for Location Finder and Text Reader for Visually Impaired P...
IRJET-  	  Device for Location Finder and Text Reader for Visually Impaired P...IRJET-  	  Device for Location Finder and Text Reader for Visually Impaired P...
IRJET- Device for Location Finder and Text Reader for Visually Impaired P...IRJET Journal
 

Similaire à A survey on Enhancements in Speech Recognition (20)

AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptx
 
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
 
Silentsound documentation
Silentsound documentationSilentsound documentation
Silentsound documentation
 
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLEHAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
 
A Translation Device for the Vision Based Sign Language
A Translation Device for the Vision Based Sign LanguageA Translation Device for the Vision Based Sign Language
A Translation Device for the Vision Based Sign Language
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
 
Sign Language Detection using Action Recognition
Sign Language Detection using Action RecognitionSign Language Detection using Action Recognition
Sign Language Detection using Action Recognition
 
Av4103298302
Av4103298302Av4103298302
Av4103298302
 
IRJET - Sign Language Text to Speech Converter using Image Processing and...
IRJET -  	  Sign Language Text to Speech Converter using Image Processing and...IRJET -  	  Sign Language Text to Speech Converter using Image Processing and...
IRJET - Sign Language Text to Speech Converter using Image Processing and...
 
DHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEDHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTE
 
DHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEDHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTE
 
Paper id 23201490
Paper id 23201490Paper id 23201490
Paper id 23201490
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
 
Sign Language Recognition using Mediapipe
Sign Language Recognition using MediapipeSign Language Recognition using Mediapipe
Sign Language Recognition using Mediapipe
 
SILINGO – SIGN LANGUAGE DETECTION/ RECOGNITION USING CONVOLUTIONAL NEURAL NET...
SILINGO – SIGN LANGUAGE DETECTION/ RECOGNITION USING CONVOLUTIONAL NEURAL NET...SILINGO – SIGN LANGUAGE DETECTION/ RECOGNITION USING CONVOLUTIONAL NEURAL NET...
SILINGO – SIGN LANGUAGE DETECTION/ RECOGNITION USING CONVOLUTIONAL NEURAL NET...
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
IRJETDevice for Location Finder and Text Reader for Visually Impaired People
IRJETDevice for Location Finder and Text Reader for Visually Impaired PeopleIRJETDevice for Location Finder and Text Reader for Visually Impaired People
IRJETDevice for Location Finder and Text Reader for Visually Impaired People
 
IRJET- Device for Location Finder and Text Reader for Visually Impaired P...
IRJET-  	  Device for Location Finder and Text Reader for Visually Impaired P...IRJET-  	  Device for Location Finder and Text Reader for Visually Impaired P...
IRJET- Device for Location Finder and Text Reader for Visually Impaired P...
 
50120140502007
5012014050200750120140502007
50120140502007
 

Plus de IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

Plus de IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Dernier

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 

Dernier (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 

A survey on Enhancements in Speech Recognition

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2714 A survey on Enhancements in Speech Recognition Nitin Kendre1, Vivek Parit2, Asazad Pathan3, Akash Kamble4, Prof. T.V. Deokar5 1,2,3,4 Students, Department of Computer Science & Engineering, Sanjeevan Engineering and Technology Institute Panhala (Maharashtra) 5 Assistant Professor, Department of Computer Science & Engineering, Sanjeevan Engineering and Technology Institute Panhala (Maharashtra) ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – Purpose of This study is to know the enhancements in speech recognitionfield. Fromstartingof the 21st century people are working and researching in the area of voice recognition. Researchers have contributed many things in this area. Normal speech without any noise is easy to understand by computers, but ifthespeechincludesnoise, then it is very difficult to understand by computer and separate the noise from the speech. There are various reasons to have noise in speech like background noise, environmental noise, signal noise, crowded places, etc. In this paper, we are going to present various techniques to enhance the speech recognition system to work in any environment by researchers. Also, some advanced enhancements in speech recognition to use this system in other situations like emotion recognition. Key Words: Robust Speech Recognition, Artificial Intelligence, Feature Extraction, Noise Reduction, Deep Learning. 1. INTRODUCTION The technique through which a computer (or another sort of machine) recognizes spoken wordsisknown as speech recognition. Essentially, it is conversing with your computer and having it accurately recognize your words. Simply, it means talking to the computer and having it correctly recognize what you are saying. Voice is the most common and fastest mode of communication, and each human voice has a distinct quality that distinguishes it from the others. As a result, not only for humans but also for automated machines, voice recognition is required for easy and natural interaction [12]. Speech recognition has applications in a variety of sectors, including medicine, engineering, and business. The general problem with speech recognition is speaking rate, gender, age, and the environment in which the discussion is taking place, and the second issue is speech noise [12]. If we can solve these issues with speech recognition, it will be much easier to create goods or systems that people can use everywhere, even in crowded areas or in noisy environments. Therefore, it is necessary to remove or reduce the amount of noise in a speech to do effective recognition of speech or voice. And to reduce or remove the noise from speech we have to know the basics of recognition. The basic model of speech recognition or speech-to-text model is shown in figure 1. Figure 1 depicts the basic model ofspeech recognition, also known as the speech-to-text paradigm. Fig-1: The basic model of Speech Recognition. 1.1 Speech Input A human voice is captured or recorded using a microphone and sound card connected to the computer as speech input. Modern sound cards can record audio at sampling rates ranging from 16 kHz to 48 kHz, with bit rates ranging from 8 to 16 bits per sample, and playback speeds of up to 96 kHz [12]. 1.2 Preprocessing Signal processing takes place in this step. This process converts an analog signal to a digital signal and does noise reduction, as well as changes audio frequencies to make it machine-ready [12][13]. 1.2.1 Feature Extraction The next step in pre-processing is to choose which features will be valuable and which will be unnecessary. We need to understand MFCCs (Mel Frequency Cepstral Coefficients) in order to extract features. 1.2.2 MFCCs MFCCs is a method for extracting features from audio signals. It divides the audio signal's frequency bands using the MEL scale, then extracts coefficients from each frequency band to create a frequency separation. The Discrete Cosine Transform is used by MFCC to conduct this operation. The MEL scale is based on human sound
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2715 perception, or how the brain analyses audio impulses and distinguishes between different frequencies. Stevens, Volkmann, and Newmannproposeda pitch in 1937 that gave birth to the MEL scale. It's a scale of audio signals with varying pitch levels that people judge based on their distances being equal. It's a scale that's based on how people perceive things. For example, if we hear two sound sources that are far apart, our brainwill beabletodetermine the distance between them without having to see them. This scale is based on how we humans use our senseofhearingto determine the distance between audiosignals.Thedistances on this scale increase with frequency because human perception is non-linear. This MEL scale is used to separate the frequency bands in audio, after which it extracts the needed information for recognition. 1.3 Model Classification models are mostly used in the model section to recognize or search for words detected inaudio.It is now simple to recognize speech using neural networks, and employing neural networks for speech recognition has proven to be quite beneficial. For speech recognition, RNNs (Recurrent Neural Networks) are commonly used. Speech recognition can also be done without the use of neural networks, although it will be less accurate. Below are somemodelsused forspeechrecognition, like recurrent neural network, Hidden Markov Model etc. 1.3.1 Recurrent Neural Network Recurrent Neural Networks are a type of neural network that is used mostly in NLP. RNNs are a type of neural network that uses past outputs as inputs while maintaining hidden states. RNN features a memory notion that preserves all data about what was calculated up to that time step. RNNs are referred to as recurrent becausethey do the same task for each combination of inputs, with theresult dependent on past calculations. Fig-2 shows the simple illustration of recurrent network [15]. Fig-2: Simple Architecture of RNN. Recurrent neural network is used for voice classification in speech recognition. Also, it is used for language translation. 1.3.2 Hidden Morkov Model When any person speaks, basically it creates the vibrations. To convert this speech into text computer uses several complex steps. First it converts our analog speech signals into digital signals. Then it does pre-process on that signal, like it removes non needed part from signals, it removes noise and do feature extraction. That is, it selects required features from signals and pass it to the Hidden Markov Model, now HMM do the search for word in Database. Then after word searchingitsenttotheusers [14]. Fig-3 shows the simple architecture of Hidden Markov Model for Speech Recognition. Fig-3: HMM for Speech Recognition 1.4 Transcript The transcript is the text obtained after the user's audio has been converted to text. TheSpeechtoTextmethod is the name given to this procedure. The transcript is then sent into a text-to-speech model or system, which allowsthe computer to speak.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2716 2. HISTORY OF SPEECH RECOGNITION "Audrey," a system built by Bell Laboratories in the 1950s, was the first authorized example of our contemporary speech recognition technology. However, audrey takes a full room to be set up and can only identify 9 numbers uttered by its inventor with 90% accuracy. Its goal was to assist toll operators in taking more telephone call over the line, but its high price and inability to detect a wide range of voices rendered it unfeasible. The next innovation was IBM's Shoebox, which debuted at the World's Fair in 1962, took nearly 12 years to build and was able to detect and discriminate between 16 words. However, person had to pause and talk slowly to ensure that the system picked up on what they were saying. Then, The Defence department began to appreciate the significance of voice recognition technology in the early 1970s. The capacity of a computer to interpret genuine human language might be extremely useful in a variety of military and national defence applications. As a result, they invested five years on DARPA's Speech Understanding Research programme, one of the largest initiatives of its sort in speech recognition history. One of the most notable discoveries to emerge from this research effort was "Harpy," a system capableofrecognising over 1000 words. Speech recognition systems were so commoninthe late 1970s and early 1980s that they found their way into children's toys. The Speak & Spell, which used a voice chip, was created in 1978 to assist kidsspell words.Thevoicechip included therein would prove to be a significant instrument for the next stage of speech recognition software development. Then the "Julie" doll was released in 1987. Julie was capable to answer to a person and identify between speaker's voices in an amazing (if not disturbing) exhibition. Then after Three years AT&T was experimenting with over-the-phone voice recognition technologies to help answer customer support calls. Then Dragon launched "Naturally Speaking" in 1997, which allowed natural voice to be interpreted without the need of pauses. And after thatmuchinnovationsandresearchinthis field today we have digital assistants like Google Assistant, Alexa, Siri, Cortana which can process natural speech without pauses and can recognize thousandsofwordsin any language. 3. RELATED WORK Many researchers have worked on various techniques to enhance the speech recognition system in any environment. Given enhancementsinvariouspapersproved helpful in speech recognition systems. 3.1 Automatic Recognition of Noisy Speech Jisi Zhang et.al. [1], have explored A framework for training a monoaural neural augmentation model forrobust voice recognition. To train the model, they used unpaired speech data and noisy speech data. They conducted trials using an end-to-end acoustic model and found that the WER was lowered by 16 to 20%. In the real development and evaluation sets, using a more powerful hybrid acoustic model, the processed speechreduced WERby28%and39%, accordingly, when compared to the unprocessed signal. 3.2 Integration of Speech Recognition and Self- Supervised learning Xuankai Chang et.al. [2], enhanced the end-to-end ASR (Automatic Speech Recognition) model for powerful voice recognition. When comparingtheirsuggestedmodel to a traditional end-to-end ASR model, we can see that theirs incorporates a speech augmentation module and a self- supervised learning representation module. The voice enhancement module improves the sound quality of loud speech. The self-supervisedlearningrepresentation module, on the other hand, performs the feature extractionand picks useful features for voice recognition. 3.3 Noisy Robust Speech Recognition Developing a speech recognition system thatworks in noisy surroundings necessitates a vast amount of data, including noisy speech and transcripts, to train the system, yet such large data is not readily available. Chen Chen et al. have suggested a generative adversarial network torecreate a noisy spectrum from a clean spectrum with only 10 minutes of in-domain noisy voice data as labels.They'vealso proposed a dual-path speech recognition method to boost the system's robustness in noisy situations. The proposed technique improves the absolute WER on the dual-path ASR system by 7.3% to the best baseline, according to the experimental data [3]. 3.4 Speech Emotion Recognition Surekha Reddy Bandela et.al. [4], have presented a strategy that combines unsupervised feature selection in combination with INTERSPEECH 2010 paralinguistic characteristics, Gammatone Cepstral Coefficients (GTCC), and Power Normalized Cepstral Coefficients (PNCC) to improve speech recognition for emotionrecognition.Inboth clean and noisy situations,theproposedsystemistested.For de-noising the noisy voice signal, they adopted a dense Non-
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2717 Negative Matrix Factorization (denseNMF) approach. For Speech Emotion Recognition analysis, they employed the EMO-DB and IEMOCAP databases. They wereabletoachieve an accuracy of 85 % using the EMO-DB database and 77 % using the IEMOCAPdatabase.Withcross-corpusanalysis,the proposed system can be improved for language independence [4]. 3.5 Speech Recognition based on multiple Deep Learning Features Because the English language has the advantage of huge datasets in speech recognition, everyone loves it for speech recognition systems. Speech recognition in the English language is a simple task thankstobigdatasetsanda wide community. As a result, Zhaojuan Song et al. [5], chose English Speech as their study topic and created a deep learning speech recognition method that incorporates both speech features and speech attributes. They used the deep neural network supervised learning approach to extract the high-level qualities of the voice and train the GMM-HMM ANN model with additional speechfeatures. Theyalsouseda speech attribute extractor based on a deep neural network trained for several speech attributes, with the retrieved features being categorized into phonemes. Then, using a neural network based on the linear feature fusion technique, speech characteristics and speech attributes are integrated into the same CNN framework, considerably improvingthe performanceoftheEnglishvoice recognition system [5]. 3.6 Robust Speech Recognition A.I. Alhamada, et al. al[6], have looked into previous work and discovered a suitable deep learning architecture. To improve the performance of speech recognition systems, they used a convolutional neural network. They discovered that a CNN-based approach had a validated accuracy of 94.3%. 3.7 Hybrid-Task Learning for ASR Gueorgui Pironkov, et.al. [7]. presented a new technique named Hybrid-Task learning. This method is based on a mix of Multi-Task and Single-Task learning architectures, resulting in a dynamic hybrid system that switches between single and multi-task learning depending on the type of input feature. Thishybridtask learningsystem is especially well suited to robust automatic speech recognition in situations where there are two types of data available: real noise anddata recordedinreal-lifeconditions, and simulated data obtained by artificially adding noise to clean speech recorded in noiseless conditions [7]. As a result of Hybrid Task Learning, ASR performance on actual and simulated data beats Multi-Task Learning and Single Task Learning, with Hybrid Task Learning bringing up to a 4.4% relative improvement over Single Task Learning. This Hybrid Task Learning technique can be tested on different datasets in the future [7]. 3.8 Audio Visual Speech Recognition An audio-visual speech recognition system is believed to be one of the most promising technologies for robust voice recognition in a noisy environment. As a result, Pan Zhou et al. [9] proposed a multimodal attention-based approach for audio-visual speech recognition, which could automatically learn the merged representation from both modes depending on their importance. The suggested method achieves a 36% relative improvement and beats previous feature concatenation-based AVSR systems, according to experimental results. This strategy can be tested with a larger AVSR dataset in the future [9]. 3.9 Audio Visual Speech Recognition for Hearing Impaired Hearing-impaired persons would benefit greatly from artificial intelligence;approximately466millionpeople worldwide suffer from hearing loss. Students who are deaf or hard of hearing depend on lip-reading to grasp what is being said. However, hearing-impaired students encounter other hurdles, including a shortage of skilled sign language facilitators and the expensive cost of assistivetechnology.As a result, L Ashok Kumar et al. [8], proposed an approach for visual speech recognitionbuiltoncutting-edgedeeplearning methods They merged the results of audio and visual. They also suggested a novel audio-visual speech recognition algorithm based on deep learning for efficient lip reading. They reduced the word error rate in the Automatic Speech Recognition system to about 6.59 % and established a lip- reading model accuracy of nearly 95 % using this model. In the future, a BERT-based language model could improve the suggested model's Word Error Rate [8]. 3.10 Robust Speech Recognition Using Reinforcement Learning. The goal of Deep Neural Network-based voice enhancement is to reduce the mean square error among improved speech and a cleanreference.Asa result,Yih-Liang Shen et al. [10] suggested a reinforcementlearningapproach to improve the voice enhancement model based on recognition results. They used the Chinese broadcast news dataset to test the proposed voice enhancement system (MATBN). The RL-Based voice enhancement mayeffectively reduce character mistake rates by 12.40% and 19.23% by using recognition errors as the goal function. More noise kinds will be investigated in the future when developing the RL-based system [10].
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2718 3.11 Noise Reduction in Speech Recognition We've seen that reducing speech noise is essential for constructing reliable speechrecognitionsystemsinnoisy contexts. As a result, Imad Qasim Habeeb et al. [11] suggested an ensemble approach that uses numerousnoise- reducing filters to increasetheaccuracyofAutomaticSpeech Recognition. To boost Automatic Speech Recognition Accuracy, they applied three noise reduction filters. These three filters generate numerous copies of the speech signal, with the best copy chosen for the voice recognition system's final output. They studied this model and found promising results, such as a 16.61% drop-in character errorrateand an 11.54% reduction in word error rate when compared to previous strategies [11]. The results of the experiments showed that alternative noise reduction filters could potentially provide additional information about the phonemes to be categorized, which could be used to improve the Automatic Speech Recognition System's accuracy. However, because a human can guess words or phrases even if he doesn't hear them completely, a technique based on the n-gram model can be developed to give a mechanism similar to the human hearing system [11]. 4. DISCUSSION AND CONCLUSIONS This study discusses strong voice recognition and critical factors in speech recognition, as well as diverse methodologies used by different researchers. We've looked at how speech recognition works, covering audio recording, voice pre-processing, classification models, and neural networks for speech recognition. We also spoke about how to extract features from speech signals. An overview of tests conducted by authors with their work has also been included. Robust speech recognition is still a long way off for humans, and developing such a system with good performance is a difficult undertaking.Inthispaper, wehave tried to provide a comprehensive overview of key advancements in the field of Automatic Speech Recognition. We've also attempted to list some research that could be useful in developing systems for physically challenged people. REFERENCES [1] Jisi Zhang, Catalin Zoril a, Rama Doddipatla and Jon Barker “On monoaural speech enhancement for automatic recognitionof real noisyspeechusingmixture invariant training.” arXiv preprint arXiv:2205.01751, 2022. [2] Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe “End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation.” arXiv preprint arXiv:2204.00540, 2022. [3] Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng Siong Chng “NOISE-ROBUST SPEECH RECOGNITION WITH 10 MINUTES UNPARALLELED IN-DOMAIN DATA.” arXiv preprint arXiv:2203.15321, 2022. [4] Surekha Reddy Bandela, T. Kishore Kumar “Unsupervised feature selection andNMFde-noisingfor robust Speech Emotion Recognition”,AppliedAcoustics, Elsevier, Vol 172, January 2021. [5] Zhaojuan Song “English speech recognition based on deep learning with multiple features.” Springer, 2019. [6] A. I. Alhamada, O. O. Khalifa, and A. H. Abdalla “Deep Learning for Environmentally Robust Speech Recognition”, AIP Conference Proceedings, 2020 [7] Gueorgui Pironkov, Sean UN Wood, Stephane Dupont “Hybrid-task learning for robust automatic speech recognition.” Elsevier, Computer Speech & Language, Volume 64, 2020. [8] L Ashok Kumar, D Karthika Renuka, S Lovelyn Rose, MC Shunmuga Priya, I Made Wartana “Deep learning based assistive technology on audio visual speech recognition for hearing impaired.”,International Journal ofCognitive Computing in Engineering, Volume 3, 2022. [9] Pan Zhou, Wenwen Yang, Wei Chen, Yanfeng Wang, Jia Jia “MODALITY ATTENTION FOR END-TO-END AUDIO- VISUAL SPEECH RECOGNITION”, arXiv preprint arXiv:1811.05250, 2019. [10] Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu Tsao, Hsin-Min Wang, and Tai-Shih Chi, “REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUSTSPEECHRECOGNITION “, arXiv preprint arXiv:1811.04224, 2018. [11] Imad Qasim Habeeb, Tamara Z. Fadhil, Yaseen Naser Jurn, Zeyad Qasim Habeeb, Hanan Najm Abdulkhudhur, “An ensemble technique for speech recognition in noisy environments.”, Indonesian Journal of Electrical Engineering and Computer Science Vol. 18, No. 2, May 2020 [12] Pranjal Maurya, DayasankarSingh “A Survey – Robust Speech Recognition”, International Journal of Advance Research in Science and Engineering, Volume 7, 2018. [13] Atma Prakash Singh, Ravindra Nath, Santosh Kumar, “A Survey: Speech Recognition Approaches and Techniques”, IEEE Xplore, 2019.
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2719 [14] Neha Jain, Somya Rastogi “SPEECH RECOGNITION SYSTEMS - A COMPREHENSIVE STUDY OF CONCEPTS AND MECHANISM.”, Acta Informatica Malaysia, 2019 [15] Dr.R.L.K.Venkateswarlu, Dr. R. Vasantha Kumari, G.Vani JayaSri “Speech Recognition By Using Recurrent Neural Networks.”, International Journal of Scientific & Engineering Research Volume 2, 2011.