1. Report on Speech Recognition AI
Tehmeena Naheed (043)
E-mail:
Tayyaba Rani (046)
E-mail :
Tehzeeb Khan Marwat (016)
E-mail :
Abstract:
Artificial Intelligenceisbecomingapopularfieldincomputerscience.Inthisreportwe exploredits
history, majoraccomplishmentsandthe visionsof itscreators.We lookedathow Artificial Intelligence
expertsinfluence reportingandengineeredasurveytogauge publicopinion.We alsoexaminedexpert
predictionsconcerningthe future of the fieldaswell asmediacoverage of itsrecentaccomplishments.
These resultswere thenusedtoexplore the linksbetweenexpertopinion,publicopinionandmedia
coverage.
Introduction:
Artificial Intelligencehasbeenstudiedfordecadesandisstill one of the mostelusive subjectsin
ComputerScience.Thispartlydue tohow large and nebulousthe subjectis.AIrangesfrommachines
trulycapable of thinkingtosearchalgorithmsusedtoplayboard games.Ithas applicationsinnearly
everywaywe use computersinsociety.thispaperisaboutexaminingthe historyof artificialintelligence
fromtheoryto practice and fromits rise to fall,highlightingfew majorthemesandadvances.
Goal:
There have beenvarioustrendsinAIeversince itsinception.Inthe earlierdaysof Artificial Intelligence,
there wasan enormousamountof hype aboutthe possibilitiesof computertechnologyincreating
intelligentmachines.These expectationswere unrealistic.We wishtoexaminethe currentviews
expressedbybothexpertsandLaypeopleaboutthe nature of Artificial Intelligence,aswell asaboutthe
possibilitiesof AItechnologyinthe nearfuture.Inexaminingbothof these we willconsiderthe extent
to whichexpertopinionsandthe currenttrendsinArtificial Intelligence alignwiththe viewsand
opinionsof the laypeople.Fromthiswe hope tocomprehendthe extenttowhichthe opinionsheldby
2. laypeoplecorrespondtothe actual innovationsinArtificial Intelligence,aswell asitspastand future
applications.`
Speech Recognition:
Definition:
It isthe science andengineeringof makingintelligentmachines,especiallyintelligent
computerprograms.AImeansArtificial Intelligence.Intelligence howevercannotbe definedbutAIcan
be describedasbranch of computerscience dealing withthe simulationof machine exhibitingintelligent
behavior. Speakerindependency,The speechqualityvariesfrompersontoperson.Itistherefore
difficulttobuildanelectronicsystemthatrecognizeseveryone’svoice.Bylimitingthe systemtothe
voice of a single person,the systembecomesnotonlysimplerbutalsomore reliable.The computer
mustbe trainedtothe voice of that particularindividual.Suchasystemiscalledspeaker-dependent
system.Speakerindependentsystemscanbe usedbyanybody,andcan recognize anyvoice,even
thoughthe characteristicsvarywidelyfromone speakertoanother.Mostof these systemsare costly
and complex.Also,these have verylimitedvocabularies. Itisimportanttoconsiderthe environmentin
whichthe speechrecognitionsystemhastowork.The grammar usedby the speakerandacceptedby
the system,noise level,noise type,positionof the microphone,andspeedandmannerof the user’s
speechare some factorsthat may affectthe qualityof speechrecognition.
Environmental influence:
Real applicationsdemandthatthe performance of the recognitionsystembe unaffectedbychangesin
the environment.However,itisa factthat whena systemistrainedandtestedunderdifferent
conditions,the recognitionrate dropsunacceptably.We needtobe concernedaboutthe variability
presentwhendifferentmicrophonesare usedintrainingandtesting,andspecificallyduring
developmentof procedures.Suchcare can significantlyimprove the accuracyof recognitionsystems
that use desktopmicrophones.Acoustical distortionscandegrade the accuracyof recognitionsystems.
Obstaclestorobustnessincludeadditive noise frommachinery,competingtalkers,reverberationfrom
surface reflectionsinaroom,and spectral shapingbymicrophonesandthe vocal tracts of individual
speakers.These sourcesof distortionsfallintotwocomplementaryclasses;additivenoise and
distortionsresultingfromthe convolutionof the speechsignal withanunknownlinearsystem.A
numberof algorithmsforspeechenhancementhave beenproposed.These includethe following:
1. Spectral subtractionof DFT coefficients
2. MMSE techniquestoestimate the DFTcoefficientsof corruptedspeech
3. Spectral equalizationtocompensate forconvoluteddistortions
4. Spectral subtractionandspectral equalization.Althoughrelativelysuccessful,all thesemethods
dependonthe assumptionof independenceof the spectral estimatesacrossfrequencies.
3. Improvedperformance canbe gotwithan MMSE estimatorinwhichcorrelationamong
frequenciesismodeledexplicitly.
Speaker-specific features:
Speakeridentitycorrelateswiththe physiological andbehavioral characteristicsof the speaker.These
characteristicsexistbothinthe vocal tract characteristicsandin the voice source characteristics,as also
inthe dynamicfeaturesspanningseveral segments.The mostcommonshort-termspectral
measurementscurrentlyusedare the spectral coefficientsderivedfromthe LinearPredictive Coding
and theirregressioncoefficients.A spectral envelope reconstructedfromtruncatedsetof spectral
coefficientsismuchsmootherthanone reconstructedfromLPCcoefficients.Therefore,itprovidesa
more stable representationfromone repetitiontoanotherof particularspeaker’sutterances.Asforthe
regressioncoefficients,typicallythe firstandsecondordercoefficientsare extractedateveryframe
periodtorepresentthe spectral dynamics.Thesecoefficientsare derivativesof the time functionof the
spectral coefficientsandare calledthe deltaanddelta-delta-spectral coefficientsrespectively.
Speech Recognition:
The user communicateswiththe applicationthroughthe appropriateinputdevice i.e.amicrophone.
The Recognizerconvertsthe analogsignal intodigital signal forthe speechprocessing.A streamof text
isgeneratedafterthe processing.Thissource-language textbecomesinputtothe Translation Engine,
whichconvertsitto the target language text.
Salient Features:
1. InputModes
ThroughSpeechEngine
Throughsoft copy
2. Interactive Graphical UserInterface
3. Format Retention
4. Fast and standardtranslation
5. Interactive Pre-processingtool
Spell checker.
Phrase marker.
Propernoun,date and otherpackage specificidentifierInputFormat.
InputFormat : txt,.doc .rtf.
User friendlyselectionof multipleoutput.
Online thesaurusforselectionof contextuallyappropriate synonym.
Online wordaddition,grammarcreationandupdatingfacility.
Personal accountcreationandinbox management.
4. Applications:
One of the mainbenefitsof speechrecognitionsystemisthatitletsuserdo otherworkssimultaneously.
The user can concentrate onobservationandmanual operations,andstill control the machineryby
voice inputcommands.Anothermajorapplicationof speechprocessingisinmilitaryoperations.Voice
control of weaponsisanexample.Withreliablespeechrecognitionequipment,pilotscangive
commandsand informationtothe computersbysimplyspeakingintotheirmicrophones - theydon’t
have to use theirhandsfor thispurpose.Anothergoodexample isaradiologistscanninghundredsof X-
rays, ultrasonograms,CT scansand simultaneouslydictatingconclusionstoa speechrecognitionsystem
connectedtowordprocessors.The radiologistcanfocushisattentiononthe imagesratherthanwriting
the text.Voice recognitioncouldalsobe usedoncomputersformakingairline andhotel reservations. A
User requiressimplystatinghisneeds,tomake reservation,cancel areservation,ormakingenquiries
aboutschedule.
Conclusion:
By usingthisspeakerrecognitiontechnologywe canachieve manyuses.Thistechnologyhelpsphysically
challengedskilledpersons.Thesepeople candotheirworksbyusingthistechnologywithoutpushing
any buttons.ThisASRtechnologyisalsousedinmilitaryweaponsandinResearchcenters.Now aday
thistechnologywasalsousedbyCID officers.Theyusedthistotrapthe criminal activities.
References
http://venturebeat.com/2012/10/07/google-uses-its-artificial-intelligence-to-improve-speech-
recognition/
http://venturebeat.com/2012/10/07/google-uses-its-artificial-intelligence-to-improve-speech-
recognition/
http://www.sciencedaily.com/articles/s/speech_recognition.htm
DevelopinganArtificialIntelligence Engine(Michael vanLentandJohnLaird)
______________________________________________________________________