SlideShare une entreprise Scribd logo
1  sur  24
Presented By:
Patel Prashant(CE 137)
The Web, databases, and other digitized information storehouses contain a
growing volume of audio content.
 For example newscasts, sporting events, telephone conversations,
recordings of meetings, Webcasts, documentary archives etc.
 Users want to make the most of this material by searching and indexing
the digitized audio content.
 In the past, companies had to create and manually analyze written
transcripts of audio content because using computers to recognize, interpret,
and analyze digitized speech was difficult.
 However, the development of faster microprocessors, larger storage
capacities, and better speech-recognition algorithms has made audio mining
easier.
Audio mining, also called audio searching, takes a text-based query
and locates the search term or phrase in an audio file.
 This helps users by, for example, letting them quickly get to specific
places in a recorded conversation or determine when a company is
mentioned in a newscast.
 Audio indexing uses speech recognition to analyze an entire file and
produce a searchable index of content bearing words and their
locations.
This is critical because audio content is in a binary format that is
otherwise not readily searchable.
Indexing audio content thus enables searching.
 There are two main approaches to audio mining :
1.Text-based indexing :
oIt converts speech to text and then identifies words in a dictionary
that can contain several hundred thousand entries. If a word or name is
not in the dictionary, the system will choose the most similar word it
can find.
oThe system uses language understanding to create a confidence level
for its findings. For findings with less than a 100 percent confidence
level, the system offers other possible word matches.
2. Phoneme-based indexing:
o It doesn’t convert speech to text but instead works only with sounds.
o The system first analyzes and identifies sounds in a piece of audio
content to create a phonetic-based index. It then uses a dictionary of
several dozen phonemes to convert a user’s search term to the correct
phoneme string.
o Phonemes are the smallest units of speech in a language, all words
are set of phonemes.
oFinally, the system looks for the search terms in the index.
 A phonetic system requires a more efficient search tool because it
must phoneticize the query term, then try to match it with the existing
phonetic string output .This is considerably more complex than using
one of the many existing text-based search tools.
Phoneme-based searches can result in more false matches than the
text-based approach, particularly for short search terms, because many
words sound alike or sound like parts of other words.
However, phonetic indexing can still be useful if the analyzed
material contains important words that are likely to be missing from a
text system’s dictionary, such as foreign terms and names of people and
places
Text- and phoneme-based systems operate in much the same way, except
that the former uses a text-based dictionary and the latter uses a phonetic
dictionary.
A speech recognizer converts the observed acoustic signal into the
corresponding written representation of the spoken words.
Speech recognition software contains acoustic models of the way in which
all phonemes are represented.
Also, there is a statistical language model that indicates how likely words are
to follow each other in a specific Language.
By using these capabilities, as well as complex probability analysis, the
technology can take a speech signal of unknown content and convert it to a
series of words .
Figure: ScanSoft Audio Mining System
 Since music is often
described by
genre, we would
like to annotate
our music data with genre.
Classification by genre
is useful for
music search and
retrieval and also for
playlist generation
Linear and non-linear neural networks
Gaussian Classification
Gaussian mixture models
Hidden Markov model
Let us consider a simple weather forecast problem and try to
emulate a model that can predict tomorrow's weather based on
today’s condition. In this example we have three stationary all day
weather, which could be sunny (S), cloudy (C), or Rainy (R). From
the
history of the weather of the town under investigation we have the
following table
•We refer to the weather conditions by state q that are
sampled at instant t and the problem is to find the
probability of weather condition of tomorrow given today's
condition
P(qt+1 /qt).
•An acceptable approximation for n instants history is :
•P(qt+1/qt , qt-1 , qt-2 , ….. , qt-n ) » P(qt+1 /qt).
•This is the first order Markov chain as the history is
considered to be one instant only.
Let us now ask this question: Given today as sunny (S) what is
the probability that the next following five days are S , C , C , R
and S, having the above model?
The answer resides in the following formula using first order
Markov chain:
P(q1 = S, q2=S, q3=C, q4=C, q5=R, q6=S) =
P(S).P(q2=S/q1=S). P(q3=C/q2=S). P(q4=C/q3=C).
P(q5=R/q4=C). P(q6=S/q5=R)
= 1 x 0.7 x 0.2 x 0.8 x 0.15 x 0.15
= 0.00252
The initial probability P(S) = 1, as it is assumed that today is
sunny.
The model is completely defined by these three sets of parameters a,
b, and p and the model of N states and M observations can be
referred to by :
λ = (A , B , p )
where A = {aij}, B = {bj(wk)} 1 <=i , j <=N and 1< =k <= M.
aij represents the probability of state transition (probability of being
in state Sj given state Si )
aij = P(qt+1=Sj / qt=Si) .
bj(wk) is the probability distribution in a state Sj
w is the alphabet and k is the number of symbols in this alphabet.
π = {1 0 0 } is the initial state probability distribution.
Each phoneme(unit of sound) could be represented by a
state of different and varying duration.
Accordingly, the transition between different phonemes to
form a word can be represented by A = {aij}.
 The observations in this case are the sounds produced in
each position and due to the variations in the evolution of
each sound
this can be also represented by a probabilistic function
B = {bj(wk)}.
A major challenge for speech recognition tools has been recognizing the
speech of different users in different environments.
With this in mind, BBN, IBM, Fast-Talk, and ScanSoft have designed their
audio mining technology to be largely speaker independent.
For example, Fast-Talk’s acoustic models are trained to recognize numerous
speakers via exposure to audio data from speakers representing various ages,
dialects, and speaking styles.
Some audio mining technology uses acoustic models tuned to understand
speech from different environments— such as telephony, TV, or radio.
Designing filters to reduce the background noise that can interfere
with accurate speech recognition.
Creating efficient data structures for representing content.
Developing algorithms that quickly work through the data structures
during indexing and searching.
 Precision is improving but it is still a key issue impeding the
technology’s widespread adoption, particularly in such accuracy-
critical applications as court reporting and medical dictation.
Processing conversational speech can be particularly difficult
because of such factors as overlapping words and background noise.
Breakthroughs in natural language understanding will eventually
lead to big improvements, but until then audio mining will get
better only incrementally.
It’s currently seen as a ‘nice to have,’ not quite yet a ‘need to have’
technology.
Companies could use audio mining to analyze customer-service and
helpdesk conversations or even voice mail.
 Law enforcement and intelligence organizations could use the
technology to analyze intercepted phone conversations.
Broadcast companies like CNN and Radio Free Asia are already
using
audio mining to quickly retrieve important background information
from previous broadcasts when new stories break.
A US prison is using ScanSoft’s audio mining product to analyze
recordings of prisoners’ phone calls to identify illegal activity.
 Musical audio mining relates to the identification of perceptually
important characteristics of a piece of music such as melodic,
harmonic or rhythmic structure.
 Searches can then be carried out to find pieces of music that are
similar in terms of their melodic, harmonic and/or rhythmic
characteristics.
This type of analysis can also be used in music to determine
characteristics like beats per minute (BPM), musical key, and musical
structure, information that is employed to classify music.
Music downloading sites that categorize music by genre uses audio
mining to organize the music.
 Research paper: “Lets Hear It For Audio Mining” by Neal Leavitt.
Research paper: “Tendencies, Perspectives, and Opportunities of
Musical Audio-Mining” by Ghent University, Belgium.
wikipedia.org/wiki/Audio_mining
Thank You

Contenu connexe

Tendances

Multi media Data mining
Multi media Data miningMulti media Data mining
Multi media Data mininghome
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalBiniam Asnake
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine LearningRitwikSaurabh1
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabSohaib Tallat
 
COM2304: Introduction to Computer Vision & Image Processing
COM2304: Introduction to Computer Vision & Image Processing COM2304: Introduction to Computer Vision & Image Processing
COM2304: Introduction to Computer Vision & Image Processing Hemantha Kulathilake
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series dataKrish_ver2
 
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMSM.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMSSupriya Radhakrishna
 
Dynamic Programming Code-Optimization Algorithm (Compiler Design)
Dynamic Programming Code-Optimization Algorithm (Compiler Design)Dynamic Programming Code-Optimization Algorithm (Compiler Design)
Dynamic Programming Code-Optimization Algorithm (Compiler Design)Dhrumil Panchal
 
Block Cipher and its Design Principles
Block Cipher and its Design PrinciplesBlock Cipher and its Design Principles
Block Cipher and its Design PrinciplesSHUBHA CHATURVEDI
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Image processing second unit Notes
Image processing second unit NotesImage processing second unit Notes
Image processing second unit NotesAAKANKSHA JAIN
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
Signature verification Using SIFT Features
Signature verification Using SIFT FeaturesSignature verification Using SIFT Features
Signature verification Using SIFT FeaturesAshikur Rahman
 

Tendances (20)

Multi media Data mining
Multi media Data miningMulti media Data mining
Multi media Data mining
 
Signature files
Signature filesSignature files
Signature files
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based Retrieval
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine Learning
 
Multimedia Information Retrieval
Multimedia Information RetrievalMultimedia Information Retrieval
Multimedia Information Retrieval
 
Query trees
Query treesQuery trees
Query trees
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, Matlab
 
COM2304: Introduction to Computer Vision & Image Processing
COM2304: Introduction to Computer Vision & Image Processing COM2304: Introduction to Computer Vision & Image Processing
COM2304: Introduction to Computer Vision & Image Processing
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMSM.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
M.TECH 1ST SEM COMPUTER SCIENCE ADBMS LAB PROGRAMS
 
Dynamic Programming Code-Optimization Algorithm (Compiler Design)
Dynamic Programming Code-Optimization Algorithm (Compiler Design)Dynamic Programming Code-Optimization Algorithm (Compiler Design)
Dynamic Programming Code-Optimization Algorithm (Compiler Design)
 
Block Cipher and its Design Principles
Block Cipher and its Design PrinciplesBlock Cipher and its Design Principles
Block Cipher and its Design Principles
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Elementary TCP Sockets
Elementary TCP SocketsElementary TCP Sockets
Elementary TCP Sockets
 
Image processing second unit Notes
Image processing second unit NotesImage processing second unit Notes
Image processing second unit Notes
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
Signature verification Using SIFT Features
Signature verification Using SIFT FeaturesSignature verification Using SIFT Features
Signature verification Using SIFT Features
 

En vedette

4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia dataminingKrish_ver2
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data miningJimit Rupani
 
Multimedia Data Mining using Deep Learning
Multimedia Data Mining using Deep LearningMultimedia Data Mining using Deep Learning
Multimedia Data Mining using Deep LearningBhagyashree Barde
 
Perfectessay.net research paper sample #1 apa style
Perfectessay.net research paper sample #1 apa stylePerfectessay.net research paper sample #1 apa style
Perfectessay.net research paper sample #1 apa styleDavid Smith
 
Brunel NIHR event 26 March 2015 - College of Health and Life Sciences researc...
Brunel NIHR event 26 March 2015 - College of Health and Life Sciences researc...Brunel NIHR event 26 March 2015 - College of Health and Life Sciences researc...
Brunel NIHR event 26 March 2015 - College of Health and Life Sciences researc...Professor Priscilla Harries
 
Siam term paper finalIdentifying the Marketing Strategy in Existing mobile Co...
Siam term paper finalIdentifying the Marketing Strategy in Existing mobile Co...Siam term paper finalIdentifying the Marketing Strategy in Existing mobile Co...
Siam term paper finalIdentifying the Marketing Strategy in Existing mobile Co...Polta Siam
 
Educatonal Research Methodology Framework
Educatonal Research Methodology FrameworkEducatonal Research Methodology Framework
Educatonal Research Methodology FrameworkSt. John's University
 
Writing The Research Paper
Writing The Research PaperWriting The Research Paper
Writing The Research PaperProf S
 
Business Process Re-Engineering Essay
Business Process Re-Engineering EssayBusiness Process Re-Engineering Essay
Business Process Re-Engineering EssayPrince Bertrand
 
Experimental/Research Reports
Experimental/Research ReportsExperimental/Research Reports
Experimental/Research ReportstheLecturette
 
A research paper on leadership
A research paper on leadershipA research paper on leadership
A research paper on leadershipPriyanka Singh
 
Term paper on grameen phone telecom...Download: http://studyassignment.blogsp...
Term paper on grameen phone telecom...Download: http://studyassignment.blogsp...Term paper on grameen phone telecom...Download: http://studyassignment.blogsp...
Term paper on grameen phone telecom...Download: http://studyassignment.blogsp...Pujan Kumar Saha
 
A study on new ULIP guidelines and comparative analysis on new ULIP vis-a-vis...
A study on new ULIP guidelines and comparative analysis on new ULIP vis-a-vis...A study on new ULIP guidelines and comparative analysis on new ULIP vis-a-vis...
A study on new ULIP guidelines and comparative analysis on new ULIP vis-a-vis...Prashant Maharshi
 
Workplace motivation paper
Workplace motivation paperWorkplace motivation paper
Workplace motivation paperPhillip Woodard
 
Eco tourism project paper
Eco tourism project paperEco tourism project paper
Eco tourism project paperSkillet Tony
 
Rural marketing research new
Rural marketing research newRural marketing research new
Rural marketing research newRamendra Tripathi
 
UGC NET PREPARATION
UGC NET PREPARATIONUGC NET PREPARATION
UGC NET PREPARATIONAloke Kumar
 
Serial Killers Psychology Presentation
Serial Killers Psychology PresentationSerial Killers Psychology Presentation
Serial Killers Psychology PresentationPietro Solda
 

En vedette (20)

Visual Data Mining
Visual Data MiningVisual Data Mining
Visual Data Mining
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Multimedia Data Mining using Deep Learning
Multimedia Data Mining using Deep LearningMultimedia Data Mining using Deep Learning
Multimedia Data Mining using Deep Learning
 
Perfectessay.net research paper sample #1 apa style
Perfectessay.net research paper sample #1 apa stylePerfectessay.net research paper sample #1 apa style
Perfectessay.net research paper sample #1 apa style
 
Prime Essay Writings Egyptian art coursework
Prime Essay Writings Egyptian art courseworkPrime Essay Writings Egyptian art coursework
Prime Essay Writings Egyptian art coursework
 
Brunel NIHR event 26 March 2015 - College of Health and Life Sciences researc...
Brunel NIHR event 26 March 2015 - College of Health and Life Sciences researc...Brunel NIHR event 26 March 2015 - College of Health and Life Sciences researc...
Brunel NIHR event 26 March 2015 - College of Health and Life Sciences researc...
 
Siam term paper finalIdentifying the Marketing Strategy in Existing mobile Co...
Siam term paper finalIdentifying the Marketing Strategy in Existing mobile Co...Siam term paper finalIdentifying the Marketing Strategy in Existing mobile Co...
Siam term paper finalIdentifying the Marketing Strategy in Existing mobile Co...
 
Educatonal Research Methodology Framework
Educatonal Research Methodology FrameworkEducatonal Research Methodology Framework
Educatonal Research Methodology Framework
 
Writing The Research Paper
Writing The Research PaperWriting The Research Paper
Writing The Research Paper
 
Business Process Re-Engineering Essay
Business Process Re-Engineering EssayBusiness Process Re-Engineering Essay
Business Process Re-Engineering Essay
 
Experimental/Research Reports
Experimental/Research ReportsExperimental/Research Reports
Experimental/Research Reports
 
A research paper on leadership
A research paper on leadershipA research paper on leadership
A research paper on leadership
 
Term paper on grameen phone telecom...Download: http://studyassignment.blogsp...
Term paper on grameen phone telecom...Download: http://studyassignment.blogsp...Term paper on grameen phone telecom...Download: http://studyassignment.blogsp...
Term paper on grameen phone telecom...Download: http://studyassignment.blogsp...
 
A study on new ULIP guidelines and comparative analysis on new ULIP vis-a-vis...
A study on new ULIP guidelines and comparative analysis on new ULIP vis-a-vis...A study on new ULIP guidelines and comparative analysis on new ULIP vis-a-vis...
A study on new ULIP guidelines and comparative analysis on new ULIP vis-a-vis...
 
Workplace motivation paper
Workplace motivation paperWorkplace motivation paper
Workplace motivation paper
 
Eco tourism project paper
Eco tourism project paperEco tourism project paper
Eco tourism project paper
 
Rural marketing research new
Rural marketing research newRural marketing research new
Rural marketing research new
 
UGC NET PREPARATION
UGC NET PREPARATIONUGC NET PREPARATION
UGC NET PREPARATION
 
Serial Killers Psychology Presentation
Serial Killers Psychology PresentationSerial Killers Psychology Presentation
Serial Killers Psychology Presentation
 

Similaire à Audio Mining Techniques and Applications

Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labIRJET Journal
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsR Systems International
 
A Recorded Debating Dataset
A Recorded Debating DatasetA Recorded Debating Dataset
A Recorded Debating DatasetScott Faria
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemIJERA Editor
 
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...aciijournal
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Mood classification of songs based on lyrics
Mood classification of songs based on lyricsMood classification of songs based on lyrics
Mood classification of songs based on lyricsFrancesco Cucari
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...butest
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
 
Annotating Soundscapes.pdf
Annotating Soundscapes.pdfAnnotating Soundscapes.pdf
Annotating Soundscapes.pdfMichelle Shaw
 

Similaire à Audio Mining Techniques and Applications (20)

Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
 
A Recorded Debating Dataset
A Recorded Debating DatasetA Recorded Debating Dataset
A Recorded Debating Dataset
 
Dy36749754
Dy36749754Dy36749754
Dy36749754
 
Voice
VoiceVoice
Voice
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Mood classification of songs based on lyrics
Mood classification of songs based on lyricsMood classification of songs based on lyrics
Mood classification of songs based on lyrics
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
F334047
F334047F334047
F334047
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Annotating Soundscapes.pdf
Annotating Soundscapes.pdfAnnotating Soundscapes.pdf
Annotating Soundscapes.pdf
 

Dernier

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Dernier (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Audio Mining Techniques and Applications

  • 2. The Web, databases, and other digitized information storehouses contain a growing volume of audio content.  For example newscasts, sporting events, telephone conversations, recordings of meetings, Webcasts, documentary archives etc.  Users want to make the most of this material by searching and indexing the digitized audio content.  In the past, companies had to create and manually analyze written transcripts of audio content because using computers to recognize, interpret, and analyze digitized speech was difficult.  However, the development of faster microprocessors, larger storage capacities, and better speech-recognition algorithms has made audio mining easier.
  • 3. Audio mining, also called audio searching, takes a text-based query and locates the search term or phrase in an audio file.  This helps users by, for example, letting them quickly get to specific places in a recorded conversation or determine when a company is mentioned in a newscast.  Audio indexing uses speech recognition to analyze an entire file and produce a searchable index of content bearing words and their locations. This is critical because audio content is in a binary format that is otherwise not readily searchable. Indexing audio content thus enables searching.
  • 4.  There are two main approaches to audio mining : 1.Text-based indexing : oIt converts speech to text and then identifies words in a dictionary that can contain several hundred thousand entries. If a word or name is not in the dictionary, the system will choose the most similar word it can find. oThe system uses language understanding to create a confidence level for its findings. For findings with less than a 100 percent confidence level, the system offers other possible word matches.
  • 5. 2. Phoneme-based indexing: o It doesn’t convert speech to text but instead works only with sounds. o The system first analyzes and identifies sounds in a piece of audio content to create a phonetic-based index. It then uses a dictionary of several dozen phonemes to convert a user’s search term to the correct phoneme string. o Phonemes are the smallest units of speech in a language, all words are set of phonemes. oFinally, the system looks for the search terms in the index.
  • 6.  A phonetic system requires a more efficient search tool because it must phoneticize the query term, then try to match it with the existing phonetic string output .This is considerably more complex than using one of the many existing text-based search tools. Phoneme-based searches can result in more false matches than the text-based approach, particularly for short search terms, because many words sound alike or sound like parts of other words. However, phonetic indexing can still be useful if the analyzed material contains important words that are likely to be missing from a text system’s dictionary, such as foreign terms and names of people and places
  • 7. Text- and phoneme-based systems operate in much the same way, except that the former uses a text-based dictionary and the latter uses a phonetic dictionary. A speech recognizer converts the observed acoustic signal into the corresponding written representation of the spoken words. Speech recognition software contains acoustic models of the way in which all phonemes are represented. Also, there is a statistical language model that indicates how likely words are to follow each other in a specific Language. By using these capabilities, as well as complex probability analysis, the technology can take a speech signal of unknown content and convert it to a series of words .
  • 8. Figure: ScanSoft Audio Mining System
  • 9.
  • 10.  Since music is often described by genre, we would like to annotate our music data with genre. Classification by genre is useful for music search and retrieval and also for playlist generation
  • 11. Linear and non-linear neural networks Gaussian Classification Gaussian mixture models Hidden Markov model
  • 12. Let us consider a simple weather forecast problem and try to emulate a model that can predict tomorrow's weather based on today’s condition. In this example we have three stationary all day weather, which could be sunny (S), cloudy (C), or Rainy (R). From the history of the weather of the town under investigation we have the following table
  • 13. •We refer to the weather conditions by state q that are sampled at instant t and the problem is to find the probability of weather condition of tomorrow given today's condition P(qt+1 /qt). •An acceptable approximation for n instants history is : •P(qt+1/qt , qt-1 , qt-2 , ….. , qt-n ) » P(qt+1 /qt). •This is the first order Markov chain as the history is considered to be one instant only.
  • 14. Let us now ask this question: Given today as sunny (S) what is the probability that the next following five days are S , C , C , R and S, having the above model? The answer resides in the following formula using first order Markov chain: P(q1 = S, q2=S, q3=C, q4=C, q5=R, q6=S) = P(S).P(q2=S/q1=S). P(q3=C/q2=S). P(q4=C/q3=C). P(q5=R/q4=C). P(q6=S/q5=R) = 1 x 0.7 x 0.2 x 0.8 x 0.15 x 0.15 = 0.00252 The initial probability P(S) = 1, as it is assumed that today is sunny.
  • 15.
  • 16. The model is completely defined by these three sets of parameters a, b, and p and the model of N states and M observations can be referred to by : λ = (A , B , p ) where A = {aij}, B = {bj(wk)} 1 <=i , j <=N and 1< =k <= M. aij represents the probability of state transition (probability of being in state Sj given state Si ) aij = P(qt+1=Sj / qt=Si) . bj(wk) is the probability distribution in a state Sj w is the alphabet and k is the number of symbols in this alphabet. π = {1 0 0 } is the initial state probability distribution.
  • 17. Each phoneme(unit of sound) could be represented by a state of different and varying duration. Accordingly, the transition between different phonemes to form a word can be represented by A = {aij}.  The observations in this case are the sounds produced in each position and due to the variations in the evolution of each sound this can be also represented by a probabilistic function B = {bj(wk)}.
  • 18. A major challenge for speech recognition tools has been recognizing the speech of different users in different environments. With this in mind, BBN, IBM, Fast-Talk, and ScanSoft have designed their audio mining technology to be largely speaker independent. For example, Fast-Talk’s acoustic models are trained to recognize numerous speakers via exposure to audio data from speakers representing various ages, dialects, and speaking styles. Some audio mining technology uses acoustic models tuned to understand speech from different environments— such as telephony, TV, or radio.
  • 19. Designing filters to reduce the background noise that can interfere with accurate speech recognition. Creating efficient data structures for representing content. Developing algorithms that quickly work through the data structures during indexing and searching.
  • 20.  Precision is improving but it is still a key issue impeding the technology’s widespread adoption, particularly in such accuracy- critical applications as court reporting and medical dictation. Processing conversational speech can be particularly difficult because of such factors as overlapping words and background noise. Breakthroughs in natural language understanding will eventually lead to big improvements, but until then audio mining will get better only incrementally. It’s currently seen as a ‘nice to have,’ not quite yet a ‘need to have’ technology.
  • 21. Companies could use audio mining to analyze customer-service and helpdesk conversations or even voice mail.  Law enforcement and intelligence organizations could use the technology to analyze intercepted phone conversations. Broadcast companies like CNN and Radio Free Asia are already using audio mining to quickly retrieve important background information from previous broadcasts when new stories break. A US prison is using ScanSoft’s audio mining product to analyze recordings of prisoners’ phone calls to identify illegal activity.
  • 22.  Musical audio mining relates to the identification of perceptually important characteristics of a piece of music such as melodic, harmonic or rhythmic structure.  Searches can then be carried out to find pieces of music that are similar in terms of their melodic, harmonic and/or rhythmic characteristics. This type of analysis can also be used in music to determine characteristics like beats per minute (BPM), musical key, and musical structure, information that is employed to classify music. Music downloading sites that categorize music by genre uses audio mining to organize the music.
  • 23.  Research paper: “Lets Hear It For Audio Mining” by Neal Leavitt. Research paper: “Tendencies, Perspectives, and Opportunities of Musical Audio-Mining” by Ghent University, Belgium. wikipedia.org/wiki/Audio_mining