SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
mjav
                  meong
     h




                miaou     mja
    m




                                u
ea
m




                                             m
                                             ia
                                     m



                                               uw
                                      na
    meow




                                         u
                                    mi



                                             m
            v




                                      ao



                                             já
           ya




                                                    ?
           mi
Automatic	
  Language	
  
   Identification	
  	
  
         	
  
       (LiD)	
  
      Alexander	
  Hosford	
  

         @awhosford	
  
The	
  LiD	
  Problem	
  


•  The	
  identification	
  of	
  a	
  given	
  spoken	
  language	
  
   from	
  a	
  speech	
  signal	
  

•  94%	
  of	
  global	
  population	
  speak	
  only	
  6%	
  of	
  
   world	
  languages	
  
Real	
  World	
  Situations	
  


•  Automation	
  of	
  LiD	
  is	
  desirable	
  

•  Offers	
  many	
  benefits	
  to	
  international	
  service	
  
   industries	
  

•  Hotels,	
  Airports,	
  Global	
  Call	
  Centres	
  
Language	
  Differences	
  
            	
  
•  Languages	
  contain	
  information	
  that	
  makes	
  
  one	
  discernable	
  from	
  the	
  other;	
  
   –  Phonemes	
  

   –  Prosody	
  

   –  Phonotactics	
  

   –  Syntax	
  
Human	
  Abilities	
  

•  Bias	
  towards	
  native	
  language	
  arises	
  in	
  infancy	
  
•  Prosodic	
  features	
  some	
  of	
  the	
  first	
  cues	
  to	
  be	
  
       recognised	
  
•  Humans	
  can	
  make	
  a	
  reasonable	
  estimate	
  on	
  language	
  
       heard	
  within	
  3-­‐4	
  seconds	
  of	
  audition	
  
•  Even	
  unfamiliar	
  languages	
  may	
  be	
  plausibly	
  judged	
  
	
  
Previous	
  Attempts	
  

•  Attempts	
  as	
  early	
  as	
  1974	
  –	
  USAF	
  work,	
  therefore	
  
   classified.	
  
•  Methodological	
  Investigations	
  as	
  early	
  as	
  1977	
  (House	
  &	
  
   Neuburg)	
  
•  Studies	
  for	
  the	
  most	
  part	
  center	
  on	
  phonotactic	
  constrains	
  
   and	
  phoneme	
  modeling	
  
•  Raw	
  acoustic	
  waveforms	
  have	
  been	
  visited	
  (Kwasny	
  1993)	
  
A	
  Simpler	
  Approach?	
  

•  Phonotactic	
  approaches	
  require	
  expert	
  linguistic	
  
   knowledge	
  
•  Phoneme	
  and	
  phonotactic	
  modeling	
  time	
  
   consuming	
  
•  Given	
  the	
  speed	
  of	
  human	
  LiD	
  abilities,	
  
   discrimination	
  most	
  likely	
  based	
  on	
  acoustic	
  
   features	
  
System	
  Overview	
  


                       SuperCollider


                                       MFCC Vectors
                                                                    SuperCollider
                                                                    SystemCmd
Pre-processed Speech        Onset                      ARFF File
                                                                                    WEKA Tools
                                           nPVI        Generation
      Utterance            Detection


                                       Pitch Contour
Feature	
  Extraction	
  
•  Spectral	
  Information	
  –	
  MFCC	
  Vectors	
  
•  Pitch	
  contour	
  information	
  
•  Handled	
  by	
  SCMIR	
  Library	
  (Collins,	
  2010)	
  
•  Speech	
  Rhythm	
  –	
  Normalised	
  Pairwise	
  Variability	
  
   Index	
  (nPVI)	
  

                           Feature	
  Type	
       Implemented	
  Measure	
  
                         Spectral	
  Content	
       MFCC	
  Vector	
  uGen	
  
                           Pitch	
  Contour	
            Tartini	
  uGen	
  
                         Speech	
  Rhythm	
             nPVI	
  function	
  
Classification	
  


•  Handled	
  by	
  the	
  WEKA	
  toolkit	
  

•  Built	
  in	
  Multilayer	
  Perceptron	
  

•  Called	
  from	
  the	
  command	
  line	
  through	
  a	
  
   SuperCollider	
  system	
  command	
  
Comparisons	
  Made	
  


•  66	
  language	
  pairs	
  from	
  12	
  languages	
  

•  A	
  comparison	
  within	
  language	
  families	
  

•  A	
  comparison	
  of	
  all	
  12	
  languages	
  

•  Averaged	
  &	
  segmented	
  data	
  
Results	
  Within	
  Families	
  
100.00	
  


 90.00	
  


 80.00	
  


 70.00	
  
                                                                                                           Germanic	
  

                                                                                                           Romantic	
  
 60.00	
  
                                                                                                           Slavic	
  

                                                                                                           SinoAltaic	
  
 50.00	
  
                                                                                                           Mean	
  


 40.00	
  


 30.00	
  


 20.00	
  
             4	
  MFCC	
     13	
  MFCC	
  Tartini	
     41	
  MFCC	
  Tartini	
     All	
  Features	
  



                             *Data	
  averaged	
  across	
  files	
  
Results	
  Within	
  Families	
  
100.00	
  


 90.00	
  


 80.00	
  


 70.00	
                                                                                               Germanic	
  

                                                                                                       Romantic	
  
 60.00	
                                                                                               Slavic	
  

                                                                                                       SinoAltaic	
  
 50.00	
                                                                                               Mean	
  


 40.00	
  


 30.00	
  


 20.00	
  
              4	
  MFCC	
                 13	
  MFCC	
  Tartini	
          41	
  MFCC	
  Tartini	
  


                              *Data	
  in	
  5	
  second	
  segments	
  
                              	
  
Results	
  From	
  All	
  Languages	
  
100.00	
  


 90.00	
  


 80.00	
  


 70.00	
  


 60.00	
                                                                                                                                                                                                                        Averaged	
  

                                                                                                                                                                                                                                Mean	
  
 50.00	
                                                                                                                                                                                                                        5s	
  Segments	
  

                                                                                                                                                                                                                                Mean	
  
 40.00	
  


 30.00	
  


 20.00	
  


 10.00	
  
              4	
  MFCC	
     4	
  MFCC	
  Tartini	
   4	
  MFCC	
  Tartini	
     13	
  MFCC	
     13	
  MFCC	
  Tartini	
   13	
  MFCC	
  Tartini	
     41	
  MFCC	
     41	
  MFCC	
  Tartini	
   41	
  MFCC	
  Tartini	
  
                                                              nPVI	
                                                                 nPVI	
                                                                 nPVI	
  
Extensions	
  

•  nPVI	
  function	
  to	
  use	
  vowel	
  onsets	
  

•  Phonemic	
  Segmentation	
  

•  A	
  Larger	
  dataset	
  

•  Better	
  Efficiency	
  

•  Real-­‐time	
  operation	
  
Robustness	
  

•  Real	
  world	
  signals	
  are	
  very	
  different	
  from	
  
   processed	
  ‘clean’	
  data.	
  

•  ‘Ideal’	
  LiD	
  systems	
  –	
  independent	
  &	
  robust	
  

•  A	
  need	
  to	
  analyse	
  only	
  the	
  part	
  of	
  the	
  signal	
  
   that	
  matters.	
  
CASA	
  

•  A	
  computational	
  modeling	
  of	
  human	
  
  ‘Auditory	
  Scene	
  Analysis’	
  (Bregman	
  1990)	
  

•  Separation	
  of	
  signal	
  into	
  component	
  parts	
  
  and	
  reconstitution	
  into	
  meaningful	
  ‘streams’	
  
Using	
  Noisy	
  Signals	
  



  I’m	
  open	
  to	
  suggestions!	
  
alexanderhosford@googlemail.com	
  
          07814	
  692	
  549	
  
          @awhosford	
  

Contenu connexe

En vedette

แรงดันในของเหลว
แรงดันในของเหลวแรงดันในของเหลว
แรงดันในของเหลว
tewin2553
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
charlyheus
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
charlyheus
 
поездка на родину
поездка на родинупоездка на родину
поездка на родину
zizari
 
Edge Amsterdam Profile
Edge Amsterdam ProfileEdge Amsterdam Profile
Edge Amsterdam Profile
charlyheus
 
Automatic Language Identification
Automatic Language IdentificationAutomatic Language Identification
Automatic Language Identification
bigshum
 

En vedette (20)

subrat
 subrat subrat
subrat
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Verb to be
Verb to beVerb to be
Verb to be
 
แรงดันในของเหลว
แรงดันในของเหลวแรงดันในของเหลว
แรงดันในของเหลว
 
Aids to creativity
Aids to creativityAids to creativity
Aids to creativity
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
 
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePointSharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
 
TRABAJO GRUPAL
TRABAJO GRUPALTRABAJO GRUPAL
TRABAJO GRUPAL
 
Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010
 
поездка на родину
поездка на родинупоездка на родину
поездка на родину
 
Edge Amsterdam Profile
Edge Amsterdam ProfileEdge Amsterdam Profile
Edge Amsterdam Profile
 
Conversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pagesConversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pages
 
Automatic Language Identification
Automatic Language IdentificationAutomatic Language Identification
Automatic Language Identification
 
01 introduction image processing analysis
01 introduction image processing analysis01 introduction image processing analysis
01 introduction image processing analysis
 
3D visualisation of medical images
3D visualisation of medical images3D visualisation of medical images
3D visualisation of medical images
 
Media text analysis
Media text analysisMedia text analysis
Media text analysis
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)
 
Mining the CPLEX Node Log for Faster MIP Performance
Mining the CPLEX Node Log for Faster MIP PerformanceMining the CPLEX Node Log for Faster MIP Performance
Mining the CPLEX Node Log for Faster MIP Performance
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Automatic Language Identification

  • 1. mjav meong h miaou mja m u ea m m ia m uw na meow u mi m v ao já ya ? mi
  • 2. Automatic  Language   Identification       (LiD)   Alexander  Hosford   @awhosford  
  • 3. The  LiD  Problem   •  The  identification  of  a  given  spoken  language   from  a  speech  signal   •  94%  of  global  population  speak  only  6%  of   world  languages  
  • 4. Real  World  Situations   •  Automation  of  LiD  is  desirable   •  Offers  many  benefits  to  international  service   industries   •  Hotels,  Airports,  Global  Call  Centres  
  • 5. Language  Differences     •  Languages  contain  information  that  makes   one  discernable  from  the  other;   –  Phonemes   –  Prosody   –  Phonotactics   –  Syntax  
  • 6. Human  Abilities   •  Bias  towards  native  language  arises  in  infancy   •  Prosodic  features  some  of  the  first  cues  to  be   recognised   •  Humans  can  make  a  reasonable  estimate  on  language   heard  within  3-­‐4  seconds  of  audition   •  Even  unfamiliar  languages  may  be  plausibly  judged    
  • 7. Previous  Attempts   •  Attempts  as  early  as  1974  –  USAF  work,  therefore   classified.   •  Methodological  Investigations  as  early  as  1977  (House  &   Neuburg)   •  Studies  for  the  most  part  center  on  phonotactic  constrains   and  phoneme  modeling   •  Raw  acoustic  waveforms  have  been  visited  (Kwasny  1993)  
  • 8. A  Simpler  Approach?   •  Phonotactic  approaches  require  expert  linguistic   knowledge   •  Phoneme  and  phonotactic  modeling  time   consuming   •  Given  the  speed  of  human  LiD  abilities,   discrimination  most  likely  based  on  acoustic   features  
  • 9. System  Overview   SuperCollider MFCC Vectors SuperCollider SystemCmd Pre-processed Speech Onset ARFF File WEKA Tools nPVI Generation Utterance Detection Pitch Contour
  • 10. Feature  Extraction   •  Spectral  Information  –  MFCC  Vectors   •  Pitch  contour  information   •  Handled  by  SCMIR  Library  (Collins,  2010)   •  Speech  Rhythm  –  Normalised  Pairwise  Variability   Index  (nPVI)   Feature  Type   Implemented  Measure   Spectral  Content   MFCC  Vector  uGen   Pitch  Contour   Tartini  uGen   Speech  Rhythm   nPVI  function  
  • 11. Classification   •  Handled  by  the  WEKA  toolkit   •  Built  in  Multilayer  Perceptron   •  Called  from  the  command  line  through  a   SuperCollider  system  command  
  • 12. Comparisons  Made   •  66  language  pairs  from  12  languages   •  A  comparison  within  language  families   •  A  comparison  of  all  12  languages   •  Averaged  &  segmented  data  
  • 13. Results  Within  Families   100.00   90.00   80.00   70.00   Germanic   Romantic   60.00   Slavic   SinoAltaic   50.00   Mean   40.00   30.00   20.00   4  MFCC   13  MFCC  Tartini   41  MFCC  Tartini   All  Features   *Data  averaged  across  files  
  • 14. Results  Within  Families   100.00   90.00   80.00   70.00   Germanic   Romantic   60.00   Slavic   SinoAltaic   50.00   Mean   40.00   30.00   20.00   4  MFCC   13  MFCC  Tartini   41  MFCC  Tartini   *Data  in  5  second  segments    
  • 15. Results  From  All  Languages   100.00   90.00   80.00   70.00   60.00   Averaged   Mean   50.00   5s  Segments   Mean   40.00   30.00   20.00   10.00   4  MFCC   4  MFCC  Tartini   4  MFCC  Tartini   13  MFCC   13  MFCC  Tartini   13  MFCC  Tartini   41  MFCC   41  MFCC  Tartini   41  MFCC  Tartini   nPVI   nPVI   nPVI  
  • 16. Extensions   •  nPVI  function  to  use  vowel  onsets   •  Phonemic  Segmentation   •  A  Larger  dataset   •  Better  Efficiency   •  Real-­‐time  operation  
  • 17. Robustness   •  Real  world  signals  are  very  different  from   processed  ‘clean’  data.   •  ‘Ideal’  LiD  systems  –  independent  &  robust   •  A  need  to  analyse  only  the  part  of  the  signal   that  matters.  
  • 18. CASA   •  A  computational  modeling  of  human   ‘Auditory  Scene  Analysis’  (Bregman  1990)   •  Separation  of  signal  into  component  parts   and  reconstitution  into  meaningful  ‘streams’  
  • 19. Using  Noisy  Signals   I’m  open  to  suggestions!  
  • 20. alexanderhosford@googlemail.com   07814  692  549   @awhosford