SlideShare une entreprise Scribd logo
1  sur  12
*
    Nick Campbell
    Speech Communication Lab
    Trinity College Dublin, Ireland
*
    * TCD – Stokes Professor (Dublin)
    * CNGL – PI – Delivery & Interaction
    * ELRA – board member / VP – speech
    * ISCA – board member – workshops
    * IEEE – Sig Proc Soc - SLTC member
    * ATR/NiCT – research director(Japan)
    * Speech Prosody 2014 (Dublin) host

        * Speech scientist/researcher/corpus analyst
* AT&T Bell Labs
    * The ideas people – think ‘BIG’

* IBM UK Scientific Centre
    * The corpus people – ‘collect it all’

* ATR basic telecom research
    * The fundamentals - learn how to ‘infer’ from it


*
* we used to be considered BIG – speech data
  (and now multimedia) gobbled up memory
* I collected 1500 hours of everyday chat/daily
  conversations in 2000 – (@1GB per minute) -
  took 5-years to process!

* now Apple, Google, Ms, .. get that each minute
       (but the secret is in the metadata)

* we need accessible data & tools for everybody!

   *
* but we need to manage privacy issues first!




  *
* and we need a way to protect IP as well

* written publications have ISBN standard
* work is now underway (cf ELRA & COCOSDA) to
  institute ISLRN for Language Resources
* researchers need to get credit for corpora as
  well as for publishing research results
* The community needs a way to identify,
  acknowledge, attribute, and reference data



 *
* tools for processing speech & multimodal data

* htk, hts, R, etc . . .   not simple to use


* little consensus on what features to encode

* manual bootstrap – much too time-consuming!


*
* social interaction

* personal idiosyncracies

* group dynamics – multimodal data (TB/hr)

* issues of robustness / domain specificity /
 privacy / storage & archiving / redistribution


     *
context analytics:


* cultural and language-specific needs
* multimodal – multimedia – multilingual
* tools for ‘less-well-supported’ languages

* e.g., U-STAR consortium for speech research –
 sharing tools & data & knowledge for research



     *
* European Language Resources Association
* COCOSDA – int’l coordinating committee
* IEEE SLTC, ISCA SIGS, there are places to go

    * but are they ready for really BIG data?
               perhaps not yet . . .




                          *
* curricula prepare people

* what standards to rely on?
* what resources available?
* what features to extract?
* what tools to work with?
* what use to put it to?
* what info to hide?
* what to do next?

                               *
*

Contenu connexe

En vedette

Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)EUCLID project
 
Annotation Processor, trésor caché de la JVM
Annotation Processor, trésor caché de la JVMAnnotation Processor, trésor caché de la JVM
Annotation Processor, trésor caché de la JVMRaphaël Brugier
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on AndroidEUCLID project
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUGConférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUGZenika
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
 

En vedette (11)

Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
Comment manager des geeks - Devoxx 2015
Comment manager des geeks - Devoxx 2015Comment manager des geeks - Devoxx 2015
Comment manager des geeks - Devoxx 2015
 
Annotation Processor, trésor caché de la JVM
Annotation Processor, trésor caché de la JVMAnnotation Processor, trésor caché de la JVM
Annotation Processor, trésor caché de la JVM
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUGConférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 

Similaire à Speech Technology and Big Data

GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016Dag Endresen
 
IWST 2013: Intro
IWST 2013: IntroIWST 2013: Intro
IWST 2013: IntroESUG
 
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLKathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLdri_ireland
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...dri_ireland
 
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...iUser2011 Keynote: The Personal Information Environment beyond the Personal C...
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...Alan Dix
 
dbGLOVE (presentation at Silicon Valley Personal Health Technology)
dbGLOVE (presentation at Silicon Valley Personal Health Technology)dbGLOVE (presentation at Silicon Valley Personal Health Technology)
dbGLOVE (presentation at Silicon Valley Personal Health Technology)QIRIS
 
Six Use Cases for Edinburgh DataShare
Six Use Cases for Edinburgh DataShareSix Use Cases for Edinburgh DataShare
Six Use Cases for Edinburgh DataShareRobin Rice
 
Using islandora to build digital collections - 2016.01.29 OLA 2016
Using islandora to build digital collections - 2016.01.29 OLA 2016Using islandora to build digital collections - 2016.01.29 OLA 2016
Using islandora to build digital collections - 2016.01.29 OLA 2016KellliBee
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteDeep Kayal
 
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...Michael Hausenblas
 
What's the fuss about all this metadata?
What's the fuss about all this metadata?What's the fuss about all this metadata?
What's the fuss about all this metadata?Sara Sterkenburg
 
An information environment for neuroscientists
An information environment for neuroscientistsAn information environment for neuroscientists
An information environment for neuroscientistsDavid Wallom
 
Ensuring Continuing Access to Online Scholarly Resources
Ensuring Continuing Access to Online Scholarly ResourcesEnsuring Continuing Access to Online Scholarly Resources
Ensuring Continuing Access to Online Scholarly ResourcesEDINA, University of Edinburgh
 
Digital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework ProgrammeDigital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework Programmelocloud
 

Similaire à Speech Technology and Big Data (20)

GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
Born Digital Archives
Born Digital ArchivesBorn Digital Archives
Born Digital Archives
 
Importance of Database in Library
Importance of Database in LibraryImportance of Database in Library
Importance of Database in Library
 
IWST 2013: Intro
IWST 2013: IntroIWST 2013: Intro
IWST 2013: Intro
 
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLKathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
 
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...iUser2011 Keynote: The Personal Information Environment beyond the Personal C...
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...
 
dbGLOVE (presentation at Silicon Valley Personal Health Technology)
dbGLOVE (presentation at Silicon Valley Personal Health Technology)dbGLOVE (presentation at Silicon Valley Personal Health Technology)
dbGLOVE (presentation at Silicon Valley Personal Health Technology)
 
Takeda 101214short-d
Takeda 101214short-dTakeda 101214short-d
Takeda 101214short-d
 
Six Use Cases for Edinburgh DataShare
Six Use Cases for Edinburgh DataShareSix Use Cases for Edinburgh DataShare
Six Use Cases for Edinburgh DataShare
 
Using islandora to build digital collections - 2016.01.29 OLA 2016
Using islandora to build digital collections - 2016.01.29 OLA 2016Using islandora to build digital collections - 2016.01.29 OLA 2016
Using islandora to build digital collections - 2016.01.29 OLA 2016
 
Digital Archive of Knowledge for Sharing and Re-using
Digital Archive of Knowledge for Sharing and Re-usingDigital Archive of Knowledge for Sharing and Re-using
Digital Archive of Knowledge for Sharing and Re-using
 
Challenges for Linked Data in Japan
Challenges for Linked Data in JapanChallenges for Linked Data in Japan
Challenges for Linked Data in Japan
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
 
What's the fuss about all this metadata?
What's the fuss about all this metadata?What's the fuss about all this metadata?
What's the fuss about all this metadata?
 
An information environment for neuroscientists
An information environment for neuroscientistsAn information environment for neuroscientists
An information environment for neuroscientists
 
Ensuring Continuing Access to Online Scholarly Resources
Ensuring Continuing Access to Online Scholarly ResourcesEnsuring Continuing Access to Online Scholarly Resources
Ensuring Continuing Access to Online Scholarly Resources
 
Digital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework ProgrammeDigital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework Programme
 

Dernier

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Dernier (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Speech Technology and Big Data

  • 1. * Nick Campbell Speech Communication Lab Trinity College Dublin, Ireland
  • 2. * * TCD – Stokes Professor (Dublin) * CNGL – PI – Delivery & Interaction * ELRA – board member / VP – speech * ISCA – board member – workshops * IEEE – Sig Proc Soc - SLTC member * ATR/NiCT – research director(Japan) * Speech Prosody 2014 (Dublin) host * Speech scientist/researcher/corpus analyst
  • 3. * AT&T Bell Labs * The ideas people – think ‘BIG’ * IBM UK Scientific Centre * The corpus people – ‘collect it all’ * ATR basic telecom research * The fundamentals - learn how to ‘infer’ from it *
  • 4. * we used to be considered BIG – speech data (and now multimedia) gobbled up memory * I collected 1500 hours of everyday chat/daily conversations in 2000 – (@1GB per minute) - took 5-years to process! * now Apple, Google, Ms, .. get that each minute (but the secret is in the metadata) * we need accessible data & tools for everybody! *
  • 5. * but we need to manage privacy issues first! *
  • 6. * and we need a way to protect IP as well * written publications have ISBN standard * work is now underway (cf ELRA & COCOSDA) to institute ISLRN for Language Resources * researchers need to get credit for corpora as well as for publishing research results * The community needs a way to identify, acknowledge, attribute, and reference data *
  • 7. * tools for processing speech & multimodal data * htk, hts, R, etc . . . not simple to use * little consensus on what features to encode * manual bootstrap – much too time-consuming! *
  • 8. * social interaction * personal idiosyncracies * group dynamics – multimodal data (TB/hr) * issues of robustness / domain specificity / privacy / storage & archiving / redistribution *
  • 9. context analytics: * cultural and language-specific needs * multimodal – multimedia – multilingual * tools for ‘less-well-supported’ languages * e.g., U-STAR consortium for speech research – sharing tools & data & knowledge for research *
  • 10. * European Language Resources Association * COCOSDA – int’l coordinating committee * IEEE SLTC, ISCA SIGS, there are places to go * but are they ready for really BIG data? perhaps not yet . . . *
  • 11. * curricula prepare people * what standards to rely on? * what resources available? * what features to extract? * what tools to work with? * what use to put it to? * what info to hide? * what to do next? *
  • 12. *