SlideShare une entreprise Scribd logo
1  sur  9
rospeex

A Cloud-based speech communication toolkit for ROS
2013/12/13

Komei Sugiura
National Institute of Information and Communication Technology, Japan
komei.sugiura@nict.go.jp
ROS (Robot Operating System)
• ROS: middleware for robots
– Version 1.0 released in 2010
– Global de facto standard
– From driver and package management to learning and
visualization

2
Speech communication toolkit for ROS

rospeex

• ROS compatible
• Speech recognition using VoiceTra engine
• Other functionalities
– Noise reduction, non-monologues speech synthesis
Conventional packages
rospeex
Speech recognition/ Sphinx, festival, Julius
VoiceTra engine
synthesis
(or commercial tools)
(or third-party engines)
Engine
Stand alone
Cloud-based
Language
Single language
ja, en, zh, ko
3
Position in Cloud Robotics
• Cloud robotics [James Kuffner@Google, 2011]
– Manipulation using Google Goggles [Kehoe+ 2013]
– Knowledge sharing based on RoboEarth [Tenorth+ 2012]
– Speech communication for robots
rospeex
Cloud-based

Incompatible

Commercial systems
(Nuance, ToSpeak,
AmiVoice Cloud, ..)

rospeex

Many

OpenHRI, HARK,
PocketSphinx, Festival
Stand-alone

Robot middleware
compatible
Quadrilingual communication using rospeex

5
rospeex provides speech recognition/synthesis,
user constructs dialogue processing
Input from other modules
(Sensors, recognized obj, etc)

Speech
input

Noise
reduction
VAD

Task manager

Output to other modules
(Actuators, learning, etc)

Speech module
Speech
recognition

Dialogue
processing

Speech
synthesis

Speech
output

Speech recognition
& synthesis servers

Provided by
rospeex

Provided by
the user

Provided by
third parties

Speech recognition
& synthesis servers
Non-monologue speech synthesis for robots
• Reading-style robot voice
– Monotonous, unnatural and unfriendly
– Hard to realize that the robot is asking
a question

XIMERA 3
(Text reading)

Voice talent

• Conventional text-to-speech (TTS) systems
are not optimized for communication

7
Demo
http://komeisugiura.jp/software/nm_tts.html

8
Using speech recognition/synthesis without ROS
• Send JSON file to the server
– Recognition http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSS
– Synthesis
http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSR
• Sample codes (JavaScript, Python, C++) are available
Non-monologue speech synthesis
{ "method":"recognize",
"params":[
"ja",
{“audio”:“base64-encoded wav",
"audioType":"audio/x-wav",
"voiceType":"*"
}]}
Recognition

Search
{ “method” : “speak”,
"params" : [
"ja",
"こんにちは",
"*",
"audio/x-wav"
]}

Synthesis

Contenu connexe

Similaire à rospeex: a cloud-based speech communication toolkit for ROS

RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI Intro
Yosuke Matsusaka
 
1) Operating systems provide a platform where there is strategic
1) Operating systems provide a platform where there is strategic1) Operating systems provide a platform where there is strategic
1) Operating systems provide a platform where there is strategic
AgripinaBeaulieuyw
 
flutter-general-report.docx
flutter-general-report.docxflutter-general-report.docx
flutter-general-report.docx
KuntalSasmal1
 

Similaire à rospeex: a cloud-based speech communication toolkit for ROS (20)

Top Reasons to Choose Flutter App Development Company.pdf
Top Reasons to Choose Flutter App Development Company.pdfTop Reasons to Choose Flutter App Development Company.pdf
Top Reasons to Choose Flutter App Development Company.pdf
 
Key Features Of The Pseudo Code
Key Features Of The Pseudo CodeKey Features Of The Pseudo Code
Key Features Of The Pseudo Code
 
The Ring programming language version 1.3 book - Part 4 of 88
The Ring programming language version 1.3 book - Part 4 of 88The Ring programming language version 1.3 book - Part 4 of 88
The Ring programming language version 1.3 book - Part 4 of 88
 
The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30
 
The Ring programming language version 1.4.1 book - Part 2 of 31
The Ring programming language version 1.4.1 book - Part 2 of 31The Ring programming language version 1.4.1 book - Part 2 of 31
The Ring programming language version 1.4.1 book - Part 2 of 31
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI Intro
 
Google Fuchsia
Google FuchsiaGoogle Fuchsia
Google Fuchsia
 
Lua - Programming Language
Lua - Programming LanguageLua - Programming Language
Lua - Programming Language
 
1) Operating systems provide a platform where there is strategic
1) Operating systems provide a platform where there is strategic1) Operating systems provide a platform where there is strategic
1) Operating systems provide a platform where there is strategic
 
What Are Your Options If You Can’t Use Flutter_.pdf
What Are Your Options If You Can’t Use Flutter_.pdfWhat Are Your Options If You Can’t Use Flutter_.pdf
What Are Your Options If You Can’t Use Flutter_.pdf
 
The new web early adopter program is now open with flutter
The new web early adopter program is now open with flutterThe new web early adopter program is now open with flutter
The new web early adopter program is now open with flutter
 
What is flutter app development
What is flutter app developmentWhat is flutter app development
What is flutter app development
 
What is flutter app development
What is flutter app developmentWhat is flutter app development
What is flutter app development
 
Programming landuages
Programming landuagesProgramming landuages
Programming landuages
 
ROS2 on WebOS - Brian Shin(LG)
ROS2 on WebOS - Brian Shin(LG)ROS2 on WebOS - Brian Shin(LG)
ROS2 on WebOS - Brian Shin(LG)
 
INTRODUCTION TO FLUTTER.pdf
INTRODUCTION TO FLUTTER.pdfINTRODUCTION TO FLUTTER.pdf
INTRODUCTION TO FLUTTER.pdf
 
flutter-general-report.docx
flutter-general-report.docxflutter-general-report.docx
flutter-general-report.docx
 
Top ten languages of Mobile Devices 2017
Top ten languages of Mobile Devices 2017Top ten languages of Mobile Devices 2017
Top ten languages of Mobile Devices 2017
 
Android system
Android systemAndroid system
Android system
 
Top Benefits of Flutter App Development Services - An Insightful Blog
Top Benefits of Flutter App Development Services - An Insightful BlogTop Benefits of Flutter App Development Services - An Insightful Blog
Top Benefits of Flutter App Development Services - An Insightful Blog
 

Plus de Komei Sugiura

SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
Komei Sugiura
 
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けてロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
Komei Sugiura
 
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
Komei Sugiura
 
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
Komei Sugiura
 
実世界の意味を扱う理論と機械知能の構築
実世界の意味を扱う理論と機械知能の構築実世界の意味を扱う理論と機械知能の構築
実世界の意味を扱う理論と機械知能の構築
Komei Sugiura
 
Language acquisition framework for robots: From grounded language acquisition...
Language acquisition framework for robots: From grounded language acquisition...Language acquisition framework for robots: From grounded language acquisition...
Language acquisition framework for robots: From grounded language acquisition...
Komei Sugiura
 
Introduction to RoboCup@Home
Introduction to RoboCup@HomeIntroduction to RoboCup@Home
Introduction to RoboCup@Home
Komei Sugiura
 
ロボカップ@ホーム入門
ロボカップ@ホーム入門ロボカップ@ホーム入門
ロボカップ@ホーム入門
Komei Sugiura
 

Plus de Komei Sugiura (19)

ロボティクスにおける言語の利活用
ロボティクスにおける言語の利活用ロボティクスにおける言語の利活用
ロボティクスにおける言語の利活用
 
生活支援ロボットにおける 大規模データ収集に向けて
生活支援ロボットにおける大規模データ収集に向けて生活支援ロボットにおける大規模データ収集に向けて
生活支援ロボットにおける 大規模データ収集に向けて
 
生活支援ロボットのマルチモーダル言語理解技術
生活支援ロボットのマルチモーダル言語理解技術生活支援ロボットのマルチモーダル言語理解技術
生活支援ロボットのマルチモーダル言語理解技術
 
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
 
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けてロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
 
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
 
言葉や能力の壁を越えるデータ指向知能
言葉や能力の壁を越えるデータ指向知能言葉や能力の壁を越えるデータ指向知能
言葉や能力の壁を越えるデータ指向知能
 
20160907rsj16ロボット聴覚OS
20160907rsj16ロボット聴覚OS20160907rsj16ロボット聴覚OS
20160907rsj16ロボット聴覚OS
 
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
 
20160221statistic imitation learning and human-robot communication
20160221statistic imitation learning and human-robot communication20160221statistic imitation learning and human-robot communication
20160221statistic imitation learning and human-robot communication
 
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
 
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
 
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
 
実世界の意味を扱う理論と機械知能の構築
実世界の意味を扱う理論と機械知能の構築実世界の意味を扱う理論と機械知能の構築
実世界の意味を扱う理論と機械知能の構築
 
20151129インテリジェントホームロボティクス研究会
20151129インテリジェントホームロボティクス研究会20151129インテリジェントホームロボティクス研究会
20151129インテリジェントホームロボティクス研究会
 
Japan Robot Week 2014けいはんなロボットフォーラム
Japan Robot Week 2014けいはんなロボットフォーラムJapan Robot Week 2014けいはんなロボットフォーラム
Japan Robot Week 2014けいはんなロボットフォーラム
 
Language acquisition framework for robots: From grounded language acquisition...
Language acquisition framework for robots: From grounded language acquisition...Language acquisition framework for robots: From grounded language acquisition...
Language acquisition framework for robots: From grounded language acquisition...
 
Introduction to RoboCup@Home
Introduction to RoboCup@HomeIntroduction to RoboCup@Home
Introduction to RoboCup@Home
 
ロボカップ@ホーム入門
ロボカップ@ホーム入門ロボカップ@ホーム入門
ロボカップ@ホーム入門
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

rospeex: a cloud-based speech communication toolkit for ROS

  • 1. rospeex A Cloud-based speech communication toolkit for ROS 2013/12/13 Komei Sugiura National Institute of Information and Communication Technology, Japan komei.sugiura@nict.go.jp
  • 2. ROS (Robot Operating System) • ROS: middleware for robots – Version 1.0 released in 2010 – Global de facto standard – From driver and package management to learning and visualization 2
  • 3. Speech communication toolkit for ROS rospeex • ROS compatible • Speech recognition using VoiceTra engine • Other functionalities – Noise reduction, non-monologues speech synthesis Conventional packages rospeex Speech recognition/ Sphinx, festival, Julius VoiceTra engine synthesis (or commercial tools) (or third-party engines) Engine Stand alone Cloud-based Language Single language ja, en, zh, ko 3
  • 4. Position in Cloud Robotics • Cloud robotics [James Kuffner@Google, 2011] – Manipulation using Google Goggles [Kehoe+ 2013] – Knowledge sharing based on RoboEarth [Tenorth+ 2012] – Speech communication for robots rospeex Cloud-based Incompatible Commercial systems (Nuance, ToSpeak, AmiVoice Cloud, ..) rospeex Many OpenHRI, HARK, PocketSphinx, Festival Stand-alone Robot middleware compatible
  • 6. rospeex provides speech recognition/synthesis, user constructs dialogue processing Input from other modules (Sensors, recognized obj, etc) Speech input Noise reduction VAD Task manager Output to other modules (Actuators, learning, etc) Speech module Speech recognition Dialogue processing Speech synthesis Speech output Speech recognition & synthesis servers Provided by rospeex Provided by the user Provided by third parties Speech recognition & synthesis servers
  • 7. Non-monologue speech synthesis for robots • Reading-style robot voice – Monotonous, unnatural and unfriendly – Hard to realize that the robot is asking a question XIMERA 3 (Text reading) Voice talent • Conventional text-to-speech (TTS) systems are not optimized for communication 7
  • 9. Using speech recognition/synthesis without ROS • Send JSON file to the server – Recognition http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSS – Synthesis http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSR • Sample codes (JavaScript, Python, C++) are available Non-monologue speech synthesis { "method":"recognize", "params":[ "ja", {“audio”:“base64-encoded wav", "audioType":"audio/x-wav", "voiceType":"*" }]} Recognition Search { “method” : “speak”, "params" : [ "ja", "こんにちは", "*", "audio/x-wav" ]} Synthesis