[DL Hacks]Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition

•

1 j'aime•661 vues

Deep Learning JP

2018/05/07 Deep Learning JP: http://deeplearning.jp/hacks/

Technologie

Power-Normalized Cepstral
Coefﬁcients (PNCC)
for Robust Speech Recognition
東京大学工学部システム創成学科Cコース
B3 中村泰貴

自己紹介
・東京大学工学部システム創成学科Cコース B3 中村泰貴
・音声(深層学習を絡めた)や信号処理の技術に興味あります
・今回が初回発表です...

書誌情報
・論文名
・Power-Normalized Cepstral Coeﬃcients (PNCC)
for Robust Speech Recognition
・著者
・Chanwoo Kim(Google)
・Richard M Stern(Carnegie Mellon University)
・公開日
・2016/06/24
・論文URL
・http://www.cs.cmu.edu/ robust/Papers/
OnlinePNCC_V25.pdf

背景
・音声認識で用いられる特徴抽出
・MFCCかmelspectrogramがほとんど
・別な特徴抽出方法はないのか...
・Robust性も欲しい！！
・試してみる価値はある
deep speech2
PNCC!!!

PNCCとは
・主な特徴
・MFCCなどは対数を用いているのに対し、
PNCCは冪乗則を用いる
・雑音低減させるasymmetric ﬁltering
・様々なタイプの雑音環境下、エコーがかかる環境下で
MFCCやPLPより認識精度が向上
・従来の特徴抽出との差異
・計算コストがよりかかる
・clean音声でも認識精度が落ちない

まずは結果から...
LibriSpeech dev-cleanの音声に
SNR=4[db]ほどのノイズを環境雑音を付加

Gammatone Frequency Integration
・Filtabank
http://aidiary.hatenablog.com/
entry/20120225/1330179868

Medium-Time Power Calculation
・M = 2
・Pの移動平均
・ガウスノイズに効果的

Asymmetric Noise Suppression
ﬂoor level noise を検出

Asymmetric Noise Suppression
有声音などの励起関数によって
駆動されていないと思われる
信号にlowpass ﬁlteringを
適用すると認識精度が向上する
この動作は複数回のローパスフィルタに
なるため音声のパワー係数をぼかし、
認識精度を低下させるため、音声セグメントに
対して適用しない

Asymmetric Noise Suppression
信号がそれ自身の下側崩落線の定数倍より
小さいならばそれは励起されていないもの
と考える
c = 2 がホワイトノイズに対して
もっとも効果的

Temporal masking
最終的なR[m, l]の値は...
R[m, l] = Rsp[m, l] (excitation)
R[m, l] = Qf[m, l] (non-excitaion)
となる

Power Function nonlinearity
MFCCによる処理
PNCCによる処理

EXPERIMENTAL RESULTS
(a)white noise
(b)street noise
(c) background
music
(d) interfering
speech
(e) artiﬁcial
reverberation

Recommandé

自然言語処理 BERTに関する論文紹介とまとめ

KeisukeNakazono

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

Deep Learning JP

【DL輪読会】事前学習用データセットについて

Deep Learning JP

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...

Deep Learning JP

【DL輪読会】Zero-Shot Dual-Lens Super-Resolution

Deep Learning JP

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv

Deep Learning JP

【DL輪読会】マルチモーダル LLM

Deep Learning JP

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...

Deep Learning JP

Recommandé

自然言語処理 BERTに関する論文紹介とまとめ

KeisukeNakazono

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

Deep Learning JP

【DL輪読会】事前学習用データセットについて

Deep Learning JP

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...

Deep Learning JP

【DL輪読会】Zero-Shot Dual-Lens Super-Resolution

Deep Learning JP

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv

Deep Learning JP

【DL輪読会】マルチモーダル LLM

Deep Learning JP

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...

Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition

Deep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?

Deep Learning JP

【DL輪読会】Hopfield network　関連研究について

Deep Learning JP

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )

Deep Learning JP

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...

Deep Learning JP

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"

Deep Learning JP

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "

Deep Learning JP

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models

Deep Learning JP

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"

Deep Learning JP

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...

Deep Learning JP

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...

Deep Learning JP

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...

Deep Learning JP

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

Deep Learning JP

【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...

Deep Learning JP

【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...

Deep Learning JP

【DL輪読会】マルチモーダル基盤モデル

Deep Learning JP

【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...

Deep Learning JP

【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...

Deep Learning JP

【DL輪読会】大量API・ツールの扱いに特化したLLM

Deep Learning JP

【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision

Deep Learning JP

LoRaWANスマート距離検出センサー DS20L カタログ LiDARデバイス

CRI Japan, Inc.

論文紹介：Selective Structured State-Spaces for Long-Form Video Understanding

Toru Tamaki

Contenu connexe

Plus de Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition

Deep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?

Deep Learning JP

【DL輪読会】Hopfield network　関連研究について

Deep Learning JP

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )

Deep Learning JP

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...

Deep Learning JP

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"

Deep Learning JP

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "

Deep Learning JP

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models

Deep Learning JP

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"

Deep Learning JP

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...

Deep Learning JP

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...

Deep Learning JP

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...

Deep Learning JP

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

Deep Learning JP

【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...

Deep Learning JP

【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...

Deep Learning JP

【DL輪読会】マルチモーダル基盤モデル

Deep Learning JP

【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...

Deep Learning JP

【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...

Deep Learning JP

【DL輪読会】大量API・ツールの扱いに特化したLLM

Deep Learning JP

【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision

Deep Learning JP

Plus de Deep Learning JP (20)

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition

【DL輪読会】Can Neural Network Memorization Be Localized?

【DL輪読会】Hopfield network　関連研究について

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...

【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...

【DL輪読会】マルチモーダル基盤モデル

【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...

【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...

【DL輪読会】大量API・ツールの扱いに特化したLLM

【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision

Dernier

LoRaWANスマート距離検出センサー DS20L カタログ LiDARデバイス

CRI Japan, Inc.

論文紹介：Selective Structured State-Spaces for Long-Form Video Understanding

Toru Tamaki

論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

atsushi061452

Utilizing Ballerina for Cloud Native Integrations

WSO2

知識ゼロの営業マンでもできた！超速で初心者を脱する、悪魔的学習ステップ3選.pptx

sn679259

論文紹介：Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...

Toru Tamaki

Amazon SES を勉強してみるその３2024/04/26の勉強会で発表されたものです。

iPride Co., Ltd.

LoRaWAN スマート距離検出デバイスDS20L日本語マニュアル

CRI Japan, Inc.

Amazon SES を勉強してみるその２2024/04/26の勉強会で発表されたものです。

iPride Co., Ltd.

新人研修　後半 2024/04/26の勉強会で発表されたものです。

iPride Co., Ltd.

Dernier (10)

LoRaWANスマート距離検出センサー DS20L カタログ LiDARデバイス

論文紹介：Selective Structured State-Spaces for Long-Form Video Understanding

論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

Utilizing Ballerina for Cloud Native Integrations

知識ゼロの営業マンでもできた！超速で初心者を脱する、悪魔的学習ステップ3選.pptx

論文紹介：Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...

Amazon SES を勉強してみるその３2024/04/26の勉強会で発表されたものです。

LoRaWAN スマート距離検出デバイスDS20L日本語マニュアル

Amazon SES を勉強してみるその２2024/04/26の勉強会で発表されたものです。

新人研修　後半 2024/04/26の勉強会で発表されたものです。

[DL Hacks]Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition

1. Power-Normalized Cepstral Coefﬁcients (PNCC) for Robust Speech Recognition 東京大学工学部システム創成学科Cコース B3 中村泰貴

2. 自己紹介・東京大学工学部システム創成学科Cコース B3 中村泰貴・音声(深層学習を絡めた)や信号処理の技術に興味あります・今回が初回発表です...

3. 書誌情報・論文名・Power-Normalized Cepstral Coeﬃcients (PNCC) for Robust Speech Recognition ・著者・Chanwoo Kim(Google) ・Richard M Stern(Carnegie Mellon University) ・公開日・2016/06/24 ・論文URL ・http://www.cs.cmu.edu/ robust/Papers/ OnlinePNCC_V25.pdf

4. 背景・音声認識で用いられる特徴抽出・MFCCかmelspectrogramがほとんど・別な特徴抽出方法はないのか... ・Robust性も欲しい！！・試してみる価値はある deep speech2 PNCC!!!

5. PNCCとは・主な特徴・MFCCなどは対数を用いているのに対し、 PNCCは冪乗則を用いる・雑音低減させるasymmetric ﬁltering ・様々なタイプの雑音環境下、エコーがかかる環境下で MFCCやPLPより認識精度が向上・従来の特徴抽出との差異・計算コストがよりかかる・clean音声でも認識精度が落ちない

6. まずは結果から... LibriSpeech dev-cleanの音声に SNR=4[db]ほどのノイズを環境雑音を付加

7. まずは結果から... mel spectrogram PNCC

8. まずは結果から...

9. PNCCの機構

10.

11. Gammatone Frequency Integration ・Filtabank http://aidiary.hatenablog.com/ entry/20120225/1330179868

12.

13. Medium-Time Power Calculation ・M = 2 ・Pの移動平均・ガウスノイズに効果的

14.

15. Asymmetric Noise Suppression ﬂoor level noise を検出

16. Asymmetric Noise Suppression 有声音などの励起関数によって駆動されていないと思われる信号にlowpass ﬁlteringを適用すると認識精度が向上するこの動作は複数回のローパスフィルタになるため音声のパワー係数をぼかし、認識精度を低下させるため、音声セグメントに対して適用しない

17. Asymmetric Noise Suppression 信号がそれ自身の下側崩落線の定数倍より小さいならばそれは励起されていないものと考える c = 2 がホワイトノイズに対してもっとも効果的

18. Temporal masking 最終的なR[m, l]の値は... R[m, l] = Rsp[m, l] (excitation) R[m, l] = Qf[m, l] (non-excitaion) となる

19.

20. Weight Smoothing

21.

22. Mean power normalization

23.

24. Power Function nonlinearity MFCCによる処理 PNCCによる処理

25. EXPERIMENTAL RESULTS (a)white noise (b)street noise (c) background music (d) interfering speech (e) artiﬁcial reverberation

26. Computational Complexity