北村大地, "独立低ランク行列分析に基づく音源分離とその発展," IEICE信号処理研究会, 2021年8月24日.
Daichi Kitamura, "Audio source separation based on independent low-rank matrix analysis and its extensions," IEICE Technical Group on Signal Processing, Aug. 24th, 2021.
http://d-kitamura.net
49. • 独立低ランク行列分析(ILRMA)
• 独立深層学習行列分析(independent deeply learned matrix analysis: IDLMA)
– 統計的独立性と教師ありDNN音源モデルに基づく音源分離
ILRMAの音源モデルの教師あり化
49
観測信号
周波数毎の
分離行列
分離信号 DNN音源モデルによる分散推定
Time
Frequency
Frequency
Time
STFT
Time
Frequency
Frequency
Time
観測信号
周波数毎の
分離行列
分離信号
Time
Frequency
Frequency
Time
分離信号が「互いに独立」かつ「低ランクな時
間周波数構造」を持つように分離行列を更新
STFT
NMFによる低ランク近似
分離信号が「互いに独立」かつ「学習済みの
DNNで表現されるような時間周波数構造」を
持つように分離行列を更新
音源モデルをDNNで教師あり化
84. まとめ
• 優決定条件BSSの本質
– 音源間独立性で空間的に分離(分離行列推定)
– 何らかの音源モデルを導入してパーミュテーション問題を解決
• ILRMA:NMFに基づく低ランク音源モデル
– D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source
separation unifying independent vector analysis and nonnegative matrix factorization,” IEEE/ACM
Trans. ASLP, vol. 24, no. 9, pp. 1626–1641, Sep. 2016.
– D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source
separation with independent low-rank matrix analysis,” Audio Source Separation. Signals and
Communication Technology., S. Makino, Ed. Springer, Cham, pp. 125–155, Mar. 2018.
• IDLMA:DNNに基づく教師あり音源モデル
– S. Mogami, H. Sumino, D. Kitamura, N. Takamune, S. Takamichi, H. Saruwatari, and N. Ono,
“Independent deeply learned matrix analysis for multichannel audio source separation,” Proc.
EUSIPCO, pp. 1571–1575, Sep. 2018.
– N. Makishima, S. Mogami, N. Takamune, D. Kitamura, H. Sumino, S. Takamichi, H. Saruwatari, and
N. Ono, “Independent deeply learned matrix analysis for determined audio source separation,”
IEEE/ACM Trans. ASLP, vol. 27, no. 10, pp. 1601–1615, Oct. 2019.
• Consistent ILRMA:NMF+スペクトログラム無矛盾性
– D. Kitamura and K. Yatabe, “Consistent independent low-rank matrix analysis for determined blind
source separation,” EURASIP J. ASP, vol. 2020, no. 46, p. 35, Nov. 2020. 84
85. そのほかのILRMA拡張(一部)
• 優ガウス分布生成モデルへの拡張
– D. Kitamura, S. Mogami, Y. Mitsui, N. Takamune, H. Saruwatari, N. Ono, Y. Takahashi, and K. Kondo,
“Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source
separation,” EURASIP J. ASP, vol. 2018, no. 28, p. 25, May 2018.
• 劣ガウス分布生成モデルへの拡張
– S. Mogami, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, and N. Ono,
“Independent low-rank matrix analysis based on time-variant sub-Gaussian source model for
determined blind source separation,” IEEE/ACM Trans. ASLP, vol. 28, pp. 503–518, Dec. 2019.
• 時間周波数マスキングに基づくBSS(TFMBSS)
– K. Yatabe and D. Kitamura, “Time-frequency-masking-based determined BSS with application to
sparse IVA,” Proc. ICASSP, pp. 715–719, May 2019.
– S. Oyabu, D. Kitamura, and K. Yatabe, “Linear multichannel blind source separation based on time-
frequency mask obtained by harmonic/percussive sound separation,” Proc. ICASSP, pp. 201–205,
Jun. 2021.
– K. Yatabe and D. Kitamura, “Determined BSS based on time-frequency masking and its application to
harmonic vector analysis,” IEEE Trans. ASLP, vol. 29, pp. 1609–1625, Apr. 2021.
• ユーザインタラクション付きILRMA
– F. Oshima, M. Nakano, and D. Kitamura, “Interactive speech source separation based on independent
low-rank matrix analysis,” AST, vol. 42, no. 4, pp. 222–225, Jul. 2021.
85
87. 参考文献(アルファベット順)(1/5)
• [Comon, 1994]: P. Comon, “Independent component analysis, a new concept?” Signal
Process., vol. 36, no. 3, pp. 287–314, 1994.
• [Duong, 2010]: N. Q. K. Duong, E. Vincent, and R. Gribonval, “Under-determined reverberant
audio source separation using a full-rank spatial covariance model,” IEEE Trans. Audio,
Speech, Lang. Process., vol. 18, no. 7, pp. 1830–1840, 2010.
• [Févotte, 2009]: C. Févotte, N. Bertin, and J.-L.Durrieu, “Nonnegative matrix factorization with
the Itakura-Saito divergence. With application to music analysis,” Neural Comput., vol. 21, no.
3, pp. 793–830, 2009.
• [Hiroe, 2006]: A. Hiroe, “Solution of permutation problem in frequency domain ICA using
multivariate probability density functions,” Proc. Int. Conf. Independent Compon. Anal. Blind
Source Separation, 2006, pp. 601–608.
• [James, 1961]: W. James and C. Stein, “Estimation with quadratic loss,” Proc. Berkeley
Symposium on Mathematical Statistics and Probability, vol. 1, 1961, pp. 361–379.
• [Kim, 2006]: T. Kim, T. Eltoft, and T.-W. Lee, “Independent vector analysis: An extension of
ICA to multivariate components,” Proc. Int. Conf. Independent Compon. Anal. Blind Source
Separation, 2006, pp. 165–172.
• [Kim, 2007]: T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, “Blind source separation exploiting
higher-order frequency dependencies,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15,
no. 1, pp. 70–79, 2007.
87
88. 参考文献(アルファベット順)(2/5)
• [Kitamura, 2014]: T. Miyauchi, D. Kitamura, H. Saruwatari, and S. Nakamura, “Depth
estimation of sound images using directional clustering and activation-shared nonnegative
matrix factorization,” Journal of Signal Process., vol. 18, no. 4, pp. 217–220, 2014.
• [Kitamura, 2015]: D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo, and S.
Nakamura, “Multichannel signal separation combining directional clustering and nonnegative
matrix factorization with spectrogram restoration,” IEEE/ACM Trans. on Audio, Speech, and
Lang. Process., vol. 23, no. 4, pp. 654–669, 2015.
• [Kitamura, 2016]: D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo and S.
Nakamura, “Determined blind source separation unifying independent vector analysis and
nonnegative matrix factorization,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24,
no. 9, pp. 1626–1641, 2016.
• [Kulis, 2006]: B. Kulis, M. Sustik, and I. Dhillon, “Learning low-rank kernel matrices,” Proc. Int.
Conf. on Machine Learning, 2006, pp. 505–512.
• [Le Roux, 2010]: J. L. Roux, H. Kameoka, N. Ono, and S. Sagayama, “Fast signal
reconstruction from magnitude STFT spectrogram based on spectrogram consistency,” Proc.
DAFx, 2010.
• [Le Roux, 2013]: J. Le Roux and E. Vincent, “Consistent Wiener filtering for audio source
separation,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 217–220, 2013.
88
89. 参考文献(アルファベット順)(3/5)
• [Lee, 1999]: D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix
factorization,” Nature, vol. 401, pp. 788–791, 1999.
• [Lee, 2000]: D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,”
Proc. Adv. Neural Inform. Process. Syst., 2000, vol. 13, pp. 556–562.
• [Matsuoka, 2001]: K. Matsuoka and S. Nakashima, “Minimal distortion principle for blind
source separation,” Proc. ICA, pp. 722–727, 2001.
• [Nugraha, 2016]: A. A. Nugraha, A. Liutkus, and E. Vincent, “Multichannel audio source
separation with deep neural networks,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol.
24, no. 9, pp. 1652–1664, Sep. 2016.
• [Ono, 2011]: N. Ono, “Stable and fast update rules for independent vector analysis based on
auxiliary function technique,” Proc. IEEE Workshop on Applications of Signal Process. to
Audio and Acoust., 2011, pp. 189–192.
• [Ono, 2012]: T. Ono, N. Ono, and S. Sagayama, “User-guided independent vector analysis
with source activity tuning,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2012, pp.
2417–2420.
• [Ozerov, 2010]: A. Ozerov and C. Févotte, “Multichannel nonnegative matrix factorization in
convolutive mixtures for audio source separation,” IEEE Trans. Audio, Speech, and Lang.
Process., vol. 18, no. 3, pp. 550–563, 2010.
89
90. 参考文献(アルファベット順)(4/5)
• [Saruwatari, 2000]: S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Evaluation
of blind signal separation method using directivity pattern under reverberant conditions,” Proc.
IEEE Int. Conf. Acoust., Speech, Signal Process., 2000, pp. 3140–3143.
• [Saruwatari, 2006]: H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano, “Blind
source separation based on a fast-convergence algorithm combining ICA and beamforming,”
IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 2, pp. 666–678, Mar. 2006.
• [Sawada, 2004]: H. Sawada, R. Mukai, S. Araki, and S.Makino, “Convolutive blind source
separation for more than two sources in the frequency domain,” Proc. IEEE Int. Conf. Acoust.,
Speech, Signal Process., 2004, pp. III-885–III-888.
• [Sawada, 2013]: H. Sawada, H.Kameoka, S.Araki, and N. Ueda, “Multichannel extensions of
non-negative matrix factorization with complex-valued data,” IEEE Trans. Audio, Speech,
Lang. Process., vol. 21, no. 5, pp. 971–982, 2013.
• [Smaragdis, 1998]: P. Smaragdis, “Blind separation of convolved mixtures in the frequency
domain,” Neurocomputing, vol. 22, pp. 21–34, 1998.
• [Smaragdis, 2007]: P. Smaragdis, B. Raj, and M. Shashanka, “Supervised and semi-
supervised separation of sounds from single-channel mixtures,” Proc. ICA, 2007, pp. 414–
421.
• [Uhlich, 2015]: S. Uhlich, F. Giron, and Y. Mitsufuji, “Deep neural network based instrument
extraction from music,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp.
90
91. 参考文献(アルファベット順)(5/5)
• [Yatabe, 2020]: K. Yatabe, “Consistent ICA: Determined BSS meets spectrogram
consistency,” IEEE Signal Process. Lett., vol. 27, pp. 870–874, 2020.
91