ICML 2018読み会の資料.
Overview of NLP/ Adversarial Attacks
- Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples
- Synthesizing Robust Adversarial Examples
- Black-box Adversarial Attacks with Limited Queries and Information
11. Adversarial Attacks
1. Obfuscated gradients give a false sense of security:
circumventing defenses to adversarial examples [Best Paper]
Anish Athalye, Nicholas Carlini, David Wagner
2. Synthesizing Robust Adversarial Examples
Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin Kwok
3. Black-box Adversarial Attacks with Limited Queries and
Information
Andrew Ilyas*, Logan Engstrom*, Anish Athalye*, Jessy Lin*
11
同じグループ (MIT/UC Berkeley)でAdversarial 系で3本 ICMLに通している
…!
そのうち1本はBest Paper.
12. 1. Obfuscated gradients give a false sense of security
12
Adversarial Exampleとは?
出力を変えるような勾配を求めて 入力xを更新していく
13. 1. Obfuscated gradients give a false sense of security
13
Adversarial Exampleとは?
(余談) Adversarial Exampleといえばパンダのイメージが強いが,
ICMLの発表では猫 (tabby cat) の例が多いなぁ…
14. 1. Obfuscated gradients give a false sense of security
14
ICLR 2018のdefenseに関する論文9本のうち7本に関して攻撃できたよ
Obfuscated gradientsという現象を使ってdefenseしている手法は攻撃されちゃう
よ
概要
1. Shattered gradients
微分不可能なオペレーションで勾配を計算不可にしている
2. Stochastic gradients
予測時にランダム性を入れている
3. Vanishing/exploding gradients
勾配が消失している
Obfuscated gradients
15. 1. Obfuscated gradients give a false sense of security
15
Backward Pass Differentiable Approximation (BPDA)
勾配が計算できないレイヤーを勾配 ≒ 1 となる別の関数で置き換える
g(x) ≈ x ∇g(x) ≈ ∇x = 1
16. 1. Obfuscated gradients give a false sense of security
16
Expectation Over Transformation (EOT)
入力にランダムな変換を加えて予測させている場合, いくつか入力を
試して期待値を計算し, 勾配を近似する.
17. 1. Obfuscated gradients give a false sense of security
17
実験
著者ポスター:
https://www.anishathalye.com/media/2018/07/19/poster.pdf
23. 勾配の推定 [Wierstra et al. (2014)]
(Natural Evolutionary Strategies; NES)
23
推定したい勾配
Search distribution of random Gaussian noise around the current image x
近似した勾配 期待値で近似
31. 論文一覧
NLP関連の論文
– Towards Binary-Valued Gates for Robust LSTM Training
– Learning K-way D-dimensional Discrete Codes for Compact Embedding
Representations
– Extracting Automata from Recurrent Neural Networks Using Queries and
Counterexamples
– Adaptive Sampled Softmax with Kernel Based Sampling
Adversarial Attack
31
1. Obfuscated gradients give a false sense of security:
circumventing defenses to adversarial examples [Best Paper]
2. Synthesizing Robust Adversarial Examples
3. Black-box Adversarial Attacks with Limited Queries and
Information
38. 関連研究
出力を変える入力文の作成
– クラウドソーシングで読解システムを騙す入力文を作成する
[Jia and Liang, 2017]
– ランダムな文字のスワップを考えてNMTを騙す入力文を探索する
[Belinkov and Bisk, 2018]
– 同義語を置き換え大量の入力文を生成し,分類器を騙す出力文を探索する
[Samanta and Mehta, 2017]
モデルの挙動を知ることで解釈性が上がる.
38
敵対的サンプル(Adversarial Example) for NLP
39. 関連研究
敵対的サンプルを学習に加えて汎化性能を上げる[Goodfellow et al .,2015]
39
敵対的学習(Adversarial Training)
目的関数 敵対的サンプルを正しく分類する目的関数
半教師あり学習に敵対的学習を拡張 (Virtual Adversarial Training; VAT)
[Miyato et al., 2016]
単語ベクトルに摂動を加え,敵対的学習を行う [Miyato et al., 2017]
– テキスト分類において最高精度だが, 摂動に関する解釈性は議論していない
Adversarial Training for Text
既存手法: [Miyato et al., 2017] について詳しく述べる
40. 既存手法 : [Miyato et al., 2017]
Takeru Miyato, Andrew M Dai, and Ian Goodfellow, ICLR 2017
“Adversarial training methods for semi-supervised text classification.”
単層LSTM + Pre-Training (Language Model) + Adversarial Training
40
単語ベクトル:敵対的摂動ベクトル:
42. 既存手法 : [Miyato et al., 2017]
Takeru Miyato, Andrew M Dai, and Ian Goodfellow, ICLR 2017
“Adversarial training methods for semi-supervised text classification.”
単層LSTM + Pre-Training (Language Model) + Adversarial Training
42
単語ベクトル:敵対的摂動ベクトル:
50. 予測を変化させる入力文の作成
摂動のノルムが大きい単語を置き換えて,
分類器の予測結果が変わるか確かめる.
50
This movie turned out to be better than I had expected it to be
Some parts were pretty funny It was nice to have a movie with a
new plot <eos>
テストデータ文 予測結果: Positive
This movie turned out to be worse than I had expected it to be
Some parts were pretty funny It was nice to have a movie with a
new plot <eos>
敵対的サンプル 予測結果: Negative
“better” → “worse”と置き換えると予測結果が反転した
(文の意味も変化している)
51. 予測を変化させる入力文の作成
51
テストデータ文
予測結果: Positive敵対的サンプル
予測結果: Negative
There is really but one thing to say about this sorry movie It should never
have been made The first one one of my favourites An American Werewolf
in London is a great movie with a good plot good actors and good FX But
this one It stinks to heaven with a cry of helplessness <eos>
There is really but one thing to say about that sorry movie It should never
have been made The first one one of my favourites An American Werewolf
in London is a great movie with a good plot good actors and good FX But
this one It stinks to heaven with a cry of helplessness <eos>
“this” → “that”と置き換えると予測結果が反転した
(文の意味は変化していない)