Anomaly Detection based on Diffusion

Smart Production Systems Lab.
구병모
2023.01.06
Anomaly Detection based on Diffusion Model
2023年冬季 Paper Seminar

INDEX
01 Denoising diffusion probabilistic models
Ho, Jonathan, Ajay Jain, and Pieter Abbeel. (NeruIPS 2020, Citation: 956)
02 AnoDDPM: Anomaly Detection With Denoising Diffusion Probabilistic Models Using Simplex Noise
Cohen, M. J., & Avidan, S. (CVPR 2022, Citation: 8)

3
Diffusion?
확산 in 열역학
• 시초: Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." PMLR, 2015. (Citation: 668)
Source: 2022 CVPR Nvidia Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications

4
Growing interest in Diffusion
A graduate student who studies late at night
Pixel art Digital art Oil Painting
Source: https://labs.openai.com/
DALL·E 2 has learned the relationship between images and the text used to describe them. It uses a process called
“diffusion”, which starts with a pattern of random dots and gradually alters that pattern towards an image when it
recognizes specific aspects of that image.

5
Generative model
▪ VAE
▪ Flow-based Model
▪ GAN
▪ Diffusion based generative model
𝑃𝜃 𝑥 𝑧
Decoder
𝑞𝜃 𝑧 𝑥
Encoder
• 학습된 Decoder network를 통해 latent variable을 특정한 패턴의 분포로 mapping
• Encoder를 모델 구조에 추가해, Latent variable / Encoder / Decoder 모두 학습
𝑝′
𝑥
(𝑥)
𝑝𝑧 𝑧 ~𝑁
𝑝𝑥(𝑥)
Generator Discriminator
• 학습된 Generator를 통해 latent variable을 특정한 패턴의 분포로 mapping
• Discriminator를 모델 구조에 추가해, Generator를 학습
𝑓−1
∘ ⋯ ∘ 𝑓−1
Flow (inverse)
𝑝′
𝑥
(𝑥)
𝑝𝑧 𝑧 ~𝑁
𝑝𝑥(𝑥)
𝑓 ∘ ⋯ ∘ 𝑓
Flow (forward)
• 학습된 Flow model의 Inverse mapping을 통해 latent variable을 특정한 패턴의
분포로 mapping
• 생성에 활용되는 Inverse mapping을 학습하기 위해 Invertible function을 학습
𝑝′
𝑥
(𝑥)
𝑝𝑧 𝑧 ~𝑁
𝑝𝑥(𝑥)
Iterative MC
𝑞(𝑥1|𝑥0) ∘ 𝑞(𝑥2|𝑥1) ∘ ⋯ ∘ 𝑞(𝑥𝑇 = 𝑧|𝑥𝑇−1)
Iterative MC
𝑃𝜃(𝑧1|𝑧0) ∘ 𝑃𝜃(𝑧2|𝑧1) ∘ ⋯ ∘ 𝑃𝜃(𝑧𝑇 = 𝑥|𝑧𝑇−1)
• 학습된 Diffusion model의 조건부 확률 분포 𝑃𝜃(𝑥|𝑧)을 통해 특정한 패턴의 분포 도출
• 생성에 활용되는 조건부 확률 분포 𝑃𝜃(𝑥|𝑧) 을 학습하기 위해 Diffusion process
𝑞 𝑧 𝑥 을 학습

6
Generative model
Source: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

7
Diffusion?
▪ Diffusion process (Noising)
▪ Reverse process (Denoising)
𝑞 𝑥1:𝑇 𝑥0 ≔ ς𝑡=1
𝑇
𝑞(𝑥𝑡|𝑥𝑡−1) , 𝑞 𝑥𝑡 𝑥𝑡−1 ≔ 𝑁 𝑥𝑡; 1 − 𝛽𝑡𝑥𝑡−1, 𝛽𝑡𝑰 , 𝑥𝑡 = 1 − 𝛽𝑡𝑥𝑡−1 + 𝛽𝑡𝜖𝑡−1, 𝜖~𝑁(0, 𝐼)
𝑃𝜃(𝑥0:𝑇) ≔ 𝑃(𝑥𝑇) ς𝑡=1
𝑇
𝑃𝜃(𝑥𝑡−1|𝑥𝑡) , 𝑃𝜃 𝑥𝑡−1 𝑥𝑡 ≔ 𝑁(𝑥𝑡−1; 𝝁𝜃 𝑥𝑡, 𝑡 , 𝚺𝜃 𝑥𝑡, 𝑡 )
Fixed
Learned
❖ Diffusion Model
▪ 생성모델의 일종으로, 학습 데이터의 패턴을 생성해내는 것을 목적으로 함
▪ 패턴을 생성해내기 위해서 패턴에 노이즈를 넣어 망가뜨리고, 이를 다시 복원하는 조건부 함수를 학습하는 과정
(Reparameterization trick)
학습 대상

8
Diffusion?
❖ Diffusion Loss
Regularization
→ Learning 𝛽𝑡
Reconstruction
Denoising
더 간단하게 만들 수는 없을까? DDPM의 Contribution

9
DDPM
❖ Loss Simplification
▪ Remove Regularization Term
Use Fixed 𝛽𝑡
Inductive Bias ↑
(∵ 학습 가능한 부분을 없애고 사용자가 설계한대로 모델을 구성)

10
DDPM
▪ Remove Regularization Term
Use Fixed 𝛽𝑡
Inductive Bias ↑
(∵ 학습 가능한 부분을 없애고 사용자가 설계한대로 모델을 구성)

11
DDPM
▪ Denoising loss Modification - #1 𝚺𝜃 𝑥𝑡, 𝑡 의 상수화
𝚺𝜃 𝑥𝑡, 𝑡 = 𝜎𝑡
2
𝑰 (time dependent constants) → t시점까지 누적된 noise
𝜎𝑡
2
= ෨
𝛽𝑡 =
1−ഥ
𝛼𝑡−1
1−ഥ
𝛼𝑡
𝛽𝑡 or 𝜎𝑡
2
= 𝛽𝑡 ( ത
𝛼𝑡 = ς𝑠=1
𝑡
𝛼𝑠, 𝛼𝑡 = 1 − 𝛽𝑡)
𝑃𝜃 𝑥𝑡−1 𝑥𝑡 ≔ 𝑁(𝑥𝑡−1; 𝝁𝜃 𝑥𝑡, 𝑡 , 𝚺𝜃 𝑥𝑡, 𝑡 ) 𝑃𝜃 𝑥𝑡−1 𝑥𝑡 ≔ 𝑁(𝑥𝑡−1; 𝝁𝜃 𝑥𝑡, 𝑡 , 𝜎𝑡
2
𝑰)
학습 대상

12
DDPM
▪ Denoising loss Modification - #2 𝝁𝜃 𝑥𝑡, 𝑡 를 Denoising matching을 통해 새롭게 정의
𝑞 𝑥𝑡−1 𝑥𝑡, 𝑥0 = 𝑁(𝑥𝑡−1; ෤
𝜇𝑡 𝑥𝑡, 𝑥0 , ෨
𝛽𝑡 ∙ 𝐈)
𝑝𝜃 𝑥𝑡−1 𝑥𝑡 = 𝑁(𝑥𝑡−1; 𝜇𝜃 𝑥𝑡, 𝑡 , ෨
𝛽𝑡 ∙ 𝐈)
& KL Divergence
<Bayes Rule>

13
DDPM
𝑥𝑡 𝑥0, 𝜖 = ത
𝛼𝑡𝑥0 + 1 − ത
𝛼𝑡𝜖 for 𝜖~𝑁 0, 𝐈 ( ത
𝛼𝑡 ≔ ς𝑠=1
𝑡
𝛼𝑠 , 𝛼𝑡 = 1 − 𝛽𝑡)

14
DDPM

15
DDPM
▪ Denoising loss Modification - #3 Total Process 수식적으로 복잡, 고려 안 해줘도 큰 성능 차이 X
계수항 제거

16
DDPM
Regularization
→ Learning 𝛽𝑡
Reconstruction
Denoising

17
Experiments
해명
1. noise-to-image에 따른 RMSE
2. noise-to-image에 따른 정보량
3. X축: 정보량 / Y축: RMSE
→ 3번 그래프를 보면 적은 정보량에서 이미 RMSE(왜곡)가 충분히 낮아지는 것을 알 수 있음.
→ Reverse process 초반에 정보량은 적지만 왜곡은 충분히 제거 & 후반에는 거의 보이지
않는 왜곡 (imperceptible distortion)을 줄이는 정보가 존재
• IS(Inception score): 생성된 이미지로부터 분류 Task 수행할 때 얼마나 특정 class로 추정을 잘하는 지에 대한 score (분류 성능 & 전체 class 고르게 생성 → IS ↑ )
• FID(Frechet Inception Distance): 실제 데이터를 참고하여 (정확하게는 데이터 분포를 참고) 평균, 공분산을 비교 / 낮을수록 좋은 지표
• NLL (Negative Log Likelihood): 이미지 픽셀당 옳게 생성했는지를 판단하는 지표 / NLL이 높다면 불확실함 (이미지 픽셀당 평균 확률이 낮음)

18
Experiments
⑴
⑵
⑶
⑷
• 계수항은 t가 증가할수록 작아지는 경향 有 → Large t (more noisy) 시점의 Loss 값이 down-weight되는 현상
• 계수항을 제거하여 noise가 더 심한 step에서의 Loss 비중을 높이는 방법 → Denoising에 더 집중 가능
계수항
Source: https://www.youtube.com/watch?v=_JQSMhqXw-4&t=1431s

19
02 Anomaly Detection With Denoising Diffusion Probabilistic Models Using Simplex Noise
Introduction
❖ Background
• 본 연구는 의료 분야 이미지 Anomaly Detection을 목표로 함.
• 생성모델 기반 이상탐지: 정상(Healthy) 이미지만을 복원할 수 있도록 학습 → 이상(Anomalous) 이미지도 정상 이미지처럼 복원하여 탐지
• Diffusion 생성모델 선택 이유: GAN보다 mode coverage가 좋고, VAE보다 higher sample quality
• 기존 방법론의 한계점: ① Too expensive Sampling times ② Gaussian noise → Fail to detect anomalies (너무 좋은 성능)
❖ Contribution
① “Partial diffusion strategy”
② “Multi-scale (multi-octave) Simplex noise”

20
Methodology - Simplex Noise ①
❖ Gaussian noise의 한계점
• Natural image는 낮은 빈도의 성분(ex. 이상 region) 이 이미지에 더 많은 기여를 하도록 하는 멱법칙(power law) 분포를 가짐.
• 멱법칙? ≈ 파레토 법칙 → 위의 말을 쉽게 써보면 이미지를 분류하는데 대부분의 부분보다는 낮은 빈도로 보이는 작은 부분들이 효과적임.
• Gaussian noise는 왜 문제? DDPM이 high quality로 복원하기 위해서 낮은 빈도 부분은 noise를 약하게, 높은 빈도 부분은 noise를
강하게 주는 경향 존재
• 따라서, 이미지 상황과 유사하게 멱법칙(power law) 분포를 따르는 noise를 넣어줘야 함.
(80 : 20 Rule)
Gaussian
noise
Simplex
noise
Low Frequency

21
Methodology - Simplex Noise ②
❖ Simplex Noise ≈ Perlin noise (펄린 노이즈)
• Ken Perlin이 1980년대 초 영화 ‘트론‘ 제작 중 컴퓨터 효과를 위한 단계적 텍스처를 만들기 위해 개발한 노이즈 함수
• 자연계의 불규칙한 노이즈를 CG로 표현하는 방법이며 프랙탈 합을 이용해서 이상적인 노이즈를 만들어 냄
• Benefit : the corruption is more structured | the denoising process will be able to “repair” those structured anomalies
Octave: 여러 noise 중첩

22
Methodology – AnoDDPM
❖ AnoDDPM
• Gaussian & Simplex noise 같이 사용해서 비교 실험
• Simplex noise → Starting frequency = 𝜈 = 2−6
𝑁 = 6 𝛾 = 0.8 (decay)

23
Training & Inference (Segmentation)

24
Experiment Dataset
❖ Healthy Dataset
• Neurofeedback Skull-Stripped (NFBS) Repository
• Database of 125 T1-weighted anatomical MRI scans that are manually skull-stripped
• Full Skull image: 복잡 But 이상 징후가 다양하게 발생 가능
• Training : Testing = 100 : 25
❖ Anomalous Dataset
• Centre for Clinical Brain Sciences from the University of Edinburgh로 부터 받은 brain tumours dataset
• Database of 22 T1-weighted MRI scans

25
Gaussian vs Simplex
• Gaussian diffusion은 확실히 high quality sample은 잘 만듦 BUT 𝜆를 250, 500, 750 늘려갈수록 아예 다른 이미지 생성
𝜆
250
500
750
• 다른 종양을 가진 환자의 뇌를 잘 복원하기 위해 필요한 Release time은 상이함 → 해당 데이터셋에서 일반적으로 250이 optimal
❖ Gaussian noise diffusion ❖ Simplex noise diffusion

26
Performance

27
Conclusion
❖ Future work
1. Simplex noise의 불균형 때문에 sample quality 가 조금은 떨어지는 경향 → 여러 noise를 연구하여 Customize
2. DDPM은 Markov Chain을 사용, 즉 Stochastic 함 → 한 번 생성하는 것보다는 여러 번 생성해서 평균 낸 이미지로 이상 탐지 수행
3. 3D 이미지 혹은 Color 이미지에도 적용
❖ Conclusion
• 실험에서도 알 수 있듯, not require large datasets → 이상탐지 수행 가능
• Gaussian noise 대신에 Simplex noise를 사용한 점
• 의료 분야 외에 다른 도메인에서도 적용 가능 → ex) MVTec AD
• Full length Markov chain이 필요하지 않은 분야에 적용하면 좋을 것
(image enhancement, semantic segmentation, filtering)

28
Reference
1. Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing
Systems 33 (2020)
2. Wyatt, Julian, et al. "AnoDDPM: Anomaly Detection With Denoising Diffusion Probabilistic Models Using Simplex Noise." Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
3. 유튜브 강의 [DSBA 연구실 김정섭 석사과정], https://www.youtube.com/watch?v=_JQSMhqXw-4&t=1431s
4. 유튜브 강의 [PR 409], https://www.youtube.com/watch?v=1j0W_lu55nc
5. 블로그, https://ivdevlog.tistory.com/14

Smart Production Systems Lab.
구병모
2023.01.06
Thank you

Anomaly Detection based on Diffusion

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Anomaly Detection based on Diffusion

Similaire à Anomaly Detection based on Diffusion (20)

Dernier

Dernier (8)

Anomaly Detection based on Diffusion