【DL輪読会】Segment Anything

•

5 j'aime•4,374 vues

Deep Learning JP

2023/4/7 Deep Learning JP http://deeplearning.jp/seminar-2/

Technologie

Segment Anything
Shohei Taniguchi, Matsuo Lab

Segment Anything
ॻࢽ৘ใ
ஶऀ
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura
Gustafson, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick
֓ཁ
• Meta͕ެ։ͨ͠ηάϝϯςʔγϣϯͷͨΊͷ‫ج‬൫ϞσϧSAM
• 1100ສຕͷը૾ʹ10ԯҎ্ͷϚεΫ͕Ξϊςʔγϣϯ͞Εͨσʔληοτ
SA-1B΋ެ։
2

֓ཁ
Segment-Anything Model,SAM
• ༷ʑͳϓϩϯϓτ͔Β෺ମͷϚεΫΛੜ੒Ͱ͖ΔϞσϧ
ࢦࣔ఺ɾςΩετɾྖҬͳͲ

֓ཁ
Segment-Anything Model,SAM
• Τοδ༧ଌ΍text-to-mask΋zero-shotͰ݁ߏͰ͖Δ

ൃද֓ཁ
• λεΫɿPromotable segmentation
• ϞσϧɿSegment Anything Model
• σʔλɿData engine
• ࣮‫ݧ‬
• ·ͱΊ
5

എ‫ܠ‬
• ۙ೥ɼେ‫ن‬໛‫ޠݴ‬Ϟσϧͷൃల͕͍͢͝
‣ PromptΛ༩͑ͨΒࣗࡏʹ‫ޠݴ‬Λੜ੒Ͱ͖Δ
‣ Scaling lawͰͲΜͲΜੑೳ্͕͕Δ
➡ίϯϐϡʔλϏδϣϯͰ΋ಉ͡Α͏ͳ͜ͱ͸
Ͱ͖ͳ͍ͷ͔ʁ
6
https://j.gifs.com/Y7mBPW.gif

λεΫ
Promptable Segmentation
• ैདྷͷηάϝϯςʔγϣϯλεΫͱҧ͍
ηάϝϯτର৅ΛϓϩϯϓτͰࢦఆ͢Δ
‣ ࢦࣔ఺ɼྖҬɼςΩετͳͲ
• ϓϩϯϓτ͸ᐆດੑΛ‫ؚ‬ΉͨΊ
ਖ਼͍͠ϚεΫ͸1ͭͱ͸‫ݶ‬Βͳ͍
7

Ϟσϧ
Segment Anything Model,SAM
• ߏ੒͸݁ߏγϯϓϧ
1. ը૾ͱϓϩϯϓτΛ
ͦΕͧΕຒΊࠐΉ
2. TransformerϕʔεͷσίʔμͰ
ຒΊࠐΈ͔ΒϚεΫΛੜ੒͢Δ
8

Ϟσϧ
• Image encoder
‣ ը૾Λಛ௃ྔʹຒΊࠐΉ
‣ த਎͸ViT
‣ 1൪‫͕ࢉܭ‬ॏ͍෦෼͕ͩɼ
ਪ࿦࣌ʹ͸ಛ௃ྔΛอ͓͚࣋ͯ͠͹
ϓϩϯϓτΛϦΞϧλΠϜͰ͍͡ΕΔ
9
Segment Anything Model,SAM

Ϟσϧ
• Prompt encoder (points, box)
‣ ϓϩϯϓτΛຒΊࠐΉ
‣ positional encodingʹͯ͠
ֶशՄೳͳຒΊࠐΈύϥϝʔλͱ
଍͠߹ΘͤΔ
10
Segment Anything Model,SAM

Ϟσϧ
• Prompt encoder (text)
‣ ϓϩϯϓτΛຒΊࠐΉ
‣ CLIPͷtext encoderΛ࢖͏
11
Segment Anything Model,SAM

Ϟσϧ
• Prompt encoder (mask)
‣ ϓϩϯϓτΛຒΊࠐΉ
‣ ৞ΈࠐΈΛ͔͚ͨ΋ͷΛ
ը૾ຒΊࠐΈͱ଍͠߹ΘͤΔ
12
Segment Anything Model,SAM

Ϟσϧ
• Mask decoder
‣ ϚεΫީิΛग़ྗ͢Δ
‣ த਎͸Transformerͷdecoder
‣ ϓϩϯϓτͷᐆດੑʹରॲ͢ΔͨΊʹ
3ͭͷީิΛग़ྗ͢Δ
13
Segment Anything Model,SAM

Ϟσϧ
• ֶश
‣ Focal lossͱdice lossΛ
૊Έ߹Θֶͤͯश
‣ ϓϩϯϓτ͸ϥϯμϜʹ
αϯϓϧ͢Δ
14
Segment Anything Model,SAM

σʔλ
Data Engine
• SAMΛΞϊςʔγϣϯʹ΋‫͢༻׆‬Δ
‣ Model-in-the-loop
• 3ஈ֊ʹ෼͚ͯΞϊςʔγϣϯ͢Δ
15

1. SAM͕༧ଌͨ͠ϚεΫΛमਖ਼͢Δ
• SAM͸ॳΊʹผͷσʔληοτͰ
ࣄલʹֶश͓ͤͯ͘͞
• σʔλ͕͋Δఔ౓ू·ͬͨΒ
ͦΕΛ࢖ͬͯSAMΛֶशͤ͞Δ
• 1ը૾͋ͨΓ30ඵҎ಺ʹ෇༩Ͱ͖ΔൣғͰ
Ξϊςʔγϣϯ
16
σʔλ
Data Engine

2. SAM͕༧ଌͨ͠΋ͷҎ֎ΛΞϊςʔγϣϯ
• ΑΓࡉ͔͍෦෼ΛΞϊςʔγϣϯ
• ͜ͷࡍʹ΋৽͘͠௥Ճͨ͠σʔλͰ
SAMΛֶशͤ͞Δ
• ͜͜·ͰͰ1020ສ‫ݸ‬ͷϚεΫ͕ಘΒΕΔ
17
σʔλ
Data Engine

3. SAMͷ༧ଌͰΞϊςʔγϣϯ
• 2ஈ֊໨ͰSAM͕͔ͳΓ͍͍ਫ਼౓ʹ
ͳ͍ͬͯΔͨΊɼ༧ଌ݁ՌΛ΄ͱΜͲ
ͦͷ··Ξϊςʔγϣϯͱͯ͠࢖͑Δ
• Ϟσϧͷ֬৴౓͕ߴ͍΋ͷΛબΜͰ
NMSͰॏෳΛআ‫͢ڈ‬Δ
18
σʔλ
Data Engine

σʔλ
SA-1B
• ࠷ऴతʹ1100ສຕͷը૾ʹ11ԯ‫ݸ‬ͷϚεΫ͕
͍ͭͨσʔληοτ͕Ͱ͖Δ
• ‫ط‬ଘͷσʔληοτʹൺ΂ͯɼ1ը૾͋ͨΓͷ
ϚεΫͷ਺͕͍ͩͿଟ͍
19

• ϚεΫͷҐஔͷόΠΞε΋গͳ͍
• ‫ط‬ଘͷ΋ͷ͸த৺෇ۙʹ͔ͳΓภ͍ͬͯΔ
20
σʔλ
SA-1B

࣮‫ݧ‬
ࢦࣔ఺͔ΒͷϚεΫ༧ଌ
• ଟ͘ͷϕϯνϚʔΫͰZero-shotͰ‫ط‬ଘͷϞσϧΛ্ճΔੑೳ͕ग़Δ
• Zero-shotɿ֤σʔληοτͰfinetune͍ͯ͠ͳ͍
21

࣮‫ݧ‬
ͦͷଞͷzero-shotੑೳ
22
Τοδ༧ଌ Text-to-mask

࣮‫ݧ‬
Ablation study
• σʔλྔ΍ϞσϧαΠζʹΑͬͯੑೳ͕Ͳͷ͘Β͍มΘΔ͔ͷ෼ੳ
• σʔλྔʹؔͯ͠͸100ສຕ͘Β͍Ͱ݁ߏανͬͯͦ͏ͳҹ৅

·ͱΊ
• ϓϩϯϓτͰ੍‫ޚ‬Մೳͳηάϝϯςʔγϣϯ༻‫ج‬൫ϞσϧSAMΛఏҊ
• SAMΛ࢖ͬͯmodel-in-the-loopͰσʔλΛऩूͨ͠SA-1Bσʔληοτ΋ެ։
• σϞ΋ެ։͞Ε͍ͯΔ
https://segment-anything.com/demo
• ϓϩϯϓτ͸ը૾‫Ͱܥ‬΋൚༻తʹ࢖͑ΔΞϓϩʔνʹͳΓͦ͏

Recommandé

[DL輪読会]Learning Transferable Visual Models From Natural Language SupervisionDeep Learning JP

近年のHierarchical Vision TransformerYusuke Uchida

【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge

【メタサーベイ】基盤モデル / Foundation Modelscvpaper. challenge

Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida

[DL輪読会]Focal Loss for Dense Object DetectionDeep Learning JP

【メタサーベイ】Vision and Language のトップ研究室/研究者cvpaper. challenge

[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and EditingDeep Learning JP

Recommandé

[DL輪読会]Learning Transferable Visual Models From Natural Language SupervisionDeep Learning JP

近年のHierarchical Vision TransformerYusuke Uchida

【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge

【メタサーベイ】基盤モデル / Foundation Modelscvpaper. challenge

Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida

[DL輪読会]Focal Loss for Dense Object DetectionDeep Learning JP

【メタサーベイ】Vision and Language のトップ研究室/研究者cvpaper. challenge

[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and EditingDeep Learning JP

SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向SSII

【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP

画像生成・生成モデルメタサーベイcvpaper. challenge

三次元点群を取り扱うニューラルネットワークのサーベイNaoya Chiba

GAN（と強化学習との関係）Masahiro Suzuki

[DL輪読会]相互情報量最大化による表現学習Deep Learning JP

【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion ModelsDeep Learning JP

全力解説！TransformerArithmer Inc.

敵対的生成ネットワーク（GAN）cvpaper. challenge

[DL輪読会]End-to-End Object Detection with TransformersDeep Learning JP

Transformerを多層にする際の勾配消失問題と解決法についてSho Takase

動作認識の最前線：手法，タスク，データセットToru Tamaki

畳み込みLstmtak9029

[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...Deep Learning JP

[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...Deep Learning JP

最適輸送の解き方joisino

【DL輪読会】The Forward-Forward Algorithm: Some PreliminaryDeep Learning JP

[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...Deep Learning JP

深層学習の数理Taiji Suzuki

【DL輪読会】DINOv2: Learning Robust Visual Features without SupervisionDeep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP

【DL輪読会】事前学習用データセットについてDeep Learning JP

Contenu connexe

Tendances

SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向SSII

【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP

画像生成・生成モデルメタサーベイcvpaper. challenge

三次元点群を取り扱うニューラルネットワークのサーベイNaoya Chiba

GAN（と強化学習との関係）Masahiro Suzuki

[DL輪読会]相互情報量最大化による表現学習Deep Learning JP

【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion ModelsDeep Learning JP

全力解説！TransformerArithmer Inc.

敵対的生成ネットワーク（GAN）cvpaper. challenge

[DL輪読会]End-to-End Object Detection with TransformersDeep Learning JP

Transformerを多層にする際の勾配消失問題と解決法についてSho Takase

動作認識の最前線：手法，タスク，データセットToru Tamaki

畳み込みLstmtak9029

[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...Deep Learning JP

[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...Deep Learning JP

最適輸送の解き方joisino

【DL輪読会】The Forward-Forward Algorithm: Some PreliminaryDeep Learning JP

[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...Deep Learning JP

深層学習の数理Taiji Suzuki

【DL輪読会】DINOv2: Learning Robust Visual Features without SupervisionDeep Learning JP

Tendances (20)

SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向

【DL輪読会】ViT + Self Supervised Learningまとめ

画像生成・生成モデルメタサーベイ

三次元点群を取り扱うニューラルネットワークのサーベイ

GAN（と強化学習との関係）

[DL輪読会]相互情報量最大化による表現学習

【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models

全力解説！Transformer

敵対的生成ネットワーク（GAN）

[DL輪読会]End-to-End Object Detection with Transformers

Transformerを多層にする際の勾配消失問題と解決法について

動作認識の最前線：手法，タスク，データセット

畳み込みLstm

[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...

[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...

最適輸送の解き方

【DL輪読会】The Forward-Forward Algorithm: Some Preliminary

[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...

深層学習の数理

【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision

Plus de Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP

【DL輪読会】事前学習用データセットについてDeep Learning JP

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP

【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP

【DL輪読会】マルチモーダル LLMDeep Learning JP

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP

【DL輪読会】Hopfield network　関連研究についてDeep Learning JP

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP

Plus de Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

【DL輪読会】事前学習用データセットについて

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...

【DL輪読会】Zero-Shot Dual-Lens Super-Resolution

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv

【DL輪読会】マルチモーダル LLM

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition

【DL輪読会】Can Neural Network Memorization Be Localized?

【DL輪読会】Hopfield network　関連研究について

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

Dernier

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Artificial Intelligence: Facts and MythsJoaquim Jorge

GenCyber Cyber Security Day PresentationMichael W. Hawkins

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Evaluating the top large language models.pdfChristopherTHyatt

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher

Presentation on how to chat with PDF using ChatGPT code interpreter

How to Troubleshoot Apps for the Modern Connected Worker

Artificial Intelligence: Facts and Myths

GenCyber Cyber Security Day Presentation

🐬 The future of MySQL is Postgres 🐘

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Finology Group – Insurtech Innovation Award 2024

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Driving Behavioral Change for Information Management through Data-Driven Gree...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Handwritten Text Recognition for manuscripts and early printed texts

Boost PC performance: How more available memory can improve productivity

Evaluating the top large language models.pdf

Data Cloud, More than a CDP by Matt Robison

Automating Google Workspace (GWS) & more with Apps Script

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

08448380779 Call Girls In Civil Lines Women Seeking Men

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Partners Life - Insurer Innovation Award 2024

【DL輪読会】Segment Anything

1. Segment Anything Shohei Taniguchi, Matsuo Lab

2. Segment Anything ॻࢽ৘ใ ஶऀ Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick ֓ཁ • Meta͕ެ։ͨ͠ηάϝϯςʔγϣϯͷͨΊͷ‫ج‬൫ϞσϧSAM • 1100ສຕͷը૾ʹ10ԯҎ্ͷϚεΫ͕Ξϊςʔγϣϯ͞Εͨσʔληοτ SA-1B΋ެ։ 2

3. ֓ཁ Segment-Anything Model,SAM • ༷ʑͳϓϩϯϓτ͔Β෺ମͷϚεΫΛੜ੒Ͱ͖ΔϞσϧ ࢦࣔ఺ɾςΩετɾྖҬͳͲ

4. ֓ཁ Segment-Anything Model,SAM • Τοδ༧ଌ΍text-to-mask΋zero-shotͰ݁ߏͰ͖Δ

5. ൃද֓ཁ • λεΫɿPromotable segmentation • ϞσϧɿSegment Anything Model • σʔλɿData engine • ࣮‫ݧ‬ • ·ͱΊ 5

6. എ‫ܠ‬ • ۙ೥ɼେ‫ن‬໛‫ޠݴ‬Ϟσϧͷൃల͕͍͢͝ ‣ PromptΛ༩͑ͨΒࣗࡏʹ‫ޠݴ‬Λੜ੒Ͱ͖Δ ‣ Scaling lawͰͲΜͲΜੑೳ্͕͕Δ ➡ίϯϐϡʔλϏδϣϯͰ΋ಉ͡Α͏ͳ͜ͱ͸ Ͱ͖ͳ͍ͷ͔ʁ 6 https://j.gifs.com/Y7mBPW.gif

7. λεΫ Promptable Segmentation • ैདྷͷηάϝϯςʔγϣϯλεΫͱҧ͍ ηάϝϯτର৅ΛϓϩϯϓτͰࢦఆ͢Δ ‣ ࢦࣔ఺ɼྖҬɼςΩετͳͲ • ϓϩϯϓτ͸ᐆດੑΛ‫ؚ‬ΉͨΊ ਖ਼͍͠ϚεΫ͸1ͭͱ͸‫ݶ‬Βͳ͍ 7

8. Ϟσϧ Segment Anything Model,SAM • ߏ੒͸݁ߏγϯϓϧ 1. ը૾ͱϓϩϯϓτΛ ͦΕͧΕຒΊࠐΉ 2. TransformerϕʔεͷσίʔμͰ ຒΊࠐΈ͔ΒϚεΫΛੜ੒͢Δ 8

9. Ϟσϧ • Image encoder ‣ ը૾Λಛ௃ྔʹຒΊࠐΉ ‣ த਎͸ViT ‣ 1൪‫͕ࢉܭ‬ॏ͍෦෼͕ͩɼ ਪ࿦࣌ʹ͸ಛ௃ྔΛอ͓͚࣋ͯ͠͹ ϓϩϯϓτΛϦΞϧλΠϜͰ͍͡ΕΔ 9 Segment Anything Model,SAM

10. Ϟσϧ • Prompt encoder (points, box) ‣ ϓϩϯϓτΛຒΊࠐΉ ‣ positional encodingʹͯ͠ ֶशՄೳͳຒΊࠐΈύϥϝʔλͱ ଍͠߹ΘͤΔ 10 Segment Anything Model,SAM

11. Ϟσϧ • Prompt encoder (text) ‣ ϓϩϯϓτΛຒΊࠐΉ ‣ CLIPͷtext encoderΛ࢖͏ 11 Segment Anything Model,SAM

12. Ϟσϧ • Prompt encoder (mask) ‣ ϓϩϯϓτΛຒΊࠐΉ ‣ ৞ΈࠐΈΛ͔͚ͨ΋ͷΛ ը૾ຒΊࠐΈͱ଍͠߹ΘͤΔ 12 Segment Anything Model,SAM

13. Ϟσϧ • Mask decoder ‣ ϚεΫީิΛग़ྗ͢Δ ‣ த਎͸Transformerͷdecoder ‣ ϓϩϯϓτͷᐆດੑʹରॲ͢ΔͨΊʹ 3ͭͷީิΛग़ྗ͢Δ 13 Segment Anything Model,SAM

14. Ϟσϧ • ֶश ‣ Focal lossͱdice lossΛ ૊Έ߹Θֶͤͯश ‣ ϓϩϯϓτ͸ϥϯμϜʹ αϯϓϧ͢Δ 14 Segment Anything Model,SAM

15. σʔλ Data Engine • SAMΛΞϊςʔγϣϯʹ΋‫͢༻׆‬Δ ‣ Model-in-the-loop • 3ஈ֊ʹ෼͚ͯΞϊςʔγϣϯ͢Δ 15

16. 1. SAM͕༧ଌͨ͠ϚεΫΛमਖ਼͢Δ • SAM͸ॳΊʹผͷσʔληοτͰ ࣄલʹֶश͓ͤͯ͘͞ • σʔλ͕͋Δఔ౓ू·ͬͨΒ ͦΕΛ࢖ͬͯSAMΛֶशͤ͞Δ • 1ը૾͋ͨΓ30ඵҎ಺ʹ෇༩Ͱ͖ΔൣғͰ Ξϊςʔγϣϯ 16 σʔλ Data Engine

17. 2. SAM͕༧ଌͨ͠΋ͷҎ֎ΛΞϊςʔγϣϯ • ΑΓࡉ͔͍෦෼ΛΞϊςʔγϣϯ • ͜ͷࡍʹ΋৽͘͠௥Ճͨ͠σʔλͰ SAMΛֶशͤ͞Δ • ͜͜·ͰͰ1020ສ‫ݸ‬ͷϚεΫ͕ಘΒΕΔ 17 σʔλ Data Engine

18. 3. SAMͷ༧ଌͰΞϊςʔγϣϯ • 2ஈ֊໨ͰSAM͕͔ͳΓ͍͍ਫ਼౓ʹ ͳ͍ͬͯΔͨΊɼ༧ଌ݁ՌΛ΄ͱΜͲ ͦͷ··Ξϊςʔγϣϯͱͯ͠࢖͑Δ • Ϟσϧͷ֬৴౓͕ߴ͍΋ͷΛબΜͰ NMSͰॏෳΛআ‫͢ڈ‬Δ 18 σʔλ Data Engine

19. σʔλ SA-1B • ࠷ऴతʹ1100ສຕͷը૾ʹ11ԯ‫ݸ‬ͷϚεΫ͕ ͍ͭͨσʔληοτ͕Ͱ͖Δ • ‫ط‬ଘͷσʔληοτʹൺ΂ͯɼ1ը૾͋ͨΓͷ ϚεΫͷ਺͕͍ͩͿଟ͍ 19

20. • ϚεΫͷҐஔͷόΠΞε΋গͳ͍ • ‫ط‬ଘͷ΋ͷ͸த৺෇ۙʹ͔ͳΓภ͍ͬͯΔ 20 σʔλ SA-1B

21. ࣮‫ݧ‬ ࢦࣔ఺͔ΒͷϚεΫ༧ଌ • ଟ͘ͷϕϯνϚʔΫͰZero-shotͰ‫ط‬ଘͷϞσϧΛ্ճΔੑೳ͕ग़Δ • Zero-shotɿ֤σʔληοτͰfinetune͍ͯ͠ͳ͍ 21

22. ࣮‫ݧ‬ ͦͷଞͷzero-shotੑೳ 22 Τοδ༧ଌ Text-to-mask

23. ࣮‫ݧ‬ Ablation study • σʔλྔ΍ϞσϧαΠζʹΑͬͯੑೳ͕Ͳͷ͘Β͍มΘΔ͔ͷ෼ੳ • σʔλྔʹؔͯ͠͸100ສຕ͘Β͍Ͱ݁ߏανͬͯͦ͏ͳҹ৅

24. ·ͱΊ • ϓϩϯϓτͰ੍‫ޚ‬Մೳͳηάϝϯςʔγϣϯ༻‫ج‬൫ϞσϧSAMΛఏҊ • SAMΛ࢖ͬͯmodel-in-the-loopͰσʔλΛऩूͨ͠SA-1Bσʔληοτ΋ެ։ • σϞ΋ެ։͞Ε͍ͯΔ https://segment-anything.com/demo • ϓϩϯϓτ͸ը૾‫Ͱܥ‬΋൚༻తʹ࢖͑ΔΞϓϩʔνʹͳΓͦ͏