SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Susang Kim(healess1@gmail.com)
3D Representation
GIRAFFE : Representing Scenes as Compositional Generative Neural Feature Fields
(CVPR 2021 Best Paper Award)
Related work & References
NeRF : Neural Radiance Fields (ECCV 2020 - Best Paper Honorable Mention)
Input is a single continuous 5D coordinate (spatial
location (x, y, z) and viewing direction (θ, φ)) and
whose output is the volume density and
view-dependent emitted radiance at that spatial
location
FΘ : (x, d) → (c, σ) and optimize its
weights Θ to map from each input 5D
coordinate to its corresponding
volume density and directional emitted
color
Positional encoding : γ(·) is applied separately to each of the three coordinate values in x (which are
normalized to lie in [−1, 1]) and to the three components of the Cartesian viewing direction unit vector d
(which by construction lie in [−1, 1]). In our experiments, we set L = 10 for γ(x) and L = 4 for γ(d).
higher dimensional space to enable our MLP to more easily approximate a higher frequency function
GRAF: Generative Radiance Fields (NeurIPS 2020)
A generative model for radiance fields for high-resolution 3D-aware image synthesis from unposed
images. A patch-based discriminator that samples the image at multiple scales and which is key to learn
high-resolution generative radiance fields efficiently
camera matrix
camera pose
2D sampling
pattern
Γ(I, ν) to denote this bilinear sampling operation
Generate high resolution images with better multi-view consistency compared to voxel-based approaches.
Limitation is simple scenes with single objects. (inductive bias : depth maps, symmetry and real-world scenes)
the dimensionalities of the latent codes
Frechet Inception Distance (FID) (NeurlPS 2017)
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
For the evaluation of the performance of GANs at image generation, FID captures the similarity of
generated images to real ones better and more consistent than the Inception Score.
FID score(real(m,c), fake(m,c)) :
the Gaussian with mean and covariance (m, C)
the trace of a square matrix A, denoted Tr(A)
Giraffe report FID score to quantify image quality.
(20,000 real and fake samples)
GIRAFFE: Representing Scenes as Compositional
Generative Neural Feature Fields
(CVPR 2021 Best Paper Award)
Abstract
Most approaches, however, do not consider
the compositional nature of scenes
(not 2D but 3D). A key limitation of NeRF
and GRAF is that the entire scene is
represented by a single model.
How to disentangle underlying factors of
variation in the data, most of them operate in
2D and hence ignore that our world is
three-dimensional.
Incorporating a compositional 3D scene
representation into the generative model
leads to more controllable image synthesis.
Our model is able to disentangle individual
objects and allows for translating and
rotating them in the scene as well as
changing the camera pose.
Image Generation
disentangled representations
(without changing)
disentangled representations
(changing)
Definitions of disentanglement vary, but commonly refer to being able to control an attribute of
interest, e.g. object shape, size, or pose, without changing other attributes.
Overview
Incorporating a compositional
3D scene representation
A novel method for generating scenes in a controllable and photorealistic manner while training from
raw unstructured image collections
A neural renderer processes these feature
images and outputs the final renderings
controllable
image
synthesis
GIRAFFE achieves high-quality images and scales to real-world scenes.
Method
Object Representation: Disentangle different entities in the scene -> Represent each object using a
separate feature field in combination with an affine transformation
rotation matrix : R ∈ SO(3)
canonical object space
Neural Radiance Fields: Low dimensional input x and d needs to be mapped to higher-dimensional
features to be able to represent complex signals when f is parameterized with a MLP.
<- positional encoding (t is scalar)
Generative Neural Feature Fields: To learn a latent space of NeRFs, they condition the MLP on
shape and appearance codes
component of x or d, and L the number of frequency octaves
the output dimensionalities of the positional encodings
viewing direction
(σ : volume density, c: RGB color value)
c : 3D color -> f:Multi-D
GIRAFFE
objects
background
shape and appearance
A novel method for generating scenes in a controllable and photorealistic manner while training from raw
unstructured image collections. Orange indicates learnable and blue non-learnable operations.
transmittance, alpha
Given Pixel
distance between neighboring sample points
Feature Fields Architecture the positional encoding γ to the viewing direction d,
concatenate γ(d) to the latent appearance code Za,
fully-connected layers (yellow color) with ReLU activation (red color)
3D point x and viewing direction d together with latent
shape and appearance codes Zs, Za
2D Neural Rendering
StyleGAN architecture : Analyzing and Improving
the Image Quality of StyleGAN (CVPR 2020)
Neural Rendering Operator map the feature image
to an RGB image at every spatial resolution, and
add the previous output to the next via bilinear
upsampling. These skip connections ensure a
strong gradient flow to the feature fields. We
obtain our final image prediction ˆI by applying a
sigmoid activation to the last RGB layer.
Gray color indicates outputs, orange learnable,
and blue non-learnable operations.
N blocks
Final Stage
Train GAN
Datasets PhotoShape: Photorealistic Materials for Large-Scale Shape Collections (ACM 2018)
CLEVR: A Diagnostic Dataset for Compositional Language
and Elementary Visual Reasoning (CVPR 2017)
We present a diagnostic dataset that tests a range of visual reasoning
abilities. It contains minimal biases and has detailed annotations.
(3D Shapes : Color, Material, Rotation, Size)
To render multi-object scenes of random primitives. We adjust the
camera position to have a rotation of 0 ◦ instead of 43◦ . We save
renderings and positions of placed primitives to files. During training,
we sample the translations of object feature fields from the saved
positions. Controllable Image Synthesis(CIS) and our method on
scenes with 0, 1, 2, or 3 primitives (Clevr-0123) at 64^2 pixels https://knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=clevr
Automatically assign
high-quality, realistic
appearance models to
large scale 3D shape
collections.
Datasets (In supplementary document)
Dataset Parameters : object rotation, background rotation, camera elevation, horizontal and depth
translation, and object size from uniform distributions over the indicated ranges. For the Clevr datasets,
we sample object locations from the distribution we obtain during dataset generation.
Image Center Cropping : center crop (CelebA, CelebA-HQ)
Image Random Cropping : rescale and random crop(CompCars)
Data Augmentation : For all experiments, randomly flip horizontally during training
Clevr Dataset Generation: the script to render multi-object scenes of random primitives adjust the
camera position to have a rotation of 0 ◦ instead of 43◦
Experiments Our model correctly disentangles individual objects
when trained on multi-object scenes with fixed or
varying number of objects
Conclusion & Limitations
Conclusion
A novel method for controllable image synthesis.
Incorporate a compositional 3D scene representation into the generative model
Disentangle individual objects from the background as well as their shape and
appearance without explicit supervision
Limitations (Dataset Bias & Object Transformation Distributions)
Investigate how the distributions over object level transformations and camera poses
can be learned from data. (Assume simple uniform priors)
Incorporating supervision which is easy to obtain(predicted object masks, scale, more
complex, multi-object scenes)
entangled eye & hair
Thanks
Any Questions?
You can send mail to
Susang Kim(healess1@gmail.com)

Contenu connexe

Similaire à [Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUEScscpconf
 
3 d graphics with opengl part 2
3 d graphics with opengl  part 23 d graphics with opengl  part 2
3 d graphics with opengl part 2Sardar Alam
 
CS 354 Pixel Updating
CS 354 Pixel UpdatingCS 354 Pixel Updating
CS 354 Pixel UpdatingMark Kilgard
 
Comparison of image segmentation
Comparison of image segmentationComparison of image segmentation
Comparison of image segmentationHaitham Ahmed
 
Presentation on Face Recognition Based on 3D Shape Estimation
Presentation on Face Recognition Based on 3D Shape EstimationPresentation on Face Recognition Based on 3D Shape Estimation
Presentation on Face Recognition Based on 3D Shape EstimationRapidAcademy
 
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLING
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLINGAPPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLING
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLINGsipij
 
GEOMETRIC TAMPERING ESTIMATION BY MEANS OF A SIFT-BASED FORENSIC ANALYSIS
GEOMETRIC TAMPERING ESTIMATION  BY MEANS OF A SIFT-BASED FORENSIC ANALYSISGEOMETRIC TAMPERING ESTIMATION  BY MEANS OF A SIFT-BASED FORENSIC ANALYSIS
GEOMETRIC TAMPERING ESTIMATION BY MEANS OF A SIFT-BASED FORENSIC ANALYSISICL - Image Communication Laboratory
 
Scattered gis handbook
Scattered gis handbookScattered gis handbook
Scattered gis handbookWaleed Liaqat
 
Different Image Segmentation Techniques for Dental Image Extraction
Different Image Segmentation Techniques for Dental Image ExtractionDifferent Image Segmentation Techniques for Dental Image Extraction
Different Image Segmentation Techniques for Dental Image ExtractionIJERA Editor
 
Neural Radiance Fields & Neural Rendering.pdf
Neural Radiance Fields & Neural Rendering.pdfNeural Radiance Fields & Neural Rendering.pdf
Neural Radiance Fields & Neural Rendering.pdfNavneetPaul2
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsMLAI2
 
Dense Visual Odometry Using Genetic Algorithm
Dense Visual Odometry Using Genetic AlgorithmDense Visual Odometry Using Genetic Algorithm
Dense Visual Odometry Using Genetic AlgorithmSlimane Djema
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Banditslauratoni4
 
Learning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLLearning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLlauratoni4
 
Data-Driven Motion Estimation With Spatial Adaptation
Data-Driven Motion Estimation With Spatial AdaptationData-Driven Motion Estimation With Spatial Adaptation
Data-Driven Motion Estimation With Spatial AdaptationCSCJournals
 

Similaire à [Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields (20)

TransNeRF
TransNeRFTransNeRF
TransNeRF
 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
 
3 d graphics with opengl part 2
3 d graphics with opengl  part 23 d graphics with opengl  part 2
3 d graphics with opengl part 2
 
CS 354 Pixel Updating
CS 354 Pixel UpdatingCS 354 Pixel Updating
CS 354 Pixel Updating
 
Comparison of image segmentation
Comparison of image segmentationComparison of image segmentation
Comparison of image segmentation
 
EIS_REVIEW_1.pptx
EIS_REVIEW_1.pptxEIS_REVIEW_1.pptx
EIS_REVIEW_1.pptx
 
Presentation on Face Recognition Based on 3D Shape Estimation
Presentation on Face Recognition Based on 3D Shape EstimationPresentation on Face Recognition Based on 3D Shape Estimation
Presentation on Face Recognition Based on 3D Shape Estimation
 
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLING
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLINGAPPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLING
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLING
 
regions
regionsregions
regions
 
GEOMETRIC TAMPERING ESTIMATION BY MEANS OF A SIFT-BASED FORENSIC ANALYSIS
GEOMETRIC TAMPERING ESTIMATION  BY MEANS OF A SIFT-BASED FORENSIC ANALYSISGEOMETRIC TAMPERING ESTIMATION  BY MEANS OF A SIFT-BASED FORENSIC ANALYSIS
GEOMETRIC TAMPERING ESTIMATION BY MEANS OF A SIFT-BASED FORENSIC ANALYSIS
 
Scattered gis handbook
Scattered gis handbookScattered gis handbook
Scattered gis handbook
 
Different Image Segmentation Techniques for Dental Image Extraction
Different Image Segmentation Techniques for Dental Image ExtractionDifferent Image Segmentation Techniques for Dental Image Extraction
Different Image Segmentation Techniques for Dental Image Extraction
 
Neural Radiance Fields & Neural Rendering.pdf
Neural Radiance Fields & Neural Rendering.pdfNeural Radiance Fields & Neural Rendering.pdf
Neural Radiance Fields & Neural Rendering.pdf
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with Hypergraphs
 
Dense Visual Odometry Using Genetic Algorithm
Dense Visual Odometry Using Genetic AlgorithmDense Visual Odometry Using Genetic Algorithm
Dense Visual Odometry Using Genetic Algorithm
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Bandits
 
Learning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLLearning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RL
 
ICRA Nathan Piasco
ICRA Nathan PiascoICRA Nathan Piasco
ICRA Nathan Piasco
 
IJET-V2I6P17
IJET-V2I6P17IJET-V2I6P17
IJET-V2I6P17
 
Data-Driven Motion Estimation With Spatial Adaptation
Data-Driven Motion Estimation With Spatial AdaptationData-Driven Motion Estimation With Spatial Adaptation
Data-Driven Motion Estimation With Spatial Adaptation
 

Plus de Susang Kim

[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)Susang Kim
 
[Paper] dynamic routing between capsules
[Paper] dynamic routing between capsules[Paper] dynamic routing between capsules
[Paper] dynamic routing between capsulesSusang Kim
 
[Paper] anti spoofing for face recognition
[Paper] anti spoofing for face recognition[Paper] anti spoofing for face recognition
[Paper] anti spoofing for face recognitionSusang Kim
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)Susang Kim
 
[Paper] shuffle net an extremely efficient convolutional neural network for ...
[Paper] shuffle net  an extremely efficient convolutional neural network for ...[Paper] shuffle net  an extremely efficient convolutional neural network for ...
[Paper] shuffle net an extremely efficient convolutional neural network for ...Susang Kim
 
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...Susang Kim
 
[Paper] auto ml part 1
[Paper] auto ml part 1[Paper] auto ml part 1
[Paper] auto ml part 1Susang Kim
 
[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer vision[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer visionSusang Kim
 
[Paper] learning video representations from correspondence proposals
[Paper]  learning video representations from correspondence proposals[Paper]  learning video representations from correspondence proposals
[Paper] learning video representations from correspondence proposalsSusang Kim
 
[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object Detection[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object DetectionSusang Kim
 
Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)Susang Kim
 
I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)Susang Kim
 
GroupFace (Face Recognition)
GroupFace (Face Recognition)GroupFace (Face Recognition)
GroupFace (Face Recognition)Susang Kim
 
제11회공개sw개발자대회 금상 TensorMSA(소개)
제11회공개sw개발자대회 금상 TensorMSA(소개)제11회공개sw개발자대회 금상 TensorMSA(소개)
제11회공개sw개발자대회 금상 TensorMSA(소개)Susang Kim
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture noteSusang Kim
 
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용Susang Kim
 

Plus de Susang Kim (16)

[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
[Paper] dynamic routing between capsules
[Paper] dynamic routing between capsules[Paper] dynamic routing between capsules
[Paper] dynamic routing between capsules
 
[Paper] anti spoofing for face recognition
[Paper] anti spoofing for face recognition[Paper] anti spoofing for face recognition
[Paper] anti spoofing for face recognition
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)
 
[Paper] shuffle net an extremely efficient convolutional neural network for ...
[Paper] shuffle net  an extremely efficient convolutional neural network for ...[Paper] shuffle net  an extremely efficient convolutional neural network for ...
[Paper] shuffle net an extremely efficient convolutional neural network for ...
 
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
 
[Paper] auto ml part 1
[Paper] auto ml part 1[Paper] auto ml part 1
[Paper] auto ml part 1
 
[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer vision[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer vision
 
[Paper] learning video representations from correspondence proposals
[Paper]  learning video representations from correspondence proposals[Paper]  learning video representations from correspondence proposals
[Paper] learning video representations from correspondence proposals
 
[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object Detection[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object Detection
 
Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)
 
I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)
 
GroupFace (Face Recognition)
GroupFace (Face Recognition)GroupFace (Face Recognition)
GroupFace (Face Recognition)
 
제11회공개sw개발자대회 금상 TensorMSA(소개)
제11회공개sw개발자대회 금상 TensorMSA(소개)제11회공개sw개발자대회 금상 TensorMSA(소개)
제11회공개sw개발자대회 금상 TensorMSA(소개)
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture note
 
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
 

Dernier

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Dernier (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

  • 1. Susang Kim(healess1@gmail.com) 3D Representation GIRAFFE : Representing Scenes as Compositional Generative Neural Feature Fields (CVPR 2021 Best Paper Award)
  • 2. Related work & References
  • 3. NeRF : Neural Radiance Fields (ECCV 2020 - Best Paper Honorable Mention) Input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location FΘ : (x, d) → (c, σ) and optimize its weights Θ to map from each input 5D coordinate to its corresponding volume density and directional emitted color Positional encoding : γ(·) is applied separately to each of the three coordinate values in x (which are normalized to lie in [−1, 1]) and to the three components of the Cartesian viewing direction unit vector d (which by construction lie in [−1, 1]). In our experiments, we set L = 10 for γ(x) and L = 4 for γ(d). higher dimensional space to enable our MLP to more easily approximate a higher frequency function
  • 4. GRAF: Generative Radiance Fields (NeurIPS 2020) A generative model for radiance fields for high-resolution 3D-aware image synthesis from unposed images. A patch-based discriminator that samples the image at multiple scales and which is key to learn high-resolution generative radiance fields efficiently camera matrix camera pose 2D sampling pattern Γ(I, ν) to denote this bilinear sampling operation Generate high resolution images with better multi-view consistency compared to voxel-based approaches. Limitation is simple scenes with single objects. (inductive bias : depth maps, symmetry and real-world scenes) the dimensionalities of the latent codes
  • 5. Frechet Inception Distance (FID) (NeurlPS 2017) GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium For the evaluation of the performance of GANs at image generation, FID captures the similarity of generated images to real ones better and more consistent than the Inception Score. FID score(real(m,c), fake(m,c)) : the Gaussian with mean and covariance (m, C) the trace of a square matrix A, denoted Tr(A) Giraffe report FID score to quantify image quality. (20,000 real and fake samples)
  • 6. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields (CVPR 2021 Best Paper Award)
  • 7. Abstract Most approaches, however, do not consider the compositional nature of scenes (not 2D but 3D). A key limitation of NeRF and GRAF is that the entire scene is represented by a single model. How to disentangle underlying factors of variation in the data, most of them operate in 2D and hence ignore that our world is three-dimensional. Incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis. Our model is able to disentangle individual objects and allows for translating and rotating them in the scene as well as changing the camera pose.
  • 8. Image Generation disentangled representations (without changing) disentangled representations (changing) Definitions of disentanglement vary, but commonly refer to being able to control an attribute of interest, e.g. object shape, size, or pose, without changing other attributes.
  • 9. Overview Incorporating a compositional 3D scene representation A novel method for generating scenes in a controllable and photorealistic manner while training from raw unstructured image collections A neural renderer processes these feature images and outputs the final renderings controllable image synthesis GIRAFFE achieves high-quality images and scales to real-world scenes.
  • 10. Method Object Representation: Disentangle different entities in the scene -> Represent each object using a separate feature field in combination with an affine transformation rotation matrix : R ∈ SO(3) canonical object space Neural Radiance Fields: Low dimensional input x and d needs to be mapped to higher-dimensional features to be able to represent complex signals when f is parameterized with a MLP. <- positional encoding (t is scalar) Generative Neural Feature Fields: To learn a latent space of NeRFs, they condition the MLP on shape and appearance codes component of x or d, and L the number of frequency octaves the output dimensionalities of the positional encodings viewing direction (σ : volume density, c: RGB color value) c : 3D color -> f:Multi-D
  • 11. GIRAFFE objects background shape and appearance A novel method for generating scenes in a controllable and photorealistic manner while training from raw unstructured image collections. Orange indicates learnable and blue non-learnable operations. transmittance, alpha Given Pixel distance between neighboring sample points
  • 12. Feature Fields Architecture the positional encoding γ to the viewing direction d, concatenate γ(d) to the latent appearance code Za, fully-connected layers (yellow color) with ReLU activation (red color) 3D point x and viewing direction d together with latent shape and appearance codes Zs, Za
  • 13. 2D Neural Rendering StyleGAN architecture : Analyzing and Improving the Image Quality of StyleGAN (CVPR 2020) Neural Rendering Operator map the feature image to an RGB image at every spatial resolution, and add the previous output to the next via bilinear upsampling. These skip connections ensure a strong gradient flow to the feature fields. We obtain our final image prediction ˆI by applying a sigmoid activation to the last RGB layer. Gray color indicates outputs, orange learnable, and blue non-learnable operations. N blocks Final Stage Train GAN
  • 14. Datasets PhotoShape: Photorealistic Materials for Large-Scale Shape Collections (ACM 2018) CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning (CVPR 2017) We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations. (3D Shapes : Color, Material, Rotation, Size) To render multi-object scenes of random primitives. We adjust the camera position to have a rotation of 0 ◦ instead of 43◦ . We save renderings and positions of placed primitives to files. During training, we sample the translations of object feature fields from the saved positions. Controllable Image Synthesis(CIS) and our method on scenes with 0, 1, 2, or 3 primitives (Clevr-0123) at 64^2 pixels https://knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=clevr Automatically assign high-quality, realistic appearance models to large scale 3D shape collections.
  • 15. Datasets (In supplementary document) Dataset Parameters : object rotation, background rotation, camera elevation, horizontal and depth translation, and object size from uniform distributions over the indicated ranges. For the Clevr datasets, we sample object locations from the distribution we obtain during dataset generation. Image Center Cropping : center crop (CelebA, CelebA-HQ) Image Random Cropping : rescale and random crop(CompCars) Data Augmentation : For all experiments, randomly flip horizontally during training Clevr Dataset Generation: the script to render multi-object scenes of random primitives adjust the camera position to have a rotation of 0 ◦ instead of 43◦
  • 16. Experiments Our model correctly disentangles individual objects when trained on multi-object scenes with fixed or varying number of objects
  • 17. Conclusion & Limitations Conclusion A novel method for controllable image synthesis. Incorporate a compositional 3D scene representation into the generative model Disentangle individual objects from the background as well as their shape and appearance without explicit supervision Limitations (Dataset Bias & Object Transformation Distributions) Investigate how the distributions over object level transformations and camera poses can be learned from data. (Assume simple uniform priors) Incorporating supervision which is easy to obtain(predicted object masks, scale, more complex, multi-object scenes) entangled eye & hair
  • 18. Thanks Any Questions? You can send mail to Susang Kim(healess1@gmail.com)