SlideShare une entreprise Scribd logo
1  sur  45
Stanford-Nokia CollaborationMobile Augmented RealityAugust 2009 Review 	Bernd Girod		Radek Grzeszczuk               	Stanford University	Nokia Research Center
Mobile Augmented Reality Team Radek Grzeszczuk Bernd Girod Vijay Chandrasekhar Gabriel Takacs Wei-Chao Chen Natasha Gelfand Yingen Xiong Kari Pulli Sam Tsai David Chen Jana Kosecka Ramakrishna Vedantham Mina Makar
Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
Mobile Visual Search User takes picture … chooses action          … …confirms POI
Mobile Visual Search Applications  Museum Guide Tourist Guide Landmarks Wine Labels Comparison Shopping Ads/Catalogs CDs/DVDs/Books Movie Posters
GPS Server Landmark Recognition withFeature Matching on the Phone Memorial Church
Prefetched Data “Bag of Words” Matching Query Image Geometric Consistency Check Feature Descriptors Feature Correspondences Database Images
Computing Visual Words dx dy scale SIFT Descriptor SURF Descriptor y x Σdx Σdy Σ|dx| Σ|dy| Σ Σ Σ Σ Σ Σ Σ Σ     Color Gray Dxx Σdx Σdy Σ|dx| Σ|dy| Maxima Dxy … … DxxDyy-(0.9Dxy)2 Σdx Σdy Σ|dx| Σ|dy| Dyy Orient along  dominant gradient Oriented Patch Gradient Field Filters Blob Response
Matching Performance ~90 images/kernel ~90 images/kernel ~1000 images/kernel True Matches False Matches
Timing Analysis(Q2 2008) Nokia N95 332 MHz ARM 64 MB RAM  100 KByte JPEG; uplink 60 Kbps Downloads Upload Upload Geometric Consistency Extract Features Extract Features Feature Matching Extract Features  on Phone All on Phone All on Server
Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
Advanced Feature Compression Transform Coding of SIFT/SURF descriptors[Chandrasekhar et al.,  VCIP 09] Direct compression of oriented image patch [M. Makar et al., ICASSP 09] Descriptor designed for compressibility: CHoG[Chandrasekhar et al.,  CVPR 09] Tree-Structured Vector QuantizationTree Histogram Coding [Chen et al.,  DCC 09] Compression of Location Information[Tsai et al.,  Mobimedia 09]
Patch CHoG: Compressed Histogram of Gradients Gradient distributions for each bin Gradients dx dx dx dx dx dx dx dx dy dy dy dy dy dy dy dy Spatial binning 01101 101101 Histogram compression 0100011 111001 0010011 01100 1010100 CHoGDescriptor
CHoG: Histogram Compression 0.46 1/2 0.21 1/4 0.46 0.16 1/8   0.09   0.08 1/16 1/16 0.21 Gradient distribution 0.08 0.16 0.09 Huffman treeapproximatesprobabilities Gradient binning
Enumerating Huffman Trees Rooted binary trees with nleaf nodes
Feature Matching Performance Tree Structured Vector Quantizer SURF Transform Random Projections BoostSSC Patch + SIFT CHoG SIFT Transform Ground truth data setof matching patches Descriptor Size (bits) [Winder & Brown CVPR ’07]
Compressed Domain Matching 1   2   3    4   5   6  1 2 3 4 5 6  Dist(·) Distance Distance Look-up table Tree index Gradient binning Gradient distribution
Nearest Neighbor Search 372 Exact ANN0.3 % errors Exact 47 28 400 350 300 250 Query Time (sec) 200 150 100 50 0 SIFT CHoG 106 database descriptors 103 query descriptors
Location Histogram Coding Feature Locations (x,y) Spatial Binning Context-based Arithmetic Coding - Refinement Bits Quantize + [Tsai et al., MobiMedia 2009]
Compressed Feature Vector 52 84 1024 1088 59 Size (bits) SIFT Location x,y 1088 bits CHoG  Location x,y ~ 84 bits Compressedx,yCHoG ~ 59 bits [Tsai et al., MobiMedia 2009]
Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
Pairwise Comparison “Bag of Words” Matching & Affine Consistency Check
Growing Vocabulary Tree [Nistér and Stewenius, 2006]
Growing Vocabulary Tree [Nistér and Stewenius, 2006]
Growing Vocabulary Tree [Nistér and Stewenius, 2006]
Growing Vocabulary Tree k = 3 [Nistér and Stewenius, 2006]
k = 3 Growing Vocabulary Tree [Nistér and Stewenius, 2006]
Querying Vocabulary Tree Query
Recognition Accuracy Forestof 6 trees Recall (Percent) Singlevocabulary tree Number of database images
Vocabulary Forest SVT Features … … Image … Image … IFS Count … Count … Early Termination GCC … Combine Matches
Real-time System: Send Image Image Wireless Network Information Server VocTreeImage  Matching Feature  Extraction Camera Client
Features Wireless Network Information Server VocTree Image Matching FeatureExtraction Camera Client Coding Real-time System: Send Features
Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM  Server Delay Execution Time (sec) Upload Image 40 kByte Server Delay Upload Features 2.2 kByte Extract Features “Send Features”            “Send Image”
Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM  Execution Time (sec) Server Delay Upload Image 40 kByte Server Delay Upload 2.2 kByte Extract Features “Send Features”            “Send Image”
Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM  Execution Time (sec) Server Delay Server Delay Extract Features “Send Features”            “Send Image”
Streaming MAR Server Extract Features Search K-D Tree Check Geometry Send Query Frame Send ID and Geometry Network Low Motion John Mayer Inside Wants Out Display ID and Draw Boundary CompensateCamera Pose Time High Motion Client TrackCamera Pose …
Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-view vocabulary trees City-scale landmark recognition using view invariant matching Summary and future directions
Multiview Database Front View Images Top View Images Bottom View Images Right View Images Left View Images
Multiview Vocabulary Trees Left Front Top Bottom Right Query Image Select Top Matches Select Top Matches Select Top Matches Select Top Matches Select Top Matches Geometric Consistency Check  Top Match
Multiview Matching Performance Front SVT Multiview SVTs Image Recall Match Rate  Query View Query View Top Right Bottom Right Front Left Top Bottom Front Left
Compact Architectural Models from Geo-Registered Image Collections GPS-tagged Images Building Outline Camera Poses Estimation Robust Map Alignment Efficient View Selection 3D Model of Landmark Unstructured Image Collections: Panoramio Structured Image Collections: Street View data (Navteq) [Grzeszczuk, 3DIM 2009]
View-Invariant Matching Pipeline Feature Store Feature Extraction Image Database Rectified Database Images Image Rectification using 3D Model Feature Extraction Matching Results Oblique Query Image Rectified Query Image Image Rectification using Vanishing Points
Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
Research Directions Research area: image features Keypoint detection optimized for CHoG, prioritization Comprehensive performance analysis of compressed feature matching Next generation CHoG: soft kernels vs. hard binning, embedded, refinablebitstream Beyond RANSAC: advanced geometry matching and coding, incorporate scale and orientation Research area: image database/vocabulary trees Optimum tree/forest growing, CHoG trees, incremental data base update Fast query, early termination, distance metrics, scoring, nearest neighbor algorithms Trees for phone implementation, inverted file caching, tree histogram coding Research area: streaming mobile augmented reality Camera pose estimation, feature tracking, temporally coherent feature extraction Continuous recognition strategies, scheduling, latency minimization Superposition of graphics information, motion compensation, occlusion handling Research area: 3D modeling Image matching pipeline using 3D models Automatic image rectification, features from texture maps Methods for integrating heterogeneous image sources Demonstrate improved landmark recognition for large-scale urban scene Collaboration with Marc Pollefeys, ETH Zurich

Contenu connexe

Similaire à Nokia Augmented Reality

Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architecturesinside-BigData.com
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeArangoDB Database
 
Cogent3 d master slides (12 april 2009)
Cogent3 d master slides (12 april 2009)Cogent3 d master slides (12 april 2009)
Cogent3 d master slides (12 april 2009)Danny Bronson
 
A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...Scott Farley
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstracttsysglobalsolutions
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Benjamin Bengfort
 
Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &IAEME Publication
 
Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &IAEME Publication
 
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to ReportingWorkflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to ReportingRayhan Ferdous
 
Semantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsSemantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLBarbara Fusinska
 
The interoperability challenges of 3D personal data
The interoperability challenges of 3D personal dataThe interoperability challenges of 3D personal data
The interoperability challenges of 3D personal dataJuan V. Dura
 

Similaire à Nokia Augmented Reality (20)

Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
 
3DRepo
3DRepo3DRepo
3DRepo
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Cogent3 d master slides (12 april 2009)
Cogent3 d master slides (12 april 2009)Cogent3 d master slides (12 april 2009)
Cogent3 d master slides (12 april 2009)
 
A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...
 
BigData
BigDataBigData
BigData
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)
 
Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &
 
Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &
 
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to ReportingWorkflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to Reporting
 
Semantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsSemantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information Systems
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
 
G1803054653
G1803054653G1803054653
G1803054653
 
The interoperability challenges of 3D personal data
The interoperability challenges of 3D personal dataThe interoperability challenges of 3D personal data
The interoperability challenges of 3D personal data
 

Dernier

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Dernier (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Nokia Augmented Reality

  • 1. Stanford-Nokia CollaborationMobile Augmented RealityAugust 2009 Review Bernd Girod Radek Grzeszczuk Stanford University Nokia Research Center
  • 2. Mobile Augmented Reality Team Radek Grzeszczuk Bernd Girod Vijay Chandrasekhar Gabriel Takacs Wei-Chao Chen Natasha Gelfand Yingen Xiong Kari Pulli Sam Tsai David Chen Jana Kosecka Ramakrishna Vedantham Mina Makar
  • 3. Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
  • 4. Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
  • 5. Mobile Visual Search User takes picture … chooses action … …confirms POI
  • 6. Mobile Visual Search Applications Museum Guide Tourist Guide Landmarks Wine Labels Comparison Shopping Ads/Catalogs CDs/DVDs/Books Movie Posters
  • 7. GPS Server Landmark Recognition withFeature Matching on the Phone Memorial Church
  • 8. Prefetched Data “Bag of Words” Matching Query Image Geometric Consistency Check Feature Descriptors Feature Correspondences Database Images
  • 9. Computing Visual Words dx dy scale SIFT Descriptor SURF Descriptor y x Σdx Σdy Σ|dx| Σ|dy| Σ Σ Σ Σ Σ Σ Σ Σ Color Gray Dxx Σdx Σdy Σ|dx| Σ|dy| Maxima Dxy … … DxxDyy-(0.9Dxy)2 Σdx Σdy Σ|dx| Σ|dy| Dyy Orient along dominant gradient Oriented Patch Gradient Field Filters Blob Response
  • 10. Matching Performance ~90 images/kernel ~90 images/kernel ~1000 images/kernel True Matches False Matches
  • 11. Timing Analysis(Q2 2008) Nokia N95 332 MHz ARM 64 MB RAM 100 KByte JPEG; uplink 60 Kbps Downloads Upload Upload Geometric Consistency Extract Features Extract Features Feature Matching Extract Features on Phone All on Phone All on Server
  • 12. Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
  • 13. Advanced Feature Compression Transform Coding of SIFT/SURF descriptors[Chandrasekhar et al., VCIP 09] Direct compression of oriented image patch [M. Makar et al., ICASSP 09] Descriptor designed for compressibility: CHoG[Chandrasekhar et al., CVPR 09] Tree-Structured Vector QuantizationTree Histogram Coding [Chen et al., DCC 09] Compression of Location Information[Tsai et al., Mobimedia 09]
  • 14. Patch CHoG: Compressed Histogram of Gradients Gradient distributions for each bin Gradients dx dx dx dx dx dx dx dx dy dy dy dy dy dy dy dy Spatial binning 01101 101101 Histogram compression 0100011 111001 0010011 01100 1010100 CHoGDescriptor
  • 15. CHoG: Histogram Compression 0.46 1/2 0.21 1/4 0.46 0.16 1/8 0.09 0.08 1/16 1/16 0.21 Gradient distribution 0.08 0.16 0.09 Huffman treeapproximatesprobabilities Gradient binning
  • 16. Enumerating Huffman Trees Rooted binary trees with nleaf nodes
  • 17. Feature Matching Performance Tree Structured Vector Quantizer SURF Transform Random Projections BoostSSC Patch + SIFT CHoG SIFT Transform Ground truth data setof matching patches Descriptor Size (bits) [Winder & Brown CVPR ’07]
  • 18. Compressed Domain Matching 1 2 3 4 5 6 1 2 3 4 5 6 Dist(·) Distance Distance Look-up table Tree index Gradient binning Gradient distribution
  • 19. Nearest Neighbor Search 372 Exact ANN0.3 % errors Exact 47 28 400 350 300 250 Query Time (sec) 200 150 100 50 0 SIFT CHoG 106 database descriptors 103 query descriptors
  • 20. Location Histogram Coding Feature Locations (x,y) Spatial Binning Context-based Arithmetic Coding - Refinement Bits Quantize + [Tsai et al., MobiMedia 2009]
  • 21. Compressed Feature Vector 52 84 1024 1088 59 Size (bits) SIFT Location x,y 1088 bits CHoG Location x,y ~ 84 bits Compressedx,yCHoG ~ 59 bits [Tsai et al., MobiMedia 2009]
  • 22. Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
  • 23. Pairwise Comparison “Bag of Words” Matching & Affine Consistency Check
  • 24. Growing Vocabulary Tree [Nistér and Stewenius, 2006]
  • 25. Growing Vocabulary Tree [Nistér and Stewenius, 2006]
  • 26. Growing Vocabulary Tree [Nistér and Stewenius, 2006]
  • 27. Growing Vocabulary Tree k = 3 [Nistér and Stewenius, 2006]
  • 28. k = 3 Growing Vocabulary Tree [Nistér and Stewenius, 2006]
  • 30. Recognition Accuracy Forestof 6 trees Recall (Percent) Singlevocabulary tree Number of database images
  • 31. Vocabulary Forest SVT Features … … Image … Image … IFS Count … Count … Early Termination GCC … Combine Matches
  • 32. Real-time System: Send Image Image Wireless Network Information Server VocTreeImage Matching Feature Extraction Camera Client
  • 33. Features Wireless Network Information Server VocTree Image Matching FeatureExtraction Camera Client Coding Real-time System: Send Features
  • 34. Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM Server Delay Execution Time (sec) Upload Image 40 kByte Server Delay Upload Features 2.2 kByte Extract Features “Send Features” “Send Image”
  • 35. Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM Execution Time (sec) Server Delay Upload Image 40 kByte Server Delay Upload 2.2 kByte Extract Features “Send Features” “Send Image”
  • 36. Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM Execution Time (sec) Server Delay Server Delay Extract Features “Send Features” “Send Image”
  • 37. Streaming MAR Server Extract Features Search K-D Tree Check Geometry Send Query Frame Send ID and Geometry Network Low Motion John Mayer Inside Wants Out Display ID and Draw Boundary CompensateCamera Pose Time High Motion Client TrackCamera Pose …
  • 38. Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-view vocabulary trees City-scale landmark recognition using view invariant matching Summary and future directions
  • 39. Multiview Database Front View Images Top View Images Bottom View Images Right View Images Left View Images
  • 40. Multiview Vocabulary Trees Left Front Top Bottom Right Query Image Select Top Matches Select Top Matches Select Top Matches Select Top Matches Select Top Matches Geometric Consistency Check Top Match
  • 41. Multiview Matching Performance Front SVT Multiview SVTs Image Recall Match Rate Query View Query View Top Right Bottom Right Front Left Top Bottom Front Left
  • 42. Compact Architectural Models from Geo-Registered Image Collections GPS-tagged Images Building Outline Camera Poses Estimation Robust Map Alignment Efficient View Selection 3D Model of Landmark Unstructured Image Collections: Panoramio Structured Image Collections: Street View data (Navteq) [Grzeszczuk, 3DIM 2009]
  • 43. View-Invariant Matching Pipeline Feature Store Feature Extraction Image Database Rectified Database Images Image Rectification using 3D Model Feature Extraction Matching Results Oblique Query Image Rectified Query Image Image Rectification using Vanishing Points
  • 44. Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions
  • 45. Research Directions Research area: image features Keypoint detection optimized for CHoG, prioritization Comprehensive performance analysis of compressed feature matching Next generation CHoG: soft kernels vs. hard binning, embedded, refinablebitstream Beyond RANSAC: advanced geometry matching and coding, incorporate scale and orientation Research area: image database/vocabulary trees Optimum tree/forest growing, CHoG trees, incremental data base update Fast query, early termination, distance metrics, scoring, nearest neighbor algorithms Trees for phone implementation, inverted file caching, tree histogram coding Research area: streaming mobile augmented reality Camera pose estimation, feature tracking, temporally coherent feature extraction Continuous recognition strategies, scheduling, latency minimization Superposition of graphics information, motion compensation, occlusion handling Research area: 3D modeling Image matching pipeline using 3D models Automatic image rectification, features from texture maps Methods for integrating heterogeneous image sources Demonstrate improved landmark recognition for large-scale urban scene Collaboration with Marc Pollefeys, ETH Zurich

Notes de l'éditeur

  1. Only a limited number of different Huffman trees.Catalan number yields number of rooted binary trees (ordered leaves, no cross-overs)Count unique permutations
  2. Winder, Brown (Microsoft Resarch), “Learning Local Image Descriptors,” 64x64 patches. touristphotographs of the Trevi Fountain and of Yosemite Valley (920 images), and a test set consisting of images ofNotre Dame (500 images). BoostSSC –Boosting Similarity Sensitive CodingG. Shakhnarovich, P. Viola, and T. Darrell. Fast pose estimation with parameter sensitive hashing. In Proc. ICCV, 2003.Torralba et al., Small Codes and Large Image Databases for Recognition, CVPR2009.Random Projections - P. A. ChuohaoYeo and K. Ramchandran, “Rate-EfficientVisual Correspondences Using Random Projections,” 2008.
  3. Most retrieval application require NN search in some formThe descriptors for both SIFT and CHoG were computed from the sameset of patches. VQ-5 bin configuration, GLOH-9 cell configurationsand Huffman Tree Coding are used for CHoG, resulting in a45 dimensional descriptor. We observe that exact nearest neighborsearching is 10X faster for CHoG. Furthermore, CHoG is still 2Xfaster than using SIFT with ANN eps = 1, which incurs a small errorrate of 0.30%. The speed up results from the lower dimensionalityof the CHoG descriptor, and the use of look up tables for fastdistance computation.
  4. The scalable vocabulary tree is the data structure at the center of our recognition system. To construct an SVT, first we take every database CD cover and extract robust local features. These features can be SIFT, SURF, or your own favorite type. Then, all the feature descriptors from all the images are represented as vectors in a high-dimensional space. Here, they are shown as 2-dimensional vectors, but in reality, they can be 64-dimensional or 128-dimensional vectors.
  5. The scalable vocabulary tree is the data structure at the center of our recognition system. To construct an SVT, first we take every database CD cover and extract robust local features. These features can be SIFT, SURF, or your own favorite type. Then, all the feature descriptors from all the images are represented as vectors in a high-dimensional space. Here, they are shown as 2-dimensional vectors, but in reality, they can be 64-dimensional or 128-dimensional vectors.
  6. The scalable vocabulary tree is the data structure at the center of our recognition system. To construct an SVT, first we take every database CD cover and extract robust local features. These features can be SIFT, SURF, or your own favorite type. Then, all the feature descriptors from all the images are represented as vectors in a high-dimensional space. Here, they are shown as 2-dimensional vectors, but in reality, they can be 64-dimensional or 128-dimensional vectors.
  7. To impose some structure on this space, we perform hierarchical k-means clustering, the first step of which is dividing the space into k clusters using regular k-means.
  8. And then again, recursively splitting each large cluster into k smaller clusters. We repeat this process until the clusters become sufficiently small.What results from the hierarchical k-means algorithm is a tree structure, where tree nodes are the cluster centroids and their children are the subcluster centroids.
  9. Here is the same tree as on the previous slide, except the tree structure is more apparent. Once we have constructed an SVT on a server, how to process an incoming query is straightforward. For every query descriptor, we classify it by traversing the SVT greedily from top to bottom. Suppose the first descriptor follows this nearest neighbor path. The SVT knows which database images have features associated with every node, so it votes for the two images found on this path. Both the blue nodes and green nodes vote, but since the blue nodes are more discriminative, their vote counts for more. Then, another query descriptor goes down a different path and votes for other images. And so on, until all the query descriptors are classified. The final vote tally is a histogram indicating how likely each database image is a match.We notice that when both the query and database images are fronto-parallel, the voting scheme works well and will select the correct database match. This is because similar features are extracted from the query image and the matching database image, leading to their descriptors visiting many of the same nodes in the SVT.
  10. Performance drops with single tree, since nodes become less discriminative – fewer features are unique to a particular database image
  11. Feature extraction is robust against rotation and scale change. NOT robust against foreshortening.Overcome by putting multiple examples into data base that show object from different angles.
  12. One could put all these views into one vocabulary tree.Distributing views across parallel trees prevent competition among the among the features belonging to different views of the same object. Views compete only, once all the features are considered. Select the 25 top matches for each SVT based on bin count similarity, then find match with best geometric consistency.The multiview SVT approach is attractive for multi-core server, the search process through the different trees can be run in parallel
  13. ICCV: Sept/Oct Kyoto
  14. Reduce Database SizeIncrease Robustness