Nokia Augmented Reality

Stanford-Nokia CollaborationMobile Augmented RealityAugust 2009 Review Bernd Girod Radek Grzeszczuk Stanford University Nokia Research Center

Mobile Augmented Reality Team Radek Grzeszczuk Bernd Girod Vijay Chandrasekhar Gabriel Takacs Wei-Chao Chen Natasha Gelfand Yingen Xiong Kari Pulli Sam Tsai David Chen Jana Kosecka Ramakrishna Vedantham Mina Makar

Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-viewvocabulary trees Matching against 3-d models Summary and future directions

Mobile Visual Search User takes picture … chooses action … …confirms POI

Mobile Visual Search Applications Museum Guide Tourist Guide Landmarks Wine Labels Comparison Shopping Ads/Catalogs CDs/DVDs/Books Movie Posters

GPS Server Landmark Recognition withFeature Matching on the Phone Memorial Church

Prefetched Data “Bag of Words” Matching Query Image Geometric Consistency Check Feature Descriptors Feature Correspondences Database Images

Computing Visual Words dx dy scale SIFT Descriptor SURF Descriptor y x Σdx Σdy Σ|dx| Σ|dy| Σ Σ Σ Σ Σ Σ Σ Σ Color Gray Dxx Σdx Σdy Σ|dx| Σ|dy| Maxima Dxy … … DxxDyy-(0.9Dxy)2 Σdx Σdy Σ|dx| Σ|dy| Dyy Orient along dominant gradient Oriented Patch Gradient Field Filters Blob Response

Matching Performance ~90 images/kernel ~90 images/kernel ~1000 images/kernel True Matches False Matches

Timing Analysis(Q2 2008) Nokia N95 332 MHz ARM 64 MB RAM 100 KByte JPEG; uplink 60 Kbps Downloads Upload Upload Geometric Consistency Extract Features Extract Features Feature Matching Extract Features on Phone All on Phone All on Server

Advanced Feature Compression Transform Coding of SIFT/SURF descriptors[Chandrasekhar et al., VCIP 09] Direct compression of oriented image patch [M. Makar et al., ICASSP 09] Descriptor designed for compressibility: CHoG[Chandrasekhar et al., CVPR 09] Tree-Structured Vector QuantizationTree Histogram Coding [Chen et al., DCC 09] Compression of Location Information[Tsai et al., Mobimedia 09]

Patch CHoG: Compressed Histogram of Gradients Gradient distributions for each bin Gradients dx dx dx dx dx dx dx dx dy dy dy dy dy dy dy dy Spatial binning 01101 101101 Histogram compression 0100011 111001 0010011 01100 1010100 CHoGDescriptor

CHoG: Histogram Compression 0.46 1/2 0.21 1/4 0.46 0.16 1/8 0.09 0.08 1/16 1/16 0.21 Gradient distribution 0.08 0.16 0.09 Huffman treeapproximatesprobabilities Gradient binning

Enumerating Huffman Trees Rooted binary trees with nleaf nodes

Feature Matching Performance Tree Structured Vector Quantizer SURF Transform Random Projections BoostSSC Patch + SIFT CHoG SIFT Transform Ground truth data setof matching patches Descriptor Size (bits) [Winder & Brown CVPR ’07]

Compressed Domain Matching 1 2 3 4 5 6 1 2 3 4 5 6 Dist(·) Distance Distance Look-up table Tree index Gradient binning Gradient distribution

Nearest Neighbor Search 372 Exact ANN0.3 % errors Exact 47 28 400 350 300 250 Query Time (sec) 200 150 100 50 0 SIFT CHoG 106 database descriptors 103 query descriptors

Location Histogram Coding Feature Locations (x,y) Spatial Binning Context-based Arithmetic Coding - Refinement Bits Quantize + [Tsai et al., MobiMedia 2009]

Compressed Feature Vector 52 84 1024 1088 59 Size (bits) SIFT Location x,y 1088 bits CHoG Location x,y ~ 84 bits Compressedx,yCHoG ~ 59 bits [Tsai et al., MobiMedia 2009]

Pairwise Comparison “Bag of Words” Matching & Affine Consistency Check

Growing Vocabulary Tree [Nistér and Stewenius, 2006]

Growing Vocabulary Tree k = 3 [Nistér and Stewenius, 2006]

k = 3 Growing Vocabulary Tree [Nistér and Stewenius, 2006]

Querying Vocabulary Tree Query

Recognition Accuracy Forestof 6 trees Recall (Percent) Singlevocabulary tree Number of database images

Vocabulary Forest SVT Features … … Image … Image … IFS Count … Count … Early Termination GCC … Combine Matches

Real-time System: Send Image Image Wireless Network Information Server VocTreeImage Matching Feature Extraction Camera Client

Features Wireless Network Information Server VocTree Image Matching FeatureExtraction Camera Client Coding Real-time System: Send Features

Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM Server Delay Execution Time (sec) Upload Image 40 kByte Server Delay Upload Features 2.2 kByte Extract Features “Send Features” “Send Image”

Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM Execution Time (sec) Server Delay Upload Image 40 kByte Server Delay Upload 2.2 kByte Extract Features “Send Features” “Send Image”

Timing Analysis Nokia N95 332 MHz ARM 64 MB RAM Execution Time (sec) Server Delay Server Delay Extract Features “Send Features” “Send Image”

Streaming MAR Server Extract Features Search K-D Tree Check Geometry Send Query Frame Send ID and Geometry Network Low Motion John Mayer Inside Wants Out Display ID and Draw Boundary CompensateCamera Pose Time High Motion Client TrackCamera Pose …

Outline Review: landmark recognition system Architecture: location-based pre-fetching and matching on the phone Computer vision: “Bag of Words” matching Feature compression for server-side matching Approaches explored: Transform coding of features, patch compression Compressible descriptor: CHoG (Compressed Histogram of Gradients) Scalability for large data bases From “Bags of Words” to “Vocabulary Trees” to “Vocabulary Forests” Accuracy vs. data base size Towards 3D Multi-view vocabulary trees City-scale landmark recognition using view invariant matching Summary and future directions

Multiview Database Front View Images Top View Images Bottom View Images Right View Images Left View Images

Multiview Vocabulary Trees Left Front Top Bottom Right Query Image Select Top Matches Select Top Matches Select Top Matches Select Top Matches Select Top Matches Geometric Consistency Check Top Match

Multiview Matching Performance Front SVT Multiview SVTs Image Recall Match Rate Query View Query View Top Right Bottom Right Front Left Top Bottom Front Left

Compact Architectural Models from Geo-Registered Image Collections GPS-tagged Images Building Outline Camera Poses Estimation Robust Map Alignment Efficient View Selection 3D Model of Landmark Unstructured Image Collections: Panoramio Structured Image Collections: Street View data (Navteq) [Grzeszczuk, 3DIM 2009]

View-Invariant Matching Pipeline Feature Store Feature Extraction Image Database Rectified Database Images Image Rectification using 3D Model Feature Extraction Matching Results Oblique Query Image Rectified Query Image Image Rectification using Vanishing Points

Research Directions Research area: image features Keypoint detection optimized for CHoG, prioritization Comprehensive performance analysis of compressed feature matching Next generation CHoG: soft kernels vs. hard binning, embedded, refinablebitstream Beyond RANSAC: advanced geometry matching and coding, incorporate scale and orientation Research area: image database/vocabulary trees Optimum tree/forest growing, CHoG trees, incremental data base update Fast query, early termination, distance metrics, scoring, nearest neighbor algorithms Trees for phone implementation, inverted file caching, tree histogram coding Research area: streaming mobile augmented reality Camera pose estimation, feature tracking, temporally coherent feature extraction Continuous recognition strategies, scheduling, latency minimization Superposition of graphics information, motion compensation, occlusion handling Research area: 3D modeling Image matching pipeline using 3D models Automatic image rectification, features from texture maps Methods for integrating heterogeneous image sources Demonstrate improved landmark recognition for large-scale urban scene Collaboration with Marc Pollefeys, ETH Zurich

Nokia Augmented Reality

Recommandé

Recommandé

Contenu connexe

Similaire à Nokia Augmented Reality

Similaire à Nokia Augmented Reality (20)

Dernier

Dernier (20)

Nokia Augmented Reality

Notes de l'éditeur