Mobile Visual Search (MVS) is a fascinating research field that has the potential to impact how visual data is organized, annotated, and retrieved using mobile devices. The document outlines opportunities in MVS, basic concepts, and technical aspects of MVS systems. It discusses the MVS pipeline including descriptor extraction, interest point detection, feature descriptor computation, feature indexing/matching, and geometric verification. Challenges of MVS like low latency, robust recognition, and handling broad/narrow domains are also covered. The Compressed Histogram of Gradients (CHoG) descriptor is presented as an example of a compact descriptor designed for MVS.
Developer Data Modeling Mistakes: From Postgres to NoSQL
Mobile Visual Search: Driving Factors and Technical Challenges
1. Mobile Visual Search
Oge Marques
Florida Atlantic University
Universitat Politècnica de Catalunya
Barcelona
2 Mar 2012
2. Take-home message
Mobile Visual Search (MVS) is a fascinating research
field with many open challenges and opportunities
which have the potential to impact the way we
organize, annotate, and retrieve visual data (images
and videos) using mobile devices.
Oge
Marques
3. Outline
• This talk is structured in four parts:
1. Opportunities
2. Basic concepts
3. Technical aspects
4. Examples and applications
Oge
Marques
5. Mobile visual search: driving factors
• Age of mobile computing
h,p://60secondmarketer.com/blog/2011/10/18/more-‐mobile-‐phones-‐than-‐toothbrushes/
Oge
Marques
6. Mobile visual search: driving factors
• Why do I need a camera? I have a smartphone…
(22 Dec 2011)
h,p://www.cellular-‐news.com/story/52382.php
Oge
Marques
7. Mobile visual search: driving factors
• Powerful devices
1 GHz ARM
Cortex-A9
processor,
PowerVR
SGX543MP2,
Apple A5 chipset
h,p://www.apple.com/iphone/specs.html
h,p://www.gsmarena.com/apple_iphone_4s-‐4212.php
Oge
Marques
8. Mobile visual search: driving factors
• Powerful devices
h,p://europe.nokia.com/PRODUCT_METADATA_0/Products/Phones/8000-‐series/808/Nokia808PureView_Whitepaper.pdf
h,p://www.nokia.com/fr-‐fr/produits/mobiles/808/
Oge
Marques
9. Mobile visual search: driving factors
Social networks
and mobile
devices
(May 2011)
h,p://jess3.com/geosocial-‐universe-‐2/
Oge
Marques
10. Mobile visual search: driving factors
• Social networks and mobile devices
– Motivated users: image taking and image sharing are
huge!
:
h,p://www.onlinemarkeUng-‐trends.com/2011/03/facebook-‐photo-‐staUsUcs-‐and-‐insights.html
Oge
Marques
11. Mobile visual search: driving factors
• Instagram:
– 15 million registered users (in 13 months)
– 7 employees
– A (growing ecosystem) based on it!
• Search
• Send postcards
• Manage your photos
• Build a poster
• etc.
h,p://thenextweb.com/apps/2011/12/07/instagram-‐hits-‐15m-‐users-‐and-‐has-‐2-‐people-‐working-‐on-‐an-‐android-‐app-‐right-‐now/
h,p://www.nuwomb.com/instagram/
Oge
Marques
12. Mobile visual search: driving factors
• Legitimate (or not quite…) needs and use cases
h,p://www.slideshare.net/dtunkelang/search-‐by-‐sight-‐google-‐goggles
h,ps://twi,er.com/#!/courtanee/status/14704916575
Oge
Marques
13. Search system, a low-latency interactive visual search system. base and is the key to very fast retr
Several sidebars in this article invite the interested reader to dig features they have in common wit
deeper into the underlying algorithms. of potentially similar images is sele
Finally, a geometric verificatio
Mobile visual search: driving factors
ROBUST MOBILE IMAGE RECOGNITION
Today, the most successful algorithms for content-based image
most similar matches in the datab
spatial pattern between features of
retrieval use an approach that is referred to as bag of features didate database image to ensure
(BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres
• A natural use case for CBIR with QBE (at last!)
text retrieval. To find a particular text document, such as a Web
page, it is sufficient to use a few well-chosen words. In the
For mobile visual search, ther
to provide the users with an int
– The example is right in front of the user!
database, the document itself can be likewise represented by a deployed systems typically transm
the server, which might require t
large databases, the inverted file in
memory swapping operations slow
ing stage. Further, the GV step
and thus increases the response t
the retrieval pipeline in the follow
the challenges of mobile visual se
Query Feature
Image Extraction
[FIG2] A Pipeline for image retrieva
from the query image. Feature mat
[FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m
being used. The system augments the viewfinder with with the query image. The GV step
information about the objects it recognizes in the image taken feature locations that cannot be pl
with a camera phone. in viewing position.
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques
15. MVS: technical challenges
• How to ensure low latency (and interactive
queries) under constraints such as:
– Network bandwidth
– Computational power
– Battery consumption
• How to achieve robust visual recognition in spite
of low-resolution cameras, varying lighting
conditions, etc.
• How to handle broad and narrow domains
Oge
Marques
16. MVS: Pipeline for image retrieval
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques
19. Part III - Outline
• The MVS pipeline in greater detail
• Datasets for MVS research
• MPEG Compact Descriptors for Visual Search
(CDVS)
Oge
Marques
20. MVS: descriptor extraction
• Interest point detection
• Feature descriptor computation
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques
21. Interest point detection
• Numerous interest-point detectors have been proposed in
the literature:
– Harris Corners (Harris and Stephens 1988)
– Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian
(DoG) (Lowe 2004)
– Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002)
– Hessian affine (Mikolajczyk et al. 2005)
– Features from Accelerated Segment Test (FAST) (Rosten and
Drummond 2006)
– Hessian blobs (Bay, Tuytelaars and Van Gool 2006)
• Different tradeoffs in repeatability and complexity
• See (Mikolajczyk and Schmid 2005) for a comparative
performance evaluation of local descriptors in a common
framework.
Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques
22. Feature descriptor computation
• After interest-point detection, we compute a
visual word descriptor on a normalized patch.
• Ideally, descriptors should be:
– robust to small distortions in scale, orientation, and
lighting conditions;
– discriminative, i.e., characteristic of an image or a small
set of images;
– compact, due to typical mobile computing constraints.
Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques
23. Feature descriptor computation
• Examples of feature descriptors in the literature:
– SIFT (Lowe 1999)
– Speeded Up Robust Feature (SURF) interest-point
detector (Bay et al. 2008)
– Gradient Location and Orientation Histogram (GLOH)
(Mikolajczyk and Schmid 2005)
– Compressed Histogram of Gradients (CHoG)
(Chandrasekhar et al. 2009, 2010)
• See (Winder, (Hua,) and Brown CVPR 2007, 2009) and
(Mikolajczyk and Schmid PAMI 2005) for comparative
performance evaluation of different descriptors.
Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques
24. Feature descriptor computation
• What about compactness?
– Option 1: Compress off-the-shelf descriptors.
• Result: poor rate-constrained image-retrieval
performance.
– Option 2: Design a descriptor with compression in
mind.
– Example: CHoG (Compressed Histogram of Gradients)
(Chandrasekhar et al. 2009, 2010)
Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques
25. CHoG: Compressed Histogram of Gradients
Gradients
Gradient distributions
Patch
for each bin
dx
dy
dx
dy
011101
Spatial
0100101
binning
01101
101101
Histogram
0100011
111001
compression
0010011
01100
1010100
CHoG
Descriptor
Bernd Girod: Mobile Visual Search
Chandrasekhar
et
al.
CVPR
09,10
Oge
Marques
26. CHoG: Compressed Histogram of Gradients
[3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92
• Performance evaluation
– Recall vs. bit rate
Industry and Standards
100
features, as they arrive.15 On
98 finds a result that has sufficien
ing score, it terminates the searc
96 ately sends the results back. T
optimization reduces system
Classification accuracy (%)
94
other factor of two.
92 Overall, the SPS system dem
using the described array of tec
90 bile visual-search systems can ac
ognition accuracy, scale to re
88
databases, and deliver search r
86 ceptable time.
84 Send feature (CHoG) Emerging MPEG standard
Send image (JPEG) As we have seen, key compo
82
Send feature (SIFT) gies for mobile visual search alr
80 we can choose among several p
100 101 102
tures to design such a system. W
Query size (Kbytes)
these options at the beginnin
Figure 7. Comparison of different schemes with regard to classification The architecture shown in Figur
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques
est one to implement on a mobi
accuracy and query size. CHoG descriptor data is an order of magnitude
smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W
good performance. The archite
27. MVS: feature indexing and matching
• Goal: produce a data structure that can quickly return a short
list of the database candidates most likely to match the query
image.
– The short list may contain false positives as long as the correct match
is included.
– Slower pairwise comparisons can be subsequently performed on just
the short list of candidates rather than the entire database.
• Example of a technique: Vocabulary Tree (VT)-Based Retrieval
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques
28. MVS: geometric verification
• Goal: use location information of features in
query and database images to confirm that the
feature matches are consistent with a change in
view-point between the two images.
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques
29. ik2, c, ikNk 6 is sorted, it is more
utive ID differences 5 dk1 5 ik1,
es. is used to encode the inverted index.
2 ik1Nk 212 6 in place of the IDs. This
dex [58] can significantly reduce
cting recognition accuracy. First, [64] and recursive bottom-up complete (RBUC) code [65] have
been shown to be at least ten times faster in decoding than
MVS: geometric verification
AC, while achieving comparable compression gains as AC. The
carryover and RBUC codes attain these speedups by enforcing
ed in text retrieval [62]. Second, word-aligned memory accesses.
n be quantized to a few repre- Figure S6(a) compares the memory usage of the invert-
• Method: perform ed index with and without feature descriptorsRBUC evaluate
Max quantization. Third, the dis- pairwise matching of compression using the and
ces and visit counts are far from code. Index compression reduces memory usage from near-
geometricrate ly 10 GBof correspondences.
coding can be much more
consistency to 2 GB. This five times reduction leads to a sub-
• Techniques:
oding. Using the distributions of stantial speedup in server-side processing, as shown in
counts, each inverted list can be Figure S6(b). Without compression, the large inverted
c code (AC) [63]. The geometricindex causes swapping between main anddatabase image is usually
– Since keeping transform between the query and virtual memory estimated
very important for interactive regression down the retrieval engine. After compression,
using robust and slows techniques such as:
ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion
• Random memory swapping is avoided and and Bolles 1981)
red over AC. The carryover code delays no longer contribute to the query latency.
• Hough transform (Lowe 2004)
– The transformation is often represented by an affine mapping or a homography.
• Note: GV is computationally expensive, which is why it’s only used for a subset
of images selected during the feature-matching stage.
onsistency checks to rerank
tion and scale information of
[53] and [69] propose incor-
tion into the VT matching or
71], the authors investigate
stimation itself. Philbin et al.
atching features to propose
c transformation model and
hypotheses. Weak geometric
cally used to rerank a larger
ore a full GVt
al.
Iperformed on011
Girod
e is EEE
MulUmedia
2 Oge
Marques
[FIG4] In the GV step, we match feature descriptors pairwise and
find feature correspondences that are consistent with a geometric
add a geometric reranking step
30. Datasets for MVS research
• Stanford Mobile Visual Search Data Set
(http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/)
– Key characteristics:
• rigid objects
• widely varying lighting conditions
• perspective distortion
• foreground and background clutter
• realistic ground-truth reference data
• query data collected from heterogeneous low and high-end
camera phones.
Chandrasekhar
et
al.
ACM
MMSys
2011
Oge
Marques
31. SMVS Data Set: categories and examples
• DVD covers
h,p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/dvd_covers.html
Oge
Marques
32. SMVS Data Set: categories and examples
• CD covers
h,p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/cd_covers.html
Oge
Marques
33. SMVS Data Set: categories and examples
• Museum paintings
h,p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/museum_painUngs.html
Oge
Marques
34. Other MVS data sets
ISO/IEC
JTC1/SC29/WG11/N12202
-‐
July
2011,
Torino,
IT
Oge
Marques
35. MPEG Compact Descriptors for Visual Search (CDVS)
• Objective
– Define a standard that enables efficient
implementation of visual search functionality on mobile
devices
• Scope
• bitstream of descriptors
• parts of descriptor extraction process (e.g. key-point
detection) needed to ensure interoperability
– Additional info:
• https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs
• http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups)
Bober,
Cordara,
and
Reznik
(2010)
Oge
Marques
36. MPEG CDVS
[3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93
• Summarized timeline
Table 1. Timeline for development of MPEG standard for visual search.
When Milestone Comments
March, 2011 Call for Proposals is published Registration deadline: 11 July 2011
Proposals due: 21 November 2011
December, 2011 Evaluation of proposals None
February, 2012 1st Working Draft First specification and test software model that can
be used for subsequent improvements.
July, 2012 Committee Draft Essentially complete and stabilized specification.
January, 2013 Draft International Standard Complete specification. Only minor editorial
changes are allowed after DIS.
July, 2013 Final Draft International Finalized specification, submitted for approval and
Standard publication as International standard.
that among several component technologies for existing standards, such as MPEG Query For-
image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch.
marily on defining the format of descriptors and
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques
parts of their extraction process (such as interest Conclusions and outlook
point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable
38. Examples
• Google Goggles
• SnapTell
• oMoby (and the IQ Engines API)
• pixlinQ
• Moodstocks
Oge
Marques
39. Examples of commercial MVS apps
• Google
Goggles
– Android
and iPhone
– Narrow-
domain
search and
retrieval
h,p://www.google.com/mobile/goggles
Oge
Marques
40. SnapTell
• One of the earliest (ca. 2008) MVS apps for iPhone
– Eventually acquired by Amazon (A9)
• Proprietary technique (“highly accurate and robust
algorithm for image matching: Accumulated Signed Gradient
(ASG)”).
h,p://www.snaptell.com/technology/index.htm
Oge
Marques
41. oMoby (and the IQ Engines API)
– iPhone app
h,p://omoby.com/pages/screenshots.php
Oge
Marques
42. oMoby (and the IQ Engines API)
• The IQ Engines API:
“vision as a service”
h,p://www.iqengines.com/applicaUons.php
Oge
Marques
43. pixlinQ
• A “mobile visual
search solution that
enables you to link
users to digital
content whenever
they take a mobile
picture of your
printed materials.”
– Powered by image
recognition from LTU
technologies
h,p://www.pixlinq.com/home
Oge
Marques
44. pixlinQ
• Example app (La Redoute)
h,p://www.youtube.com/watch?v=qUZCFtc42Q4
Oge
Marques
45. Moodstocks: overview
• Offline image recognition thanks to a smart image
signatures synchronization
h,p://www.youtube.com/watch?v=tsxe23b12eU
Oge
Marques
46. Moodstocks: technology
• Unique features:
– offline image recognition thanks to a smart image signatures
synchronization,
– QR Code decoding,
– EAN 8/13 decoding,
– online image recognition as a fallback for very large image databases,
– simultaneous run of image recognition and barcode decoding,
– seamless scans logging in the background.
• Cross-platform (iOS / Android) client-side SDK and HTTP API
available: https://github.com/Moodstocks
• JPEG encoder used within their SDK also publicly
available: https://github.com/Moodstocks/jpec
Oge
Marques
47. Moodstocks
• Many successful apps for different platforms
h,p://www.moodstocks.com/gallery/
Oge
Marques
49. Concluding thoughts
• Mobile Visual Search (MVS) is coming of age.
• This is not a fad and it can only grow.
• Still a good research topic
– Many relevant technical challenges
– MPEG efforts have just started
• Infinite creative commercial possibilities
Oge
Marques