Mobile Visual Search: Driving Factors and Technical Challenges

Mobile Visual Search

Oge Marques

Florida Atlantic University

Universitat Politècnica de Catalunya
Barcelona

2 Mar 2012

Take-home message

Mobile Visual Search (MVS) is a fascinating research
ﬁeld with many open challenges and opportunities
which have the potential to impact the way we
organize, annotate, and retrieve visual data (images
and videos) using mobile devices.

Oge
Marques

Outline

•  This talk is structured in four parts:

1.  Opportunities

2.  Basic concepts

3.  Technical aspects

4.  Examples and applications

Oge
Marques

Mobile visual search: driving factors

•  Age of mobile computing

h,p://60secondmarketer.com/blog/2011/10/18/more-‐mobile-‐phones-‐than-‐toothbrushes/

Oge
Marques


•  Why do I need a camera? I have a smartphone…

(22 Dec 2011)

h,p://www.cellular-‐news.com/story/52382.php

Oge
Marques


•  Powerful devices

1 GHz ARM
Cortex-A9
processor,
PowerVR
SGX543MP2,

Apple A5 chipset

h,p://www.apple.com/iphone/specs.html

h,p://www.gsmarena.com/apple_iphone_4s-‐4212.php

Oge
Marques


•  Powerful devices

h,p://europe.nokia.com/PRODUCT_METADATA_0/Products/Phones/8000-‐series/808/Nokia808PureView_Whitepaper.pdf

h,p://www.nokia.com/fr-‐fr/produits/mobiles/808/

Oge
Marques


Social networks
and mobile
devices

(May 2011)

h,p://jess3.com/geosocial-‐universe-‐2/

Oge
Marques


•  Social networks and mobile devices

–  Motivated users: image taking and image sharing are
huge!

:
h,p://www.onlinemarkeUng-‐trends.com/2011/03/facebook-‐photo-‐staUsUcs-‐and-‐insights.html

Oge
Marques


•  Instagram:

–  15 million registered users (in 13 months)

–  7 employees

–  A (growing ecosystem) based on it!

•  Search

•  Send postcards

•  Manage your photos

•  Build a poster

•  etc.

h,p://thenextweb.com/apps/2011/12/07/instagram-‐hits-‐15m-‐users-‐and-‐has-‐2-‐people-‐working-‐on-‐an-‐android-‐app-‐right-‐now/

h,p://www.nuwomb.com/instagram/

Oge
Marques


•  Legitimate (or not quite…) needs and use cases

h,p://www.slideshare.net/dtunkelang/search-‐by-‐sight-‐google-‐goggles

h,ps://twi,er.com/#!/courtanee/status/14704916575

Oge
Marques

Search system, a low-latency interactive visual search system. base and is the key to very fast retr
Several sidebars in this article invite the interested reader to dig features they have in common wit
deeper into the underlying algorithms. of potentially similar images is sele
Finally, a geometric verificatio


ROBUST MOBILE IMAGE RECOGNITION
Today, the most successful algorithms for content-based image
most similar matches in the datab
spatial pattern between features of
retrieval use an approach that is referred to as bag of features didate database image to ensure
(BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres
•  A natural use case for CBIR with QBE (at last!)

text retrieval. To find a particular text document, such as a Web
page, it is sufficient to use a few well-chosen words. In the
For mobile visual search, ther
to provide the users with an int
–  The example is right in front of the user!

database, the document itself can be likewise represented by a deployed systems typically transm
the server, which might require t
large databases, the inverted file in
memory swapping operations slow
ing stage. Further, the GV step
and thus increases the response t
the retrieval pipeline in the follow
the challenges of mobile visual se

Query Feature
Image Extraction

[FIG2] A Pipeline for image retrieva
from the query image. Feature mat
[FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m
being used. The system augments the viewfinder with with the query image. The GV step
information about the objects it recognizes in the image taken feature locations that cannot be pl
with a camera phone. in viewing position.
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques

MVS: technical challenges

•  How to ensure low latency (and interactive
queries) under constraints such as:

–  Network bandwidth

–  Computational power

–  Battery consumption

•  How to achieve robust visual recognition in spite
of low-resolution cameras, varying lighting
conditions, etc.

•  How to handle broad and narrow domains

Oge
Marques

MVS: Pipeline for image retrieval

Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques

3 scenarios

Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques

Part III - Outline

•  The MVS pipeline in greater detail

•  Datasets for MVS research

•  MPEG Compact Descriptors for Visual Search
(CDVS)

Oge
Marques

MVS: descriptor extraction

•  Interest point detection

•  Feature descriptor computation

Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques

Interest point detection

•  Numerous interest-point detectors have been proposed in
the literature:

–  Harris Corners (Harris and Stephens 1988)

–  Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian
(DoG) (Lowe 2004)

–  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002)

–  Hessian afﬁne (Mikolajczyk et al. 2005)

–  Features from Accelerated Segment Test (FAST) (Rosten and
Drummond 2006)

–  Hessian blobs (Bay, Tuytelaars and Van Gool 2006)

•  Different tradeoffs in repeatability and complexity

•  See (Mikolajczyk and Schmid 2005) for a comparative
performance evaluation of local descriptors in a common
framework.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques

Feature descriptor computation

•  After interest-point detection, we compute a
visual word descriptor on a normalized patch.

•  Ideally, descriptors should be:

–  robust to small distortions in scale, orientation, and
lighting conditions;

–  discriminative, i.e., characteristic of an image or a small
set of images;

–  compact, due to typical mobile computing constraints.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques


•  Examples of feature descriptors in the literature:

–  SIFT (Lowe 1999)

–  Speeded Up Robust Feature (SURF) interest-point
detector (Bay et al. 2008)

–  Gradient Location and Orientation Histogram (GLOH)
(Mikolajczyk and Schmid 2005)

–  Compressed Histogram of Gradients (CHoG)
(Chandrasekhar et al. 2009, 2010)

•  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and
(Mikolajczyk and Schmid PAMI 2005) for comparative
performance evaluation of different descriptors.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques


•  What about compactness?

–  Option 1: Compress off-the-shelf descriptors.

•  Result: poor rate-constrained image-retrieval
performance.

–  Option 2: Design a descriptor with compression in
mind.

–  Example: CHoG (Compressed Histogram of Gradients)
(Chandrasekhar et al. 2009, 2010)

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques

CHoG: Compressed Histogram of Gradients

Gradients
Gradient distributions
Patch
for each bin
dx

dy

dx
dy
011101

Spatial
0100101

binning
01101

101101

Histogram
0100011

111001

compression
0010011

01100

1010100

CHoG 
Descriptor
Bernd Girod: Mobile Visual Search
Chandrasekhar
et
al.
CVPR
09,10
Oge
Marques

CHoG: Compressed Histogram of Gradients

[3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92

•  Performance evaluation

–  Recall vs. bit rate

Industry and Standards

100
features, as they arrive.15 On
98 finds a result that has sufficien
ing score, it terminates the searc
96 ately sends the results back. T
optimization reduces system
Classification accuracy (%)

94
other factor of two.
92 Overall, the SPS system dem
using the described array of tec
90 bile visual-search systems can ac
ognition accuracy, scale to re
88
databases, and deliver search r
86 ceptable time.

84 Send feature (CHoG) Emerging MPEG standard
Send image (JPEG) As we have seen, key compo
82
Send feature (SIFT) gies for mobile visual search alr
80 we can choose among several p
100 101 102
tures to design such a system. W
Query size (Kbytes)
these options at the beginnin
Figure 7. Comparison of different schemes with regard to classification The architecture shown in Figur
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques

est one to implement on a mobi
accuracy and query size. CHoG descriptor data is an order of magnitude
smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W
good performance. The archite

MVS: feature indexing and matching

•  Goal: produce a data structure that can quickly return a short
list of the database candidates most likely to match the query
image.

–  The short list may contain false positives as long as the correct match
is included.

–  Slower pairwise comparisons can be subsequently performed on just
the short list of candidates rather than the entire database.

•  Example of a technique: Vocabulary Tree (VT)-Based Retrieval

Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques

MVS: geometric veriﬁcation

•  Goal: use location information of features in
query and database images to conﬁrm that the
feature matches are consistent with a change in
view-point between the two images.

Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques

ik2, c, ikNk 6 is sorted, it is more
utive ID differences 5 dk1 5 ik1,
es. is used to encode the inverted index.

2 ik1Nk 212 6 in place of the IDs. This
dex [58] can significantly reduce
cting recognition accuracy. First, [64] and recursive bottom-up complete (RBUC) code [65] have
been shown to be at least ten times faster in decoding than

MVS: geometric veriﬁcation

AC, while achieving comparable compression gains as AC. The
carryover and RBUC codes attain these speedups by enforcing
ed in text retrieval [62]. Second, word-aligned memory accesses.
n be quantized to a few repre- Figure S6(a) compares the memory usage of the invert-
•  Method: perform ed index with and without feature descriptorsRBUC evaluate
Max quantization. Third, the dis- pairwise matching of compression using the and
ces and visit counts are far from code. Index compression reduces memory usage from near-
geometricrate ly 10 GBof correspondences.

coding can be much more
consistency to 2 GB. This five times reduction leads to a sub-
•  Techniques:

oding. Using the distributions of stantial speedup in server-side processing, as shown in
counts, each inverted list can be Figure S6(b). Without compression, the large inverted
c code (AC) [63]. The geometricindex causes swapping between main anddatabase image is usually
–  Since keeping transform between the query and virtual memory estimated
very important for interactive regression down the retrieval engine. After compression,
using robust and slows techniques such as:

ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion
•  Random memory swapping is avoided and and Bolles 1981)

red over AC. The carryover code delays no longer contribute to the query latency.
•  Hough transform (Lowe 2004)

–  The transformation is often represented by an afﬁne mapping or a homography.

•  Note: GV is computationally expensive, which is why it’s only used for a subset
of images selected during the feature-matching stage.

onsistency checks to rerank
tion and scale information of
[53] and [69] propose incor-
tion into the VT matching or
71], the authors investigate
stimation itself. Philbin et al.
atching features to propose
c transformation model and
hypotheses. Weak geometric
cally used to rerank a larger
ore a full GVt
al.
Iperformed on011

Girod
e is EEE
MulUmedia
2 Oge
Marques

[FIG4] In the GV step, we match feature descriptors pairwise and
find feature correspondences that are consistent with a geometric
add a geometric reranking step

Datasets for MVS research

•  Stanford Mobile Visual Search Data Set
(http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/)

–  Key characteristics:

•  rigid objects

•  widely varying lighting conditions

•  perspective distortion

•  foreground and background clutter

•  realistic ground-truth reference data

•  query data collected from heterogeneous low and high-end
camera phones.

Chandrasekhar
et
al.
ACM
MMSys
2011
Oge
Marques

SMVS Data Set: categories and examples

•  DVD covers

h,p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/dvd_covers.html

Oge
Marques


•  CD covers

h,p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/cd_covers.html

Oge
Marques


•  Museum paintings

h,p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/museum_painUngs.html

Oge
Marques

Other MVS data sets

ISO/IEC
JTC1/SC29/WG11/N12202
-‐
July
2011,
Torino,
IT
Oge
Marques

MPEG Compact Descriptors for Visual Search (CDVS)

•  Objective

–  Deﬁne a standard that enables efﬁcient
implementation of visual search functionality on mobile
devices

•  Scope

•  bitstream of descriptors

•  parts of descriptor extraction process (e.g. key-point
detection) needed to ensure interoperability

–  Additional info:

•  https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs

•  http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups)

Bober,
Cordara,
and
Reznik
(2010)
Oge
Marques

MPEG CDVS

[3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93

•  Summarized timeline

Table 1. Timeline for development of MPEG standard for visual search.

When Milestone Comments
March, 2011 Call for Proposals is published Registration deadline: 11 July 2011
Proposals due: 21 November 2011
December, 2011 Evaluation of proposals None
February, 2012 1st Working Draft First specification and test software model that can
be used for subsequent improvements.
July, 2012 Committee Draft Essentially complete and stabilized specification.
January, 2013 Draft International Standard Complete specification. Only minor editorial
changes are allowed after DIS.
July, 2013 Final Draft International Finalized specification, submitted for approval and
Standard publication as International standard.

that among several component technologies for existing standards, such as MPEG Query For-
image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch.
marily on defining the format of descriptors and
Girod
et
al.
IEEE
MulUmedia
2011
Oge
Marques

parts of their extraction process (such as interest Conclusions and outlook
point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable

Part IV

Examples and applications

Examples

•  Google Goggles

•  SnapTell

•  oMoby (and the IQ Engines API)

•  pixlinQ

•  Moodstocks

Oge
Marques

Examples of commercial MVS apps

•  Google
Goggles

–  Android
and iPhone

–  Narrow-
domain
search and
retrieval

h,p://www.google.com/mobile/goggles

Oge
Marques

SnapTell

•  One of the earliest (ca. 2008) MVS apps for iPhone

–  Eventually acquired by Amazon (A9)

•  Proprietary technique (“highly accurate and robust
algorithm for image matching: Accumulated Signed Gradient
(ASG)”).

h,p://www.snaptell.com/technology/index.htm

Oge
Marques

oMoby (and the IQ Engines API)

–  iPhone app

h,p://omoby.com/pages/screenshots.php

Oge
Marques

oMoby (and the IQ Engines API)

•  The IQ Engines API:
“vision as a service”

h,p://www.iqengines.com/applicaUons.php

Oge
Marques

pixlinQ

•  A “mobile visual
search solution that
enables you to link
users to digital
content whenever
they take a mobile
picture of your
printed materials.”

–  Powered by image
recognition from LTU
technologies

h,p://www.pixlinq.com/home

Oge
Marques

pixlinQ

•  Example app (La Redoute)

h,p://www.youtube.com/watch?v=qUZCFtc42Q4

Oge
Marques

Moodstocks: overview

•  Ofﬂine image recognition thanks to a smart image
signatures synchronization

h,p://www.youtube.com/watch?v=tsxe23b12eU

Oge
Marques

Moodstocks: technology

•  Unique features:

–  ofﬂine image recognition thanks to a smart image signatures
synchronization,

–  QR Code decoding,

–  EAN 8/13 decoding,

–  online image recognition as a fallback for very large image databases,

–  simultaneous run of image recognition and barcode decoding,

–  seamless scans logging in the background.

•  Cross-platform (iOS / Android) client-side SDK and HTTP API
available: https://github.com/Moodstocks

•  JPEG encoder used within their SDK also publicly
available: https://github.com/Moodstocks/jpec

Oge
Marques

Moodstocks

•  Many successful apps for different platforms

h,p://www.moodstocks.com/gallery/

Oge
Marques

Concluding thoughts

•  Mobile Visual Search (MVS) is coming of age.

•  This is not a fad and it can only grow.

•  Still a good research topic

–  Many relevant technical challenges

–  MPEG efforts have just started

•  Inﬁnite creative commercial possibilities

Oge
Marques

Thanks!

•  Questions?

•  For additional information: omarques@fau.edu

Oge
Marques

Mobile Visual Search: Driving Factors and Technical Challenges

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (11)

Similaire à Mobile Visual Search: Driving Factors and Technical Challenges

Similaire à Mobile Visual Search: Driving Factors and Technical Challenges (20)

Plus de Oge Marques

Plus de Oge Marques (8)

Dernier

Dernier (20)

Mobile Visual Search: Driving Factors and Technical Challenges