20140327 - Hashing Object Embedding

1
Hashing: Object Embedding
Reporter: Xu Jiaming (PH.D Student)
Date: 2014.03.27
Computational-Brain Research Center
Institute of Automation, Chinese Academy of Sciences
Report

2
First, What is Embedding?
[ 出自 ]: https://en.wikipedia.org/wiki/Embedding
When some object X is said to be embedded in another object Y,
the embedding is given by some injective and structure-
preserving map f : X → Y. The precise meaning of "structure-
preserving" depends on the kind of mathematical structure of which
X and Y are instances.
Structure-Preserving in IR:
1 2 1 2
:
(X ,X ) (Y ,Y )
f
Sim Sim
→
≈
X Y

3
Then, What is Hash?
[ 出自 ]: https://en.wikipedia.org/wiki/Hash_table
The hash function will assign each key to a unique bucket, but this
situation is rarely achievable in practice (usually some keys will
hash to the same bucket). Instead, most hash table designs assume
that hash collisions—different keys that are assigned by the hash
function to the same bucket—will occur and must be
accommodated in some way.

4
Combine the Two Properties
[1998, Piotr Indyk, cited: 1847]
Locality Sensitive Hashing
. ( , ) , . Pr[ ( ) ( )] 1
. ( , ) (1 ), . Pr[ ( ) ( )] 1
if D p q r then h p h q p
if D p q r then h p h q pε
≤ = ≥
> + = ≥

5
Overview of Hashing
Real World Binary Space
2000 values 32 bits
Binary
Reduction

6
Facing Big Data
Approximation

7
Learning to Hash
Data-Oblivious
Data-Aware
Description Methods
LSI, RBM, SpH, STH, …
LSH, Kernel-LSH, SimHash, …

8
Data-Oblivious: SimHash [WWW.2007]
Text
…
…
Observed
Features
W1
W2
Wn
100110 W1
110000 W2
001001 Wn
…
…
W1 –W1 -W1 W1 W1 -W1
W2 W2 -W2 -W2 -W2 -W2
-Wn –Wn Wn –Wn –Wn Wn
…
…13, 108, -22, -5, -32, 551, 1, 0, 0, 0,
1
Step1:
Compute TF-
IDF
Step2: Hash
Function
Step3: Signature
Step4: Sum
Step5: Generate
Fingerprint

9
Data-Aware: Spectral Hashing [NIPS.2008]
2
min :
. .: { 1,1}
0
1
ij i j
ij
k
i
i
i
T
i i
i
S y y
s t y
y
y y
n
−
∈ −
=
=
∑
∑
∑ I
min : ( ( ) )
. .: ( , ) { 1,1}
0
T
k
T
T
trace Y D W Y
s t Y i j
−
∈ −
=
=
Y 1
Y Y I
Laplacian Eigenmap
XW Y=

10
Some Questions?
1. Can we obtain hashing codes by binarizing the real-valued low-
dimensional vectors such as LSI?
2. Can we get hashing codes by Deep Learning approaches such
as RBM, or AutoEncoder?

11
Some Questions?
1. Can we obtain hashing codes by binarizing the real-valued low-
dimensional vectors such as LSI?
Of Course !
[R. Salakhutdinov, G. Hinton. Semantic Hashing, SIGIR2007]
2. Can we get hashing codes by Deep Learning approaches such
as RBM, or AutoEncoder?
No Problem !
[R. Salakhutdinov, G. Hinton. Semantic Hashing, SIGIR2007]

12
In 2013, What Did They Think About?
Total: 30

13
1/9 - ICML2013:
Title: Learning Hash Functions Using Column Generation
Authors: Xi Li, Guosheng Lin, Chunhua Shen, Anton van den Hengel, Anthony Dick
Organization: The University of Adelaide (Australia)
Based On: NIPS2005: Distance Metric Learning for Large Margin Nearest Neighbor Classiﬁcation
Motivation: In content based image retrieval, to collect feedback, users may be required to report
whether image x looks more similar to x+ than it is to a third image x−. This task is typically much
easier than to label each individual image.
11
min
. . 0, 0;
( , ) ( , ) 1 ,
J
i i
H i i H i i i
C
s t
d d i
ξ
ξ
ξ
=
− +
+
≥ ≥
− ≥ − ∀
∑w,ξ
w
w
x x x x

14
2/9 - ICML2013:
Title: Predictable Dual-View Hashing
Authors: Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Hal Daume III, Larry S. Davis
Organization: The University of Maryland (USA)
Motivation: It is often the case that information about data are available from two or more views, e.g.,
images and their textual descriptions. It is highly desirable to embed information from both domains in
the binary codes, to increase search and retrieval capabilities.
2 2 2 2
2 2 2 2
min
. . sgn(W X )
sgn(W X )
T T T T
V V T T T T T V V V
T
T T T
T
V V V
W X Y Y Y I W X Y Y Y I
s t Y
Y
− + − + − + −
=
=
Y,U

15
3/9 - SIGIR2013:
Title: Semantic Hashing Using Tags and Topic Modeling.
Authors: Qifan Wang, Dan Zhang, Luo Si
Organization: Purdue University (USA)
Motivation: Two major issues are not addressed in the existing hashing methods: (1) Tag information
is not fully utilized in previous methods. Most existing methods only deal with the contents of
documents without utilizing the information contained in tags; (2) Document similarity in the
original keyword feature space is used as guidance for generating hashing codes in previous methods,
which may not fully reflect the semantic relationship.
1
2
2 2 2
min ( )
. . { 1,1} , 0
T
F
k n
C
s t
γ
×
− + + −
∈ − =
Y,U
T U Y U Yθ
Y Y1
g

16
3/9 - SIGIR2013:
Title: Semantic Hashing Using Tags and Topic Modeling.
Authors: Qifan Wang, Dan Zhang, Luo Si
Organization: Purdue University (USA)
Motivation: Two major issues are not addressed in the existing hashing methods: (1) Tag information
is not fully utilized in previous methods. Most existing methods only deal with the contents of
documents without utilizing the information contained in tags; (2) Document similarity in the
original keyword feature space is used as guidance for generating hashing codes in previous methods,
which may not fully reflect the semantic relationship.
1
2
2 2 2
min ( )
. . { 1,1} , 0
T
F
k n
C
s t
γ
×
− + + −
∈ − =
Y,U
T U Y U Yθ
Y Y1
g
Our experiments on
20Newsgroups

17
4/9 - IJCAI2013:
Title: A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and
Hashing.
Authors: Debing Zhang, Genmao Yang, Yao Hu, Zhongming Jin, Deng Cai, Xiaofei He
Organization: Zhejiang University (China)
Motivation: Traditionally, to solve problem of nearest neighbor search, researchers mainly focus on
building effective data structures such as hierarchical k-means tree or using hashing methods to
accelerate the query process. In this paper, we propose a novel unified approximate nearest neighbor
search scheme to combine the advantages of both the effective data structure and the fast Hamming
distance computation in hashing methods.

18
5/9 - CVPR2013:
Title: K-means Hashing: an Affinity-Preserving Quantization Method for Learning Binary
Compact Codes.
Authors: Kaiming He, Fang Wen, Jian Sun
Organization: Microsoft Research Asia (China)
Motivation: Both Hamming-based methods and lookup-based methods are of growing interest
recently, and each category has its benefits depending on the scenarios. The lookup-based methods
have been shown more accurate than some Hamming methods with the same code-length. However,
the lookup-based distance computation is slower than the Hamming distance computation. Hamming
methods also have the advantage that the distance computation is problem-independent
1 1
2
0 0
( ( , ) ( , ))
k k
aff ij i j h
i j
E w d c c d i j
− −
= =
= −∑∑

19
6/9 - ICCV2013:
Title: Complementary Projection Hashing.
Authors: Zhongming Jin1
, Yao Hu1
, Yue Lin1
, Debing Zhang1
, Shiding Lin2
, Deng Cai1
, Xuelong Li3
Organization: 1. Zhejiang University, 2. Baidu Inc., 3. Chinese Academy of Sciences, Xi’an (China)
Motivation: 1. (a) The hyperplane a crosses the sparse region and the neighbors are quantized into the
same bucket; (b) The hyperplane b crosses the dense region and the neighbors are quantized into the
different buckets. Apparently, the hyperplane a is more suitable as a hashing function. 2. (a) (b) Both
the hyperplane a and the hyperplane b can evenly separated the data. (c) However, putting them
together does not generate a good two bits hash function. (d) A better example for two bits hash
function

20
7/9 - CVPR2013:
Title: Hash Bit Selection: a Unified Solution for Selection Problems in Hashing.
Authors: Xianglong Liu1
, Junfeng He2,3
, Bo Lang1
, Shih-Fu Chang2
.
Organization: 1. Beihang University(China), 2. Columbia University(US), 3. Facebook(US)
Motivation: Recent years have witnessed the active development of hashing techniques for nearest
neighbor search over big datasets. However, to apply hashing techniques successfully, there are
several important issues remaining open in selection features, hashing algorithms, parameter
settings ， kernels, etc.

21
8/9 - ICCV2013:
Title: A General Two-Step Approach to Learning-Based Hashing.
Authors: Guosheng Lin, Chunhua Shen, David Suter, Anton van den Hengel.
Organization: University of Adelaide (Australia)
Based On: SIGIR2010: Self-Taught Hashing for Fast Similarity Search
Motivation: Most existing approaches to hashing apply a single form of hash function, and an
optimization process which is typically deeply coupled to this specific form. This tight coupling
restricts the flexibility of the method to respond to the data, and can result in complex optimization
problems that are difficult to solve. Their framework decomposes hashing learning problem into two
steps: hash bit learning and hash function learning based on the learned bits.

22
9/9 - IJCAI2013:
Title: Smart Hashing Update for Fast Response.
Authors: Qiang Yang, Long-Kai Huang, Wei-Shi Zheng, Yingbiao Ling.
Organization: Sun Yat-sen University (China)
Based On: DMKD2012: Active Hashing and Its Application to Image and Text Retrieval
Motivation: Although most existing hashing-based methods have been proven to obtain high accuracy,
they are regarded as passive hashing and assume that the labeled points are provided in advance. In this
paper, they consider updating a hashing model upon gradually increased labeled data in a fast response
to users, called smart hashing update (SHU).
1. Consistency-based Selection;
2. Similarity-based Selection.
[CVPR.2012]
( , ) min{ ( , , 1), ( , ,1)}Diff k j num k j num k j= −
2
{ 1,1}
1
min l r
l
T
l l
H
F
Q H H S
r×
∈ −
= −
2
1 1
{1,2,...,r}
min k k T
k r r Fk
R rS H H− −
∈
= −

23
Reporter: Xu Jiaming (Ph.D Student)
Date: 2014.03.27
Computational-Brain Research Center
Institute of Automation, Chinese Academy of Sciences

20140327 - Hashing Object Embedding

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

En vedette

En vedette (12)

Similaire à 20140327 - Hashing Object Embedding

Similaire à 20140327 - Hashing Object Embedding (20)

Dernier

Dernier (20)

20140327 - Hashing Object Embedding

Notes de l'éditeur