20140327 - Hashing Object Embedding

792 vues

Publié le

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

20140327 - Hashing Object Embedding

  1. 1. 1 Hashing: Object Embedding Reporter: Xu Jiaming (PH.D Student) Date: 2014.03.27 Computational-Brain Research Center Institute of Automation, Chinese Academy of Sciences Report
  2. 2. 2 First, What is Embedding? [ 出自 ]: https://en.wikipedia.org/wiki/Embedding When some object X is said to be embedded in another object Y, the embedding is given by some injective and structure- preserving map f : X → Y. The precise meaning of "structure- preserving" depends on the kind of mathematical structure of which X and Y are instances. Structure-Preserving in IR: 1 2 1 2 : (X ,X ) (Y ,Y ) f Sim Sim → ≈ X Y
  3. 3. 3 Then, What is Hash? [ 出自 ]: https://en.wikipedia.org/wiki/Hash_table The hash function will assign each key to a unique bucket, but this situation is rarely achievable in practice (usually some keys will hash to the same bucket). Instead, most hash table designs assume that hash collisions—different keys that are assigned by the hash function to the same bucket—will occur and must be accommodated in some way.
  4. 4. 4 Combine the Two Properties [1998, Piotr Indyk, cited: 1847] Locality Sensitive Hashing . ( , ) , . Pr[ ( ) ( )] 1 . ( , ) (1 ), . Pr[ ( ) ( )] 1 if D p q r then h p h q p if D p q r then h p h q pε ≤ = ≥ > + = ≥
  5. 5. 5 Overview of Hashing Real World Binary Space 2000 values 32 bits Binary Reduction
  6. 6. 6 Facing Big Data Approximation
  7. 7. 7 Learning to Hash Data-Oblivious Data-Aware Description Methods LSI, RBM, SpH, STH, … LSH, Kernel-LSH, SimHash, …
  8. 8. 8 Data-Oblivious: SimHash [WWW.2007] Text … … Observed Features W1 W2 Wn 100110 W1 110000 W2 001001 Wn … … W1 –W1 -W1 W1 W1 -W1 W2 W2 -W2 -W2 -W2 -W2 -Wn –Wn Wn –Wn –Wn Wn … …13, 108, -22, -5, -32, 551, 1, 0, 0, 0, 1 Step1: Compute TF- IDF Step2: Hash Function Step3: Signature Step4: Sum Step5: Generate Fingerprint
  9. 9. 9 Data-Aware: Spectral Hashing [NIPS.2008] 2 min : . .: { 1,1} 0 1 ij i j ij k i i i T i i i S y y s t y y y y n − ∈ − = = ∑ ∑ ∑ I min : ( ( ) ) . .: ( , ) { 1,1} 0 T k T T trace Y D W Y s t Y i j − ∈ − = = Y 1 Y Y I Laplacian Eigenmap XW Y=
  10. 10. 10 Some Questions? 1. Can we obtain hashing codes by binarizing the real-valued low- dimensional vectors such as LSI? 2. Can we get hashing codes by Deep Learning approaches such as RBM, or AutoEncoder?
  11. 11. 11 Some Questions? 1. Can we obtain hashing codes by binarizing the real-valued low- dimensional vectors such as LSI? Of Course ! [R. Salakhutdinov, G. Hinton. Semantic Hashing, SIGIR2007] 2. Can we get hashing codes by Deep Learning approaches such as RBM, or AutoEncoder? No Problem ! [R. Salakhutdinov, G. Hinton. Semantic Hashing, SIGIR2007]
  12. 12. 12 In 2013, What Did They Think About? Total: 30
  13. 13. 13 1/9 - ICML2013: Title: Learning Hash Functions Using Column Generation Authors: Xi Li, Guosheng Lin, Chunhua Shen, Anton van den Hengel, Anthony Dick Organization: The University of Adelaide (Australia) Based On: NIPS2005: Distance Metric Learning for Large Margin Nearest Neighbor Classification Motivation: In content based image retrieval, to collect feedback, users may be required to report whether image x looks more similar to x+ than it is to a third image x−. This task is typically much easier than to label each individual image. 11 min . . 0, 0; ( , ) ( , ) 1 , J i i H i i H i i i C s t d d i ξ ξ ξ = − + + ≥ ≥ − ≥ − ∀ ∑w,ξ w w x x x x
  14. 14. 14 2/9 - ICML2013: Title: Predictable Dual-View Hashing Authors: Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Hal Daume III, Larry S. Davis Organization: The University of Maryland (USA) Motivation: It is often the case that information about data are available from two or more views, e.g., images and their textual descriptions. It is highly desirable to embed information from both domains in the binary codes, to increase search and retrieval capabilities. 2 2 2 2 2 2 2 2 min . . sgn(W X ) sgn(W X ) T T T T V V T T T T T V V V T T T T T V V V W X Y Y Y I W X Y Y Y I s t Y Y − + − + − + − = = Y,U
  15. 15. 15 3/9 - SIGIR2013: Title: Semantic Hashing Using Tags and Topic Modeling. Authors: Qifan Wang, Dan Zhang, Luo Si Organization: Purdue University (USA) Motivation: Two major issues are not addressed in the existing hashing methods: (1) Tag information is not fully utilized in previous methods. Most existing methods only deal with the contents of documents without utilizing the information contained in tags; (2) Document similarity in the original keyword feature space is used as guidance for generating hashing codes in previous methods, which may not fully reflect the semantic relationship. 1 2 2 2 2 min ( ) . . { 1,1} , 0 T F k n C s t γ × − + + − ∈ − = Y,U T U Y U Yθ Y Y1 g
  16. 16. 16 3/9 - SIGIR2013: Title: Semantic Hashing Using Tags and Topic Modeling. Authors: Qifan Wang, Dan Zhang, Luo Si Organization: Purdue University (USA) Motivation: Two major issues are not addressed in the existing hashing methods: (1) Tag information is not fully utilized in previous methods. Most existing methods only deal with the contents of documents without utilizing the information contained in tags; (2) Document similarity in the original keyword feature space is used as guidance for generating hashing codes in previous methods, which may not fully reflect the semantic relationship. 1 2 2 2 2 min ( ) . . { 1,1} , 0 T F k n C s t γ × − + + − ∈ − = Y,U T U Y U Yθ Y Y1 g Our experiments on 20Newsgroups
  17. 17. 17 4/9 - IJCAI2013: Title: A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing. Authors: Debing Zhang, Genmao Yang, Yao Hu, Zhongming Jin, Deng Cai, Xiaofei He Organization: Zhejiang University (China) Motivation: Traditionally, to solve problem of nearest neighbor search, researchers mainly focus on building effective data structures such as hierarchical k-means tree or using hashing methods to accelerate the query process. In this paper, we propose a novel unified approximate nearest neighbor search scheme to combine the advantages of both the effective data structure and the fast Hamming distance computation in hashing methods.
  18. 18. 18 5/9 - CVPR2013: Title: K-means Hashing: an Affinity-Preserving Quantization Method for Learning Binary Compact Codes. Authors: Kaiming He, Fang Wen, Jian Sun Organization: Microsoft Research Asia (China) Motivation: Both Hamming-based methods and lookup-based methods are of growing interest recently, and each category has its benefits depending on the scenarios. The lookup-based methods have been shown more accurate than some Hamming methods with the same code-length. However, the lookup-based distance computation is slower than the Hamming distance computation. Hamming methods also have the advantage that the distance computation is problem-independent 1 1 2 0 0 ( ( , ) ( , )) k k aff ij i j h i j E w d c c d i j − − = = = −∑∑
  19. 19. 19 6/9 - ICCV2013: Title: Complementary Projection Hashing. Authors: Zhongming Jin1 , Yao Hu1 , Yue Lin1 , Debing Zhang1 , Shiding Lin2 , Deng Cai1 , Xuelong Li3 Organization: 1. Zhejiang University, 2. Baidu Inc., 3. Chinese Academy of Sciences, Xi’an (China) Motivation: 1. (a) The hyperplane a crosses the sparse region and the neighbors are quantized into the same bucket; (b) The hyperplane b crosses the dense region and the neighbors are quantized into the different buckets. Apparently, the hyperplane a is more suitable as a hashing function. 2. (a) (b) Both the hyperplane a and the hyperplane b can evenly separated the data. (c) However, putting them together does not generate a good two bits hash function. (d) A better example for two bits hash function
  20. 20. 20 7/9 - CVPR2013: Title: Hash Bit Selection: a Unified Solution for Selection Problems in Hashing. Authors: Xianglong Liu1 , Junfeng He2,3 , Bo Lang1 , Shih-Fu Chang2 . Organization: 1. Beihang University(China), 2. Columbia University(US), 3. Facebook(US) Motivation: Recent years have witnessed the active development of hashing techniques for nearest neighbor search over big datasets. However, to apply hashing techniques successfully, there are several important issues remaining open in selection features, hashing algorithms, parameter settings , kernels, etc.
  21. 21. 21 8/9 - ICCV2013: Title: A General Two-Step Approach to Learning-Based Hashing. Authors: Guosheng Lin, Chunhua Shen, David Suter, Anton van den Hengel. Organization: University of Adelaide (Australia) Based On: SIGIR2010: Self-Taught Hashing for Fast Similarity Search Motivation: Most existing approaches to hashing apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of the method to respond to the data, and can result in complex optimization problems that are difficult to solve. Their framework decomposes hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits.
  22. 22. 22 9/9 - IJCAI2013: Title: Smart Hashing Update for Fast Response. Authors: Qiang Yang, Long-Kai Huang, Wei-Shi Zheng, Yingbiao Ling. Organization: Sun Yat-sen University (China) Based On: DMKD2012: Active Hashing and Its Application to Image and Text Retrieval Motivation: Although most existing hashing-based methods have been proven to obtain high accuracy, they are regarded as passive hashing and assume that the labeled points are provided in advance. In this paper, they consider updating a hashing model upon gradually increased labeled data in a fast response to users, called smart hashing update (SHU). 1. Consistency-based Selection; 2. Similarity-based Selection. [CVPR.2012] ( , ) min{ ( , , 1), ( , ,1)}Diff k j num k j num k j= − 2 { 1,1} 1 min l r l T l l H F Q H H S r× ∈ − = − 2 1 1 {1,2,...,r} min k k T k r r Fk R rS H H− − ∈ = −
  23. 23. 23 Reporter: Xu Jiaming (Ph.D Student) Date: 2014.03.27 Computational-Brain Research Center Institute of Automation, Chinese Academy of Sciences

×