SlideShare une entreprise Scribd logo
1  sur  27
1
Learning to Hash for Large-Scale Search
Xu Jiaming
Chinese Academe of Science
2014-07-04 @CUHK
2
Motivation
 Similarity based search has been popular in many applications
– Image/video search and retrieval: finding most similar images/videos
– Audio search: find similar songs
– Product search: find shoes with similar style but different color
– Patient search: find patients with similar diagnostic status
 Two key components:
– Similarity/distance measure
– Indexing scheme
Whittlesearch (Kovashka et al. 2013)
- 2013CIKM Tutorial by Jun Wang
3
A Conceptual Diagram for Hashing Based Image Search System
Indexing
and Search
Image
Database
Similarity Search & Retrieval
Hash Function Design
Visual Search ApplicationsVisual Search Applications
Reranking
Refinement
Designing compact yet accurate hashing codes is a
critical component to make the search effective
- 2013CIKM Tutorial by Jun Wang
4
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHash [2002-STOC, 2007-WWW]
 Learning to Hashing (data-dependent)
 Unsupervised V.S. Supervised
STH [2010-SIGIR] V.S. SHK [2012-CVPR]
 One-Step V.S. Two-Step
ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]
 Others (data-dependent)
 Smart Hashing Update for Fast Response [2013-IJCAI]
 Two-Stage Hashing [2014-ACL]
 Semantic Hashing with Topics and Tags [2013-SIGIR]
 Dual-View Hashing [2013-ICML]
 Multiple View Hashing [2011-SIGIR]
 LSH in MapReduce
5
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHash [2002-STOC, 2007-WWW]
 Learning to Hashing (data-dependent)
 Unsupervised V.S. Supervised
STH [2010-SIGIR] V.S. SHK [2012-CVPR]
 One-Step V.S. Two-Step
ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]
 Others (data-dependent)
 Smart Hashing Update for Fast Response [2013-IJCAI]
 Two-Stage Hashing [2014-ACL]
 Semantic Hashing with Topics and Tags [2013-SIGIR]
 Dual-View Hashing [2013-ICML]
 Multiple View Hashing [2011-SIGIR]
 LSH in MapReduce
6
LSH [1999-VLDB, 2006-FOCS, 2008-Communications]
0
1
Database Items
hash function
random
101 Query
Locality Sensitive Hashing (LSH)
- 2013CIKM Tutorial by Jun Wang
0
1 0
1
7
SimHash [2002-STOC, 2007-WWW]
Text
…
…
Observed Features
W1
W2
Wn
100110 W1
110000 W2
001001 Wn
…
…
W1 –W1 -W1 W1 W1 -W1
W2 W2 -W2 -W2 -W2 -W2
-Wn –Wn Wn –Wn –Wn Wn
…
…13, 108, -22, -5, -32, 551, 1, 0, 0, 0, 1
Step1: Compute
TF-IDF
Step2: Hash
Function
Step3: Signature
Step4: Sum
Step5: Generate
Fingerprint
8
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHash [2002-STOC, 2007-WWW]
 Learning to Hashing (data-dependent)
 Unsupervised V.S. Supervised
STH [2010-SIGIR] V.S. SHK [2012-CVPR]
 One-Step V.S. Two-Step
ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]
 Others (data-dependent)
 Smart Hashing Update for Fast Response [2013-IJCAI]
 Two-Stage Hashing [2014-ACL]
 Semantic Hashing with Topics and Tags [2013-SIGIR]
 Dual-View Hashing [2013-ICML]
 Multiple View Hashing [2011-SIGIR]
 LSH in MapReduce
9
STH [2010-SIGIR]
2
min :
. .: { 1,1}
0
1
ij i j
ij
k
i
i
i
T
i i
i
S y y
s t y
y
y y
n
−
∈ −
=
=
∑
∑
∑ I
min : ( ( ) )
. .: ( , ) { 1,1}
0
T
k
T
T
trace Y D W Y
s t Y i j
−
∈ −
=
=
Y 1
Y Y I
Laplacian Eigenmap
Self Taught Hashing (STH)
Unsupervised Learning
Supervised Learning
10
SHK [2012-CVPR]
Pairwise similarity
Code inner product approximates pairwise similarity
Supervised Hashing with Kernels
- 2013CIKM Tutorial by Jun Wang
11
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHash [2002-STOC, 2007-WWW]
 Learning to Hashing (data-dependent)
 Unsupervised V.S. Supervised
STH [2010-SIGIR] V.S. SHK [2012-CVPR]
 One-Step V.S. Two-Step
ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]
 Others (data-dependent)
 Smart Hashing Update for Fast Response [2013-IJCAI]
 Two-Stage Hashing [2014-ACL]
 Semantic Hashing with Topics and Tags [2013-SIGIR]
 Dual-View Hashing [2013-ICML]
 Multiple View Hashing [2011-SIGIR]
 LSH in MapReduce
12
ITQ [2011-CVPR, 2013-TPAMI]
Iterative Quantization
 Apply PCA for dimensionality reduction, find to maximize:
 Keep top c eigenvectors of the data covariance matrix to
obtain , projected data is
 Note that if is an optimal solution then is also optimal for
any orthogonal matrix
 Key idea: Find to minimize the quantization loss:
 nc and V are fixed so this is equivalent to maximizing ( ) :
13
TSH [2013-ICCV]
Two-Step Hashing
14
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHash [2002-STOC, 2007-WWW]
 Learning to Hashing (data-dependent)
 Unsupervised V.S. Supervised
STH [2010-SIGIR] V.S. SHK [2012-CVPR]
 One-Step V.S. Two-Step
ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]
 Others (data-dependent)
 Smart Hashing Update for Fast Response [2013-IJCAI]
 Two-Stage Hashing [2014-ACL]
 Semantic Hashing with Topics and Tags [2013-SIGIR]
 Dual-View Hashing [2013-ICML]
 Multiple View Hashing [2011-SIGIR]
 LSH in MapReduce
15
SHU [2013-IJCAI]
Smart Hashing Update
1. Consistency-based Selection;
2. Similarity-based Selection.
( , ) min{ ( , , 1), ( , ,1)}Diff k j num k j num k j= −
2
{ 1,1}
1
min l r
l
T
l l
H
F
Q H H S
r×
∈ −
= −
2
1 1
{1,2,...,r}
min k k T
k r r Fk
R rS H H− −
∈
= −
16
TSH [2014-ACL]
Two-Stage Hashing
 LSH for neighbor candidate pruning; ITQ for
effective re-ranking.
 LSH captures term similarity; ITQ captures
topic similarity
 Advantages:
 High hash lookup success rate is attained by the LSH stage;
 High search precision due to the ITQ re-ranking stage;
 Scan only a small portion of an entire dataset
 Integrate two similarity measures
17
SHTTM [2013-SIGIR]
Semantic Hashing Using Tags and Topic Modeling
Hash Code Learning Hash Function Learning
2 2*
1
* 1
( )
arg min
( )
j j j
n
j j
j
T T
y f x x
y x λ
λ
=
−
= =
= − +
⇒ = +
∑W
W
W W W
W Y X X X I
Tag Consistency
1
2
2 2 2
min ( )
. . { 1,1} , 0
T
F
k n
C
s t
γ
×
− + + −
∈ − =
Y,U
T U Y U Yθ
Y Y1
g
Similarity Preservation
18
DVH [2013-ICML]
Predictable Dual-View Hashing
The goal is to find two sets of hyperplanes that map the visual and textual space into a common
subspace.
CCA
Multi-SVM
19
MVH [2011-SIGIR]
Composite Hashing with Multiple Information Sources
( )
2
2( ) ( ) ( ) ( )
1 2
1 1 1
( , , ) ( ) ( , )
( )
S C
M M M
TT k k k k
k
k k k
J J J
C tr C α
= = =
= +
= + − +∑ ∑ ∑
Y WαY Y W
Y L Y Y W X W%
 Overall Objection
20
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHash [2002-STOC, 2007-WWW]
 Learning to Hashing (data-dependent)
 Unsupervised V.S. Supervised
STH [2010-SIGIR] V.S. SHK [2012-CVPR]
 One-Step V.S. Two-Step
ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]
 Others (data-dependent)
 Smart Hashing Update for Fast Response [2013-IJCAI]
 Two-Stage Hashing [2014-ACL]
 Semantic Hashing with Topics and Tags [2013-SIGIR]
 Dual-View Hashing [2013-ICML]
 Multiple View Hashing [2011-SIGIR]
 LSH in MapReduce
21
LSH in MapReduce – Key Idea
22
LSH in MapReduce – First Round of MapReduce
23
LSH in MapReduce – Second Round of MapReduce
24
Reference
[1]. Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via
hashing[C]//VLDB. 1999, 99: 518-529.
[2]. Andoni A, Indyk P. Near-optimal hashing algorithms for approximate nearest neighbor
in high dimensions[C]//Foundations of Computer Science, 2006. FOCS'06. 47th Annual
IEEE Symposium on. IEEE, 2006: 459-468.
[3]. Andoni A, Indyk P. Near-Optimal Hashing Algorithms for Approximate Nearest
Neighbor in High Dimensions[J]. COMMUNICATIONS OF THE ACM, 2008, 51(1): 117.
[4]. Charikar M S. Similarity estimation techniques from rounding
algorithms[C]//Proceedings of the thiry-fourth annual ACM symposium on Theory of
computing. ACM, 2002: 380-388.
[5]. Manku G S, Jain A, Das Sarma A. Detecting near-duplicates for web
crawling[C]//Proceedings of the 16th international conference on World Wide Web. ACM,
2007: 141-150.
[6]. Zhang D, Wang J, Cai D, et al. Self-taught hashing for fast similarity
search[C]//Proceedings of the 33rd international ACM SIGIR conference on Research
and development in information retrieval. ACM, 2010: 18-25.
[7]. Liu W, Wang J, Ji R, et al. Supervised hashing with kernels[C]//Computer Vision and
Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012: 2074-2081.
25
Reference
[8]. Gong Y, Lazebnik S. Iterative quantization: A procrustean approach to learning binary
codes[C]//Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on.
IEEE, 2011: 817-824.
[9]. Gong Y, Lazebnik S, Gordo A, et al. Iterative quantization: A procrustean approach to
learning binary codes for large-scale image retrieval[J]. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 2013, 35(12): 2916-2929.
[10]. Lin G, Shen C, Suter D, et al. A general two-step approach to learning-based
hashing[C]//Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE,
2013: 2552-2559.
[11]. Yang Q, Huang L K, Zheng W S, et al. Smart hashing update for fast
response[C]//Proceedings of the Twenty-Third international joint conference on Artificial
Intelligence. AAAI Press, 2013: 1855-1861.
[12]. Li H, Liu W, Ji H. Two-Stage Hashing for Fast Document Retrieval[C]. ACL. 2014
[13]. Wang Q, Zhang D, Si L. Semantic hashing using tags and topic
modeling[C]//Proceedings of the 36th international ACM SIGIR conference on Research
and development in information retrieval. ACM, 2013: 213-222.
[14]. Rastegari M, Choi J, Fakhraei S, et al. Predictable Dual-View
Hashing[C]//Proceedings of The 30th International Conference on Machine Learning.
2013: 1328-1336.
26
Reference
[15]. Zhang D, Wang F, Si L. Composite hashing with multiple information
sources[C]//Proceedings of the 34th international ACM SIGIR conference on Research
and development in Information Retrieval. ACM, 2011: 225-234.
[16]. Szmit, Radosław. "Locality Sensitive Hashing for Similarity Search Using
MapReduce on Large Scale Data." Language Processing and Intelligent Information
Systems. Springer Berlin Heidelberg, 2013. 171-178.
[17]. Blog: Location Sensitive Hashing in Map Reduce:
http://horicky.blogspot.hk/2012/09/location-sensitive-hashing-in-map-reduce.html
[18]. Likelike Project: https://github.com/takahi-i/likelike
[19]. Jun Wang. Learning to Hash for Large-Scale Search. 2013 CIKM Tutorial.
27
Discussions and Questions?
Thank you!
2014-07-04

Contenu connexe

Similaire à 20140702 xu jiaming hashinglearning - lite

2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...Rudolf Husar
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
Egu2013 final
Egu2013 finalEgu2013 final
Egu2013 finalsirf13
 
The development of a Geographic Information System for traffic route planni...
The development of a  Geographic  Information System for traffic route planni...The development of a  Geographic  Information System for traffic route planni...
The development of a Geographic Information System for traffic route planni...Matthew Pulis
 
Li Cheng WUSTL resume(Amazon)
Li Cheng WUSTL resume(Amazon)Li Cheng WUSTL resume(Amazon)
Li Cheng WUSTL resume(Amazon)Li Cheng
 
0603 Esip Fed Wash Dc Tech Pres 060103 Esip Aq Tech Track
0603 Esip Fed Wash Dc Tech Pres 060103 Esip Aq Tech Track0603 Esip Fed Wash Dc Tech Pres 060103 Esip Aq Tech Track
0603 Esip Fed Wash Dc Tech Pres 060103 Esip Aq Tech TrackRudolf Husar
 
2006-01-11 Data Flow & Interoperability in DataFed Service-based AQ Analysis ...
2006-01-11 Data Flow & Interoperability in DataFed Service-based AQ Analysis ...2006-01-11 Data Flow & Interoperability in DataFed Service-based AQ Analysis ...
2006-01-11 Data Flow & Interoperability in DataFed Service-based AQ Analysis ...Rudolf Husar
 
Architectural Design Spaces for Feedback Control in Self-Adaptive Systems Con...
Architectural Design Spaces for Feedback Control in Self-Adaptive Systems Con...Architectural Design Spaces for Feedback Control in Self-Adaptive Systems Con...
Architectural Design Spaces for Feedback Control in Self-Adaptive Systems Con...Sandro Andrade
 
Big Data Analysis of Airline Data Set on Cloud Computing
Big Data Analysis of Airline Data Set on Cloud ComputingBig Data Analysis of Airline Data Set on Cloud Computing
Big Data Analysis of Airline Data Set on Cloud ComputingNillohit Bhattacharya
 
CIKM Tutorial 2008
CIKM Tutorial 2008CIKM Tutorial 2008
CIKM Tutorial 2008Peiling Wang
 
A Preliminary Study on Architecting Cyber-Physical Systems
A Preliminary Study on Architecting Cyber-Physical SystemsA Preliminary Study on Architecting Cyber-Physical Systems
A Preliminary Study on Architecting Cyber-Physical SystemsHenry Muccini
 
remotesensing-12-01253.pdf
remotesensing-12-01253.pdfremotesensing-12-01253.pdf
remotesensing-12-01253.pdfNguyenVanTuan29
 
060128 Galeon Rept
060128 Galeon Rept060128 Galeon Rept
060128 Galeon ReptRudolf Husar
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysYork University
 
MawereC- Ubuntunet paper publication 2015
MawereC- Ubuntunet paper publication 2015MawereC- Ubuntunet paper publication 2015
MawereC- Ubuntunet paper publication 2015CEPHAS MAWERE
 
Chek mate geolocation analyzer
Chek mate geolocation analyzerChek mate geolocation analyzer
Chek mate geolocation analyzerpriyal mistry
 
Toward a Semantic Web of Vehicles
Toward a Semantic Web of VehiclesToward a Semantic Web of Vehicles
Toward a Semantic Web of VehiclesAmélie Gyrard
 
REAL TIME POLLING SYSTEM
REAL TIME POLLING SYSTEMREAL TIME POLLING SYSTEM
REAL TIME POLLING SYSTEMIRJET Journal
 

Similaire à 20140702 xu jiaming hashinglearning - lite (20)

2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Egu2013 final
Egu2013 finalEgu2013 final
Egu2013 final
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
The development of a Geographic Information System for traffic route planni...
The development of a  Geographic  Information System for traffic route planni...The development of a  Geographic  Information System for traffic route planni...
The development of a Geographic Information System for traffic route planni...
 
Li Cheng WUSTL resume(Amazon)
Li Cheng WUSTL resume(Amazon)Li Cheng WUSTL resume(Amazon)
Li Cheng WUSTL resume(Amazon)
 
0603 Esip Fed Wash Dc Tech Pres 060103 Esip Aq Tech Track
0603 Esip Fed Wash Dc Tech Pres 060103 Esip Aq Tech Track0603 Esip Fed Wash Dc Tech Pres 060103 Esip Aq Tech Track
0603 Esip Fed Wash Dc Tech Pres 060103 Esip Aq Tech Track
 
2006-01-11 Data Flow & Interoperability in DataFed Service-based AQ Analysis ...
2006-01-11 Data Flow & Interoperability in DataFed Service-based AQ Analysis ...2006-01-11 Data Flow & Interoperability in DataFed Service-based AQ Analysis ...
2006-01-11 Data Flow & Interoperability in DataFed Service-based AQ Analysis ...
 
Architectural Design Spaces for Feedback Control in Self-Adaptive Systems Con...
Architectural Design Spaces for Feedback Control in Self-Adaptive Systems Con...Architectural Design Spaces for Feedback Control in Self-Adaptive Systems Con...
Architectural Design Spaces for Feedback Control in Self-Adaptive Systems Con...
 
Big Data Analysis of Airline Data Set on Cloud Computing
Big Data Analysis of Airline Data Set on Cloud ComputingBig Data Analysis of Airline Data Set on Cloud Computing
Big Data Analysis of Airline Data Set on Cloud Computing
 
CIKM Tutorial 2008
CIKM Tutorial 2008CIKM Tutorial 2008
CIKM Tutorial 2008
 
A Preliminary Study on Architecting Cyber-Physical Systems
A Preliminary Study on Architecting Cyber-Physical SystemsA Preliminary Study on Architecting Cyber-Physical Systems
A Preliminary Study on Architecting Cyber-Physical Systems
 
remotesensing-12-01253.pdf
remotesensing-12-01253.pdfremotesensing-12-01253.pdf
remotesensing-12-01253.pdf
 
060128 Galeon Rept
060128 Galeon Rept060128 Galeon Rept
060128 Galeon Rept
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
 
MawereC- Ubuntunet paper publication 2015
MawereC- Ubuntunet paper publication 2015MawereC- Ubuntunet paper publication 2015
MawereC- Ubuntunet paper publication 2015
 
Chek mate geolocation analyzer
Chek mate geolocation analyzerChek mate geolocation analyzer
Chek mate geolocation analyzer
 
Resume sandeep chakraborty
Resume sandeep chakrabortyResume sandeep chakraborty
Resume sandeep chakraborty
 
Toward a Semantic Web of Vehicles
Toward a Semantic Web of VehiclesToward a Semantic Web of Vehicles
Toward a Semantic Web of Vehicles
 
REAL TIME POLLING SYSTEM
REAL TIME POLLING SYSTEMREAL TIME POLLING SYSTEM
REAL TIME POLLING SYSTEM
 

Dernier

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 

Dernier (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 

20140702 xu jiaming hashinglearning - lite

  • 1. 1 Learning to Hash for Large-Scale Search Xu Jiaming Chinese Academe of Science 2014-07-04 @CUHK
  • 2. 2 Motivation  Similarity based search has been popular in many applications – Image/video search and retrieval: finding most similar images/videos – Audio search: find similar songs – Product search: find shoes with similar style but different color – Patient search: find patients with similar diagnostic status  Two key components: – Similarity/distance measure – Indexing scheme Whittlesearch (Kovashka et al. 2013) - 2013CIKM Tutorial by Jun Wang
  • 3. 3 A Conceptual Diagram for Hashing Based Image Search System Indexing and Search Image Database Similarity Search & Retrieval Hash Function Design Visual Search ApplicationsVisual Search Applications Reranking Refinement Designing compact yet accurate hashing codes is a critical component to make the search effective - 2013CIKM Tutorial by Jun Wang
  • 4. 4 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  • 5. 5 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  • 6. 6 LSH [1999-VLDB, 2006-FOCS, 2008-Communications] 0 1 Database Items hash function random 101 Query Locality Sensitive Hashing (LSH) - 2013CIKM Tutorial by Jun Wang 0 1 0 1
  • 7. 7 SimHash [2002-STOC, 2007-WWW] Text … … Observed Features W1 W2 Wn 100110 W1 110000 W2 001001 Wn … … W1 –W1 -W1 W1 W1 -W1 W2 W2 -W2 -W2 -W2 -W2 -Wn –Wn Wn –Wn –Wn Wn … …13, 108, -22, -5, -32, 551, 1, 0, 0, 0, 1 Step1: Compute TF-IDF Step2: Hash Function Step3: Signature Step4: Sum Step5: Generate Fingerprint
  • 8. 8 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  • 9. 9 STH [2010-SIGIR] 2 min : . .: { 1,1} 0 1 ij i j ij k i i i T i i i S y y s t y y y y n − ∈ − = = ∑ ∑ ∑ I min : ( ( ) ) . .: ( , ) { 1,1} 0 T k T T trace Y D W Y s t Y i j − ∈ − = = Y 1 Y Y I Laplacian Eigenmap Self Taught Hashing (STH) Unsupervised Learning Supervised Learning
  • 10. 10 SHK [2012-CVPR] Pairwise similarity Code inner product approximates pairwise similarity Supervised Hashing with Kernels - 2013CIKM Tutorial by Jun Wang
  • 11. 11 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  • 12. 12 ITQ [2011-CVPR, 2013-TPAMI] Iterative Quantization  Apply PCA for dimensionality reduction, find to maximize:  Keep top c eigenvectors of the data covariance matrix to obtain , projected data is  Note that if is an optimal solution then is also optimal for any orthogonal matrix  Key idea: Find to minimize the quantization loss:  nc and V are fixed so this is equivalent to maximizing ( ) :
  • 14. 14 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  • 15. 15 SHU [2013-IJCAI] Smart Hashing Update 1. Consistency-based Selection; 2. Similarity-based Selection. ( , ) min{ ( , , 1), ( , ,1)}Diff k j num k j num k j= − 2 { 1,1} 1 min l r l T l l H F Q H H S r× ∈ − = − 2 1 1 {1,2,...,r} min k k T k r r Fk R rS H H− − ∈ = −
  • 16. 16 TSH [2014-ACL] Two-Stage Hashing  LSH for neighbor candidate pruning; ITQ for effective re-ranking.  LSH captures term similarity; ITQ captures topic similarity  Advantages:  High hash lookup success rate is attained by the LSH stage;  High search precision due to the ITQ re-ranking stage;  Scan only a small portion of an entire dataset  Integrate two similarity measures
  • 17. 17 SHTTM [2013-SIGIR] Semantic Hashing Using Tags and Topic Modeling Hash Code Learning Hash Function Learning 2 2* 1 * 1 ( ) arg min ( ) j j j n j j j T T y f x x y x λ λ = − = = = − + ⇒ = + ∑W W W W W W Y X X X I Tag Consistency 1 2 2 2 2 min ( ) . . { 1,1} , 0 T F k n C s t γ × − + + − ∈ − = Y,U T U Y U Yθ Y Y1 g Similarity Preservation
  • 18. 18 DVH [2013-ICML] Predictable Dual-View Hashing The goal is to find two sets of hyperplanes that map the visual and textual space into a common subspace. CCA Multi-SVM
  • 19. 19 MVH [2011-SIGIR] Composite Hashing with Multiple Information Sources ( ) 2 2( ) ( ) ( ) ( ) 1 2 1 1 1 ( , , ) ( ) ( , ) ( ) S C M M M TT k k k k k k k k J J J C tr C α = = = = + = + − +∑ ∑ ∑ Y WαY Y W Y L Y Y W X W%  Overall Objection
  • 20. 20 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  • 21. 21 LSH in MapReduce – Key Idea
  • 22. 22 LSH in MapReduce – First Round of MapReduce
  • 23. 23 LSH in MapReduce – Second Round of MapReduce
  • 24. 24 Reference [1]. Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing[C]//VLDB. 1999, 99: 518-529. [2]. Andoni A, Indyk P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions[C]//Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on. IEEE, 2006: 459-468. [3]. Andoni A, Indyk P. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions[J]. COMMUNICATIONS OF THE ACM, 2008, 51(1): 117. [4]. Charikar M S. Similarity estimation techniques from rounding algorithms[C]//Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 2002: 380-388. [5]. Manku G S, Jain A, Das Sarma A. Detecting near-duplicates for web crawling[C]//Proceedings of the 16th international conference on World Wide Web. ACM, 2007: 141-150. [6]. Zhang D, Wang J, Cai D, et al. Self-taught hashing for fast similarity search[C]//Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010: 18-25. [7]. Liu W, Wang J, Ji R, et al. Supervised hashing with kernels[C]//Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012: 2074-2081.
  • 25. 25 Reference [8]. Gong Y, Lazebnik S. Iterative quantization: A procrustean approach to learning binary codes[C]//Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011: 817-824. [9]. Gong Y, Lazebnik S, Gordo A, et al. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2013, 35(12): 2916-2929. [10]. Lin G, Shen C, Suter D, et al. A general two-step approach to learning-based hashing[C]//Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013: 2552-2559. [11]. Yang Q, Huang L K, Zheng W S, et al. Smart hashing update for fast response[C]//Proceedings of the Twenty-Third international joint conference on Artificial Intelligence. AAAI Press, 2013: 1855-1861. [12]. Li H, Liu W, Ji H. Two-Stage Hashing for Fast Document Retrieval[C]. ACL. 2014 [13]. Wang Q, Zhang D, Si L. Semantic hashing using tags and topic modeling[C]//Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2013: 213-222. [14]. Rastegari M, Choi J, Fakhraei S, et al. Predictable Dual-View Hashing[C]//Proceedings of The 30th International Conference on Machine Learning. 2013: 1328-1336.
  • 26. 26 Reference [15]. Zhang D, Wang F, Si L. Composite hashing with multiple information sources[C]//Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011: 225-234. [16]. Szmit, Radosław. "Locality Sensitive Hashing for Similarity Search Using MapReduce on Large Scale Data." Language Processing and Intelligent Information Systems. Springer Berlin Heidelberg, 2013. 171-178. [17]. Blog: Location Sensitive Hashing in Map Reduce: http://horicky.blogspot.hk/2012/09/location-sensitive-hashing-in-map-reduce.html [18]. Likelike Project: https://github.com/takahi-i/likelike [19]. Jun Wang. Learning to Hash for Large-Scale Search. 2013 CIKM Tutorial.

Notes de l'éditeur

  1. 直接进入各个Hashing模型
  2. 直接进入各个Hashing模型
  3. 应用最为广泛的工作LSH。
  4. Google的工作,用于网页爬虫中的文本内容去重工作
  5. 直接进入各个Hashing模型
  6. 直接进入各个Hashing模型
  7. 先通过PCA进行降维,得到低维的向量V,然后一种最直接的方式是直接拟合这个低维向量,即直接对这个低维向量V进行二值化,但实际上在PCA问题中,最优解W进行任意的正交变换后还是最优解。因而我们可以对低维向量进行任意的正交变化,然后由hash code进行拟合这个正交后的矩阵。
  8. ICCV这篇是澳大利亚的阿德雷得大學的工作,他们的Motivation是说目前大多数Hash方法都是针对数据集的Hash降维编码过程及Hash编码预测函数的学习过程整合在一起来学习,这种紧密耦合一方面限制了灵活性,另一方面导致优化问题变的复杂,难以求解。他们提出一种框架,把Hash问题拆解成两个阶段完成,第一个阶段是进行针对现有的数据集进行hash码学习,第二个阶段是基于之前的Hash码学习Hash函数。如果对Motivation不太清楚的,我们可以下面这个图例,这个图例是这篇文章的主要参考工作SIGIR2010的Self-Taught Hashing,它就是一个典型 二阶段Hash学习方法,出自于普渡大学的Si Luo实验室,这篇文章的第三作者是浙大的蔡登,可能是交流学习阶段一起完成的工作。我们看这个图,首先给一堆文本集然后通过一种无监督的降维方法得到二进制的Hash码,这是第一个阶段。然后根据已经学到的Hash码作为二值标签利用监督学习方法学习一个Hash函数。而这两个阶段都属于离线学习阶段,而Query查询属于在线阶段。其实STH本身就是一个二阶段框架了,ICCV的这篇文章基本就是基于此工作提出总结性的两阶段Hash学习框架
  9. 直接进入各个Hashing模型
  10. 还之前的Two-Step框架上,更新Hash函数. 这篇IJCAI是中山大学的工作,他们的工作是基于DMKD2012年上一篇基于主动学习Hash的文章(DMKD是检索类里面的B级期刊)。他们的Motivation比较实用化,就是说现有的基于Hash的方法已经获得了比较好的效果,但是他们大多是被动Hashing学习,且假定带标签的数据都是已经提供好的。这在这篇文章中,他们考虑如何基于逐渐增多的标签数据更新Hashing模型码给用户做出快速相应,被称为Smart Hashing Update.所谓主动学习,就是系统自动的挑选一些数据给用户进行标记,然后基于已经存在的数据和新标准的数据更新整个Hashing模型。他们的算法流程见下图,每次由用户标出新数据之后,添加到现有数据集中,然后由系统挑选那些Hash位需要进行更新,被挑出的t个bit位对应的Hash函数参与本轮更新,那其实如何挑选这t个bit位比较关键,本文是给了两种策略:1,Consistency-based Selection;2,Similarity-based Selection;基于一致性选择是考虑整体数据集属于同一类的Hash码每一位上的一致性是否比较强,判断同样位的标签{-1,1}是否比较一致,是否都是正一,或都是负一,如果一致性不好的话,我们就把它挑选出来参与更新;当然这种策略的缺点就是没有考虑内部数据和外部新数据的相似性,因而第二种是基于相似度选择,度量同一类别内的Hash编码效果好不好,CVPR2012上给出了一个性能度量指标公式,H是同一种类别的Hash码,S是关联矩阵,这个指标越小的话,说明效果比较好。这边为了挑出效果不好的t个Hash函数对这个指标进行了变形,依次把第k位从Hash函数中剔除,对比剔除哪个Hash位之后这个指标下降比较明显的话,就把波动影响比较大的t个挑选出来这就完成了挑选工作,然后根据带标签数据进行重新学习
  11. 和Two Step Hashing 不同。两个Hash方法,前后排
  12. 考虑了Topic model和Tag信息,一种 two step hashing方式
  13. 一种one step hashing method,为了防止平凡解,后面加了约束条件。 用CCA来 solve this optimization. 但是orthogonality 有时是not necessary and harmful的
  14. 一种one step hashing method
  15. 直接进入各个Hashing模型