Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

455 vues

Publié le

Paper review
"Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search",
Yue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, Xindong Wu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013

Publié dans : Données & analyses
  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @ https://www.ThesisScientist.com
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

  1. 1. Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search Yue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, Xindong Wu IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013
  2. 2. Introduction
  3. 3. Tag-based social image search • Social media data (Flickr, Youtube, etc.) • Associated with user generated tags, meta information (date, location, etc.) • Conventional tag-based social image search • Too much noise in tags • Lacks an optimal ranking strategy (e.g. Flickr – time-based ranking, interestingness-based ranking) • Existing relevance-based ranking method • explore visual content and tags separately or sequentially
  4. 4. Proposed schema • a hypergraph-based approach to simultaneously utilize visual information and tags Vertex: social image Hyperedge: visual word / tag Learn the weights(importance of different visual words and tags) Relevance scores of images
  5. 5. Related works
  6. 6. Social image search • Separated Methods • Only the textual content or the visual content is employed for tag analysis • The useful information is missing
  7. 7. Social image search • Sequential Methods • The visual content and the tags are sequentially employed for image search • The correlation among visual content and tags are separated
  8. 8. Social image search Joint method
  9. 9. Hypergraph learning • Hypergraph is generalization of graph in which an edge can connect to multiple vertices • Used for data mining and information retrieval task • Effective in capturing higher-order relationship
  10. 10. Hypergraph analysis
  11. 11. Definition Image from Wikipedia • Vertex set 𝒱 = {𝑣1, 𝑣2, 𝑣3, 𝑣4, 𝑣5, 𝑣6, 𝑣7} • Hyperedge set ℰ = 𝑒1, 𝑒2, 𝑒3, 𝑒4 = { 𝑣1, 𝑣2, 𝑣3 , 𝑣2, 𝑣3 , 𝑣3, 𝑣5, 𝑣6 , 𝑣4 } • Hyperedge is able to link more than two vertices. • Edge weight set 𝓌 Hypergraph 𝒢 = (𝒱, ℰ, 𝓌)
  12. 12. Hypergraph analysis • Learning with hypergraphs • Binary classification with hypergraph • Normalized Laplacian method is formulated as a regularization framework 𝑎𝑟𝑔𝑚𝑖𝑛 𝑓{𝜆𝑅 𝑒𝑚𝑝 𝑓 + Ω(𝑓)} Regularizer Empirical loss Weighting parameter To-be-learned classification function
  13. 13. Visual-textual joint relevance learning
  14. 14. Hypergraph construction • Vertex construction • Vertices : Social image set • The number of vertices in Hypergraph is equals to the number of images in the image dataset.
  15. 15. Hypergraph construction • Hyperedge construction • Feature 1. visual contents • Bag of Visual Words • Extracts local SIFT descriptors for each images • Trains visual vocabularies with descriptors • 𝑓𝑖 𝑏𝑜𝑤 𝑘, 1 = 1 0 if i-th image has k-th visual word otherwise
  16. 16. Hypergraph construction • Hyperedge construction • Feature 2. Textual information • Bag of Textual Words • Tags in each image are ranked by TagRanking • For further processing, top 𝑛𝑙 tags for each image are left • For further hyperedge construction, the total number of tags with the highest TF-IDF are left in the database • 𝑓𝑖 𝑡𝑎𝑔 𝑘, 1 = 1 0 if i-th image has k-th tag otherwise
  17. 17. Hypergraph construction • Hyperedge construction • If selected two images contain the same visual word, they are connected with the hyperedge. • If selected two images contain the same tag, they are connected with the hyperedge. • If 𝑓𝑖 𝑏𝑜𝑤 𝑘, 1 = 𝑓𝑗 𝑏𝑜𝑤 𝑘, 1 = 1, 𝑖 and 𝑗 is connected. • If 𝑓𝑖 𝑡𝑎𝑔 𝑘, 1 = 𝑓𝑗 𝑡𝑎𝑔 𝑘, 1 = 1, 𝑖 and 𝑗 is connected. 𝑛 𝑐 visual content based hyperedges 𝑛 𝑡 tag-based hyperedges 𝑛 𝑐 + 𝑛 𝑡 hyperedges in total
  18. 18. Hypergraph construction Example of textual hyperedge construction Example of visual hyperedge construction
  19. 19. Example of the connection between two images
  20. 20. Social image relevance learning • Social image search task • Binary classification problem • Measure the relevance score among all vertices in hypergraph • Transductivie inference is also formulated as a regularization framework • Object Function • 𝑎𝑟𝑔𝑚𝑖𝑛 𝑓,𝜔{Ω 𝑓 + 𝜆𝑅 𝑒𝑚𝑝 𝑓 + 𝜇Ψ(𝜔)} • Regularizer term indicates that highly related vertices should have close label results Weight regularizer term Empirical loss term Regularizer term Weight vector To-be-learned relevance score vector
  21. 21. Social image relevance learning • Object Function • 𝑎𝑟𝑔𝑚𝑖𝑛 𝑓,𝜔{Ω 𝑓 + 𝜆𝑅 𝑒𝑚𝑝 𝑓 + 𝜇Ψ(𝜔)} • Ω 𝑓 = 𝑓 𝑇 Δ𝑓 • 𝑅 𝑒𝑚𝑝 𝑓 = 𝑓 − 𝑦 2 = Σ 𝑢∈𝑉 𝑓 𝑢 − 𝑦 𝑢 2 • guarantees that the new generated labeling results are not far away from the initial label information • Ψ 𝜔 = Σ𝑖=1 𝑛 𝑒 𝜔𝑖 2 s.t. Σ𝑖=1 𝑛 𝑒 𝜔𝑖 = 1 • 𝑎𝑟𝑔𝑚𝑖𝑛 𝑓,𝜔Φ 𝑓 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑓{𝑓 𝑇Δ𝑓+ 𝜆 𝑓 − 𝑦 2 + 𝜇Σ𝑖=1 𝑛 𝑒 𝜔𝑖 2 } s.t. Σ𝑖=1 𝑛 𝑒 𝜔𝑖 = 1 (Δ: the normalized hypergraph Laplacian) (y : n × 1 initial label vector)
  22. 22. Optimization • Alternating optimization strategy • to-be-learned two variable w and f we fix one and optimize the other one each time • Using the iterative optimization method, w and f are obtained.
  23. 23. Probabilistic explanation • Probabilistic perspective • Deriving the optimal f and w with the maximum posterior probability given the samples X and the label vector y 𝑓, 𝑤 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝(𝑓, 𝑤|𝑋, 𝑦) • Equivalent to the object function 𝑎𝑟𝑔𝑚𝑖𝑛 𝑓{𝑓 𝑇 Δ𝑓 + 𝜆 𝑓 − 𝑦 2 + 𝜇Σ𝑖=1 𝑛 𝑒 𝜔𝑖 2 } s.t. Σ𝑖=1 𝑛 𝑒 𝜔𝑖 = 1
  24. 24. Pseudo-relevant sample selection • Pseudo-relevant samples • Associated with the query tag • Have high relevance probabilities • They are not far away from result • Used for noise reduction
  25. 25. Pseudo-relevant sample selection • Semantic Relevance Measuring • All the social images that are associated with the tag are ranked in descending order • The top K results are selected as the pseudo-relevant images • Semantic similarity • Flickr Distance between two tags • Based on a latent topic based visual language model 𝑠 𝑥𝑖, 𝑡 𝑞 = 1 𝑛𝑖 Σ 𝑡∈𝑇 𝑖 𝑠𝑡𝑎𝑔(𝑡 𝑞, 𝑡) 𝑠𝑡𝑎𝑔 𝑡1, 𝑡2 = exp(−𝐹𝐷(𝑡1, 𝑡2))
  26. 26. Experiments
  27. 27. Experimental settings • Dataset : Flickr dataset(104,000 images, 83,999 tags) + NUS-WIDE (370K+ images) • Labeling : three relevance levels : very relevant(2), relevant(1) and irrelevant(0) • Compared algorithms • Graph based semi supervised learning (Graph) • Sequential social image relevance learning (Sequential) • Tag ranking (TagRanking) • Tag relevance combination (Uniform Tagger) • Hypergraph based relevance learning (HG) • HG + hyperedge weight estimation (HG+WE) • HG + WE (visual contents only) • HG + WE (textual contents only) • Performance evaluation metric • Normalised Discounted Cumulative Gain (NDCG)
  28. 28. The NDCG@20 Results of different methods 0.8814 0.8578 0.8463 0.7418 0.6281 0.5994 0.5778 0.5727 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 The Average NDCG@20 Results
  29. 29. Average NDCG@k comparison • This approach consistently outperforms the other methods Depth for NDCG
  30. 30. • Top results obtained by different methods for the query weapon. • the final ranking list can preserve images from all different meanings
  31. 31. • Top results obtained by different methods for the query apple. • the proposed method can return relevant results with different meanings
  32. 32. The effects of hyperedge weight learning Top 100 visual words with the highest weights after the hypergraph learning process
  33. 33. The effects of hyperedge weight learning Ten tags with the highest weights after the hypergraph learning process for the queries (a) car and (b) weapon.
  34. 34. Variation of weighting parameters Average NDCG@20 performance curves with respect to the variation of λ and μ.
  35. 35. Variation of dictionary size NDCG@20 comparison of the proposed method with different sizes of the tag and visual word dictionaries, i.e., 𝑛 𝑐 and 𝑛 𝑡.
  36. 36. Variation of max. number of tags NDCG@20 comparison of the proposed method with different 𝑛𝑙 selection The parameter 𝑛𝑙 is employed to filter noise tags
  37. 37. Computational cost comparison
  38. 38. Conclusion
  39. 39. Conclusion • Proposal : joint utilization of both visual contents and tags by hypergraph and relevance learning procedure for social image search. • Consideration of weights of hyperedges • Differ from previous hypergraph learning algorithms • Minimizes the effects of uninformative features • Future work • Diversity of search results : Next issue
  40. 40. Thank you ! Q&A

×