Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

The dark art of search relevancy

1 062 vues

Publié le

PyData London 2015

Publié dans : Données & analyses
  • Soyez le premier à commenter

The dark art of search relevancy

  1. 1. The dark art of search relevancy
  2. 2. Hi, I’m Eddie. @ejlbell
  3. 3. We collect the world of fashion into a customisable shopping experience.
  4. 4. 90 thousand 3,800,000 items scraped per day. items updated / hour
  5. 5. 5
  6. 6. Why is search hard? 6
  7. 7. True NegativesFalse Negatives Selected Elements True Positives False Positives Relevant Elements Precision Recall How many selected items are relevant? How many selected items are relevant?
  8. 8. BM25 8
  9. 9. When it goes wrong… “Little Black Dress”
  10. 10. When it goes wrong… “Red Valentino”
  11. 11. BCBGMAXZRIA 11
  12. 12. Dress
  13. 13. Dress Dress Shirt
  14. 14. Dress Dress Shirt Shirt Dress
  15. 15. What is a good result?
  16. 16. Clickthrough Data
  17. 17. Crowdsource Relevance
  18. 18. Ordinal “Light Pink Heels”
  19. 19. Pairwise “Blue Trainers”
  20. 20. Pairwise “Blue Trainers”
  21. 21. Search Term Example 1.) designer + category “hermes sandal” 2.) designer + colour + category “burberry black boots” 3.) designer + fabric + category “chloe leather top” 4.) color + category “gray hoodie” 5.) designer + type “asos bag” What are people actually searching?
  22. 22. DSSM
  23. 23. DSSM with 1D convolution and max pooling C-DSSM
  24. 24. CSSM Results Search Result Score Download office Excel 0.54 Word office online 0.50 Apartment office hours 0.33 Internation office berklely 0.27 “Microsoft Office” Search Result Score Car body kits 0.70 Auto body parts 0.55 Calculate body fat 0.22 Forcefield body armour 0.17 “Car body shop”
  25. 25. Computing the results Store hidden layer representation in postgres Rank by cosine distance Speed up search with ANN Random projection trees Calculate query representation at run time Docker, Django-rest, chef, empire, auto-scaling
  26. 26. Images?
  27. 27. Images Train 8 layer CNN Swap out soft max layer A soft max layer for each label of interest 60 Million parameters Build robust learned representations Represents products as a 4096 element vector
  28. 28. Image Classifier Sub-category Score Male, clothing, suits, 3 piece suits 0.72263 Male, clothing, suits, 2 piece suits 0.12102 Male, clothing, jackets, formal jackets 0.10818 Male, clothing, coats, trench coats 0.02396 Male, clothing, jackets, casual jackets 0.00949 Colors Blue 0.99397 Gray 0.00481 Black 0.00071
  29. 29. Search is fun but search is hard Conclusion 33
  30. 30. thank you

×