Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Do Better ImageNet Models Transfer Better... for Image Recommendation?

Article presented at the RecSysKTL workshop, co-located at ACM RecSys 2018

  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

Do Better ImageNet Models Transfer Better... for Image Recommendation?

  1. 1. Do Better ImageNet Models Transfer Better … for Image Recommendation? FelipedelRío,PabloMessina,VicenteDominguez,DenisParra CS Department Schoolof Engineering PontificiaUniversidadCatólicadeChile KTLRecSysWorkshop,6de Octubrede 2018
  2. 2. Artwork Recommendation • Online artwork market: Growing since 2008, despite global crises! – In 2011, art received $11.57 billion in totalglobal annual revenue, over $2 billion versus 2010 (*forbes) • Previous recommendation projects date for as long as 2007, such as the CHIP project to recommend paintings from Rijksmuseum. • Little use of recent advances in Deep Neural Networks for Computer Vision. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 2 [forbes] The World’s Strongest Economy? the Global Art Market. https://www.forbes.com/sites/abigailesman/2012/02/29/the- worlds- strongest- economy- the- global- art- market/ (2012)
  3. 3. Image Recommendation • Since 2017 we have been working on recommending art images, using data from the online store UGallery. • Two papers published: – DLRS 2017: Dominguez,V., Messina, P., Parra,D., Mery,D., Trattner, C., & Soto,A. (2017, August). ComparingNeural and Attractiveness- based Visual Features for ArtworkRecommendation.In Proceedingsof the 2nd WorkshoponDeep Learning for RecommenderSystems(pp. 55-59). ACM. – UMUAI2018: Messina, P., Dominguez,V., Parra,D., Trattner,C., & Soto, A. (2018). Content-basedartworkrecommendation:integrating painting metadata with neural and manually-engineeredvisualfeatures. User Modelingand User-AdaptedInteraction,1-40. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 3
  4. 4. Data: UGallery • Online Artwork Store, based on CA, USA. • Mostly sales one-of-a-kind physical artwork. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 4
  5. 5. Image Recommendation • Our top approach is a hybrid recommender, based on metadata and visual features from Deep Convolutional Neural Networks. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 5
  6. 6. Motivation • When submitting our work we usually received criticism for not using the latest DNN model. • An actual review from a previous article submission (2017): << Overall an interesting paper although … the choice of AlexNet is rather odd as there are better pre-trained networks available e.g. VGG16 >> October 6th, 2018 del Rio et al ~ RecSysKTL 2018 6
  7. 7. Motivation • Is it always the case that better pre-trained deep convolutional models (on the Imagenet Challenge) produce better results in a transfer learning setting? October 6th, 2018 del Rio et al ~ RecSysKTL 2018 7
  8. 8. ImageNet: Crowdsourcing a Large Dataset of Image Labels
  9. 9. Datasets in Computer Vision • 1996: faces and cars 14,000 images of 10,000 people • 1998: MNIST 70,000 images of handwritten digits • 2004: Caltech 101, 9,146 images of 101 categories • 2005: PASCAL VOC 20,000 images with 20 classes October 6th, 2018 del Rio et al ~ RecSysKTL 2018 9
  10. 10. Datasets in Computer Vision • Imagenet: Presented in 2009 at CVPR • Crowdsourced • 14,197,122 images • 21,841 categories (non-empty synsets) • Categories based on WordNet taxonomy October 6th, 2018 del Rio et al ~ RecSysKTL 2018 10
  11. 11. WordNet • Wordnet: Miller’s project started in 1980 at Princeton, a hierarchy for the English language • Prof. Fei-Fei Li (UIUC, Princeton, Stanford), worked on filling WordNet with many images. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 11
  12. 12. Crowdsourced • Amazon Mechanical Turk • It took 2.5 years to complete. Originally 3.2 million images in 5,247 categories (mammal, vehicle, etc.) October 6th, 2018 del Rio et al ~ RecSysKTL 2018 12
  13. 13. ImageNet Challenge • The dataset was used to set a competition for image classification: from 2010 on. • In 2012 a team used deep learning, got error rate below 25% (Hinton et al.), 10.8 point margin, 41% better than next best. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 13
  14. 14. Transfer Learning • 2012 model was called AlexNet: a Convolutional Neural Network • The features learned (fc6, fc7) have been used in succesfully, allowing to transfer the learning to other tasks. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 14
  15. 15. Recent Imagenet results https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models October 6th, 2018 del Rio et al ~ RecSysKTL 2018 15 Method Top-1 Accuracy Top-5 Accuracy NASNet Large 82.7 96.2 InceptionResNetV2 80.4 95.3 InceptionV3 78.0 93.9 ResNet50 75.6 92.8 VGG19 71.1 89.8
  16. 16. Inspiration • Simon Kornblith, Jonathon Shlens, and Quoc V. Le. 2018. Do Better ImageNetModels Transfer Better? (2018). https://arxiv.org/abs/1805.08974 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 16 Without tuning ResNet outperforms NASNet (SOTA)
  17. 17. Evaluation 1 • Do pre-trained ImageNet model performance correlate with Image recommendation performance? October 6th, 2018 del Rio et al ~ RecSysKTL 2018 17
  18. 18. Ugallery Data and Evaluation • 1,371 users / 3,940 items / 2,846 transactions October 6th, 2018 del Rio et al ~ RecSysKTL 2018 18
  19. 19. Recommendation • Scoring items based on cosine similarity between user model and item model: October 6th, 2018 del Rio et al ~ RecSysKTL 2018 19
  20. 20. Experiment 1: Results October 6th, 2018 del Rio et al ~ RecSysKTL 2018 20
  21. 21. Experiment 1: Results October 6th, 2018 del Rio et al ~ RecSysKTL 2018 21 • No correlation between ImageNet performance and image recommendation performance.
  22. 22. Experiment 2 • What is the effect of fine-tuning? • How should fine-tuning be performed? October 6th, 2018 del Rio et al ~ RecSysKTL 2018 22
  23. 23. Tuning I: Shallow vs. Deep October 6th, 2018 del Rio et al ~ RecSysKTL 2018 23 Shallow fine-tuning
  24. 24. Tuning I: Shallow vs. Deep October 6th, 2018 del Rio et al ~ RecSysKTL 2018 24 Deep fine tuning
  25. 25. Learning: Multitask vs. Single Task October 6th, 2018 del Rio et al ~ RecSysKTL 2018 25 • Dataset 1: Omniart – 432,217 images – Target classes: artist, artwork type, year • Dataset 2: Ugallery – 3,940 images – Target classes: artist, medium (oil, acrylic, etc.)
  26. 26. Omniart Dataset • http://isis-data.science.uva.nl/strezoski/#3 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 26
  27. 27. Results 1 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 27 Deep fine-tuning worked better than shallow fine tuning
  28. 28. Results 2 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 28 • ResNet was better than shallow fine-tuning • Consistent with Kornblith et al., ResNet is the best generic visual feature extractor
  29. 29. Results 3 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 29 • Training with a smaller but focused target dataset results in better transfer learning performance
  30. 30. Results 4 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 30 • There was not a clear winner between multitask and single task, probably because the artist category is really descriptive
  31. 31. Conclusion • Pre-trained neural image embeddings are great, but do not assume that performance in the original task is correlated with a current recommendation task. • If you are still going to used a pre-trained Imagenet visual embedding, ResNet is a good option, although is not the current SOTA in ILSVRC. • Fine-tuning is strongly suggested, even if your dataset is small, October 6th, 2018 del Rio et al ~ RecSysKTL 2018 31
  32. 32. THANKS! dparra@ing.puc.cl

×