Publicité

Categorisation at Jellysmack by Virginie Cornu, VP Data @Jellysmack

Paris Women in Machine Learning and Data Science
30 Nov 2021
Publicité

Contenu connexe

Publicité

Plus de Paris Women in Machine Learning and Data Science(20)

Publicité

Categorisation at Jellysmack by Virginie Cornu, VP Data @Jellysmack

  1. Empower Creators with IA Paris WiMLDS - 29/11/2021
  2. Jellysmack detects and develops the world’s most talented video Creators through technology. We are the only company building the hyper-engaged communities that every Creator dreams of. We optimize and distribute video content across social platforms allowing Creators to reach authentic new fans with zero effort. Through the power of our data, we maximize reach and revenue so our Creators can stay focused on their passion—creating the best content. Jellysmack unlocks each creator’s full potential.
  3. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. Genesis of Jellysmack TV is being replaced by social media as the new source of entertainment
  4. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 5 So let’s create thematic channels on social networks and on several of them! and many more
  5. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 6 Let’s create content but let’s make sure our content will be successful! That’s when the Data came into play:
  6. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 7 Jellysmack Creator 10B views / month 500M reached / month
  7. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 9 Expertise + AI tools to serve Creators! Our goal : to optimize and distribute content cross-platform in order to reach a bigger audience. The challenge : every platform is different and each content has to adapted accordingly. Engagements: 10sec. 16x9 Engagements: 3sec. 4x3 Youtube Facebook Original video Logging Reframing Editing ...
  8. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 10 First success: Reaction videos The start of a Data Science journey to business value First problem: How to find more of them when there’s no precise categories on social media platforms?
  9. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 12 Proprietary Private Public Data coming from platforms on our own channels Data enriched and generated by Jellysmack Data directly accessible from the platforms Data at heart
  10. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 13 A text classification based approach Why Text? Data at our hand ● Title ● Tag ● Comments ● Thumbnail ● Video
  11. Human to the rescue
  12. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 15 Dataset
  13. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 16 DATASET
  14. EXPERIMENT ● TF-IDF ● CNN ● MG-CNN ● ULMFiT ● BERT
  15. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 18 A text classification based approach : TF-IDF ● Bag of words ● Logistic Regression ● Millions of videos = huge dimension (V > 10k)
  16. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 19 A text classification based approach : CNN MG-CNN
  17. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 20 A text classification based approach : ULMFiT ○ Pre-trained LM ○ Transfer Learning ○ Target dataset fine tuning
  18. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 21 A text classification based approach : BERT ○ Became state-of-the-art at that time ○ Masked Language Model ○ Bi-directional model
  19. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 22 RESULTS Accuracy Time with CPU 8 Cores and Size Trained Models Reality check ! BERT is doing well but at what cost ...
  20. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. CNN Best compromise Accuracy / Speed / Memory CONCLUSION Tags Better than Titles
  21. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. How we enhanced our product to get continuous end-user feedback. Data Product
  22. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. ReactionHacks Reaction Time AzzyLand 4000+ discovered 35 signed Business Impacts & Benefits 4M$ revenue 12B views
  23. CONFIDENTIAL AND PROPRIETARY INFORMATION. NOT FOR DISCLOSURE OR PUBLICATION. 2 8 Human expertise powering Data products Data Science Human Expertise Cross-Platform Obsession
  24. Interested in joining the adventure? Check our opportunities online!
Publicité