Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Data Analytics using R with Yelp Dataset

Microsoft Student Partners - Developer's Conference Presentation

  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

Data Analytics using R with Yelp Dataset

  1. 1. Text Analytics on Dataset #DevConMru
  2. 2. Data ScienceBig Processes and systems to extract knowledge or insights from data Large and complex data that has been collected over several years
  3. 3. What is yelp ?
  4. 4. Dataset yelp_academic_dataset_business – Business Information yelp_academic_dataset_review – User Reviews Combine the 2 using
  5. 5. DEMO
  6. 6. Text Analytics Methodologies Natural Language Processing (NLP)
  7. 7. Part 1 Natural Language Processing (NLP) Microsoft Text Analytics Sentiment Analysis Keyword Extraction Topic Detection Language Detection
  8. 8. DEMO
  9. 9. Stemming Revert the word into its original or root form
  10. 10. Stemming (Results)
  11. 11. Common Words Removal
  12. 12. DEMO
  13. 13. Part 2 Natural Language Processing (NLP) Stanford CoreNLP
  14. 14. Part 2 Natural Language Processing (NLP) Stanford CoreNLP Part of Speech POS Tag Description Example CC coordinating conjunction and CD cardinal number 1, third DT determiner the EX existential there there is FW foreign word d’hoevre IN preposition/subordinating conjunction in, of, like JJ adjective big JJR adjective, comparative bigger JJS adjective, superlative biggest
  15. 15. DEMO
  16. 16. Part 2 Natural Language Processing (NLP) Stanford CoreNLP
  17. 17. Part 2 Natural Language Processing (NLP) Stanford CoreNLP
  18. 18. DEMO
  19. 19. Term Document Matrix Describes the frequency of terms that occur in a collection of documents
  20. 20. Term Document Matrix Term frequency and weighting TF-IDF weighting: give higher weight to terms that are rare
  21. 21. Unsupervised Learning K-means Clustering “Art of finding groups in data” – Kaufman, Rousseeuw • No clear picture of what is within the document • • Natural pair of groupings • • Simple to run
  22. 22. K-means Clustering
  23. 23. DEMO
  24. 24. Conclusion 1. Data Manipulation in R 2. Natural Language Processing 3. Machine Learning 4. Visualization
  25. 25. Text Analytics on Dataset Thank you.