Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

5 418 vues

Publié le

This talk shows how we can use Apache Flink and Apache Zeppelin to do interactive data analysis. The examples show the usage of FlinkML to solve a linear regression and classification problem.

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

  1. 1. Till Rohrmann Flink PMC member trohrmann@apache.org @stsffap Interactive Data Analysis with Apache Flink
  2. 2. Data Analysis 1
  3. 3. Exploratory Data Analysis §  Visualize data §  Calculate main characteristics §  Understand data and find possibly new hypothesis 2
  4. 4. Data Analysts 3
  5. 5. Read-Evaluate-Print Loop §  New Scala shell offers REPL §  Interactive queries §  Let’s you explore data quickly 4
  6. 6. Scala Shell 5
  7. 7. Simple Scala Shell Example 6
  8. 8. Problems §  No visualization §  No saving or replaying of written code §  No assistance à Bad IDE 7
  9. 9. Notebooks §  Web-based interactive computation environment §  Combines rich text, execution code, plots and rich media §  Storytelling 8
  10. 10. Apache Zeppelin §  Web-based REPL with pluggable interpreters §  Since 2014 in the Apache Incubator §  Supported interpreters: •  Flink •  Spark •  Python •  Markdown •  Many more … 9
  11. 11. Word Count with Zeppelin §  Find the 10 most frequent words with more than 4 letters in the King James version of the bible. 10
  12. 12. 11
  13. 13. 12
  14. 14. 13
  15. 15. 14
  16. 16. Linear regression §  Let’s predict the influence of advertisement spending on sales §  Input data set: http://www-bcf.usc.edu/~gareth/ISL/ Advertising.csv §  Features: •  TV advertisement money •  Radio advertisement money •  Newspaper advertisement money §  Response: •  Sales 15
  17. 17. 16
  18. 18. 17
  19. 19. 18
  20. 20. 19
  21. 21. 20
  22. 22. 21
  23. 23. 22
  24. 24. 23
  25. 25. 24
  26. 26. Classification §  Let’s build a classifier for insult detection §  Kaggle challenge https://www.kaggle.com/c/detecting- insults-in-social-commentary §  Label: 1 – Insult, 0 – No insult §  Feature: Comment text 25
  27. 27. 26
  28. 28. 27
  29. 29. Conclusion §  Interactive data analysis is really easy with Apache Flink §  Apache Zeppelin is great interactive notebook §  Zeppelin and Flink play well together to solve machine learning tasks and more 28
  30. 30. 29
  31. 31. flink.apache.org @ApacheFlink

×