Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Flink4 jug

680 vues

Publié le

Flink presentation to Grenoble and Lausanne Jug

Publié dans : Ingénierie
  • Soyez le premier à commenter

Flink4 jug

  1. 1. Big Data series : Apache Flink Jérôme Blachon Laurent Tardif Stéphane Thiers Juin 2015 : Jug Grenoble Septembre 2015 : Jug Lausanne
  2. 2. Qui sommes nous Jérôme Blachon Laurent Tardif Stéphane Thiers
  3. 3. Un peu d’histoire La stack Flink Demo Comment ca marche Les plus Roadmap La soirée
  4. 4. Histoire
  5. 5. BigData success story Map / Reduce OSDI 04 Map / Reduce OSDI 04 Hadoop1 Dryad Euro’Sys 07 Dryad Euro’Sys 07 TEZ RDDs HotCloud’10, NSDI’12 RDDs HotCloud’10, NSDI’12 Spark PACTs SOCC’10, VLDB’12 PACTs SOCC’10, VLDB’12 Flink Map/Reduce extended to DAG Backtracking recovery Map/Reduce extended to DAG Backtracking recovery Small recoverable tasks Sequencial code Small recoverable tasks Sequencial code Functional implementation of Dryad recovery Functional implementation of Dryad recovery Cyclic Graph (and incremental construction) Query Processing runtime embed in DAG engine Cyclic Graph (and incremental construction) Query Processing runtime embed in DAG engine Stonebraker/ Cetintemel / Zdonik 2005 Stonebraker/ Cetintemel / Zdonik 2005
  6. 6. ● Keep data moving ● Low latency on critical path ● Query on stream ● High level language ● Handle stream imperfection ● Timeout (ex: avg of last 25 securities) ● Out of order (must leave window open) ● Generate predictable outcomes ● Time ordered Criteria for stream processing (1/2)
  7. 7. ● Integrate stored / streaming data ● Uniform language for both stored and streamed data ● Combine streamed and stored data ● Data safety / availability ● Resistant to failure ● Partition and scale automatically ● Process and respond instantaneously ● 100 000 msg / s Criteria for stream processing (2/2)
  8. 8. Big data stack
  9. 9. The stack Data Processing engineData Processing engine User requirementUser requirement App and ressource managementApp and ressource management Storage / streamStorage / stream
  10. 10. Eco system Applications Data processing engines App and resource management Storage/Stream
  11. 11. Une autre vue http://practicalanalytics.wordpress.com
  12. 12. Demo
  13. 13. Word count The hello world // read test file or in Memory, and generate a set of String DataSet<String> text = getTextDataSet(env); DataSet<Tuple2<String, Integer>> counts = // split up the lines in pairs (2-tuples) containing: (word,1) text.flatMap(new Tokenizer()) // group by the tuple field "0" and sum up tuple field "1“ .groupBy(0) .sum(1);
  14. 14. Word count “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) (or,1)(or,1) Flatmap(tojenizer) groupby sum
  15. 15. Data in memory public static final String[] WORDS = new String[] { "To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", "Or to take arms against a sea of troubles,", "And by opposing end them?--To die,--to sleep,--", "No more; and by a sleep to say we end", "The heartache, and the thousand natural shocks", "That flesh is heir to,--'tis a consummation", "Devoutly to be wish'd. To die,--to sleep;--", ….
  16. 16. File private static DataSet<String> getTextDataSet(ExecutionEnvironment env) { return env.readTextFile(textPath); }
  17. 17. With POJO public static class Word { // fields private String word; private Integer frequency; // constructors public Word() { } public Word(String word, int i) { this.word = word; this.frequency = i; } // getters setters // to String @Override public String toString() { return "Word="+word+" freq="+frequency; }
  18. 18. Pojo “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", Word 1 {to,1}Word 1 {to,1} Word 2 {be,1}Word 2 {be,1} Word 3 {or,1}Word 3 {or,1} Word 1 {to,1} Word 5 {to,1} Word 1 {to,1} Word 5 {to,1} Word 2 {be,2} Word 6 {be,1} Word 2 {be,2} Word 6 {be,1} Word 3 {be,1}Word 3 {be,1} Word7 {to,2}Word7 {to,2} Word8 {be,2}Word8 {be,2} Word9 {or,1}Word9 {or,1} Flatmap(tokenizer) groupby sum
  19. 19. JDBC (“To be, or not to be,--that is the question:--")(“To be, or not to be,--that is the question:--") ("Whether 'tis nobler in the mind to suffer")("Whether 'tis nobler in the mind to suffer") (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) (or,1)(or,1) Map + Flatmap(tokenizer) groupby sum hamlet “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer",
  20. 20. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum (or,1)(or,1)
  21. 21. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1)
  22. 22. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1)
  23. 23. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1) (or,1)(or,1)
  24. 24. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1) (or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1) (or,1)(or,1)
  25. 25. Stream “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1) (or,1) (to,2)(to,2) (be,2)(be,2) (or,2) Flatmap(tokenizer) groupby sum "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", "Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,", (or,1)(or,1) (or,1)(or,1)
  26. 26. Multiple “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) (or,1)(or,1) (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum (or,1)(or,1) “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
  27. 27. Multiple “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) ...... (to,2)(to,2) (be,2)(be,2) Flatmap(tokenizer) groupby sum “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", “To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--", (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1)(to,1) (be,1)(be,1) (or,1)(or,1) (to,1) (to,1) (to,1) (to,1) (be,1) (be,1) (be,1) (be,1) Groupby + sum (to,6)(to,6) (be,6)(be,6) (or,3)(or,3) ...... ...... ......
  28. 28. Demo Produit1 , 14 , 1/6/2015Produit1 , 14 , 1/6/2015 Produit2 , 13.5 , 1/6/2015Produit2 , 13.5 , 1/6/2015 Produit3 , 24 , 1/6/2015Produit3 , 24 , 1/6/2015 Produit1 , 14 , 30/5/2015 Produit1 , 14 , 30/5/2015Produit2 , 13 , 30/5/2015 Produit2 , 13 , 30/5/2015Produit3 , 24 , 30/5/2015 Produit3 , 24 , 30/5/2015Produit4 , 124 , 30/5/2015 Produit4 , 124 , 30/5/2015 Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5 Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5 Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015 Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015 Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015 Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015
  29. 29. Demo 2 : twitter twit, Flink is…, 1/6/2015twit, Flink is…, 1/6/2015 twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015 twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015 twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015 twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015 twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015 twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015 Cloud TagCloud Tag @writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015 @writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015 @writer3: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015 @writer3: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015 JiraJira stackoverflowstackoverflow
  30. 30. Demo 3 : scala shell … Word count demo from flink scalashell ...
  31. 31. Demo 4 : ML demo Classifier (SVM) from MLLib – Scala only Learn + Predict
  32. 32. Some basics (covered by demo) type, streaming, loop,….
  33. 33. Tuples avec des types primitifs DataSet<Tuple2<String, Integer>> wordCounts = env.fromElements( new Tuple2<String, Integer>("hello", 1), new Tuple2<String, Integer>("world", 2)); Pojo (constructor + get/set) public class WordWithCount { public String word; public int count; public WordCount() {} public WordCount(String word, int count) { this.word = word; this.count = count; } } Hadoop org.apache.hadoop.Writable interface Data
  34. 34. //local file system DataSet<String> localLines = env.readTextFile("file:///path/to/my/textfile"); // read text file from a HDFS running at nnHost:nnPort DataSet<String> hdfsLines = env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile"); // read a CSV file with three fields DataSet<Tuple3<Integer, String, Double>> csvInput = env.readCsvFile("hdfs:///the/CSV/file") .types(Integer.class, String.class, Double.class); // create a set from some given elements DataSet<String> value = env.fromElements("Foo", "bar", "foobar", "fubar"); Data sources : File based
  35. 35. // Read data from a relational database using the JDBC input format DataSet<Tuple2<String, Integer> dbData = env.createInput( // create and configure input format JDBCInputFormat.buildJDBCInputFormat() .setDrivername("org.apache.derby.jdbc.EmbeddedDriver") .setDBUrl("jdbc:derby:memory:persons") .setQuery("select name, age from persons") .finish(), // specify type information for DataSet new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO, INT_TYPE_INFO) ); Data sources
  36. 36. // text data DataSet<String> textData = // [...] // write DataSet to a file on the local file system textData.writeAsText("file:///my/result/on/localFS"); // write DataSet to a file on a HDFS with a namenode running at nnHost:nnPort textData.writeAsText("hdfs://nnHost:nnPort/my/result/on/localFS"); // write DataSet to a file and overwrite the file if it exists textData.writeAsText("file:///my/result/on/localFS", WriteMode.OVERWRITE); // tuples as lines with pipe as the separator "a|b|c" DataSet<Tuple3<String, Integer, Double>> values = // [...] values.writeAsCsv("file:///path/to/the/result/file", "n", "|"); Data Sinks
  37. 37. Variable and storage DataSet<Tuple...> large = env.readCsv(...); DataSet<Tuple...> medium = env.readCsv(...); DataSet<Tuple...> small = env.readCsv(...); DataSet<Tuple...> LargeAndMedium = large.join(medium) .where(3).equals(1) .with(new JoinFunction() { ... }); DataSet<Tuple...> LargeMediumAndSmall= small.join(joined1) .where(0).equals(2) .with(new JoinFunction() { ... }); DataSet<Tuple...> result = LargeMediumAndSmall.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> otherresult = LargeMedium.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> oneMoreresult = Large.groupBy(3).aggregate(MAX, 2);
  38. 38. Map Filter Reduce Join Cross Union First-n …. Lazy Evaluation Operators
  39. 39. Datastream continuous, parallel, immutable stream of data Socket stream (twitter, …) Message Queue connector (RabbitMQ) FileStream Streaming
  40. 40. Iterative  Algorithms that need iterations  Clustering (K-Means, Canopy, …)  Gradient descent (e.g., Logistic Regression, Matrix Factorization)  Graph Algorithms (e.g., PageRank, Line-Rank, components, paths, reachability, centrality, )  Graph communities / dense sub-components  Inference (believe propagation)  … Loop makes multiple passes over the data 40
  41. 41. Windowing (to,2)(to,2) (be,2)(be,2)…… .window(Count.of(4)).every(Count.of(2)) 41 Count Time …. Count Time …. Count Time …. Count Time ….
  42. 42. Windowing (to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2)…… .window(Count.of(4)).every(Count.of(2)) 42 Count Time …. Count Time …. Count Time …. Count Time ….
  43. 43. Windowing (to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2) (my,2)(my,2) (king,1)(king,1)…… .window(Count.of(4)).every(Count.of(2)) 43 Count Time …. Count Time …. Count Time …. Count Time ….
  44. 44. Go inside Flink
  45. 45. © 2015 Persistent Systems Ltd 45
  46. 46. Comment ca marche : idée naïve CodeCode Flink Job Mana ger Job Mana ger Execution Plan Execution Plan DataData ResultsResults
  47. 47. Execution plan
  48. 48. We have resources, let’s optimize it ! CodeCode Flink Job Mana ger Job Mana ger Execution Plan Execution Plan DataData ResultResult DataData ResultResult DataData ResultResult DataData ResultResult
  49. 49. Distributed Runtime 49 Master (Job Manager) handles job submission, scheduling, and metadata Workers (Task Managers) execute operations Data can be streamed between nodes All operators start in-memory and gradually go out-of-core
  50. 50. How the magic happen - Flink Runtime - Flink Optimizer 50
  51. 51.  The optimizer is the component that selects an execution plan for a Common API program  Think of an AI system manipulating your program for you   But don’t be scared – it works • Relational databases have been doing this for decades – Flink ports the technology to API-based systems Flink Optimizer 51
  52. 52. Program lifecycle 52 valsource1 = … valsource2 = … valm axed = source1 .m ap(v = > (v._1,v._2, m ath.m ax(v._1,v._2)) valfiltered = source2 .filter(v = > (v._1 > 4)) valresult= m axed .join(filtered).w here(0).equalTo(0) .filter(_1 > 3) .groupB y(0) .reduceG roup {… … } 1 3 4 5 2
  53. 53. Forwarded fields @ForwardedFields("f0->f2") public class MyMap implements MapFunction<Tuple2<…>, Tuple3<…>> { @Override public Tuple3<…> map(Tuple2<…> val) { return new Tuple3<…>("foo", val.f1 / 2, val.f0);} } Some fancy stuff to help him
  54. 54. Partitioning Partitioning controls how individual data points of a stream are distributed/ordering among the parallel instances of the transformation operators. There are several partitioning types supported in Flink Streaming: Ex : Forward(default): Forward partitioning directs the output data to the next operator on the same machine (if possible) avoiding expensive network I/O Shuffle: Shuffle partitioning randomly partitions the output data stream to the next operator using uniform distribution. Rebalance: Rebalance partitioning directs the output data stream to the next operator in a round-robin fashion Broadcast: Broadcast partitioning sends the output data stream to all parallel instances of the next operator. Usage: dataStream.broadcast() Some fancy stuff to help him
  55. 55. Performance
  56. 56. ● -Plus d'info soon ● Demo sur 100.000 produits/3 ans de prix => ~ 20 minutes ● Sur un “petit cluster” de 3 noeuds : 4 procs, 8gb de ram virtualisé Performance
  57. 57. Limites
  58. 58. API still moving Diagnosic is hard …. Flink, hadoop, network, OS , jvm … Heap usage (too ?) important Limitation
  59. 59. API & Big Data eco system
  60. 60. The growing Flink stack 60 Flink Optimizer Flink Stream Builder Common API Scala API Java API Python API (upcoming) Graph API Apache MRQL Flink Local Runtime Embedded environment (Java collections) Local Environment (for debugging) Remote environment (Regular cluster execution) Apache Tez Data storage HDFSFiles S3 JDBC Redis Rabbit MQ Kafka Azure tables … Single node execution Standalone or YARN cluster
  61. 61. Roadmap 61
  62. 62. Flink Roadmap Currently being discussed by the Flink community Flink has a major release every 3 months, and one or more bug-fixing releases between major releases Caveat: rough roadmap, depends on volunteer work, outcome of community discussion, and Apache open source processes 62
  63. 63. Roadmap for 2015 (highlights) Q1 Q2 Q3 APIs Logical Query integration Additional operators Interactive programs Interactive Scala shell SQL-on- Flink Optimizer Semantic annotations HCatalog integration Optimizer hints Runtime Dual engine (blocking & pipelining) Fine-grained fault tolerance Dynamic memory allocation Streaming Better memory manageme nt More operators in API At-least- once processing guarantees Unify batch and streaming Exactly- once processing guarantees ML library First version Additional algorithms Mahout integration Graph library First version Integratio n Tez, Samoa Mahout 63
  64. 64. Integration with other projects Machine Learning – Samoa (incubating): distributed streaming machine learning (ML) framework Apache Tez (run complex directed- acyclic-graph of tasks for processing data ) (simplify Pig, Hive task definition) Storage – Tachyon(Tachyon is a memory-centric distributed storage system) Mahout (Data analytics) – H2O (distributed scalable machine learning system) Apache Hive (High level langage for data processing) ● Expected Q3/Q4 2015 Apache Zepelin (inc.) A web- based notebook that enables interactive data analytics. 64
  65. 65. And many more… Runtime: even better performance and robustness Using off-heap memory, dynamic memory allocation Improvements to the Flink optimizer Integration with HCatalog, better statistics Runtime optimization Streaming graph and ML pipeline libraries 65
  66. 66. Sumary and conclusion
  67. 67. Flink is optimized for cyclic or iterative processes by using iterative transformations on collections. Flink streaming processes data streams as true streams, i.e., data elements are immediately "pipelined" though a streaming program as soon as they arrive. This allows to perform flexible window operations on streams. Built-in optimizer Flink in one slide
  68. 68. flink.apache.org http://flink-forward.org/ : 15 oct : Berlin

×