Apache Flink - Hadoop MapReduce Compatibility

2. Hadoop MapReduce Jobs Input Map Reduce Output InputFormat Mapper Reducer OutputFormat • Jobs have a static structure. • Input, Output, Map, Reduce run your custom (or library) code. • If application logic is too complex, you need more than one job.

3. Flink Programs Source Map Reduce Source Source Filter Join CoGroup Sink • Flink program are DAG data flows. • Data Sources, Data Sinks, Map and Reduce operators are included. • Everything that MapReduce gives and much more (super set). • Much better performance • Especially if more than 1 MR job is executed.

4. Run your Hadoop code with Flink? • Hadoop data types (Writable) are natively supported. • Hadoop Filesystems are natively supported. • Flink features Input- & OutputFormats, Map, and Reduce functions, just like Hadoop MapReduce. • Concepts are the same, but interfaces are not :-( But Flink provides wrappers for Hadoop code :-) • mapred.* API: In/OutputFormat, Mappers, & Reducers • mapreduce.* API: In/OutputFormat

5. Alright, sounds good… … but will my WordCount still work?!?

6. final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // set up Hadoop InputFormat HadoopInputFormat<LongWritable, Text> hadoopInputFormat = new HadoopInputFormat<LongWritable, Text>(new TextInputFormat(), LongWritable.class, Text.class, new JobConf()); TextInputFormat.addInputPath(hadoopInputFormat.getJobConf(), new Path(inputPath)); DataSet<Tuple2<LongWritable, Text>> text = env.createInput(hadoopInputFormat); // read data with Hadoop InputFormat DataSet<Tuple2<Text, LongWritable>> words = // apply Hadoop Mapper text.flatMap(new HadoopMapFunction<LongWritable, Text, Text, LongWritable>(new Tokenizer())) // apply Hadoop Reducer .groupBy(0).reduceGroup(new HadoopReduceFunction<Text, LongWritable, Text, LongWritable>(new Counter())); // set up Hadoop Output Format HadoopOutputFormat<Text, LongWritable> hadoopOutputFormat = new HadoopOutputFormat<Text, LongWritable>(new TextOutputFormat<Text, LongWritable>(), new JobConf()); hadoopOutputFormat.getJobConf().set("mapred.textoutputformat.separator", " "); TextOutputFormat.setOutputPath(hadoopOutputFormat.getJobConf(), new Path(outputPath)); words.output(hadoopOutputFormat); // write data with Hadoop OutputFormat env.execute("Hadoop Compat WordCount"); // execute the program Hadoop Data Types Hadoop Input- & OutputFormats Your Hadoop Functions Yes, it will…

7. Use MapReduce like you always wanted • Freely assemble your functions into a program. • Very efficient, pipelined execution. – Program is executed on Flink (no Hadoop involved). – No writing to/reading from HDFS within a program. • Caveat: No support for custom Hadoop partitioners & sorters, yet :-( Input Map Reduce Input Output Reduce Map Reduce Output

8. WHAT TO EXPECT NEXT?

9. Hadoop Job Do not change a single line of code! • Inject MapReduce jobs as a whole into Flink programs – with support for custom partitioners, sorters, groupers. • Run Hadoop MapReduce jobs on Flink – without changing a single line of code. Source Map Reduce Source Source Filter Join CoGroup Sink

10. Looking for some fun? Try Hadoop on Flink!

Apache Flink - Hadoop MapReduce Compatibility

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Apache Flink - Hadoop MapReduce Compatibility

Similaire à Apache Flink - Hadoop MapReduce Compatibility (20)

Plus de Fabian Hueske

Plus de Fabian Hueske (13)

Dernier

Dernier (20)

Apache Flink - Hadoop MapReduce Compatibility