Spark Day 2017- Spark 의 과거, 현재, 미래

•

19 j'aime•1,174 vues

This document contains code for a basic word count program written in Java using Apache Spark. It defines Mapper and Reducer classes to count the frequency of words in a text file. The main method sets up the job configuration and runs the job. Other sections provide links about the history of Spark and summaries of Spark surveys from 2016 and 2017 focusing on trends in machine learning, streaming, and scaling Spark applications.

Technologie

https://github.com/apache/spark/commits/master?after=9e50a1d37a4cf0c34e20a7c1a910ceaff41535a2+1574&author=mateiz
http://blog.madhukaraphatak.com/history-of-spark/

package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<Lon
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Tex
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer
public void reduce(Text key, Iterator values, OutputCollector<Tex
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
wordcount

file.flatMap(line => line.split(" “))
.map(word => (word, 1))
.reduceByKey(_ + _)
wordcount
Logistic regression in Hadoop and Spark

2016 Spark survey
https://www.slideshare.net/SparkSummit/trends-for-big-data-and-apache-spark-in-2017-by-matei-zaharia

Learning 52
Streaming 39
Machine 38
Deep 33
Science 30
Analyics 27
Scale 21
Developer 20
Enterprise 20
Ecosystem 20
Research 13
Applications 9
Processing 8
Pipeline 8
Platform 7

2017 Spark summithttps://www.slideshare.net/SparkSummit/trends-for-big-data-and-apache-spark-in-2017-by-matei-zaharia

• http://blog.madhukaraphatak.com/history-of-spark/
• http://cdn2.hubspot.net/hubfs/438089/DataBricks_Surveys_-_Content/2016_Spark_Survey/
2016_Spark_Infographic.pdf
• http://cdn2.hubspot.net/hubfs/438089/DataBricks_Surveys_-_Content/Spark-Survey-2015-
Infographic.pdf

Contenu connexe

Tendances

MySQL Slow Query log Monitoring using Beats & ELKYoungHeon (Roy) Kim

OCCI-OS tutorialAlan Sill

Spraykatz installation & basic usageSylvain Cortes

Sensu wrapper-sensu-summitLee Briggs

Retrofit 2 - O que devemos saberBruno Vieira

Docker at Digital OceanCloud 66

Let's break apache spark workshopGrzegorz Gawron

Pf: the OpenBSD packet filterGiovanni Bechis

Docker deployEric Ahn

Intro djangoAlexander Lyabah

How Secure Are Docker Containers?Ben Hall

G*なクラウド雲のかなたにショートバージョンTsuyoshi Yamamoto

.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...Maarten Balliauw

Go Profiling - John Graham-Cumming Cloudflare

Infinum Android Talks #16 - Retrofit 2 by Kristijan JurkovicInfinum

Retrofit Technology Overview by Cumulations TechnologiesCumulations Technologies

Go破壊Hattori Hideo

Quicli - From zero to a full CLI application in a few lines of RustDamien Castelltort

Long Tail Treasure TroveGianugo Rabellino

Quick Start Guide using Virtuozzo 7 (β) on AWS EC2Kentaro Ebisawa

Tendances (20)

MySQL Slow Query log Monitoring using Beats & ELK

OCCI-OS tutorial

Spraykatz installation & basic usage

Sensu wrapper-sensu-summit

Retrofit 2 - O que devemos saber

Docker at Digital Ocean

Let's break apache spark workshop

Pf: the OpenBSD packet filter

Docker deploy

Intro django

How Secure Are Docker Containers?

G*なクラウド雲のかなたにショートバージョン

.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...

Go Profiling - John Graham-Cumming

Infinum Android Talks #16 - Retrofit 2 by Kristijan Jurkovic

Retrofit Technology Overview by Cumulations Technologies

Go破壊

Quicli - From zero to a full CLI application in a few lines of Rust

Long Tail Treasure Trove

Quick Start Guide using Virtuozzo 7 (β) on AWS EC2

Similaire à Spark Day 2017- Spark 의 과거, 현재, 미래

Create & Execute First Hadoop MapReduce Project in.pptxvishal choudhary

HadoopMukesh kumar

Ratpack JVM_MX Meetup February 2016Domingo Suarez Torres

AJUG April 2011 Raw hadoop exampleChristopher Curtin

T2 reading 20101126Go Tanaka

JRubyKaigi2010 Hadoop PapyrusKoichi Fujikawa

Writing Hadoop Jobs in Scala using ScaldingToni Cebrián

High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi

Hadoop Installation_13_09_2022(1).docx1MS20CS406

Hibernate Import.Sql I18nyifi2009

Testing multi outputformat based mapreduceAshok Agarwal

The Gradle in Ratpack: DissectedDavid Carr

JS everywhere 2011Oleg Podsechin

Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon

JJUG CCC 2011 SpringKiyotaka Oku

Server1FahriIrawan3

EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev

Full stack analytics with Hadoop 2Gabriele Modena

Akka Cluster in Java - JCConf 2015Jiayun Zhou

AJUG April 2011 Cascading exampleChristopher Curtin

Similaire à Spark Day 2017- Spark 의 과거, 현재, 미래 (20)

Create & Execute First Hadoop MapReduce Project in.pptx

Hadoop

Ratpack JVM_MX Meetup February 2016

AJUG April 2011 Raw hadoop example

T2 reading 20101126

JRubyKaigi2010 Hadoop Papyrus

Writing Hadoop Jobs in Scala using Scalding

High-level Programming Languages: Apache Pig and Pig Latin

Hadoop Installation_13_09_2022(1).docx

Hibernate Import.Sql I18n

Testing multi outputformat based mapreduce

The Gradle in Ratpack: Dissected

JS everywhere 2011

Scalable and Flexible Machine Learning With Scala @ LinkedIn

JJUG CCC 2011 Spring

Server1

EuroPython 2015 - Big Data with Python and Hadoop

Full stack analytics with Hadoop 2

Akka Cluster in Java - JCConf 2015

AJUG April 2011 Cascading example

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Real Time Object Detection Using Open CVKhem

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Histor y of HAM Radio presentation slidevu2urc

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

A Domino Admins Adventures (Engage 2024)Gabriella Davis

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Powerful Google developer tools for immediate impact! (2023-24 C)

Driving Behavioral Change for Information Management through Data-Driven Gree...

Real Time Object Detection Using Open CV

The Codex of Business Writing Software for Real-World Solutions 2.pptx

08448380779 Call Girls In Civil Lines Women Seeking Men

Data Cloud, More than a CDP by Matt Robison

Exploring the Future Potential of AI-Enabled Smartphone Processors

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

What Are The Drone Anti-jamming Systems Technology?

IAC 2024 - IA Fast Track to Search Focused AI Solutions

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

How to Troubleshoot Apps for the Modern Connected Worker

Histor y of HAM Radio presentation slide

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

A Domino Admins Adventures (Engage 2024)

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Breaking the Kubernetes Kill Chain: Host Path Mount

Spark Day 2017- Spark 의 과거, 현재, 미래

2. https://github.com/apache/spark/commits/master?after=9e50a1d37a4cf0c34e20a7c1a910ceaff41535a2+1574&author=mateiz http://blog.madhukaraphatak.com/history-of-spark/

4. package org.myorg; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class WordCount { public static class Map extends MapReduceBase implements Mapper<Lon private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Tex String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer public void reduce(Text key, Iterator values, OutputCollector<Tex int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } } wordcount

5. file.flatMap(line => line.split(" “)) .map(word => (word, 1)) .reduceByKey(_ + _) wordcount Logistic regression in Hadoop and Spark

10. 2016 Spark survey

11. 2016 Spark survey

12. 2016 Spark survey

13. 2016 Spark survey https://www.slideshare.net/SparkSummit/trends-for-big-data-and-apache-spark-in-2017-by-matei-zaharia

14. 2016 Spark survey

15. 2016 Spark survey

16. 2016 Spark survey

17. 2016 Spark survey

18.

19. Learning 52 Streaming 39 Machine 38 Deep 33 Science 30 Analyics 27 Scale 21 Developer 20 Enterprise 20 Ecosystem 20 Research 13 Applications 9 Processing 8 Pipeline 8 Platform 7

20. 2017 Spark summithttps://www.slideshare.net/SparkSummit/trends-for-big-data-and-apache-spark-in-2017-by-matei-zaharia

21. 2017 Spark summithttps://www.slideshare.net/SparkSummit/trends-for-big-data-and-apache-spark-in-2017-by-matei-zaharia

22. • http://blog.madhukaraphatak.com/history-of-spark/ • http://cdn2.hubspot.net/hubfs/438089/DataBricks_Surveys_-_Content/2016_Spark_Survey/ 2016_Spark_Infographic.pdf • http://cdn2.hubspot.net/hubfs/438089/DataBricks_Surveys_-_Content/Spark-Survey-2015- Infographic.pdf

Spark Day 2017- Spark 의 과거, 현재, 미래

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Spark Day 2017- Spark 의 과거, 현재, 미래

Similaire à Spark Day 2017- Spark 의 과거, 현재, 미래 (20)

Dernier

Dernier (20)

Spark Day 2017- Spark 의 과거, 현재, 미래