SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
Open Source Big Data in OPC
Edelweiss Kammermann
Frank MunzJava One 2017
munz & more #2
© IT Convergence 2016. All rights reserved.
© IT Convergence 2016. All rights reserved.
About Me
à Computer Engineer, BI and Data Integration Specialist
à Over 20 years of Consulting and Project Management experience in Oracle
technology.
à Co-founder and Vice President of Uruguayan Oracle User Group (UYOUG)
à Director of Community of LAOUC
à Head of BI Team CMS at ITConvergence
à Writer and frequent speaker at international conferences:
à Collaborate, OTN Tour LA, UKOUG Tech & Apps, OOW, Rittman Mead BI Forum
à Oracle ACE Director
© IT Convergence 2016. All rights reserved.
Uruguay
6
Dr. Frank Munz
•Founded munz & more in 2007
•17 years Oracle Middleware,
Cloud, and Distributed Computing
•Consulting and
High-End Training
•Wrote two Oracle WLS and
one Cloud book
#1
Hadoop
© IT Convergence 2016. All rights reserved.
What is Big Data?
à Volume: The high amount of data
à Variety: The wide range of different data formats and schemas.
Unstructured and semi-structured data
à Velocity: The speed which data is created or consumed
à Oracle added another V in this definition
à Value: Data has intrinsic value—but it must be discovered.
© IT Convergence 2016. All rights reserved.
What is Oracle Big Data Cloud Compute Edition?
à Big Data Platform that integrates Oracle Big Data solution with
Open Source tools
à Fully Elastic
à Integrated with Other Paas Services as Database Cloud Service, MySQL Cloud
Service, Event Hub Cloud Service
à Access, Data and Network Security
à REST access to all the funcitonality
© IT Convergence 2016. All rights reserved.
Big Data Cloud Service – Compute Edition (BDCS-CE)
© IT Convergence 2016. All rights reserved.
BDCS-CE Notebook: Interactive Analysis
à Apache Zeppelin Notebook (version0.7) to interactively work with data
© IT Convergence 2016. All rights reserved.
What is Hadoop?
à An open source software platform for distributed storage and
processing
à Manage huge volumes of unstructured data
à Parallel processing of large data set
à Highly scalable
à Fault-tolerant
à Two main components:
à HDFS: Hadoop Distributed File System for storing information
à MapReduce: programming framework that process information
© IT Convergence 2016. All rights reserved.
Hadoop Components: HFDS
à Stores the data on the cluster
à Namenode: block registry
à DataNode: block containers themselves (Datanode)
à HDFS cartoon by Mvarshney
© IT Convergence 2016. All rights reserved.
Hadoop Components: MapReduce
à Retrieves data from HDFS
à A MapReduce program is composed by
à Map() method: performs filtering and sorting of the <key, value> inputs
à Reduce() method: summarize the <key,value> pairs provided by the Mappers
à Code can be written in many languages (Perl, Python, Java etc)
© IT Convergence 2016. All rights reserved.
MapReduce Example
© IT Convergence 2016. All rights reserved.
Code Example
© IT Convergence 2016. All rights reserved.
Code Example
© IT Convergence 2016. All rights reserved.
#2
Hive
© IT Convergence 2016. All rights reserved.
What is Hive?
à An open source data warehouse software on top of Apache Hadoop
à Analyze and query data stored in HDFS
à Structure the data into tables
à Tools for simple ETL
à SQL- like queries (HiveQL)
à Procedural language with HPL-SQL
à Metadata storage in a RDBMS
© IT Convergence 2016. All rights reserved.
Hadoop & Hive Demo
#3
Spark
Revisited: Map Reduce I/O
munz & more #23
Source:	Hadoop	Application	Architecture	Book
Spark
• Orders of magnitude(s) faster than M/R
• Higher level Scala, Java or Python API
• Standalone, in Hadoop, or Mesos
• Principle: Run an operation on all data
-> ”Spark is the new MapReduce”
• See also: Apache Storm, etc
• Uses RDDs, or Dataframes, or Datasets
munz & more #24
https://stackoverflow.com/questions/31508083/difference-between-
dataframe-in-spark-2-0-i-e-datasetrow-and-rdd-in-spark
https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
RDDs
Resilient Distributed Datasets
Where do they come from?
Collection of data grouped into named columns.
Supports text, JSON, Apache Parquet, sequence.
Read	in	
HDFS,	Local	FS,	S3,	Hbase
Parallelize	
existing	Collection
Transform
other	RDD
->	RDDs	are	immutable
Lazy Evaluation
munz & more #26
Nothing	is	executed Execution
Transformations:
map(), flatMap(),
reduceByKey(), groupByKey()
Actions:
collect(), count(), first(), takeOrdered(),
saveAsTextFile(), …
http://spark.apache.org/docs/2.1.1/programming-guide.html
map(func) Return	a	new	distributed	dataset	formed	
by	passing	each	element	of	the	source	
through	a	function func.
flatMap(func) Similar	to	map,	but	each	input	item	can	be	
mapped	to	0	or	more	output	items	(so func
should	return	a	Seq rather	than	a	single	
item).
reduceByKey(func,	[numTasks]) When	called	on	a	dataset	of	(K,	V)	pairs,	
returns	a	dataset	of	(K,	V)	pairs	where	the	
values	for	each	key	are	aggregated	using	
the	given	reduce	function func,	which	must	
be	of	type	(V,V)	=>	V.	
groupByKey([numTasks]) When	called	on	a	dataset	of	(K,	V)	pairs,	
returns	a	dataset	of	(K,	Iterable<V>)	pairs.
Transformations
Spark Demo
munz & more #30
Apache Zeppelin Notebook
munz & more #31
Word Count and Histogram
munz & more #32
res =
t.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
res.takeOrdered(5, key = lambda x: -x[1])
Zeppelin Notebooks
munz & more #33
Big Data Compute Service CE
munz & more #34
#4
Kafka
Kafka
Partitioned, replicated commit log
munz & more #36
0 1 2 3 4 … n
Immutable	log:	Messages	with	offset
Producer
Consumer	A
Consumer	B
https://www.quora.com/Kafka-writes-every-message-to-broker-disk-Still-performance-wise-it-
is-better-than-some-of-the-in-memory-message-storing-message-queues-Why-is-that
Broker1
Broker2
Broker3
Topic	A	
(1)
Topic	A	
(2)
Topic	A	
(3)
Partition	/
Leader
Repl A	
(1)
Repl A	
(2)
Repl A	
(3)
Producer
Replication	/
Follower
Zoo-
keeper
Zoo-
keeper
Zoo-
keeper
State	/
HA
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
- 1 topic
- 1 partition
- Contains every article published
since 1851
- Multiple producers / consumers
Example	for	
Stream	/	Table	Duality
Kafka Clients
SDKs Connect Streams
- OOTB:	Java,	Scala
- Confluent:	Python,	C,	
C++
Confluent:
- HDFS	sink,	
- JDBC	source,
- S3	sink
- Elastic	search	sink
- Plugin	.jar	file
- JDBC:	Change	data	
capture	(CDC)
- Real-time	data	ingestion
- Microservices
- KSQL:	SQL	streaming	
engine	for	streaming	
ETL,	anomaly	detection,	
monitoring
- .jar	file	runs	anywhere
High	/	low	level	Kafka	API Configuration	only
Integrate	external	Systems
Data	in	Motion
Stream	/	Table	duality	
REST
- Language	
agnostic
- Easy	for	
mobile	apps
- Easy	to	
tunnel	
through	FW	
etc.
Lightweight
Oracle Event Hub Cloud Service
• PaaS: Managed Kafka 0.10.2
• Two deployment modes
– Basic (Broker and ZK on 1 node)
– Recommended (distributed)
• REST Proxy
– Separate sever(s) running REST Proxy
munz & more #40
Event Hub
munz & more #41
Event Hub Service
munz & more #42
Ports
You must open ports to allow access for
external clients
• Kafka Broker (from OPC connect string)
• Zookeeper with port 2181
munz & more #43
Scaling
munz & more #44
horizontal (up)vertical
Event Hub REST Interface
munz & more #45
https://129.151.91.31:1080/restproxy/topics/a12345orderTopic
Service = Topic
Interesting to Know
• Event Hub topics are prefixed with ID domain
• With Kafka CLI topics with ID Domain can be
created
• Topics without ID domain are not shown in
OPC console
46
#5
Conclusion
TL;DR #bigData #openSource #OPC
OpenSource: entry point to Oracle Big
Data world / Low(er) setup times /
Check for resource usage & limits in
Big Data OPC / BDCS-CE: managed
Hadoop, Hive, Spark + Event hub:
Kafka / Attend a hands-on workshop! /
Next level: Oracle Big Data tools
@EdelweissK
@FrankMunz
www.linkedin.com/in/frankmunz/
www.munzandmore.com/blog
facebook.com/cloudcomputingbook
facebook.com/weblogicbook
@frankmunz
youtube.com/weblogicbook
-> more than 50 web casts
Don’t be
shy J
email:	ekammermann@itconvergence.com
Twitter:	@EdelweissK
3	Membership	Tiers
• Oracle	ACE	Director
• Oracle	ACE
• Oracle	ACE	Associate
bit.ly/OracleACEProgram
500+	Technical	Experts	
Helping	Peers	Globally
Connect:
Nominate	yourself	or	someone	you	know:	acenomination.oracle.com
@oracleace
Facebook.com/oracleaces
oracle-ace_ww@oracle.com
Sign up for Free Trial
http://cloud.oracle.com

Contenu connexe

Tendances

Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and cephShapeBlue
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalleybuildacloud
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesosnelsonadpresent
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittalbuildacloud
 
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerYahoo Developer Network
 
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...ClusterHQ
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin
 
Cloud OS development
Cloud OS developmentCloud OS development
Cloud OS developmentSean Chang
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Lucas Jellema
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsBlueData, Inc.
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstackFramgia Vietnam
 
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David NalleyJenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David Nalleybuildacloud
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Isaac Chiang
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 

Tendances (20)

Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and ceph
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalley
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
 
Server 2016 sneak peek
Server 2016 sneak peekServer 2016 sneak peek
Server 2016 sneak peek
 
OpenStack Cinder
OpenStack CinderOpenStack Cinder
OpenStack Cinder
 
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
Cloud OS development
Cloud OS developmentCloud OS development
Cloud OS development
 
CloudStack Hyderabad Meetup: How the Apache community works
CloudStack Hyderabad Meetup: How the Apache community worksCloudStack Hyderabad Meetup: How the Apache community works
CloudStack Hyderabad Meetup: How the Apache community works
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstack
 
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David NalleyJenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15
 
Cloud stack for_beginners
Cloud stack for_beginnersCloud stack for_beginners
Cloud stack for_beginners
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
Cloud data center and openstack
Cloud data center and openstackCloud data center and openstack
Cloud data center and openstack
 

Similaire à Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark and Kafka

The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersEdelweiss Kammermann
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Hortonworks
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologiesinside-BigData.com
 
Impala turbocharge your big data access
Impala   turbocharge your big data accessImpala   turbocharge your big data access
Impala turbocharge your big data accessOphir Cohen
 

Similaire à Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark and Kafka (20)

The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
Impala turbocharge your big data access
Impala   turbocharge your big data accessImpala   turbocharge your big data access
Impala turbocharge your big data access
 

Plus de Frank Munz

Microservices Runtimes
Microservices RuntimesMicroservices Runtimes
Microservices RuntimesFrank Munz
 
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017Frank Munz
 
Docker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCSDocker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCSFrank Munz
 
Oracle Service Bus 12c (12.2.1) What You Always Wanted to Know
Oracle Service Bus 12c (12.2.1) What You Always Wanted to KnowOracle Service Bus 12c (12.2.1) What You Always Wanted to Know
Oracle Service Bus 12c (12.2.1) What You Always Wanted to KnowFrank Munz
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...Frank Munz
 
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cDocker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cFrank Munz
 
12 Things About WebLogic 12.1.3 #oow2014 #otnla15
12 Things About WebLogic 12.1.3 #oow2014 #otnla1512 Things About WebLogic 12.1.3 #oow2014 #otnla15
12 Things About WebLogic 12.1.3 #oow2014 #otnla15Frank Munz
 
WebLogic JMX for DevOps
WebLogic JMX for DevOpsWebLogic JMX for DevOps
WebLogic JMX for DevOpsFrank Munz
 
Oracle Service Bus (OSB) for the Busy IT Professonial
Oracle Service Bus (OSB) for the Busy IT Professonial Oracle Service Bus (OSB) for the Busy IT Professonial
Oracle Service Bus (OSB) for the Busy IT Professonial Frank Munz
 

Plus de Frank Munz (9)

Microservices Runtimes
Microservices RuntimesMicroservices Runtimes
Microservices Runtimes
 
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
 
Docker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCSDocker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCS
 
Oracle Service Bus 12c (12.2.1) What You Always Wanted to Know
Oracle Service Bus 12c (12.2.1) What You Always Wanted to KnowOracle Service Bus 12c (12.2.1) What You Always Wanted to Know
Oracle Service Bus 12c (12.2.1) What You Always Wanted to Know
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
 
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cDocker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
 
12 Things About WebLogic 12.1.3 #oow2014 #otnla15
12 Things About WebLogic 12.1.3 #oow2014 #otnla1512 Things About WebLogic 12.1.3 #oow2014 #otnla15
12 Things About WebLogic 12.1.3 #oow2014 #otnla15
 
WebLogic JMX for DevOps
WebLogic JMX for DevOpsWebLogic JMX for DevOps
WebLogic JMX for DevOps
 
Oracle Service Bus (OSB) for the Busy IT Professonial
Oracle Service Bus (OSB) for the Busy IT Professonial Oracle Service Bus (OSB) for the Busy IT Professonial
Oracle Service Bus (OSB) for the Busy IT Professonial
 

Dernier

Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...SUHANI PANDEY
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Dernier (20)

Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 

Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark and Kafka

  • 1. Open Source Big Data in OPC Edelweiss Kammermann Frank MunzJava One 2017
  • 3. © IT Convergence 2016. All rights reserved.
  • 4. © IT Convergence 2016. All rights reserved. About Me à Computer Engineer, BI and Data Integration Specialist à Over 20 years of Consulting and Project Management experience in Oracle technology. à Co-founder and Vice President of Uruguayan Oracle User Group (UYOUG) à Director of Community of LAOUC à Head of BI Team CMS at ITConvergence à Writer and frequent speaker at international conferences: à Collaborate, OTN Tour LA, UKOUG Tech & Apps, OOW, Rittman Mead BI Forum à Oracle ACE Director
  • 5. © IT Convergence 2016. All rights reserved. Uruguay
  • 6. 6 Dr. Frank Munz •Founded munz & more in 2007 •17 years Oracle Middleware, Cloud, and Distributed Computing •Consulting and High-End Training •Wrote two Oracle WLS and one Cloud book
  • 7.
  • 9. © IT Convergence 2016. All rights reserved. What is Big Data? à Volume: The high amount of data à Variety: The wide range of different data formats and schemas. Unstructured and semi-structured data à Velocity: The speed which data is created or consumed à Oracle added another V in this definition à Value: Data has intrinsic value—but it must be discovered.
  • 10. © IT Convergence 2016. All rights reserved. What is Oracle Big Data Cloud Compute Edition? à Big Data Platform that integrates Oracle Big Data solution with Open Source tools à Fully Elastic à Integrated with Other Paas Services as Database Cloud Service, MySQL Cloud Service, Event Hub Cloud Service à Access, Data and Network Security à REST access to all the funcitonality
  • 11. © IT Convergence 2016. All rights reserved. Big Data Cloud Service – Compute Edition (BDCS-CE)
  • 12. © IT Convergence 2016. All rights reserved. BDCS-CE Notebook: Interactive Analysis à Apache Zeppelin Notebook (version0.7) to interactively work with data
  • 13. © IT Convergence 2016. All rights reserved. What is Hadoop? à An open source software platform for distributed storage and processing à Manage huge volumes of unstructured data à Parallel processing of large data set à Highly scalable à Fault-tolerant à Two main components: à HDFS: Hadoop Distributed File System for storing information à MapReduce: programming framework that process information
  • 14. © IT Convergence 2016. All rights reserved. Hadoop Components: HFDS à Stores the data on the cluster à Namenode: block registry à DataNode: block containers themselves (Datanode) à HDFS cartoon by Mvarshney
  • 15. © IT Convergence 2016. All rights reserved. Hadoop Components: MapReduce à Retrieves data from HDFS à A MapReduce program is composed by à Map() method: performs filtering and sorting of the <key, value> inputs à Reduce() method: summarize the <key,value> pairs provided by the Mappers à Code can be written in many languages (Perl, Python, Java etc)
  • 16. © IT Convergence 2016. All rights reserved. MapReduce Example
  • 17. © IT Convergence 2016. All rights reserved. Code Example
  • 18. © IT Convergence 2016. All rights reserved. Code Example
  • 19. © IT Convergence 2016. All rights reserved. #2 Hive
  • 20. © IT Convergence 2016. All rights reserved. What is Hive? à An open source data warehouse software on top of Apache Hadoop à Analyze and query data stored in HDFS à Structure the data into tables à Tools for simple ETL à SQL- like queries (HiveQL) à Procedural language with HPL-SQL à Metadata storage in a RDBMS
  • 21. © IT Convergence 2016. All rights reserved. Hadoop & Hive Demo
  • 23. Revisited: Map Reduce I/O munz & more #23 Source: Hadoop Application Architecture Book
  • 24. Spark • Orders of magnitude(s) faster than M/R • Higher level Scala, Java or Python API • Standalone, in Hadoop, or Mesos • Principle: Run an operation on all data -> ”Spark is the new MapReduce” • See also: Apache Storm, etc • Uses RDDs, or Dataframes, or Datasets munz & more #24 https://stackoverflow.com/questions/31508083/difference-between- dataframe-in-spark-2-0-i-e-datasetrow-and-rdd-in-spark https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
  • 25. RDDs Resilient Distributed Datasets Where do they come from? Collection of data grouped into named columns. Supports text, JSON, Apache Parquet, sequence. Read in HDFS, Local FS, S3, Hbase Parallelize existing Collection Transform other RDD -> RDDs are immutable
  • 26. Lazy Evaluation munz & more #26 Nothing is executed Execution Transformations: map(), flatMap(), reduceByKey(), groupByKey() Actions: collect(), count(), first(), takeOrdered(), saveAsTextFile(), … http://spark.apache.org/docs/2.1.1/programming-guide.html
  • 27. map(func) Return a new distributed dataset formed by passing each element of the source through a function func. flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item). reduceByKey(func, [numTasks]) When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V. groupByKey([numTasks]) When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>) pairs. Transformations
  • 28.
  • 29.
  • 30. Spark Demo munz & more #30
  • 32. Word Count and Histogram munz & more #32 res = t.flatMap(lambda line: line.split(" ")) .map(lambda word: (word, 1)) .reduceByKey(lambda a, b: a + b) res.takeOrdered(5, key = lambda x: -x[1])
  • 34. Big Data Compute Service CE munz & more #34
  • 36. Kafka Partitioned, replicated commit log munz & more #36 0 1 2 3 4 … n Immutable log: Messages with offset Producer Consumer A Consumer B https://www.quora.com/Kafka-writes-every-message-to-broker-disk-Still-performance-wise-it- is-better-than-some-of-the-in-memory-message-storing-message-queues-Why-is-that
  • 37. Broker1 Broker2 Broker3 Topic A (1) Topic A (2) Topic A (3) Partition / Leader Repl A (1) Repl A (2) Repl A (3) Producer Replication / Follower Zoo- keeper Zoo- keeper Zoo- keeper State / HA
  • 38. https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ - 1 topic - 1 partition - Contains every article published since 1851 - Multiple producers / consumers Example for Stream / Table Duality
  • 39. Kafka Clients SDKs Connect Streams - OOTB: Java, Scala - Confluent: Python, C, C++ Confluent: - HDFS sink, - JDBC source, - S3 sink - Elastic search sink - Plugin .jar file - JDBC: Change data capture (CDC) - Real-time data ingestion - Microservices - KSQL: SQL streaming engine for streaming ETL, anomaly detection, monitoring - .jar file runs anywhere High / low level Kafka API Configuration only Integrate external Systems Data in Motion Stream / Table duality REST - Language agnostic - Easy for mobile apps - Easy to tunnel through FW etc. Lightweight
  • 40. Oracle Event Hub Cloud Service • PaaS: Managed Kafka 0.10.2 • Two deployment modes – Basic (Broker and ZK on 1 node) – Recommended (distributed) • REST Proxy – Separate sever(s) running REST Proxy munz & more #40
  • 41. Event Hub munz & more #41
  • 43. Ports You must open ports to allow access for external clients • Kafka Broker (from OPC connect string) • Zookeeper with port 2181 munz & more #43
  • 44. Scaling munz & more #44 horizontal (up)vertical
  • 45. Event Hub REST Interface munz & more #45 https://129.151.91.31:1080/restproxy/topics/a12345orderTopic Service = Topic
  • 46. Interesting to Know • Event Hub topics are prefixed with ID domain • With Kafka CLI topics with ID Domain can be created • Topics without ID domain are not shown in OPC console 46
  • 48. TL;DR #bigData #openSource #OPC OpenSource: entry point to Oracle Big Data world / Low(er) setup times / Check for resource usage & limits in Big Data OPC / BDCS-CE: managed Hadoop, Hive, Spark + Event hub: Kafka / Attend a hands-on workshop! / Next level: Oracle Big Data tools @EdelweissK @FrankMunz
  • 51. 3 Membership Tiers • Oracle ACE Director • Oracle ACE • Oracle ACE Associate bit.ly/OracleACEProgram 500+ Technical Experts Helping Peers Globally Connect: Nominate yourself or someone you know: acenomination.oracle.com @oracleace Facebook.com/oracleaces oracle-ace_ww@oracle.com
  • 52. Sign up for Free Trial http://cloud.oracle.com