SlideShare une entreprise Scribd logo
1  sur  29
www.edureka.co/big-data-and-hadoop
Hadoop IN 2015
View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop
For more details please contact us:
US : 1800 275 9730 (toll free)
INDIA : +91 88808 62004
Email Us : sales@edureka.co
For Queries:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
www.edureka.co/big-data-and-hadoopSlide 2
Objectives
At the end of this module, you will be able to:
 Hadoop the Swiss Knife - Integration with tools and frameworks
» Spark Integration with Hadoop
» Cassandra Integration with Hadoop
» Pentaho Integration with Hadoop
 From Batch to Real-time Processing
 Lambda Architecture
 New and Upcoming Tools
Slide 3 www.edureka.co/big-data-and-hadoop
Predictions for Hadoop in 2015
Slide 4 www.edureka.co/big-data-and-hadoop
Monte Zweben
Co-founder and CEO of Splice Machine
There will be "strong demand" for
Hadoop to become more real-time
and transactional, as it becomes a
viable alternative for traditional
database vendors like Oracle MySQL.
Gary Nakamura
CEO of Concurrent, Inc
As the market continues to catch up
to the hype, 2015 will be the year
that Hadoop becomes a worldwide
phenomenon. As part of this, expect
to see more Hadoop-related
acquisitions, IPOs and the rise of new
jobs.
Neil Mendelson
Oracle's VP of Big Data and Advanced Analytics
Hadoop and NoSQL will graduate
from mostly experimental pilots to
standard components of enterprise
data management, taking their place
alongside relational databases.
Predictions for Hadoop in 2015
Slide 5 www.edureka.co/big-data-and-hadoop
Predictions for Hadoop in 2015
Big Data movement will generate 4.4 million new IT
jobs globally by 2015.
SQL, the data querying language tool used by
application developers, will become one of the most
prolific use cases in the Hadoop ecosystem.
Slide 6 www.edureka.co/big-data-and-hadoop
Hadoop – The Swiss Knife Of 21st Century
 Hadoop can be integrated with multiple analytic tools to get the best out of it, like M-Learning, R, Python,
Spark, MongoDB etc.
Slide 7 www.edureka.co/big-data-and-hadoop
Spark can be used along with Hadoop 2.x
Spark can use Yarn as the Cluster resource Manager in Spark – Yarn mode
Spark has a different build for Yarn specific integration
Yarn spawns the spark program as Yarn process, where the App Master process is actually the driver program and
Yarn child processes are the Spark workers
This integration is preferable mode if the data size per node is way more than the memory available to cache the
RDDs
On lower volumes of the data per node, launching Spark without yarn yields better results
Spark Integration with Hadoop
Slide 8 www.edureka.co/big-data-and-hadoop
Tez*
Spark
Cascading
Pig
MR v1 &
v2
YARN
GraphX
MLLib
Mahout
Drill*
Shark
Impala
Hive
Accumulo*
Solr
HBase
Storm*
Spark
Streaming
Hue
HttpFS
Flume
Sqoop
Knox*
Sentry*
Falcon*
Oozie
Savannah*
Juju
Whirr
Zookeeper
Batch ML,
Graph
SQL NoSQL
&
Search
Streaming Data
Integrtn.
&
Access
Security Workflow
&
Data Gov.
Provision
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
APACHE HADOOP AND OSS ECOSYSTEM
MapR Data Platform
Management
Spark + Hadoop
Slide 9 www.edureka.co/big-data-and-hadoop
EASE OF
DEVELOPMENT
COMBINE
WORKFLOWS
IN-MEMORY
PERFORMANCE
UNLIMITED
SCALE
WIDE RANGE OF
APPLICATIONS
ENTERPRISE
PLATFORM
The Combination of Spark on Hadoop
Operational Applications
Augmented by In-Memory
Performance
Spark + Hadoop
Slide 10 www.edureka.co/big-data-and-hadoop
Spark can use Hadoop as Storage
» Spark can use Hadoop as storage as well as cluster manager
» HDFS provides distributed storage of large datasets
» High Availability is assured natively through HDFS
» No extra software installation is required
» Compatible with Hadoop 1.x also. Using HDFS as storage doesn’t require Hadoop 2.x
» Data Loss during computation is handled by HDFS itself
Using Hadoop as Storage
Slide 11 www.edureka.co/big-data-and-hadoop
Spark can use Hadoop as execution engine
» Spark can be integrated with Yarn for it’s execution
» Spark can be used with other engines (like Mesos, Spark Clsuter manager) also
» Yarn integration automatically provides processing scalability to Spark
» Spark needs Hadoop 2.0+ versions in order to use it for execution
» Every node in Hadoop cluster need Spark also to be installed
» Using Hadoop cluster for Spark processes, requires RAM upgrading of data nodes
» The integration distribution of Spark is quite new and still in the process of stabilization
Using Hadoop as Execution Engine
Slide 12 www.edureka.co/big-data-and-hadoop
Real Time Analytics – Accepted Way
Streaming
Data
Storing
Slide 13 www.edureka.co/big-data-and-hadoop
Real Time Analytics – Accepted Way
14 sec
0.6 sec
Slide 14 www.edureka.co/big-data-and-hadoop
Cassandra Integration with Hadoop
Stand Alone Model
» Stand Alone Independent Clusters
» Existing Cassandra and Hadoop Platforms
» Different Environments
» Different Business Units
» Exposing For B2B Consumption
Slave Node 1
Task Tracker
Map Reduce
Slave Node 2
Task Tracker
Map Reduce
Slave Node 3
Task Tracker
Map Reduce
Task Tracker
Map Reduce
Master Node
Job Tracker
Slide 15 www.edureka.co/big-data-and-hadoop
Real-time Application and Analytics in One Cluster with
Resource Isolation
Cassandra Integration with Hadoop
Hybrid Model
» Single & Hybrid Cluster
» Shared Infrastructure
» Shared Workload
» Dedicated groups
» Run Cassandra & Hadoop on same cluster
» No SPOF
Replica
Group 1
Cassandra
Node
Replica
Group 2
Write Replication
Hadoop Task Tracker
Hadoop Job Tracker
Cassandra
Node
Cassandra
Node
Cassandra
Node
Cassandra
Node
Cassandra
Node
Cassandra
Node
Cassandra
Node
Slide 16 www.edureka.co/big-data-and-hadoop
Integration of Hadoop with Cassandra give a remarkable performance for business improvement in
companies using big data
Hadoop integration with Cassandra includes support for MapReduce, PIG, HIVE, Oozie
Hadoop provides
distributed processing
and high scalability
Cassandra gives us
linear scalability and
high availability
Together Hadoop and
Cassandra helps us to
process and manage
big data easily
Cassandra Integration with Hadoop
Slide 17 www.edureka.co/big-data-and-hadoop
Pentaho is distributing big data plugin along with standard products only
Hadoop configurations within PDI are collections of the Hadoop libraries
required to communicate with a specific version of Hadoop and Hadoop-
related tools, such as Hive, HBase, Sqoop, or Pig
The Hadoop distribution configuration can be found at this
location: plugins/pentaho-big-data-plugin/plugin.properties
As of PDI 5.1, it supports standard hadoop distros CDH4.2 and 5.0,
MapR3.1 and HDP 2.0.
Pentaho Integration with Hadoop
Slide 18 www.edureka.co/big-data-and-hadoop
Pentaho Integration with Hadoop
Map Reduce can be written in traditional languages like java, but needs that specific skillset
PDI provides a powerful alternative to create your MapReduce jobs with minimal technical skill
When compared to traditional coding style and ETL approaches, Pentaho’s visual development
tools reduce the time to design, develop and deploy Hadoop analytics solutions by 15x
Joe Nicholson
Pentaho’s Vice President of Product
Marketing
Our goal is Hadoop with practically zero
programming, so we can simplify the use
of Hadoop for analytics, including file
input and output steps as well as
managing Hadoop jobs.
Slide 19 www.edureka.co/big-data-and-hadoop
Real-time Analytics in…
$
» Proactive Maintenance» Fraud
Detection/Prevention
» Cell tower diagnostics
» Bandwidth Allocation
» Brand Sentiment
Analysis
» Localized, Personalized
Promotions
Financial
Services
Retail Telecom Manufacturing
Healthcare
Utilities, Oil
& Gas
Public
Sector
» Monitor patient vitals
» Patient care and safety
» Reduce re-admittance
rates
» Smart meter stream
analysis
» Proactive equipment
repair
» Power and
consumption matching
» Network intrusion
detection and
prevention
» Disease outbreak
detection
» Unsafe driving detection and
monitoring
Transportation
Slide 20 www.edureka.co/big-data-and-hadoop
All data entering the system is dispatched to both the batch layer and the speed layer for processing.
New Data
Batch Layer
1
Serving Layer
Speed Layer
Lambda Architecture
Slide 21 www.edureka.co/big-data-and-hadoop
The batch layer has two functions:
» managing the master dataset (an immutable, append-only set of raw data), and
» to pre-compute the batch views. The serving layer indexes the batch views so that they can be queried in
low-latency, ad-hoc way.
Batch View
Batch View
Master
Dataset
New Data
Batch Layer Serving Layer
Speed Layer
1
2
3
Lambda Architecture
Slide 22 www.edureka.co/big-data-and-hadoop
The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
Batch View
Batch View
Real-time
View
Master
Dataset
New Data
Real-time
View
Batch Layer Serving Layer
Speed Layer
1
2
3
4
Lambda Architecture
Slide 23 www.edureka.co/big-data-and-hadoop
Any incoming query can be answered by merging results from batch views and real-time views.
Batch View
Batch View
Real-time
View
Master
Dataset
New Data
Query
Query
Real-time
View
Batch Layer Serving Layer
Speed Layer
1
2
3
4
5
Lambda Architecture
Slide 24 www.edureka.co/big-data-and-hadoop
New and Upcoming Tools
Apache Tez: Application framework which allows for a complex directed-acyclic-graph of tasks for
processing data.
Apache Accumulo: Sorted, distributed key/value store is a robust, scalable, high performance data
storage and retrieval system
Apache Kafka: Distributed, partitioned, replicated commit log service. It provides the functionality of
a messaging system, but with a unique design
Apache Nutch: High extensible and highly scalable web crawler
Apache Knox Gateway: System that provides a single point of authentication and access for
Apache™ Hadoop® services in a cluster
Apache S4: General-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows
programmers to easily develop applications for processing continuous unbounded streams of data
Slide 25 www.edureka.co/big-data-and-hadoop
Questions
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/big-data-and-hadoopSlide 26
Slide 27 www.edureka.co/big-data-and-hadoop
 Module 1
» Understanding Big Data and Hadoop
 Module 2
» Hadoop Architecture and HDFS
 Module 3
» Hadoop MapReduce Framework - I
 Module 4
» Hadoop MapReduce Framework - II
 Module 5
» Advance MapReduce
 Module 6
» PIG
 Module 7
» HIVE
 Module 8
» Advance HIVE and HBase
 Module 9
» Advance HBase
 Module 10
» Oozie and Hadoop Project
Course Topics
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
Slide 28 www.edureka.co/big-data-and-hadoop
How it Works?
Webinar: Ways to Succeed with Hadoop in 2015

Contenu connexe

Tendances

Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsEdureka!
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java ProfessionalsEdureka!
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?Edureka!
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Rajan Kanitkar
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoopMaulik Thaker
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendEdureka!
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Data Con LA
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop AdministrationEdureka!
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsDataWorks Summit
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopDataWorks Summit
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedDouglas Bernardini
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM Analytics
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems Jiaheng Lu
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousingSneha Challa
 

Tendances (20)

Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionals
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java Professionals
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 

En vedette

Test Driven Development using QUnit
Test Driven Development using QUnitTest Driven Development using QUnit
Test Driven Development using QUnitsatejsahu
 
Test-driven development and Umple
Test-driven development and UmpleTest-driven development and Umple
Test-driven development and Umpletylerjdmcconnell
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopEdureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 
Java/J2EE & SOA
Java/J2EE & SOA Java/J2EE & SOA
Java/J2EE & SOA Edureka!
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use HadoopEdureka!
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Edureka!
 
Efficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopEfficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopDataWorks Summit
 

En vedette (14)

Test Driven Development using QUnit
Test Driven Development using QUnitTest Driven Development using QUnit
Test Driven Development using QUnit
 
Test-driven development and Umple
Test-driven development and UmpleTest-driven development and Umple
Test-driven development and Umple
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
Java/J2EE & SOA
Java/J2EE & SOA Java/J2EE & SOA
Java/J2EE & SOA
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 
Efficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopEfficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in Hadoop
 

Similaire à Webinar: Ways to Succeed with Hadoop in 2015

Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSandish Kumar H N
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkManish Gupta
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveQubole
 

Similaire à Webinar: Ways to Succeed with Hadoop in 2015 (20)

Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Why Spark over Hadoop?
Why Spark over Hadoop?Why Spark over Hadoop?
Why Spark over Hadoop?
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Actian DataFlow Whitepaper
Actian DataFlow WhitepaperActian DataFlow Whitepaper
Actian DataFlow Whitepaper
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 

Plus de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Plus de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Webinar: Ways to Succeed with Hadoop in 2015

  • 1. www.edureka.co/big-data-and-hadoop Hadoop IN 2015 View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : sales@edureka.co For Queries: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN
  • 2. www.edureka.co/big-data-and-hadoopSlide 2 Objectives At the end of this module, you will be able to:  Hadoop the Swiss Knife - Integration with tools and frameworks » Spark Integration with Hadoop » Cassandra Integration with Hadoop » Pentaho Integration with Hadoop  From Batch to Real-time Processing  Lambda Architecture  New and Upcoming Tools
  • 4. Slide 4 www.edureka.co/big-data-and-hadoop Monte Zweben Co-founder and CEO of Splice Machine There will be "strong demand" for Hadoop to become more real-time and transactional, as it becomes a viable alternative for traditional database vendors like Oracle MySQL. Gary Nakamura CEO of Concurrent, Inc As the market continues to catch up to the hype, 2015 will be the year that Hadoop becomes a worldwide phenomenon. As part of this, expect to see more Hadoop-related acquisitions, IPOs and the rise of new jobs. Neil Mendelson Oracle's VP of Big Data and Advanced Analytics Hadoop and NoSQL will graduate from mostly experimental pilots to standard components of enterprise data management, taking their place alongside relational databases. Predictions for Hadoop in 2015
  • 5. Slide 5 www.edureka.co/big-data-and-hadoop Predictions for Hadoop in 2015 Big Data movement will generate 4.4 million new IT jobs globally by 2015. SQL, the data querying language tool used by application developers, will become one of the most prolific use cases in the Hadoop ecosystem.
  • 6. Slide 6 www.edureka.co/big-data-and-hadoop Hadoop – The Swiss Knife Of 21st Century  Hadoop can be integrated with multiple analytic tools to get the best out of it, like M-Learning, R, Python, Spark, MongoDB etc.
  • 7. Slide 7 www.edureka.co/big-data-and-hadoop Spark can be used along with Hadoop 2.x Spark can use Yarn as the Cluster resource Manager in Spark – Yarn mode Spark has a different build for Yarn specific integration Yarn spawns the spark program as Yarn process, where the App Master process is actually the driver program and Yarn child processes are the Spark workers This integration is preferable mode if the data size per node is way more than the memory available to cache the RDDs On lower volumes of the data per node, launching Spark without yarn yields better results Spark Integration with Hadoop
  • 8. Slide 8 www.edureka.co/big-data-and-hadoop Tez* Spark Cascading Pig MR v1 & v2 YARN GraphX MLLib Mahout Drill* Shark Impala Hive Accumulo* Solr HBase Storm* Spark Streaming Hue HttpFS Flume Sqoop Knox* Sentry* Falcon* Oozie Savannah* Juju Whirr Zookeeper Batch ML, Graph SQL NoSQL & Search Streaming Data Integrtn. & Access Security Workflow & Data Gov. Provision EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS APACHE HADOOP AND OSS ECOSYSTEM MapR Data Platform Management Spark + Hadoop
  • 9. Slide 9 www.edureka.co/big-data-and-hadoop EASE OF DEVELOPMENT COMBINE WORKFLOWS IN-MEMORY PERFORMANCE UNLIMITED SCALE WIDE RANGE OF APPLICATIONS ENTERPRISE PLATFORM The Combination of Spark on Hadoop Operational Applications Augmented by In-Memory Performance Spark + Hadoop
  • 10. Slide 10 www.edureka.co/big-data-and-hadoop Spark can use Hadoop as Storage » Spark can use Hadoop as storage as well as cluster manager » HDFS provides distributed storage of large datasets » High Availability is assured natively through HDFS » No extra software installation is required » Compatible with Hadoop 1.x also. Using HDFS as storage doesn’t require Hadoop 2.x » Data Loss during computation is handled by HDFS itself Using Hadoop as Storage
  • 11. Slide 11 www.edureka.co/big-data-and-hadoop Spark can use Hadoop as execution engine » Spark can be integrated with Yarn for it’s execution » Spark can be used with other engines (like Mesos, Spark Clsuter manager) also » Yarn integration automatically provides processing scalability to Spark » Spark needs Hadoop 2.0+ versions in order to use it for execution » Every node in Hadoop cluster need Spark also to be installed » Using Hadoop cluster for Spark processes, requires RAM upgrading of data nodes » The integration distribution of Spark is quite new and still in the process of stabilization Using Hadoop as Execution Engine
  • 12. Slide 12 www.edureka.co/big-data-and-hadoop Real Time Analytics – Accepted Way Streaming Data Storing
  • 13. Slide 13 www.edureka.co/big-data-and-hadoop Real Time Analytics – Accepted Way 14 sec 0.6 sec
  • 14. Slide 14 www.edureka.co/big-data-and-hadoop Cassandra Integration with Hadoop Stand Alone Model » Stand Alone Independent Clusters » Existing Cassandra and Hadoop Platforms » Different Environments » Different Business Units » Exposing For B2B Consumption Slave Node 1 Task Tracker Map Reduce Slave Node 2 Task Tracker Map Reduce Slave Node 3 Task Tracker Map Reduce Task Tracker Map Reduce Master Node Job Tracker
  • 15. Slide 15 www.edureka.co/big-data-and-hadoop Real-time Application and Analytics in One Cluster with Resource Isolation Cassandra Integration with Hadoop Hybrid Model » Single & Hybrid Cluster » Shared Infrastructure » Shared Workload » Dedicated groups » Run Cassandra & Hadoop on same cluster » No SPOF Replica Group 1 Cassandra Node Replica Group 2 Write Replication Hadoop Task Tracker Hadoop Job Tracker Cassandra Node Cassandra Node Cassandra Node Cassandra Node Cassandra Node Cassandra Node Cassandra Node
  • 16. Slide 16 www.edureka.co/big-data-and-hadoop Integration of Hadoop with Cassandra give a remarkable performance for business improvement in companies using big data Hadoop integration with Cassandra includes support for MapReduce, PIG, HIVE, Oozie Hadoop provides distributed processing and high scalability Cassandra gives us linear scalability and high availability Together Hadoop and Cassandra helps us to process and manage big data easily Cassandra Integration with Hadoop
  • 17. Slide 17 www.edureka.co/big-data-and-hadoop Pentaho is distributing big data plugin along with standard products only Hadoop configurations within PDI are collections of the Hadoop libraries required to communicate with a specific version of Hadoop and Hadoop- related tools, such as Hive, HBase, Sqoop, or Pig The Hadoop distribution configuration can be found at this location: plugins/pentaho-big-data-plugin/plugin.properties As of PDI 5.1, it supports standard hadoop distros CDH4.2 and 5.0, MapR3.1 and HDP 2.0. Pentaho Integration with Hadoop
  • 18. Slide 18 www.edureka.co/big-data-and-hadoop Pentaho Integration with Hadoop Map Reduce can be written in traditional languages like java, but needs that specific skillset PDI provides a powerful alternative to create your MapReduce jobs with minimal technical skill When compared to traditional coding style and ETL approaches, Pentaho’s visual development tools reduce the time to design, develop and deploy Hadoop analytics solutions by 15x Joe Nicholson Pentaho’s Vice President of Product Marketing Our goal is Hadoop with practically zero programming, so we can simplify the use of Hadoop for analytics, including file input and output steps as well as managing Hadoop jobs.
  • 19. Slide 19 www.edureka.co/big-data-and-hadoop Real-time Analytics in… $ » Proactive Maintenance» Fraud Detection/Prevention » Cell tower diagnostics » Bandwidth Allocation » Brand Sentiment Analysis » Localized, Personalized Promotions Financial Services Retail Telecom Manufacturing Healthcare Utilities, Oil & Gas Public Sector » Monitor patient vitals » Patient care and safety » Reduce re-admittance rates » Smart meter stream analysis » Proactive equipment repair » Power and consumption matching » Network intrusion detection and prevention » Disease outbreak detection » Unsafe driving detection and monitoring Transportation
  • 20. Slide 20 www.edureka.co/big-data-and-hadoop All data entering the system is dispatched to both the batch layer and the speed layer for processing. New Data Batch Layer 1 Serving Layer Speed Layer Lambda Architecture
  • 21. Slide 21 www.edureka.co/big-data-and-hadoop The batch layer has two functions: » managing the master dataset (an immutable, append-only set of raw data), and » to pre-compute the batch views. The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way. Batch View Batch View Master Dataset New Data Batch Layer Serving Layer Speed Layer 1 2 3 Lambda Architecture
  • 22. Slide 22 www.edureka.co/big-data-and-hadoop The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only. Batch View Batch View Real-time View Master Dataset New Data Real-time View Batch Layer Serving Layer Speed Layer 1 2 3 4 Lambda Architecture
  • 23. Slide 23 www.edureka.co/big-data-and-hadoop Any incoming query can be answered by merging results from batch views and real-time views. Batch View Batch View Real-time View Master Dataset New Data Query Query Real-time View Batch Layer Serving Layer Speed Layer 1 2 3 4 5 Lambda Architecture
  • 24. Slide 24 www.edureka.co/big-data-and-hadoop New and Upcoming Tools Apache Tez: Application framework which allows for a complex directed-acyclic-graph of tasks for processing data. Apache Accumulo: Sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system Apache Kafka: Distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design Apache Nutch: High extensible and highly scalable web crawler Apache Knox Gateway: System that provides a single point of authentication and access for Apache™ Hadoop® services in a cluster Apache S4: General-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data
  • 26. Questions Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/big-data-and-hadoopSlide 26
  • 27. Slide 27 www.edureka.co/big-data-and-hadoop  Module 1 » Understanding Big Data and Hadoop  Module 2 » Hadoop Architecture and HDFS  Module 3 » Hadoop MapReduce Framework - I  Module 4 » Hadoop MapReduce Framework - II  Module 5 » Advance MapReduce  Module 6 » PIG  Module 7 » HIVE  Module 8 » Advance HIVE and HBase  Module 9 » Advance HBase  Module 10 » Oozie and Hadoop Project Course Topics
  • 28. LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate Slide 28 www.edureka.co/big-data-and-hadoop How it Works?