SlideShare a Scribd company logo
1 of 26
Hortonworks: We Do Hadoop.
Our mission is to enable your Modern Data Architecture
by Delivering Enterprise Apache Hadoop
YARN, Tez, Stinger
June 2014
Our Mission:
Our Commitment
Open Leadership
Drive innovation in the open exclusively via the
Apache community-driven open source process
Enterprise Rigor
Engineer, test and certify Apache Hadoop with
the enterprise in mind
Ecosystem Endorsement
Focus on deep integration with existing data
center technologies and skills
Page 2
Headquarters: Palo Alto, CA
Employees: 300+ and growing
Trusted Partners
Enable your Modern Data Architecture by
Delivering Enterprise Apache Hadoop
Driving Our Innovation Through Apache
147,933 lines
614,041 lines
End Users
449,768 lines
Total Net Lines Contributed
to Apache Hadoop
Yahoo: 10
Cloudera: 7
IBM: 3
10 Others
21
Facebook: 5
LinkedIn: 3
Total Number of Committers
to Apache Hadoop
63
total
Hortonworks mission is
to power your modern data architecture by enabling
Hadoop to be an enterprise data platform that
deeply integrates with your data center technologies
Page 3
Apache
Project
Committers
PMC
Members
Hadoop 21 13
Tez 10 4
Hive 11 3
HBase 8 3
Pig 6 5
Sqoop 1 0
Ambari 20 12
Knox 6 2
Falcon 2 2
Oozie 2 2
Zookeepe
r
2 1
Flume 1 0
Accumulo 2 2
Storm 1 0
Drill 1 0
TOTAL 95 48
Broad Ecosystem Integration
Page 4
APPLICATIONSDATASYSTEMSOURCES
RDBMS EDW MPP
Emerging Sources
(Sensor, Sentiment, Geo, Unstructured)
HANA
BusinessObjects BI
OPERATIONAL TOOLS
DEV & DATA TOOLS
Existing Sources
(CRM, ERP, Clickstream, Logs)
INFRASTRUCTURE
UDA
Diagram
Relying on Hortonworks…
Teradata Portfolio
for Hadoop
• Seamless data access
between Teradata and
Hadoop (SQL-H)
• Simple management &
monitoring with Viewpoint
integration
• Flexible deployment
options
Page 5
HDInsight &
HDP for Windows
• Only Hadoop Distribution
for Windows Azure &
Windows Server
• Native integration with
SQL Server, Excel, and
System Center
• Extends Hadoop to .NET
community
Complete Portfolio for Hadoop
Appliances
Instant Access +
Infinite Scale
• SAP can assure their
customers they are
deploying an SAP HANA
+ Hadoop architecture
fully supported by SAP
• Enables analytics apps
(BOBJ) to interact with
Hadoop
HDP 2.1: Enterprise Hadoop Platform
Page 6
Hortonworks
Data Platform (HDP)
• The ONLY 100% open source
and most current platform
• Integrates full range of
enterprise-ready services
• Certified and tested at scale
• Engineered for deep
ecosystem interoperability
OS/VM Cloud Appliance
CORE
SERVICES
CORE
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots
HORTONWORKS
DATA PLATFORM (HDP)
OPERATIONAL
SERVICES
DATA
SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN
MAP
TEZREDUCE
HIVE &
HCATALOG
PIGHBASE
OPERATIONAL
SERVICES
DATA
SERVICES
CORE SERVICES
HORTONWORKS
DATA PLATFORM (HDP)
Schedule
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots
Storage
Resource Management
Process
Data
Movement
Cluster
Mgmnt Dataset
Mgmnt
Data Access
CORE SERVICES
HORTONWORKS
DATA PLATFORM (HDP)
OPERATIONAL
SERVICES
DATA
SERVICES
HDFS
SQOOP
FLUMEAMBARI
FALCON
YARN
MAP
TEZREDUCE
HIVEPIG
HBASE
OOZIE
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots
LOAD &
EXTRACT
WebHDFS
NFS
KNOX*
Our Vision: Hadoop as Next-Gen Platform
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource management
& data processing)
HDFS2
(redundant, highly-available & reliable storage)
YARN
(cluster resource management)
MapReduce
(data processing)
Others
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming, …
Page 7
The 1st Generation of Hadoop: Batch
HADOOP 1.0
Built for Web-Scale Batch Apps
Single App
BATCH
HDFS
Single App
INTERACTIVE
Single App
BATCH
HDFS
• All other usage
patterns must
leverage that same
infrastructure
• Forces the creation
of silos for managing
mixed workloads
Single App
BATCH
HDFS
Single App
ONLINE
Hadoop MapReduce Classic
• JobTracker
–Manages cluster resources and job scheduling
• TaskTracker
–Per-node agent
–Manage tasks
Page 9
YARN: Taking Hadoop Beyond Batch
Page 10
Applications Run Natively in Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm, S4,…)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
HPC MPI
(OpenMPI)
ONLINE
(HBase)
OTHER
(Search)
(Weave…)
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
5 Key Benefits of YARN
1. Scale
2. New Programming Models &
Services
3. Improved cluster utilization
4. Agility
5. Beyond Java
Page 11
Concepts
• Application
–Application is a temporal job or a service submitted YARN
–Examples
– Map Reduce Job (job)
– Hbase Cluster (service)
• Container
–Basic unit of allocation
–Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)
– container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU
–Replaces the fixed map/reduce slots
12
Design Centre
• Split up the two major functions of JobTracker
–Cluster resource management
–Application life-cycle management
• MapReduce becomes user-land library
13
YARN Applications
• Data processing applications and services
–Online Serving – HOYA (HBase on YARN)
–Real-time event processing – Storm, S4, other commercial
platforms
–Interactive SQL – Tez (Generalization of MR)
–Machine Learning – MPI (OpenMPI, MPICH2)
–In-Memory: Spark
–Graph processing: Giraph
–Enabled by allowing the use of paradigm-specific application
master
Run all on the same Hadoop cluster!
Page 14
© Hortonworks Inc. 2012
NodeManager NodeManager NodeManager NodeManager
map 1.1
vertex1.2.2
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
map1.2
reduce1.1
Batch
vertex1.1.1
vertex1.1.2
vertex1.2.1
Interactive SQL
YARN as OS for Data Lake
ResourceManager
Scheduler
Real-Time
nimbus0
nimbus1
nimbus2
© Hortonworks Inc. 2012
Multi-Tenant YARN
ResourceManager
Scheduler
root
Adhoc
10%
DW
60%
Mrkting
30%
Dev
10%
Reserved
20%
Prod
70%
Prod
80%
Dev
20%
P0
70%
P1
30%
Multi-Tenancy with CapacityScheduler
• Queues
• Economics as queue-capacity
–Hierarchical Queues
• SLAs
–Preemption
• Resource Isolation
–Linux: cgroups
–MS Windows: Job Control
–Roadmap: Virtualization (Xen, KVM)
• Administration
–Queue ACLs
–Run-time re-configuration for queues
–Charge-back
Page 17
ResourceManager
Scheduler
root
Adhoc
10%
DW
70%
Mrkting
20%
Dev
10%
Reserved
20%
Prod
70%
Prod
80%
Dev
20%
P0
70%
P1
30%
Capacity Scheduler
Hierarchical
Queues
Tez (“Speed”)
• What is it?
–A data processing framework as an alternative to MapReduce
–A new incubation project in the ASF
• Who else is involved?
–22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo,
Microsoft
• Why does it matter?
–Widens the platform for Hadoop use cases
–Crucial to improving the performance of low-latency applications
–Core to the Stinger initiative
–Evidence of Hortonworks leading the community in the evolution
of Enterprise Hadoop
Moving Hadoop Beyond MapReduce
• Low level data-processing execution engine
• Built on YARN
• Enables pipelining of jobs
• Removes task and job launch times
• Does not write intermediate output to HDFS
–Much lighter disk and network usage
• New base of MapReduce, Hive, Pig, Cascading etc.
• Hive and Pig jobs no longer need to move to the end
of the queue between steps in the pipeline
Tez - Core Idea
Task with pluggable Input, Processor & Output
YARN ApplicationMaster to run DAG of Tez Tasks
Input Processor
Task
Output
Tez Task - <Input, Processor, Output>
Building Blocks for Tasks
MapReduce ‘Map’ MapReduce ‘Reduce’
HDFS
Input
Map
Processor
MapReduce ‘Map’ Task
Sorted
Output
Intermediate ‘Reduce’ for
Map-Reduce-Reduce
Shuffle
Input
Reduce
Processor
Intermediate ‘Reduce’ for
Map-Reduce-Reduce
Sorted
Output
Shuffle
Input
Reduce
Processor
HDFS
Output
MapReduce ‘Reduce’ Task
Special Pig/Hive ‘Map’
HDFS
Input
Map
Processor
Tez Task
Pipelin
e
Sorter
Output
Special Pig/Hive ‘Reduce’
Shuffle
Skip-
merge
Input
Reduce
Processor
Tez Task
Sorted
Output
In-memory Map
HDFSI
nput
Map
Processor
Tez Task
In-
memor
y
Sorted
Output
Pig/Hive-MR versus Pig/Hive-Tez
SELECT a.state, COUNT(*), AVERAGE(c.price)
FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a.itemId = c.itemId)
GROUP BY a.state
Pig/Hive - MR Pig/Hive - Tez
I/O Synchronization
Barrier
I/O Synchronization
Barrier
Job 1
Job 2
Job 3
Single Job
Tez on YARN: Going Beyond Batch
Tez Optimizes Execution
New runtime engine for
more efficient data processing
Always-On Tez Service
Low latency processing for
all Hadoop data processing
Tez Task
SQL-in-Hadoop with Apache Hive
• Apache Hive is the standard for
SQL interaction with Hadoop
–Enterprise makes final purchasing
decision on two key characteristics:
'compatibility' with existing
investments (60%) and skills (20%)
–Most application claim Hive
compatibility TODAY*
• Stinger Initiative: Simple Focus
–Performance
–SQL-Compatibility
–Scalability
Claims publicly made by: Teradata, Microsoft, Oracle, Microstrategy, IBM, Information Builders,
SAS, QlikTech, SAP, Tableau, Tibco, Actuate, Jaspersoft, Alteryx, Datameer, Pentaho
Page 24
Hadoop
HDFS
Hive
TezMapReduce
SQL
YARN
Business
Analytics
Custom
Apps
Improves existing
tools & preserves
investments
Stinger Project
(announced February 2013)
Batch AND Interactive SQL-IN-Hadoop
Stinger Initiative
A broad, community-based effort to
drive the next generation of HIVE
Hive 0.13, April 2014:
• Hive on Apache Tez
• Query Service
• Buffer Cache
• Cost Based Optimizer (Optiq)
• Vectorized Processing
Hive 0.11, May 2013:
• Base Optimizations
• SQL Analytic Functions
• ORCFile, Modern File Format
Hive 0.12, October 2013:
• VARCHAR, DATE Types
• ORCFile predicate pushdown
• Advanced Optimizations
• Performance Boosts via YARN
Speed
Improve Hive query performance by 100X to
allow for interactive query times (seconds)
Scale
The only SQL interface to Hadoop designed
for queries that scale from TB to PB
SQL
Support broadest range of SQL semantics for
analytic applications running against Hadoop
…all IN Hadoop
Goals:
Hortonworks: The Value of “Open” for You
Page 26
Validate & Try
1. Download the
Hortonworks Sandbox
2. Learn Hadoop using the
technical tutorials
3. Investigate a business
case using the step-by-
step business cases
scenarios
4. Validate YOUR business
case using your data in
the sandbox
Connect With the Hadoop Community
We employ a large number of Apache project committers & innovators so
that you are represented in the open source community
Avoid Vendor Lock-In
Hortonworks Data Platform remain as close to the open source trunk as
possible and is developed 100% in the open so you are never locked in
The Partners you Rely On, Rely On Hortonworks
We work with partners to deeply integrate Hadoop with data center
technologies so you can leverage existing skills and investments
Certified for the Enterprise
We engineer, test and certify the Hortonworks Data Platform at scale to
ensure reliability and stability you require for enterprise use
Support from the Experts
We provide the highest quality of support for deploying at scale. You are
supported by hundreds of years of Hadoop experience
Engage
1. Execute a Business Case
Discovery Workshop with
our architects
2. Build a business case for
Hadoop today

More Related Content

What's hot

Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Kognitio
 

What's hot (19)

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 

Similar to Hackathon bonn

Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
Wes Floyd
 

Similar to Hackathon bonn (20)

Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoop
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 

Recently uploaded

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Hackathon bonn

  • 1. Hortonworks: We Do Hadoop. Our mission is to enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop YARN, Tez, Stinger June 2014
  • 2. Our Mission: Our Commitment Open Leadership Drive innovation in the open exclusively via the Apache community-driven open source process Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills Page 2 Headquarters: Palo Alto, CA Employees: 300+ and growing Trusted Partners Enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop
  • 3. Driving Our Innovation Through Apache 147,933 lines 614,041 lines End Users 449,768 lines Total Net Lines Contributed to Apache Hadoop Yahoo: 10 Cloudera: 7 IBM: 3 10 Others 21 Facebook: 5 LinkedIn: 3 Total Number of Committers to Apache Hadoop 63 total Hortonworks mission is to power your modern data architecture by enabling Hadoop to be an enterprise data platform that deeply integrates with your data center technologies Page 3 Apache Project Committers PMC Members Hadoop 21 13 Tez 10 4 Hive 11 3 HBase 8 3 Pig 6 5 Sqoop 1 0 Ambari 20 12 Knox 6 2 Falcon 2 2 Oozie 2 2 Zookeepe r 2 1 Flume 1 0 Accumulo 2 2 Storm 1 0 Drill 1 0 TOTAL 95 48
  • 4. Broad Ecosystem Integration Page 4 APPLICATIONSDATASYSTEMSOURCES RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) HANA BusinessObjects BI OPERATIONAL TOOLS DEV & DATA TOOLS Existing Sources (CRM, ERP, Clickstream, Logs) INFRASTRUCTURE
  • 5. UDA Diagram Relying on Hortonworks… Teradata Portfolio for Hadoop • Seamless data access between Teradata and Hadoop (SQL-H) • Simple management & monitoring with Viewpoint integration • Flexible deployment options Page 5 HDInsight & HDP for Windows • Only Hadoop Distribution for Windows Azure & Windows Server • Native integration with SQL Server, Excel, and System Center • Extends Hadoop to .NET community Complete Portfolio for Hadoop Appliances Instant Access + Infinite Scale • SAP can assure their customers they are deploying an SAP HANA + Hadoop architecture fully supported by SAP • Enables analytics apps (BOBJ) to interact with Hadoop
  • 6. HDP 2.1: Enterprise Hadoop Platform Page 6 Hortonworks Data Platform (HDP) • The ONLY 100% open source and most current platform • Integrates full range of enterprise-ready services • Certified and tested at scale • Engineered for deep ecosystem interoperability OS/VM Cloud Appliance CORE SERVICES CORE Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HDFS SQOOP FLUME NFS LOAD & EXTRACT WebHDFS KNOX* OOZIE AMBARI FALCON* YARN MAP TEZREDUCE HIVE & HCATALOG PIGHBASE OPERATIONAL SERVICES DATA SERVICES CORE SERVICES HORTONWORKS DATA PLATFORM (HDP) Schedule Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots Storage Resource Management Process Data Movement Cluster Mgmnt Dataset Mgmnt Data Access CORE SERVICES HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HDFS SQOOP FLUMEAMBARI FALCON YARN MAP TEZREDUCE HIVEPIG HBASE OOZIE Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots LOAD & EXTRACT WebHDFS NFS KNOX*
  • 7. Our Vision: Hadoop as Next-Gen Platform HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, highly-available & reliable storage) YARN (cluster resource management) MapReduce (data processing) Others HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 7
  • 8. The 1st Generation of Hadoop: Batch HADOOP 1.0 Built for Web-Scale Batch Apps Single App BATCH HDFS Single App INTERACTIVE Single App BATCH HDFS • All other usage patterns must leverage that same infrastructure • Forces the creation of silos for managing mixed workloads Single App BATCH HDFS Single App ONLINE
  • 9. Hadoop MapReduce Classic • JobTracker –Manages cluster resources and job scheduling • TaskTracker –Per-node agent –Manage tasks Page 9
  • 10. YARN: Taking Hadoop Beyond Batch Page 10 Applications Run Natively in Hadoop HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) ONLINE (HBase) OTHER (Search) (Weave…) Store ALL DATA in one place… Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service
  • 11. 5 Key Benefits of YARN 1. Scale 2. New Programming Models & Services 3. Improved cluster utilization 4. Agility 5. Beyond Java Page 11
  • 12. Concepts • Application –Application is a temporal job or a service submitted YARN –Examples – Map Reduce Job (job) – Hbase Cluster (service) • Container –Basic unit of allocation –Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) – container_0 = 2GB, 1CPU – container_1 = 1GB, 6 CPU –Replaces the fixed map/reduce slots 12
  • 13. Design Centre • Split up the two major functions of JobTracker –Cluster resource management –Application life-cycle management • MapReduce becomes user-land library 13
  • 14. YARN Applications • Data processing applications and services –Online Serving – HOYA (HBase on YARN) –Real-time event processing – Storm, S4, other commercial platforms –Interactive SQL – Tez (Generalization of MR) –Machine Learning – MPI (OpenMPI, MPICH2) –In-Memory: Spark –Graph processing: Giraph –Enabled by allowing the use of paradigm-specific application master Run all on the same Hadoop cluster! Page 14
  • 15. © Hortonworks Inc. 2012 NodeManager NodeManager NodeManager NodeManager map 1.1 vertex1.2.2 NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager map1.2 reduce1.1 Batch vertex1.1.1 vertex1.1.2 vertex1.2.1 Interactive SQL YARN as OS for Data Lake ResourceManager Scheduler Real-Time nimbus0 nimbus1 nimbus2
  • 16. © Hortonworks Inc. 2012 Multi-Tenant YARN ResourceManager Scheduler root Adhoc 10% DW 60% Mrkting 30% Dev 10% Reserved 20% Prod 70% Prod 80% Dev 20% P0 70% P1 30%
  • 17. Multi-Tenancy with CapacityScheduler • Queues • Economics as queue-capacity –Hierarchical Queues • SLAs –Preemption • Resource Isolation –Linux: cgroups –MS Windows: Job Control –Roadmap: Virtualization (Xen, KVM) • Administration –Queue ACLs –Run-time re-configuration for queues –Charge-back Page 17 ResourceManager Scheduler root Adhoc 10% DW 70% Mrkting 20% Dev 10% Reserved 20% Prod 70% Prod 80% Dev 20% P0 70% P1 30% Capacity Scheduler Hierarchical Queues
  • 18. Tez (“Speed”) • What is it? –A data processing framework as an alternative to MapReduce –A new incubation project in the ASF • Who else is involved? –22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft • Why does it matter? –Widens the platform for Hadoop use cases –Crucial to improving the performance of low-latency applications –Core to the Stinger initiative –Evidence of Hortonworks leading the community in the evolution of Enterprise Hadoop
  • 19. Moving Hadoop Beyond MapReduce • Low level data-processing execution engine • Built on YARN • Enables pipelining of jobs • Removes task and job launch times • Does not write intermediate output to HDFS –Much lighter disk and network usage • New base of MapReduce, Hive, Pig, Cascading etc. • Hive and Pig jobs no longer need to move to the end of the queue between steps in the pipeline
  • 20. Tez - Core Idea Task with pluggable Input, Processor & Output YARN ApplicationMaster to run DAG of Tez Tasks Input Processor Task Output Tez Task - <Input, Processor, Output>
  • 21. Building Blocks for Tasks MapReduce ‘Map’ MapReduce ‘Reduce’ HDFS Input Map Processor MapReduce ‘Map’ Task Sorted Output Intermediate ‘Reduce’ for Map-Reduce-Reduce Shuffle Input Reduce Processor Intermediate ‘Reduce’ for Map-Reduce-Reduce Sorted Output Shuffle Input Reduce Processor HDFS Output MapReduce ‘Reduce’ Task Special Pig/Hive ‘Map’ HDFS Input Map Processor Tez Task Pipelin e Sorter Output Special Pig/Hive ‘Reduce’ Shuffle Skip- merge Input Reduce Processor Tez Task Sorted Output In-memory Map HDFSI nput Map Processor Tez Task In- memor y Sorted Output
  • 22. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Pig/Hive - MR Pig/Hive - Tez I/O Synchronization Barrier I/O Synchronization Barrier Job 1 Job 2 Job 3 Single Job
  • 23. Tez on YARN: Going Beyond Batch Tez Optimizes Execution New runtime engine for more efficient data processing Always-On Tez Service Low latency processing for all Hadoop data processing Tez Task
  • 24. SQL-in-Hadoop with Apache Hive • Apache Hive is the standard for SQL interaction with Hadoop –Enterprise makes final purchasing decision on two key characteristics: 'compatibility' with existing investments (60%) and skills (20%) –Most application claim Hive compatibility TODAY* • Stinger Initiative: Simple Focus –Performance –SQL-Compatibility –Scalability Claims publicly made by: Teradata, Microsoft, Oracle, Microstrategy, IBM, Information Builders, SAS, QlikTech, SAP, Tableau, Tibco, Actuate, Jaspersoft, Alteryx, Datameer, Pentaho Page 24 Hadoop HDFS Hive TezMapReduce SQL YARN Business Analytics Custom Apps Improves existing tools & preserves investments
  • 25. Stinger Project (announced February 2013) Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE Hive 0.13, April 2014: • Hive on Apache Tez • Query Service • Buffer Cache • Cost Based Optimizer (Optiq) • Vectorized Processing Hive 0.11, May 2013: • Base Optimizations • SQL Analytic Functions • ORCFile, Modern File Format Hive 0.12, October 2013: • VARCHAR, DATE Types • ORCFile predicate pushdown • Advanced Optimizations • Performance Boosts via YARN Speed Improve Hive query performance by 100X to allow for interactive query times (seconds) Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB SQL Support broadest range of SQL semantics for analytic applications running against Hadoop …all IN Hadoop Goals:
  • 26. Hortonworks: The Value of “Open” for You Page 26 Validate & Try 1. Download the Hortonworks Sandbox 2. Learn Hadoop using the technical tutorials 3. Investigate a business case using the step-by- step business cases scenarios 4. Validate YOUR business case using your data in the sandbox Connect With the Hadoop Community We employ a large number of Apache project committers & innovators so that you are represented in the open source community Avoid Vendor Lock-In Hortonworks Data Platform remain as close to the open source trunk as possible and is developed 100% in the open so you are never locked in The Partners you Rely On, Rely On Hortonworks We work with partners to deeply integrate Hadoop with data center technologies so you can leverage existing skills and investments Certified for the Enterprise We engineer, test and certify the Hortonworks Data Platform at scale to ensure reliability and stability you require for enterprise use Support from the Experts We provide the highest quality of support for deploying at scale. You are supported by hundreds of years of Hadoop experience Engage 1. Execute a Business Case Discovery Workshop with our architects 2. Build a business case for Hadoop today

Editor's Notes

  1. Hello Today I’m going to talk to you about HW and how we deliver an Enterprise Ready Hadoop to enable your modern data architecture.
  2. Founded just 2.5 years ago from the original hadoop team members a yahoo. Hortonworks emerged as the leader in open source Hadoop. We are commited to ensure H is an enterprise viable data platform ready for your modern data architecture Our team is probably the largest assembled team of Hadoop experts and active leaders in the community We not only make sure Hadoop meets all your enterprise requirements like Operations, reliablity & Security It also needs to be Packaged & Tested and we do this. It has to work with what you have Make Hadoop an enterprise data platform. Make the market function. Innovate core platform, data, & operational services Integrate deeply with enterprise ecosystem Provide world-class enterprise support Drive 100% open source software development and releases through the core Apache projects Address enterprise needs in community projects Establish Apache foundation projects as “the standard” Promote open community vs. vendor control / lock-in Enable the Hadoop market to function Make it easy for enterprises to deploy at scale Be the best at enabling deep ecosystem integration Create a pull market with key strategic partners
  3. Tez Approved as New Apache Incubator Project Hortonworks Introduces Next-Generation Runtime for Improving Latency and Throughput of Hadoop Apps
  4. Make Hadoop an enterprise data platform Innovate core platform, data, & operational services Integrate deeply with enterprise ecosystem Provide world-class enterprise support Drive 100% open source software development and releases through the core Apache projects Address enterprise needs in community projects Establish Apache foundation projects as “the standard” Promote open community vs. vendor control / lock-in Enable the Hadoop market to function Make it easy for enterprises to deploy at scale Be the best at enabling deep ecosystem integration Create a pull market with key strategic partners