SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
HADOOP 
AT 
THE 
CENTER: 
THE 
NEXT 
GENERATION 
OF 
HADOOP 
DATA 
MARKETING 
2014 
-­‐ 
TORONTO 
Adam 
Muise 
Principal 
Architect 
Hortonworks
Who 
am 
I?
Who 
is 
?
The 
leaders 
of 
Hadoop’s 
development 
We 
do 
Hadoop 
Community 
driven, 
Enterprise 
Focused 
Drive 
InnovaDon 
in 
the 
plaEorm 
– 
We 
lead 
the 
roadmap 
100% 
Open 
Source 
– 
DemocraDzed 
Access 
to 
Data
We 
do 
Hadoop 
successfully. 
> Develop 
Open 
Source 
Hadoop 
> Distribute 
Hadoop 
with 
HDP 
> Support 
> Professional 
Services 
> Training
Hortonworks Approach 
1 Innovate the Core 
Architect and build 
innovation at the core of 
Hadoop 
• YARN: Data Operating 
System 
• HDFS as the storage layer 
• Key processing engines 
Extend Hadoop as an 
2 Enterprise Data Platform 3 Enable the Ecosystem 
Extend Hadoop with enterprise 
capabilities for governance, 
security & operations 
Apply enterprise software rigor 
to the open source development 
process 
Enable the leaders in the data 
center to easily adopt & extend 
their platforms 
• Establish Hadoop as standard 
component of a modern data 
architecture 
• Joint engineering 
Script 
Pig 
YARN 
SQL 
Hive/Tez, 
HCatalog 
NoSQL 
HBase 
Accumulo 
Stream 
Storm 
: 
Data 
Opera>ng 
System 
Batch 
Map 
Reduce 
HDFS 
(Hadoop 
Distributed 
File 
System) 
HDP 
2.2 
Governance 
& Integration 
Security 
Operations 
Data Access 
YARN 
Data Management 
Memory 
Spark
…all done completely 4 in Open Source 
Innova>ng 
within 
the 
community 
for 
the 
enterprise 
• Open 
• Complete 
adopDon 
and 
minimizes 
lock 
in 
• Enables 
Script 
Pig 
YARN 
Source: 
fastest 
path 
to 
innovaDon 
for 
a 
plaEorm 
technology 
open 
source 
plaEorm 
speeds 
enterprise 
and 
ecosystem 
the 
market 
to 
funcDon 
much 
bigger 
much 
faster 
Memory 
Spark 
SQL 
Hive/Tez, 
HCatalog 
NoSQL 
HBase 
Accumulo 
Stream 
Storm 
: 
Data 
Opera>ng 
System 
Batch 
Map 
Reduce 
HDFS 
(Hadoop 
Distributed 
File 
System) 
HDP 
2.2 
Governance 
& Integration 
Security 
Operations 
Data Access 
YARN 
Data Management 
Driving 
our 
innova>on 
through 
Apache 
SoQware 
Founda>on 
Projects 
Apache 
Project 
CommiTers 
PMC 
Members 
Hadoop 
27 
20 
Pig 
5 
5 
Hive 
16 
4 
Tez 
15 
15 
HBase 
6 
4 
Phoenix 
4 
4 
Accumulo 
2 
2 
Storm 
3 
2 
Slider 
10 
10 
Flume 
1 
0 
Sqoop 
1 
1 
Ambari 
32 
27 
Oozie 
3 
2 
Zookeeper 
2 
1 
Knox 
11 
5 
Argus 
10 
n/a 
Falcon 
5 
3 
TOTAL 
153 
105
Let’s 
talk 
challenges…
Volume 
Volume 
Volume 
Volume
Volume 
Volume 
Volume 
Volume 
VoVluomlume 
e 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
VoVluomlume 
e 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
VoVluomlume 
e 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume 
Volume
Storage, 
Management, 
Processing 
all 
become 
challenges 
with 
Data 
at 
Volume
TradiDonal 
technologies 
adopt 
a 
divide, 
drop, 
and 
conquer 
approach
Data 
The 
soluDon? 
EDW 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Yet 
Another 
EDW 
DDataat 
Data 
Data 
a 
Data 
Data 
Data 
Data 
Data 
AnalyDcal 
DB 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
OLTP 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Another 
EDW 
DDataat 
Data 
Data 
a 
Data 
Data 
Data 
Data
Data 
Ummm…you 
Data 
dropped 
something 
Data 
Data 
Data 
Data 
Data 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
DDDaDatataaatt 
a 
Data 
Data 
Data 
Data 
Data 
Data 
DDDaaDattataaa 
Data 
t 
a 
Data 
Data 
Data 
Data 
Data 
Data 
DDataat 
a 
Data 
Data 
Data 
Data 
DDataat 
a 
Data 
Data 
Data 
Data 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
DaDtaat 
a 
Data 
Data 
Data 
Data 
EDW 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Yet 
Another 
EDW 
DDataat 
Data 
Data 
a 
Data 
Data 
Data 
Data 
AnalyDcal 
DB 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
OLTP 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Another 
EDW 
DDataat 
Data 
Data 
a 
Data 
Data 
Data 
Data
What 
keeps 
us 
from 
our 
Data?
Data 
Silos. 
Your 
data 
silos 
are 
lonely 
places. 
EDW 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Accounts 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Customers 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Web 
ProperDes 
DDataat 
Data 
Data 
a 
Data 
Data 
Data 
Data 
Data
… 
Data 
likes 
to 
be 
together. 
EDW 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Accounts 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Customers 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Web 
ProperDes 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data
Facebook 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
likes 
to 
socialize 
too. 
EDW 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Accounts 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Customers 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Web 
ProperDes 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Machine 
Data 
DDataat 
Data 
Data 
a 
Data 
Data 
Data 
Data 
Twiber 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
CDR 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Weather 
Data 
DDataat 
Data 
Data 
a 
Data 
Data 
Data 
Data
New 
types 
of 
data 
don’t 
quite 
fit 
into 
your 
prisDne 
view 
of 
the 
world. 
My 
Lible 
Data 
Empire 
DaDtaat 
Data 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Logs 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Machine 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
? 
? 
? 
?
To 
resolve 
this, 
some 
people 
take 
hints 
from 
Lord 
Of 
The 
Rings...
…and 
create 
One-­‐Schema-­‐To-­‐Rule-­‐ 
Them-­‐All… 
EDW 
DDataat 
a 
Data 
Data 
Data 
Data 
SchemaD 
ata 
Data 
Data
…but 
that 
has 
its 
problems 
too. 
EDW 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
SchemaD 
ata 
DaDtaat 
a 
ETL 
ETL 
ETL 
ETL 
EDW 
DDataat 
a 
Data 
Data 
Data 
Data 
Data 
Data 
Data 
SchemaD 
ata 
DaDtaat 
a 
ETL 
ETL 
ETL 
ETL
What 
if 
the 
data 
was 
processed 
and 
stored 
centrally? 
What 
if 
you 
didn’t 
need 
to 
force 
it 
into 
a 
single 
schema? 
We 
call 
it 
a 
Modern 
Data 
Architecture* 
*AKA 
Data 
Lake
A Modern Data Architecture 
• Consolidate siloed data sets structured 
and unstructured 
• Central data set on a single cluster 
• Multiple workloads across batch 
interactive and real time 
• Central services for security, governance 
and operation 
• Preserve existing investment in current 
tools and platforms 
• Single view of the customer, product, 
supply chain 
APPLICATIONS 
DATA 
SYSTEM 
Business 
Analy>cs 
Custom 
Applica>ons 
Packaged 
Applica>ons 
RDBMS 
EDW 
MPP 
Batch Interactive Real-Time 
YARN: 
Data 
Opera>ng 
System 
1 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
N 
CRM 
ERP 
Other 
1 
° 
° 
° 
° 
° 
° 
HDFS 
(Hadoop 
Distributed 
File 
System) 
SOURCES 
EXISTING 
Systems 
Clickstream 
Web 
&Social 
Geoloca>on 
Sensor 
& 
Machine 
Server 
Logs 
Unstructured
What 
do 
you 
want 
to 
do 
with 
your 
data?
MarkeDng 
AnalyDcs 
needs 
data. 
Work 
with 
the 
populaDon, 
not 
just 
a 
sample.
Your 
segmentaDon 
today. 
Male 
Female 
Age: 
25-­‐30 
Town/City 
Middle 
Income 
Band 
Product 
Category 
Preferences
Looking 
to 
start 
a 
business 
Your 
segmentaDon 
with 
beber 
data. 
Male 
Female 
Age: 
27 
but 
feels 
old 
GPS 
coordinates 
$65-­‐68k 
per 
year 
Product 
recommendaDons 
per 
Dme 
of 
day 
and 
per 
weather 
Tea 
Party 
Hippie 
Walking 
into 
Starbucks 
right 
now… 
A 
depressed 
Toronto 
Maple 
Leaf’s 
Fan 
Products 
lem 
in 
basket 
indicate 
drunk 
amazon 
shopper 
Purchase 
history 
indicates 
a 
risk 
taker 
Thinking 
about 
a 
new 
house 
Unhappy 
with 
his 
cell 
phone 
plan 
Pregnant 
Spent 
25 
minutes 
looking 
at 
tea 
cozies
Pick 
up 
all 
of 
that 
data 
that 
was 
prohibiDvely 
expensive 
to 
store 
and 
use.
To 
approach 
these 
use 
cases 
you 
need 
an 
affordable 
plaEorm 
that 
stores, 
processes, 
and 
analyzes 
the 
data.
Don’t 
wait 
for 
your 
data. 
Batch 
is 
omen 
too 
late 
to 
influence 
the 
person 
who 
is 
in 
your 
store 
or 
on 
your 
website 
right 
now.
Streaming Processing, Search, and Storage 
APACHE 
KAFKA 
YARN 
HDFS 
Hortonworks 
Data 
Plaaorm 
2.2 
Search 
Slider 
Solr 
Online 
Data 
Processing 
HBase 
Real 
Time 
Stream 
Processing 
Storm 
SQL 
Hive 
Streaming 
Ingest 
Stream 
data 
into 
Hadoop 
and 
process 
it 
in 
near 
real-­‐;me 
Real-­‐Dme 
data 
feeds
How? 
With 
Hortonworks 
Data 
PlaEorm* 
*AKA 
Hadoop
What’s New in HDP 2.2 
New and Improved YARN 
Ready Engines 
• Enterprise SQL at Hadoop Scale with 
Stinger.next 
• Enterprise Ready Spark on YARN 
• Deep YARN integration for real-time 
engines: HBase, Accumulo, Storm 
• Enabling ISVs with a general SDK and API 
for direct YARN integration 
• Only solution to provide real-time to micro 
batch for analyzing the internet of things 
• Other engines/tools: Solr, Cascading 
Continued Innovation of 
Central Enterprise Services 
• Centralized security administration 
and policy enforcement 
• Ease of use and operations agility 
features to speed cluster 
deployment 
• 100% uptime target with cluster 
rolling upgrades 
Expanded Deployment Options 
• Enhanced business continuity with 
replication/archival across on-premises 
and cloud storage tiers (Azure Blob, S3) 
• Simultaneous ship of Windows and Linux 
installs 
• Expand Azure support beyond HDInsight 
Azure to include HDP for Windows or 
Linux in Azure VMs 
HDP 
2.2 
Delivering 
Apache 
Hadoop 
for 
the 
Enterprise
Complete List of New Features in HDP 2.2 
Apache Hadoop YARN 
• Slide existing services onto YARN through ‘Slider’ 
• GA release of HBase, Accumulo, and Storm on 
YARN 
• Support long running services: handling of logs, 
containers not killed when AM dies, secure token 
renewal, YARN Labels for tagging nodes for specific 
workloads 
• Support for CPU Scheduling and CPU Resource 
Isolation through CGroups 
Apache Hadoop HDFS 
• Heterogeneous storage: Support for archival 
• Rolling Upgrade (This is an item that applies to the 
entire HDP Stack. YARN, Hive, HBase, everything. 
We now support comprehensive Rolling Upgrade 
across the HDP Stack). 
• Multi-NIC Support 
• Heterogeneous storage: Support memory as a 
storage tier (TP) 
• HDFS Transparent Data Encryption (TP) 
Apache Hive, Apache Pig, and Apache Tez 
• Hive Cost Based Optimizer: Function Pushdown & 
Join re-ordering support for other join types: star & 
bushy. 
• Hive SQL Enhancements including: 
• ACID Support: Insert, Update, Delete 
• Temporary Tables 
• Metadata-only queries return instantly 
• Pig on Tez 
• Including DataFu for use with Pig 
• Vectorized shuffle 
• Tez Debug Tooling & UI 
Hue 
• Support for HiveServer 2 
• Support for Resource Manager HA 
Apache Spark 
• Refreshed Tech Preview to Spark 1.1.0 (available 
now) 
• ORC File support & Hive 0.13 integration 
• Planned for GA of Spark 1.2.0 
• Operations integration via YARN ATS and Ambari 
• Security: Authentication 
• Apache Solr 
• Added Banana, a rich and flexible UI for visualizing 
time series data indexed in Solr 
• Cascading 
• Cascading 3.0 on Tez distributed with HDP 
— coming soon 
Apache Falcon 
• Authentication Integration 
• Lineage – now GA. (it’s been a tech preview 
feature…) 
• Improve UI for pipeline management & editing: list, 
detail, and create new (from existing elements) 
• Replicate to Cloud – Azure & S3 
Apache Sqoop, Apache Flume & Apache Oozie 
• Sqoop import support for Hive types via HCatalog 
• Secure Windows cluster support: Sqoop, Flume, 
Oozie 
• Flume streaming support: sink to HCat on secure 
cluster 
• Oozie HA now supports secure clusters 
• Oozie Rolling Upgrade 
• Operational improvements for Oozie to better 
support Falcon 
• Capture workflow job logs in HDFS 
• Don’t start new workflows for re-run 
• Allow job property updates on running jobs 
Apache HBase, Apache Phoenix, & Apache 
Accumulo 
• HBase & Accumulo on YARN via Slider 
• HBase HA 
• Replicas update in real-time 
• Fully supports region split/merge 
• Scan API now supports standby RegionServers 
• HBase Block cache compression 
• HBase optimizations for low latency 
• Phoenix Robust Secondary Indexes 
• Performance enhancements for bulk import into 
Phoenix 
• Hive over HBase Snapshots 
• Hive Connector to Accumulo 
• HBase & Accumulo wire-level encryption 
• Accumulo multi-datacenter replication 
Apache Storm 
• Storm-on-YARN via Slider 
• Ingest & notification for JMS (IBM MQ not 
supported) 
• Kafka bolt for Storm – supports sophisticated 
chaining of topologies through Kafka 
• Kerberos support 
• Hive update support – Streaming Ingest 
• Connector improvements for HBase and HDFS 
• Deliver Kafka as a companion component 
• Kafka install, start/stop via Ambari 
• Security Authorization Integration with Ranger 
Apache Slider 
• Allow on-demand create and run different versions 
of heterogeneous applications 
• Allow users to configure different application 
instances differently 
• Manage operational lifecycle of application 
instances 
• Expand / shrink application instances 
• Provide application registry for publish and 
discovery 
Apache Knox & Apache Ranger (Argus) & HDP 
Security 
• Apache Ranger – Support authorization and auditing 
for Storm and Knox 
• Introducing REST APIs for managing policies in 
Apache Ranger 
• Apache Ranger – Support native grant/revoke 
permissions in Hive and HBase 
• Apache Ranger – Support Oracle DB and storing of 
audit logs in HDFS 
• Apache Ranger to run on Windows environment 
• Apache Knox to protect YARN RM 
• Apache Knox support for HDFS HA 
• Apache Ambari install, start/stop of Knox 
Apache Ambari 
• Support for HDP 2.2 Stack, including support for 
Kafka, Knox and Slider 
• Enhancements to Ambari Web configuration 
management including: versioning, history and 
revert, setting final properties and downloading client 
configurations 
• Launch and monitor HDFS rebalance 
• Perform Capacity Scheduler queue refresh 
• Configure High Availability for ResourceManager 
• Ambari Administration framework for managing user 
and group access to Ambari 
• Ambari Views development framework for 
customizing the Ambari Web user experience 
• Ambari Stacks for extending Ambari to bring custom 
Services under Ambari management 
• Ambari Blueprints for automating cluster 
deployments 
• Performance improvements and enterprise usability 
guardrails
Hortonworks Data Platform: 
A comprehensive data management platform 
Hortonworks 
Data 
Plaaorm 
2.2 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
YARN: Data Operating System 
(Cluster 
Resource 
Management) 
Script 
Pig 
SQL 
Hive 
TezTez 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
Others 
ISV 
Engines 
° ° ° ° ° 
° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Sli der 
Slider 
GOVERNANCE 
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY 
OPERATIONS 
In-Memory 
Spark 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
Kafka 
NFS 
WebHDFS 
Authentication 
Authorization 
Accounting 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive, … 
Pipeline: Falcon 
Cluster: Knox 
Cluster: Ranger 
Linux Windows Deployment Choice On-Premises Cloud 
YARN 
is the architectural 
center of HDP 
Enables batch, interactive 
and real-time workloads 
Provides comprehensive 
enterprise capabilities 
The widest range of 
deployment options 
Delivered 
Completely 
in 
the 
OPEN
HDP = Apache Hadoop 
& 
HCatalog 
0.98.0 
0.5.0 
0.4.0 
4.10.0 
4.7.2 
Hortonworks 
Data 
Plaaorm 
2.2 
Hadoop 
&YARN 
Pig 
Hive 
HBase 
Sqoop 
4.0.0 
Oozie 
Zookeeper 
1.5.1 
Ambari 
Storm 
1.4.0 
Flume 
Knox 
Phoenix 
Accumulo 
2.2.0 
0.12.0 
0.12.0 
2.4.0 
0.12.1 
Data 
Management 
0.13.0 
0.96.1 
0.9.1 
1.4.4 
1.3.1 
1.4.4 
3.3.2 
3.4.5 
0.4.0 
4.0.0 
1.5.1 
Falcon 
Ranger 
Spark 
Kafa 
0.14.0 
0.14.0 
0.98.4 
1.6.1 
4.2 
0.9.3 
1.2.0 
0.6.0 
0.8.1 
1.4.5 
1.5.0 
1.7.0 
4.1.0 
0.5.0 
0.4.0 
2.6.0 
3.4.5 
Tez 
Slider 
0.60 
HDP 
2.2 
October 
2014 
HDP 
2.1 
April 
2014 
HDP 
2.0 
October 
2013 
Solr 
0.5.1 
Data Access Governance 
& Integration Operations Security
What 
else 
are 
we 
working 
on? 
hortonworks.com/labs/
Hadoop 
is 
the 
new 
Data 
OperaDng 
System 
for 
the 
Enterprise
© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 
42 
There is NO second place 
Hortonworks 
…the 
Bull 
Elephant 
of 
Hadoop 
Innova>on

Contenu connexe

Tendances

Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
Yahoo Developer Network
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
Edureka!
 

Tendances (20)

What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
 
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 

Similaire à Hadoop at the Center: The Next Generation of Hadoop

Big data tim
Big data timBig data tim
Big data tim
T Weir
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best PracticesNon-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Jyrki Määttä
 

Similaire à Hadoop at the Center: The Next Generation of Hadoop (20)

Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend
 
Big data tim
Big data timBig data tim
Big data tim
 
Big Data
Big DataBig Data
Big Data
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947
 
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best PracticesNon-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 

Plus de Adam Muise

KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Adam Muise
 

Plus de Adam Muise (13)

2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 

Dernier

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Dernier (20)

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 

Hadoop at the Center: The Next Generation of Hadoop

  • 1. HADOOP AT THE CENTER: THE NEXT GENERATION OF HADOOP DATA MARKETING 2014 -­‐ TORONTO Adam Muise Principal Architect Hortonworks
  • 4. The leaders of Hadoop’s development We do Hadoop Community driven, Enterprise Focused Drive InnovaDon in the plaEorm – We lead the roadmap 100% Open Source – DemocraDzed Access to Data
  • 5. We do Hadoop successfully. > Develop Open Source Hadoop > Distribute Hadoop with HDP > Support > Professional Services > Training
  • 6. Hortonworks Approach 1 Innovate the Core Architect and build innovation at the core of Hadoop • YARN: Data Operating System • HDFS as the storage layer • Key processing engines Extend Hadoop as an 2 Enterprise Data Platform 3 Enable the Ecosystem Extend Hadoop with enterprise capabilities for governance, security & operations Apply enterprise software rigor to the open source development process Enable the leaders in the data center to easily adopt & extend their platforms • Establish Hadoop as standard component of a modern data architecture • Joint engineering Script Pig YARN SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm : Data Opera>ng System Batch Map Reduce HDFS (Hadoop Distributed File System) HDP 2.2 Governance & Integration Security Operations Data Access YARN Data Management Memory Spark
  • 7. …all done completely 4 in Open Source Innova>ng within the community for the enterprise • Open • Complete adopDon and minimizes lock in • Enables Script Pig YARN Source: fastest path to innovaDon for a plaEorm technology open source plaEorm speeds enterprise and ecosystem the market to funcDon much bigger much faster Memory Spark SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm : Data Opera>ng System Batch Map Reduce HDFS (Hadoop Distributed File System) HDP 2.2 Governance & Integration Security Operations Data Access YARN Data Management Driving our innova>on through Apache SoQware Founda>on Projects Apache Project CommiTers PMC Members Hadoop 27 20 Pig 5 5 Hive 16 4 Tez 15 15 HBase 6 4 Phoenix 4 4 Accumulo 2 2 Storm 3 2 Slider 10 10 Flume 1 0 Sqoop 1 1 Ambari 32 27 Oozie 3 2 Zookeeper 2 1 Knox 11 5 Argus 10 n/a Falcon 5 3 TOTAL 153 105
  • 10. Volume Volume Volume Volume VoVluomlume e Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume
  • 11. Volume Volume Volume Volume Volume Volume Volume Volume Volume VoVluomlume e Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume
  • 12. Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume VoVluomlume e Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume
  • 13. Storage, Management, Processing all become challenges with Data at Volume
  • 14. TradiDonal technologies adopt a divide, drop, and conquer approach
  • 15. Data The soluDon? EDW DDataat a Data Data Data Data Data Data Data Data Data Yet Another EDW DDataat Data Data a Data Data Data Data Data AnalyDcal DB DDataat a Data Data Data Data Data Data OLTP DDataat a Data Data Data Data Data Data Another EDW DDataat Data Data a Data Data Data Data
  • 16. Data Ummm…you Data dropped something Data Data Data Data Data DDataat a Data Data Data Data Data Data Data DDDaDatataaatt a Data Data Data Data Data Data DDDaaDattataaa Data t a Data Data Data Data Data Data DDataat a Data Data Data Data DDataat a Data Data Data Data DDataat a Data Data Data Data Data DDataat a Data Data Data Data Data DDataat a Data Data Data Data Data Data Data Data Data Data Data Data Data Data DaDtaat a Data Data Data Data EDW DDataat a Data Data Data Data Data Data Yet Another EDW DDataat Data Data a Data Data Data Data AnalyDcal DB DDataat a Data Data Data Data Data Data OLTP DDataat a Data Data Data Data Data Data Another EDW DDataat Data Data a Data Data Data Data
  • 17. What keeps us from our Data?
  • 18. Data Silos. Your data silos are lonely places. EDW DDataat a Data Data Data Data Data Data Data Accounts DDataat a Data Data Data Data Data Data Data Customers DDataat a Data Data Data Data Data Data Data Web ProperDes DDataat Data Data a Data Data Data Data Data
  • 19. … Data likes to be together. EDW DDataat a Data Data Data Data Data Accounts DDataat a Data Data Data Data Data Data Data Customers DDataat a Data Data Data Data Data Data Data Web ProperDes DDataat a Data Data Data Data Data Data Data Data Data
  • 20. Facebook DDataat a Data Data Data Data Data likes to socialize too. EDW DDataat a Data Data Data Data Data Accounts DDataat a Data Data Data Data Data Data Data Customers DDataat a Data Data Data Data Data Data Data Data Data Data Data Data Web ProperDes DDataat a Data Data Data Data Data Data Data Data Data Machine Data DDataat Data Data a Data Data Data Data Twiber DDataat a Data Data Data Data Data Data Data Data CDR DDataat a Data Data Data Data Data Data Weather Data DDataat Data Data a Data Data Data Data
  • 21. New types of data don’t quite fit into your prisDne view of the world. My Lible Data Empire DaDtaat Data a Data Data Data Data Data Data Logs Data Data Data Data Data Data Data Machine Data Data Data Data Data Data Data Data ? ? ? ?
  • 22. To resolve this, some people take hints from Lord Of The Rings...
  • 23. …and create One-­‐Schema-­‐To-­‐Rule-­‐ Them-­‐All… EDW DDataat a Data Data Data Data SchemaD ata Data Data
  • 24. …but that has its problems too. EDW DDataat a Data Data Data Data Data Data Data SchemaD ata DaDtaat a ETL ETL ETL ETL EDW DDataat a Data Data Data Data Data Data Data SchemaD ata DaDtaat a ETL ETL ETL ETL
  • 25. What if the data was processed and stored centrally? What if you didn’t need to force it into a single schema? We call it a Modern Data Architecture* *AKA Data Lake
  • 26. A Modern Data Architecture • Consolidate siloed data sets structured and unstructured • Central data set on a single cluster • Multiple workloads across batch interactive and real time • Central services for security, governance and operation • Preserve existing investment in current tools and platforms • Single view of the customer, product, supply chain APPLICATIONS DATA SYSTEM Business Analy>cs Custom Applica>ons Packaged Applica>ons RDBMS EDW MPP Batch Interactive Real-Time YARN: Data Opera>ng System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N CRM ERP Other 1 ° ° ° ° ° ° HDFS (Hadoop Distributed File System) SOURCES EXISTING Systems Clickstream Web &Social Geoloca>on Sensor & Machine Server Logs Unstructured
  • 27. What do you want to do with your data?
  • 28. MarkeDng AnalyDcs needs data. Work with the populaDon, not just a sample.
  • 29. Your segmentaDon today. Male Female Age: 25-­‐30 Town/City Middle Income Band Product Category Preferences
  • 30. Looking to start a business Your segmentaDon with beber data. Male Female Age: 27 but feels old GPS coordinates $65-­‐68k per year Product recommendaDons per Dme of day and per weather Tea Party Hippie Walking into Starbucks right now… A depressed Toronto Maple Leaf’s Fan Products lem in basket indicate drunk amazon shopper Purchase history indicates a risk taker Thinking about a new house Unhappy with his cell phone plan Pregnant Spent 25 minutes looking at tea cozies
  • 31. Pick up all of that data that was prohibiDvely expensive to store and use.
  • 32. To approach these use cases you need an affordable plaEorm that stores, processes, and analyzes the data.
  • 33. Don’t wait for your data. Batch is omen too late to influence the person who is in your store or on your website right now.
  • 34. Streaming Processing, Search, and Storage APACHE KAFKA YARN HDFS Hortonworks Data Plaaorm 2.2 Search Slider Solr Online Data Processing HBase Real Time Stream Processing Storm SQL Hive Streaming Ingest Stream data into Hadoop and process it in near real-­‐;me Real-­‐Dme data feeds
  • 35. How? With Hortonworks Data PlaEorm* *AKA Hadoop
  • 36. What’s New in HDP 2.2 New and Improved YARN Ready Engines • Enterprise SQL at Hadoop Scale with Stinger.next • Enterprise Ready Spark on YARN • Deep YARN integration for real-time engines: HBase, Accumulo, Storm • Enabling ISVs with a general SDK and API for direct YARN integration • Only solution to provide real-time to micro batch for analyzing the internet of things • Other engines/tools: Solr, Cascading Continued Innovation of Central Enterprise Services • Centralized security administration and policy enforcement • Ease of use and operations agility features to speed cluster deployment • 100% uptime target with cluster rolling upgrades Expanded Deployment Options • Enhanced business continuity with replication/archival across on-premises and cloud storage tiers (Azure Blob, S3) • Simultaneous ship of Windows and Linux installs • Expand Azure support beyond HDInsight Azure to include HDP for Windows or Linux in Azure VMs HDP 2.2 Delivering Apache Hadoop for the Enterprise
  • 37. Complete List of New Features in HDP 2.2 Apache Hadoop YARN • Slide existing services onto YARN through ‘Slider’ • GA release of HBase, Accumulo, and Storm on YARN • Support long running services: handling of logs, containers not killed when AM dies, secure token renewal, YARN Labels for tagging nodes for specific workloads • Support for CPU Scheduling and CPU Resource Isolation through CGroups Apache Hadoop HDFS • Heterogeneous storage: Support for archival • Rolling Upgrade (This is an item that applies to the entire HDP Stack. YARN, Hive, HBase, everything. We now support comprehensive Rolling Upgrade across the HDP Stack). • Multi-NIC Support • Heterogeneous storage: Support memory as a storage tier (TP) • HDFS Transparent Data Encryption (TP) Apache Hive, Apache Pig, and Apache Tez • Hive Cost Based Optimizer: Function Pushdown & Join re-ordering support for other join types: star & bushy. • Hive SQL Enhancements including: • ACID Support: Insert, Update, Delete • Temporary Tables • Metadata-only queries return instantly • Pig on Tez • Including DataFu for use with Pig • Vectorized shuffle • Tez Debug Tooling & UI Hue • Support for HiveServer 2 • Support for Resource Manager HA Apache Spark • Refreshed Tech Preview to Spark 1.1.0 (available now) • ORC File support & Hive 0.13 integration • Planned for GA of Spark 1.2.0 • Operations integration via YARN ATS and Ambari • Security: Authentication • Apache Solr • Added Banana, a rich and flexible UI for visualizing time series data indexed in Solr • Cascading • Cascading 3.0 on Tez distributed with HDP — coming soon Apache Falcon • Authentication Integration • Lineage – now GA. (it’s been a tech preview feature…) • Improve UI for pipeline management & editing: list, detail, and create new (from existing elements) • Replicate to Cloud – Azure & S3 Apache Sqoop, Apache Flume & Apache Oozie • Sqoop import support for Hive types via HCatalog • Secure Windows cluster support: Sqoop, Flume, Oozie • Flume streaming support: sink to HCat on secure cluster • Oozie HA now supports secure clusters • Oozie Rolling Upgrade • Operational improvements for Oozie to better support Falcon • Capture workflow job logs in HDFS • Don’t start new workflows for re-run • Allow job property updates on running jobs Apache HBase, Apache Phoenix, & Apache Accumulo • HBase & Accumulo on YARN via Slider • HBase HA • Replicas update in real-time • Fully supports region split/merge • Scan API now supports standby RegionServers • HBase Block cache compression • HBase optimizations for low latency • Phoenix Robust Secondary Indexes • Performance enhancements for bulk import into Phoenix • Hive over HBase Snapshots • Hive Connector to Accumulo • HBase & Accumulo wire-level encryption • Accumulo multi-datacenter replication Apache Storm • Storm-on-YARN via Slider • Ingest & notification for JMS (IBM MQ not supported) • Kafka bolt for Storm – supports sophisticated chaining of topologies through Kafka • Kerberos support • Hive update support – Streaming Ingest • Connector improvements for HBase and HDFS • Deliver Kafka as a companion component • Kafka install, start/stop via Ambari • Security Authorization Integration with Ranger Apache Slider • Allow on-demand create and run different versions of heterogeneous applications • Allow users to configure different application instances differently • Manage operational lifecycle of application instances • Expand / shrink application instances • Provide application registry for publish and discovery Apache Knox & Apache Ranger (Argus) & HDP Security • Apache Ranger – Support authorization and auditing for Storm and Knox • Introducing REST APIs for managing policies in Apache Ranger • Apache Ranger – Support native grant/revoke permissions in Hive and HBase • Apache Ranger – Support Oracle DB and storing of audit logs in HDFS • Apache Ranger to run on Windows environment • Apache Knox to protect YARN RM • Apache Knox support for HDFS HA • Apache Ambari install, start/stop of Knox Apache Ambari • Support for HDP 2.2 Stack, including support for Kafka, Knox and Slider • Enhancements to Ambari Web configuration management including: versioning, history and revert, setting final properties and downloading client configurations • Launch and monitor HDFS rebalance • Perform Capacity Scheduler queue refresh • Configure High Availability for ResourceManager • Ambari Administration framework for managing user and group access to Ambari • Ambari Views development framework for customizing the Ambari Web user experience • Ambari Stacks for extending Ambari to bring custom Services under Ambari management • Ambari Blueprints for automating cluster deployments • Performance improvements and enterprise usability guardrails
  • 38. Hortonworks Data Platform: A comprehensive data management platform Hortonworks Data Plaaorm 2.2 Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) Script Pig SQL Hive TezTez 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox Cluster: Ranger Linux Windows Deployment Choice On-Premises Cloud YARN is the architectural center of HDP Enables batch, interactive and real-time workloads Provides comprehensive enterprise capabilities The widest range of deployment options Delivered Completely in the OPEN
  • 39. HDP = Apache Hadoop & HCatalog 0.98.0 0.5.0 0.4.0 4.10.0 4.7.2 Hortonworks Data Plaaorm 2.2 Hadoop &YARN Pig Hive HBase Sqoop 4.0.0 Oozie Zookeeper 1.5.1 Ambari Storm 1.4.0 Flume Knox Phoenix Accumulo 2.2.0 0.12.0 0.12.0 2.4.0 0.12.1 Data Management 0.13.0 0.96.1 0.9.1 1.4.4 1.3.1 1.4.4 3.3.2 3.4.5 0.4.0 4.0.0 1.5.1 Falcon Ranger Spark Kafa 0.14.0 0.14.0 0.98.4 1.6.1 4.2 0.9.3 1.2.0 0.6.0 0.8.1 1.4.5 1.5.0 1.7.0 4.1.0 0.5.0 0.4.0 2.6.0 3.4.5 Tez Slider 0.60 HDP 2.2 October 2014 HDP 2.1 April 2014 HDP 2.0 October 2013 Solr 0.5.1 Data Access Governance & Integration Operations Security
  • 40. What else are we working on? hortonworks.com/labs/
  • 41. Hadoop is the new Data OperaDng System for the Enterprise
  • 42. © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 42 There is NO second place Hortonworks …the Bull Elephant of Hadoop Innova>on