Contenu connexe Similaire à A First-Hand Look at What's New in HDP 2.3 (20) Plus de DataWorks Summit (20) A First-Hand Look at What's New in HDP 2.3 1. Page1 © Hortonworks Inc. 2015
A First-Hand Look at What's New in HDP
2.3
Tim E. Hall
VP, Product Management
Hortonworks
June 2015
2. Page2 © Hortonworks Inc. 2015
Empowering More Organizations to
Drive Transformational Outcomes
Introducing Hortonworks® Data Platform 2.3
3. Page3 © Hortonworks Inc. 2015
Retailer builds 360° view of its customers
Challenges
• Cost: Data silos led to duplicate storage expenses
• Customer: Data fragmentation (with as many as 15 different records
on the same customer) harmed service quality
• Supply chain: Mismatch between inventory and store-specific
demand led to inefficient carrying costs
Results
• Cost: Data offload and consolidation saved millions
• Customer: Single view of customer personalized promotions
• Supply chain: A single view fed by 12 legacy systems improved
visibility and streamlined inventory management
• Pricing: Optimization added $80 million in top-line revenue
4. Page4 © Hortonworks Inc. 2015
Security company protects its customers from intrusions
Challenges
• Cost: Redundant storage systems cost many millions annually,
data retention limited to no more than two years
• Multi-tenancy: Unable to support simultaneous users with ad
hoc, data science and predictive analytics tasks
• Speed: Latencies created lags that attackers could exploit
Results
• Cost: Millions saved through elimination of redundant platforms
• Multi-tenancy: Concurrent jobs run in a private cloud
• Ingest: 105 million log events per minute
• Processing time: Time reduced from four hours to two seconds
• High availability: Zero downtime across rolling upgrades
“The recent transformation of
business and consumer
technologies has driven
pervasive mobility and an
explosion of data resulting in the
need for a new approach to
protecting devices, applications,
data and users.”
Company’s 2014 annual report
5. Page5 © Hortonworks Inc. 2015
New Capabilities in Hortonworks Data Platform 2.3
Breakthrough User
Experience
Dramatic Improvement in the User Experience
HDP 2.3 eliminates much of the complexity administering
Hadoop and improves developer productivity.
Enhanced Security
and Governance
Enhanced Security and Data Governance
HDP 2.3 delivers new encryption of data at rest, and
extends the data governance initiative with Apache™ Atlas.
Proactive Support
Extending the Value of a Hortonworks Subscription
Hortonworks® SmartSense™ adds proactive cluster monitoring,
enhancing Hortonworks’ award-winning support in key areas.
Apache is a trademark of the Apache Software Foundation.
6. Page6 © Hortonworks Inc. 2015
New Capabilities in HDP 2.3
Breakthrough User
Experience
Dramatic Improvement in the User Experience
HDP 2.3 eliminates much of the complexity administering
Hadoop and improves developer productivity.
Enhanced Security
and Governance
Proactive Support
Extending the Value of a Hortonworks Subscription
Hortonworks® SmartSense™ adds proactive cluster monitoring,
enhancing Hortonworks’ award-winning support in key areas.
Enhanced Security and Data Governance
HDP 2.3 delivers new encryption of data at rest, and
extends the data governance initiative with Apache™ Atlas.
7. Page7 © Hortonworks Inc. 2015
Ambari Views Framework
Goal: enable the delivery of custom UI experiences in Ambari Web
Developers can extend the Ambari Web interface
• Views expose custom UI features for Hadoop Services
Ambari Admins can entitle Views to Ambari Web users
• Entitlements framework for controlling access to Views
8. Page8 © Hortonworks Inc. 2015
Views Framework
Views Framework vs. Views
Views
Core to Ambari
Built by
Hortonworks,
Community,
Partners
9. Page9 © Hortonworks Inc. 2015
Views Framework
Views Framework vs. Views
Views
Core to Ambari
Built by
Hortonworks,
Community,
Partners
10. Page10 © Hortonworks Inc. 2015
View Components
• Serve client-side assets (such as HTML + JavaScript)
• Expose server-side resources (such as REST endpoints)
VIEW
Client-side
assets
(.js, html)
AMBARI WEB
VIEW
Server-side
resources
(java)
AMBARI SERVER
{rest}
Hadoop
and
other
systems
11. Page11 © Hortonworks Inc. 2015
View Delivery
1. Develop the View (just like you would for a Web App)
2. Package as a View (basically a WAR)
3. Deploy the View into Ambari
4. Ambari Admins create + configuration view instance(s) and give
access to users + groups
Develop DeployPackage
Create
Instance(s)
12. Page12 © Hortonworks Inc. 2015
Versions and Instances
• Deploy multiple versions and create multiple instances of a view
• Manage accessibility and usage
13. Page13 © Hortonworks Inc. 2015
Choice of Deployment Model
• For Hadoop Operators:
Deploy Views in an Ambari Server that is managing a Hadoop cluster
• For Data Workers:
Run Views in a “standalone” Ambari Server
Ambari
Server
HADOOP
Store & Process
Ambari
Server
Operators
manage the
cluster, may
have Views
deployed
Data
Workers use
the cluster
and use a
“standalone”
Ambari
Server for
Views
14. Page14 © Hortonworks Inc. 2015
Improved Ease of Use for the Hadoop Operator
Responsibilities include:
• Deploying Hadoop® clusters
• Managing cluster health
• Troubleshooting and resolving issues
Hadoop Operator
Simpler administration speeds
time to value
Easy Setup and Installation
Streamlined configuration experience
Customizable Dashboards
Track cluster health with KPIs and drill downs
Easier Provisioning and Faster Cluster
Formation
Cloudbreak simplifies provisioning. Ambari speeds
cluster formation with automated host discovery.
16. Page16 © Hortonworks Inc. 2015
Ease installation and
configuration for HDFS,
YARN, Hive and HBase
Makes Key Configs Visible
Clearly displays the set of options
Recommends Settings
Suggests optimal ranges
Highlights Dependencies
Lets you visualize any impact on
dependent services
Hadoop Operator
18. Page18 © Hortonworks Inc. 2015
System Administrator
Hadoop operators can
configure dashboards to
show KPIs
Out-of-box Templates
Based on common best practices
Personalized Experience
Create new display widgets built from
Hadoop metrics. Add or remove
existing widgets.
Reusable and Shareable
Widget library allows other operators
to re-use community widgets
20. Page20 © Hortonworks Inc. 2015
Host discovery makes cluster
expansion automatic, fast,
orderly and predictable
Faster
Expand clusters incrementally and automatically
as each new node becomes available
Easier
Pre-plan automatic expansion paths
Flexible for Cloud or On-premises
Discover hosts wherever they are
Ambari
Hadoop Operator
Host Discovery Eases Cluster Formation
21. Page21 © Hortonworks Inc. 2015
Learn More about Ambari
Thursday, 3:10-3:50 – What’s New in Apache Ambari
with Sumit Mohanty & Yusako Sako
22. Page22 © Hortonworks Inc. 2015
Launch HDP on Leading Cloud Platforms
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari Blueprints:
IoT Apps, BI / Analytics, Data Science,
Dev / Test
23. Page23 © Hortonworks Inc. 2015
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari Blueprints:
IoT Apps, BI / Analytics, Data Science,
Dev / Test
24. Page24 © Hortonworks Inc. 2015
Cloudbreak automates provisioning and
scaling clusters in the cloud
Hadoop Operator
25. Page25 © Hortonworks Inc. 2015
Hadoop Operator
Cloudbreak automates cluster
provisioning and scaling for the cloud in
only 3 steps
26. Page26 © Hortonworks Inc. 2015
Step 1: Choose your cloud provider –
Microsoft Azure, Amazon AWS, Google
Cloud Platform or OpenStack
31. Page31 © Hortonworks Inc. 2015
Hadoop Operator
Leverage re-usable
blueprints to provision
HDP in any environment
Public or Private Clouds
Dynamically set up public or private
cloud clusters from the web console
Automated Scaling
Manage elasticity requirements as
cluster demands grow
Choice of Many Clouds
Supports Microsoft Azure, AWS,
Google and Open Stack clouds
32. Page32 © Hortonworks Inc. 2015
Learn More about Cloudbreak
Wednesday, 2:35-3:15 – One-click Hadoop Clusters - anywhere (using Docker)
with Janos Matyas
33. Page33 © Hortonworks Inc. 2015
Preview URL: launch.hortonworks.com
Launch an HDP cluster
with only a few clicks
Easy Setup
With the leading public cloud
platforms: Microsoft Azure, AWS and
Google Cloud
Easy Exploration
Try out the latest features in HDP
Your Data
Use the newest cluster technologies
with your own familiar dataset
34. Page34 © Hortonworks Inc. 2015
Advances for the Developer
Responsibilities include:
• Developing SQL queries
• Developing new Spark applications
• Implementing streaming data analytics
Developer
Develop Hadoop applications with
ease and speed
Visualization of SQL Queries
Streamlined user interface for Apache Hive
Improvements to Apache Spark on YARN
Machine Learning, Data Frame API, New SQL (Preview)
Enterprise Enhancements for Streaming
Fault tolerance, security, and rolling upgrades for
Apache Kafka and Apache Storm
35. Page35 © Hortonworks Inc. 2015
Enhanced SQL Semantics and New SQL User View
The rich developer experience includes enhanced
SQL semantics and a new user interface
Enhanced SQL Semantics
Include interval types in expressions and added UNION
SQL User View in Ambari
Write, debug and run Hive SQL queries
Performance Improvements
2.5x performance gain
Query Scheduling
Dynamically share resources for Hive queries
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
36. Page36 © Hortonworks Inc. 2015
Developer
New user interface enables fast &
easy SQL definition and execution.
37. Page37 © Hortonworks Inc. 2015
New capabilities add dynamic access methods
to feature-rich Spark applications
Data Frame API
Enables common and easy interchange between Spark
components for data imports and exports
Machine Learning
Introduces multiclass classification, clustering,
frequent pattern-mining algorithms
Enterprise-Ready
Consistent operations, comprehensive security,
deployable anywhere
Spark SQL
[Tech Preview] A new module for structured data processing
in Spark
Improvements for Apache Spark on YARN
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
38. Page38 © Hortonworks Inc. 2015
Stream analysis, scalable across the cluster
Nimbus High Availability
No single point of failure for stream processing job
management
Ease of Deployment
Quickly create stream processing pipelines
Rolling Upgrades
Update Storm to newer versions, with zero downtime
Enhanced Security for Kafka
Authorization via Ranger and authentication via Kerberos
Streaming Analysis Ready for Mainstream Adoption
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
40. Page40 © Hortonworks Inc. 2015
New Capabilities in HDP 2.3
Breakthrough User
Experience
Dramatic Improvement in the User Experience
HDP 2.3 eliminates much of the complexity administering
Hadoop and improves developer productivity.
Enhanced Security
and Governance
Enhanced Security and Data Governance
HDP 2.3 delivers new encryption of data at rest, and
extends the data governance initiative with Apache™ Atlas.
Proactive Support
Extending the Value of a Hortonworks Subscription
Hortonworks® SmartSense™ adds proactive cluster monitoring,
enhancing Hortonworks’ award-winning support in key areas.
41. Page41 © Hortonworks Inc. 2015
HDP Security: Comprehensive, Complete, Extensible
Security in HDP is the most comprehensive, complete and extensible for Hadoop
Administration
Central management and consistent security
Only HDP delivers a single administrative console to
set policy across the entire cluster
Authentication
Authenticate users and systems
Authentication for perimeter and cluster; integrates with existing
ActiveDirectory and LDAP solutions
Authorization
Provision access to data
Provides consistent authorization controls across all
Apache components within HDP
Audit
Maintain a record of data access
Maintains a record of data access events across all
components that is consistent and accessible
Data Protection
Protect data at rest and in motion
Encrypts data in motion and data at rest; refer partner
encryption solutions for broader needs
42. Page42 © Hortonworks Inc. 2015
Enhanced Security Capabilities in HDP 2.3
Project New Features
Administration
Central management
and consistent security
Ranger
• Administer Kafka, Solr and multi-tenant YARN queues
• Support for custom plugins via Ranger and Knox stacks
Authentication
Authenticate users and systems
Knox
• Bi-directional SSL support trust between clients and servers
• LDAP data caching reduces server load, improves performance
Authorization
Provision access to data
Ranger
• Authorization for Kafka, Solr and multi-tenant YARN queues
• Hooks for dynamic policy rules (e.g., by geo-location)
Audit
Maintain a record of data access
Atlas
• Scalable metadata service
• Hive integration leverages existing metadata
• UI: Hive table lineage and domain-specific search
Data Protection
Protect data at rest and in motion
HDFS,
Ranger
• HDFS transparent data encryption (for data at rest)
• Key management store (KMS) that’s robust and highly available
44. Page44 © Hortonworks Inc. 2015
Learn More about Ranger
Thursday, 3:10-3:50 – Securing Hadoop with Apache Ranger: Strategies and
Best Practices
with Selvamohan Neethiraj & Velmurugan Periasamy
45. Page45 © Hortonworks Inc. 2015
Extending Data Governance to Hadoop
ETL / DQ MDM
ARCHIVE
Traditional
Data Systems
Data Governance Requirements
Transparent
Governance standards and
protocols must be clearly defined
and available to all
Reproducible
Recreate the relevant data
landscape at a given point in time
Auditable
Trace all relevant events and assets
with appropriate historical lineage
Consistent
Compliance practices must be
consistent
Hadoop Data
Platform
Must snap into existing
data governance
frameworks and openly
exchange metadata
A group of companies dedicated to
meeting these requirements in the openSCM
CRM
ERP
Holistic Data
Governance
Business
Analytics
Visualization
& Dashboards
46. Page46 © Hortonworks Inc. 2015
Apache Atlas Is Now Included in HDP
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag-based
Policies
Data Lifecycle
Management
Real-time Tag-based Access Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Scalable Metadata Service
Agile Centralized Taxonomy – Enterprise/Business unit level
modeling with industry-specific vocabulary
Operational Metadata – Extend visibility into HDFS Path,
Hive DB, table, columns
REST API – Modern, flexible access to Atlas services
Hive Integration
Hive Metadata – Leverage existing metadata with import /
export capability and capture SQL runtime metrics directly
User Interface
Hive Table Lineage and Search DSL – Support for key word,
faceted and free text searches
47. Page47 © Hortonworks Inc. 2015
New Capabilities in HDP 2.3
Breakthrough User
Experience
Dramatic Improvement in the User Experience
HDP 2.3 eliminates much of the complexity administering
Hadoop and improves developer productivity.
Enhanced Security
and Governance
Enhanced Security and Data Governance
HDP 2.3 delivers new encryption of data at rest, and
extends the data governance initiative with Apache™ Atlas.
Proactive Support
Extending the Value of a Hortonworks Subscription
Hortonworks® SmartSense™ adds proactive cluster monitoring,
enhancing Hortonworks’ award-winning support in key areas.
48. Page48 © Hortonworks Inc. 2015
HDP Subscriptions Deliver
Global support coverage, 24x7x365
Hortonworks University self-paced learning
Premier Support: designated support engineer
Influence on the direction of the technology
The Hadoop Industry’s Best Subscription Value
Expansion
Architecture &
Development ProductionImplementation
Hortonworks Support
# tickets
Project 2
Project 3
Project N
.
.
.
From Architecture to Expansion
“Hortonworks loves
and lives open-source
innovation”
49. Page49 © Hortonworks Inc. 2015
Hortonworks® SmartSense™ provides
comprehensive visibility into cluster issues
Hadoop Operator
50. Page50 © Hortonworks Inc. 2015
Hortonworks® SmartSense™ makes
tailored recommendations based on
analysis of operational data
Hadoop Operator
51. Page51 © Hortonworks Inc. 2015
Hortonworks® SmartSense™ solicits
feedback from Hadoop Operators to
optimize its recommendations
Hadoop Operator
52. Page52 © Hortonworks Inc. 2015
Hadoop Operator
Hortonworks®
SmartSense™ enhances
the support subscription
Faster Case Resolution
Easily capture log files and metrics for
insight and resolution
Proactive Configuration
Via intelligent stream of cluster
analytics and data-driven
recommendations
Capacity Planning
Through proactive view into
customer’s cluster utilization
53. Page53 © Hortonworks Inc. 2015
Hortonworks® SmartSense™ Resolves Issues Proactively
Integrated Customer Portal
Knowledge Base
On-Demand
Training
Customer Environment
• Any cloud
• Hybrid environment
• Multi-tenant
“5 out of 5”Enterprise Hadoop Support
Connection to the customer’s environment
via telephone or web support
54. Page54 © Hortonworks Inc. 2015
Hortonworks SmartSense
Hortonworks® SmartSense™ Resolves Issues Proactively
Integrated Customer Portal
Knowledge Base
On-Demand
Training
Customer Environment
• Any cloud
• Hybrid environment
• Multi-tenant
“5 out of 5”Enterprise Hadoop Support
55. Page55 © Hortonworks Inc. 2015
In Summary: New in HDP
Breakthrough User
Experience
Enhanced Security
and Governance
Proactive Support
HDP 2.3 is a Major Step Forward for
Open Enterprise Hadoop®
Notes de l'éditeur TYPE OF ANALYSIS: SQL QUERIES WITH HIVE
+ A MAJOR HOME IMPROVEMENT RETAILER
+ SINGLE VIEW OF ITS CUSTOMERS – “THE GOLDEN RECORD”
+ SINGLE VIEW OF INVENTORY FOR SUPPLY CHAIN OPTIMIZATION
+ AND ALSO A SINGLE VIEW OF ITS AND COMPETITORS PRICES
+ LOW COST OF STORAGE = MORE DATA RETAINED FOR LONGER
+ LONGER RETENTION POWER = MULTIPLE “SINGLE VIEWS”
TYPE OF ANALYSIS: STREAM ANALYSIS
A MAJOR PROVIDER OF DIGITAL SECURITY SOLUTIONS CUT ITS TIME PROCESSING THE THREAT LANDSCAPE FROM
FOUR HOURS TO 2 SECONDS, WHICH DRAMATICALLY REDUCED THEIR CLIENTS’ WINDOW OF VULNERABILITY
STATS
+ PROCESSES 105 MILLION LOG EVENTS PER MINUTE
Dynamic availability A data governance framework in any organization comprises a combination of people, process and technology that are in place to establish decision rights and accountabilities for information. A governance policy defines who can take what actions with what information, and when, under what circumstances, using what methods.
The technology goals for a data governance framework are to provide a platform for a common approach across all systems and data within the organization, Explicitly they need to be:
- Transparent: Governance standards & protocols must be clearly defined and available to all
- Reproducible: Recreate the relevant data landscape at a point in time
- Auditable: All relevant events and assets but be traceable with appropriate historical lineage
- Consistent: Compliance practices must be consistent
Apache Atlas is the only open source project created to solve the governance challenge in the open. The founding members of the project include all the members of the data governance initiative and others from the Hadoop community. The core functionality defined by the project includes the following:
Data Classification – create an understanding of the data within Hadoop and provide a classification of this data to external and internal sources
Centralized Auditing – provide a framework to capture and report on access to and modifications of data within Hadoop
Search & Lineage – allow pre-defined and ad hoc exploration of data and metadata while maintaining a history of how a data source or explicit data was constructed.
Security and Policy Engine – implement engines to protect and rationalize data access and according to compliance policy.
You should be hiring people focused on your unique data and application needs, not support engineers focused on the complicated internals of the data platform.
Many users who started out self-supporting have ultimately come to us to support the platform so they can focus on their application and business needs.
We enable HDP in the market through three types of offerings: 1) software support subscriptions, 2) expert consulting services, and 3) training.
Our primary focus is on our annual support subscriptions for HDP which provide the 24x7 support enterprises expect along with patches, updates, hot fixes, etc. that help keep their mission critical workloads running.
Since we have the most committers working on the dozens of open source projects, we’re uniquely able to:
-- Define and deliver an enterprise-focused roadmap for Enterprise Hadoop
-- Provide customers and partners a direct way to engage with the community to affect that roadmap (you can think of us as the product management function for these projects)
-- And finally, we ensure the patches and updates we make available to our customers are applied to the corresponding open source projects so there are no regressions in future releases of those open source components.
To net out: we enable customer success by listening to their needs and driving innovation into HDP. And our open source model provides the leverage to evolve the technology faster than any single vendor could accomplish alone.