Contenu connexe Similaire à Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR) (20) Plus de BigDataEverywhere (6) Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)1. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies
Getting Real With Hadoop
Jim Scott, Director, Enterprise Strategy & Architecture
@kingmesal #BigDataEverywhere #Chicago - October 1st, 2014
7. © 2014 MapR Technologies 6
Can’t We All Just Get Along?
10. © 2014 MapR Technologies 9
High Availability (HA) Everywhere
No NameNode architecture
MapReduce/YARN HA
NFS HA
Instant recovery
Rolling upgrades
HA is built in
• Distributed metadata can self-heal
• No practical limit on # of files
• Jobs are not impacted by failures
• Meet your data processing SLAs
• High throughput and resilience for NFS-based data
ingestion, import/export and multi-client access
• Files and tables are accessible within seconds of a node
failure or cluster restart
• Upgrade the software with no downtime
• No special configuration to enable HA
• All MapR customers operate with HA
15. © 2014 MapR Technologies
Data Everywhere!
Social Media
Messages
Audio
Sensors
Mobile Data
Email
Clickstream
19. © 2014 MapR Technologies 18
Volumes
100K volumes are OK,
create as many as needed
Volumes dramatically simplify
management:
• Replication factor
• Scheduled mirroring
• Scheduled snapshots
• Data placement control
• User access and tracking
• Administrative permissions
/projects
/tahoe
/yosemite
/user
/msmith
/bjohnson
20. © 2014 MapR Technologies 19
MapR M7: The Best In-Hadoop Database
MapR-DB
NoSQL Columnar Store
Apache HBase API
Integrated with Hadoop
HBase
JVM
HDFS
JVM
ext3/ext4
Disks
Other Distros
Tables/Files
Disks
MapR Enterprise Database Edition (M7)
The most scalable, enterprise-grade,
NoSQL database that supports online applications and analytics
21. Easy Administration
© 2014 MapR Technologies 20
Tradeoffs with Other NoSQL Solutions
Reliability
24x7 applications with strong
data consistency
Performance
Continuous low latency with
horizontal scaling
Easy day-to-day management
with minimal learning curve
22. © 2014 MapR Technologies 21
Consistent, Low Read Latency
--- M7 Read Latency --- Others Read Latency
24. © 2014 MapR Technologies 23
Hadoop Security
Authorization to
ensure the right
access to files
and databases
Authentication
for users and
user-created job
requests
Encryption to
ensure user
credentials and
data are always
secure
Integration with
existing security
infrastructure
25. © 2014 MapR Technologies 24
Fine-Grained Access Control
Full POSIX permissions on files and directories
ACLs on tables, column families and columns
ACLs on MapReduce jobs and queues
Administration ACLs on cluster and volumes
ACLs for Apache Hive, Apache Drill and Impala
26. Seamless Integration with Direct Access NFS
© 2014 MapR Technologies 25
• MapR is POSIX compliant
– Random reads/writes
– Simultaneous reading and writing to a file
– Compression is automatic and transparent
27. Seamless Integration with Direct Access NFS
© 2014 MapR Technologies 26
• MapR is POSIX compliant
– Random reads/writes
– Simultaneous reading and writing to a file
– Compression is automatic and transparent
• Industry-standard NFS interface (in
addition to HDFS API)
– Stream data into the cluster
– Leverage thousands of tools and
applications
– Easier to use non-Java programming
languages
– No need for most proprietary Hadoop
connectors
28. © 2014 MapR Technologies 27
Disaster Recovery: Mirroring
• Flexible
– Choose the volumes/directories to mirror
– You don’t need to mirror the entire cluster
– Active/active
• Fast
– No performance impact
– Block-level (8KB) deltas
– Automatic compression
Production Research
Production
WAN
Datacenter 1 Datacenter 2
WAN EC2
29. © 2014 MapR Technologies 28
Disaster Recovery: Mirroring
• Flexible
– Choose the volumes/directories to mirror
– You don’t need to mirror the entire cluster
– Active/active
• Fast
– No performance impact
– Block-level (8KB) deltas
– Automatic compression
• Safe
– Point-in-time consistency
– End-to-end checksums
• Easy
– Graceful handling of network issues
– No third-party software
– Takes less than two minutes to configure!
Production Research
Production
WAN
Datacenter 1 Datacenter 2
WAN EC2
30. MapR Advantages
MapR-DB Others
99.999% uptime ✓ X
Instant recovery from failures ✓ X
Continuous low latency (no compactions) ✓ X
© 2014 MapR Technologies 29
Zero administration
(no processes to manage, self-tuning)
✓ X
Online data protection (snapshots, mirroring) ✓ X
Scalability (number of tables supported) Trillion Hundreds
31. Packages Supported by various distributions
Red – lacking
Blue - leading
© 2014 MapR Technologies 30
MapR 4.0.1
(Sep 2014)
Cloudera 5.1.2
(Aug 2014)
Hortonworks 2.1.5
(Aug 2014)
Apache Versions
(Sep 12th, 2014)
Core Hadoop Hadoop Core, YARN 2.4.1 2.3.0 2.4.0 2.5.1
Batch Map Reduce MRv1 and MRv2 MRv1 or MRv2 MRv2 MRv2
Hive 0.12, 0.13 0.12 0.13 0.13
Tez 0.4 (Dev Preview Only) X 0.4 0.5
Pig 0.12 0.12 0.12 0.12
Cascading 2.1.6 X X 2.5
Spark 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
Interactive SQL Impala 1.2.3 1.4 X 1.4
Drill 0.5 X X 0.5
SparkSQL 1.0.2 X 1.0.1 (Tech Preview only) 1.1
NoSQL and Search HBase/NoSQL 0.94.2, 0.98.4, MapR-DB 0.98 0.98, Accumulo 1.5.1 HBase 0.98
Phoenix X X 4.0.0 4.1.0
AsyncHBase 1.5 X X 1.5
Search LW (Solr) 2.6.1 , 2.7 Cloudera Search 1.5 X NA
Machine Learning and
Graph
Mahout 0.9 0.9 0.9 0.9
MLLib/MLBase 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
GraphX 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
Streaming/Messaging Spark Streaming 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
Storm 0.9, 0.9.2 (Certified) X 0.9.1 0.9.2
Kafka X X 0.8.1.1 (Tech Preview) 0.8.1.1
Data Integration Sqoop, Sqoop2 1.4.4, 1.99.3 1.4.4, 1.99.3 1.4.4 1.4.5
Flume 1.5.0 1.5.0 1.4.0 1.5.0
Knox X X 0.4 0.4
Coordination Oozie 4.0.1 4.0.0 4.0.0 4.0.1
Zookeeper 3.4.5 3.4.5 3.4.5 3.4.5
GUI, Configuration,
Monitoring
Management MCS CM Ambari Ambari
Hue 3.5 3.6 2.5.1 3.6
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html?scroll=topic_3_unique_8
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.5-product.html
32. © 2014 MapR Technologies
Pick the
Right Tool
for the Job
33. Provisioning
&
coordination
Savannah*
Workflow
& Data
Governance
MapR Distribution for Apache Hadoop
Data
Integration
& Access
Hue
HttpFS
Flume Knox* Falcon* Whirr
© 2014 MapR Technologies 32
APACHE HADOOP AND OSS ECOSYSTEM
Security
SQL
Drill
SparkSQL
Impala
YARN
Batch
Spark
Cascading
Pig
Streaming
Storm*
Spark
Streaming
NoSQL &
Search
Solr
HBase
Juju
ML, Graph
GraphX
MLLib
Mahout
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Tez*
Accumulo*
Hive
Sqoop Sentry* Oozie ZooKeeper
* Certification/support planned for 2014
Management
MapR Data Platform
34. Provisioning
&
coordination
Savannah*
Workflow
& Data
Governance
Data
Integration
& Access
Hue
HttpFS
Flume Knox* Falcon* Whirr
NFS HDFS API HBase API JSON API
© 2014 MapR Technologies 33
APACHE HADOOP AND OSS ECOSYSTEM
Security
SQL
Drill
SparkSQL
Impala
YARN
Batch
Spark
Cascading
Pig
Streaming
Storm*
Spark
Streaming
NoSQL &
Search
Solr
HBase
Juju
ML, Graph
GraphX
MLLib
Mahout
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Tez*
Accumulo*
Hive
Sqoop Sentry* Oozie ZooKeeper
MapR Control System
(Management and Monitoring)
* Certification/support planned for 2014
CLI REST API GUI
MapR Distribution for Apache Hadoop
36. © 2014 MapR Technologies 35
1/7th the Hardware Footprint
37. Forrester Wave™: Big Data Hadoop Solutions, Q1‘14
February 2014 “The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014”
© 2014 MapR Technologies 36
39. • Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
© 2014 MapR Technologies 38
APACHE DRILL
40+ contributors
150+ years of experience building
databases and distributed systems
40. Drill Supports Schema Discovery On-The-Fly
Schema Declared In Advance Schema Discovered On-The-Fly
Schema Schema2 The-Fly
© 2014 MapR Technologies 39
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
SCHEMA ON
WRITE
SCHEMA
BEFORE READ
SCHEMA ON THE
FLY
41. © 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 40
Operational Analytics
43. © 2014 MapR Technologies 42
Mobile
application server
Real-time ad
targeting
Data exploration
(SQL)
Real-time and Operational
Actionable
Analytics
Hadoop (MapR M7)
•User profiles and state
•User interactions
•Real-time location data
•Web and mobile session state
•Comments/rankings
Web
application server
Customer 360
dashboard
Churn analysis
(predictive analytics)
Product/service
optimization and
personalization
44. © 2014 MapR Technologies 43
General Application Monitoring
45. © 2014 MapR Technologies 44
Hard Drive Failure Rates
47. © 2014 MapR Technologies 46
20M
SONGS
Media Content Recommendation Engine
49. © 2014 MapR Technologies 48
104M
CARD MEMBERS
Offer Serving, Credit Risk & Fraud
More than $600B+
50. 100M
Data Points
per second
Fastest Data Ingest Rates
© 2014 PEOPLE MapR Technologies 49
51. © 2014 MapR Technologies 50
Speed and Intelligence…
52. Forrester Wave™: NoSQL Key-Value Databases, Q3‘14
September 2014 “The Forrester Wave™: NoSQL Key-Value Databases, Q3 2014”
© 2014 MapR Technologies 51
53. © 2014 MapR Technologies 52
MapR Editions
Control System
NFS Access
Performance
Unlimited Nodes
Free
All the Features of M5
Simplified Administration
for HBase
Increased Performance
Consistent Low Latency
Unified Snapshots,
Mirroring
Control System
NFS Access
Performance
High Availability
Snapshots & Mirroring
24 X 7 Support
Annual Subscription
Fastest On-Ramp:
MapR Sandbox for Hadoop
54. © 2014 MapR Technologies
Engage with us!
@mapr maprtech
jscott@mapr.com
MapR
maprtech
mapr-technologies