SlideShare une entreprise Scribd logo
1  sur  21
Cloudera Manager – API’s &
Extensibility
Patrick Angeles, Director Field Technical Services
December 2013

1

CONFIDENTIAL - RESTRICTED
Cloudera Manager
End-to-End Administration for CDH

1
Monitor
2
Diagnose
3
Integrate
4
Manage

Easily deploy, configure & optimize clusters

Maintain a central view of all activity

Easily identify and resolve issues

Use Cloudera Manager with existing tools

2

©2013 Cloudera, Inc. All Rights Reserved.
Integrating with your IT Mgmt tools
Datacenter Operations

Various options of integrating Cloudera Manager into your existing
Installation,
Datacenter Operations/Tools Monitoring
Alerting
Deployment
Tools
tools
Tools
e.g. Orion,
• Cloudera Manager API
e.g. Chef,
e.g Nagios,
Tivoli, BMC
Puppet etc.
SNMP etc.
etc.
• Introduced in CM4 (June 2012)
• Installation & deployment
• Monitoring
• SNMP Alerts
• Introduced in CM4.5 (Feb 2013)
• Hadoop Operations
And more…
Cloudera
• Monitoring ‘tsquery’ (Feb 2013)
Manager
• User-defined triggers/alarms (new for C5!)
• Service extensibility (new for C5!)

3

©2013 Cloudera, Inc. All Rights Reserved.
Cloudera Manager (CM) API
•

•

API access was a new feature introduced in Cloudera Manager 4.0, providing programmatic access to
cluster operations (such as configuration and restart) and monitoring information (such as health and
metrics).
The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host and port as
the CM web UI, and does not require an extra process or extra configuration. API users have the same
privileges as they do in the web UI world.
• Docs & Examples
http://cloudera.github.io/cm_api/
https://github.com/cloudera/cm_api
• Java/Python clients
http://blog.cloudera.com/blog/2013/05/how-toautomate-your-hadoop-cluster-from-java/

4

©2013Cloudera, Inc. All Rights Reserved.
Examples of integration with CM API
•

Installation & Deployment
•
•
•

Chef
Puppet
Dell Crowbar
•

•

http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-with-dell-crowbar-and-cloudera-manager/

StackIQ
•

http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-with-Cloudera

•

•

•

WANdisco – non-stop NN setup

Several other customers/partners leveraging the API’s as part of their install & deployment
process

Monitoring & Alerting
•

•

Oracle Enterprise Manager (via Big Data Appliance)
Nagios
•
•

•

https://github.com/cloudera/cm_api/tree/master/nagios
https://github.com/harisekhon/nagiosplugins/blob/master/check_hadoop_cloudera_manager_metrics.pl

SNMP alerts integration with IBM Netcool

Develop & Contribute your plug-in’s using Cloudera
Manager API

5

©2013 Cloudera, Inc. All Rights Reserved.
Cloudera Manager – Monitoring via ‘tsquery’
•

Introduced as part of CM4.5 release (Feb 2013)

•

Great way to add interesting charts (above & beyond what is provided by default) and monitor
metrics that are relevant to your clusters

•

The tsquery language is used to specify statements for retrieving time-series data from the Cloudera
Manager time-series data store

•

Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service?
select bytes_read, bytes_written where roleType=DATANODE and serviceName=hdfs1

•

Retrieved time-series data can be plotted via various options – line, bar, scatter, heat maps, table list
etc.

•

Extending this concept to create user-defined triggers/alarms (new for C5!).

•

More details
• http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/ClouderaManager-Diagnostics-Guide/cm5dg_chart_time_series_data.html

6

©2013 Cloudera, Inc. All Rights Reserved.
Examples of Cloudera Manager ‘tsquery’
Example1: How do I track the
aggregate Cluster Disk IO?
select dt0(read_bytes_disk_sum),
dt0(write_bytes_disk_sum) where
category = CLUSTER and clusterId =
$CLUSTERID
Example2: How do I compare CPU
usage across hosts?
select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_system) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100

Create & Contribute your ‘tsqueries’!
https://github.com/cloudera/cm_charting_scrapbook
7

©2013 Cloudera, Inc. All Rights Reserved.
Cloudera Manager – Service Extensibility
•

Introduced in C5
•

Still in Beta!

•

Some aspects (espcially Parcel mgmt) available in CM4.x

•

Example: Collaboration with Syncsort to deploy DMX-h libraries

•

Single management console for CDH, non-CDH services and ISV
applications

•

Similar look and feel as existing services

•

Easy to write (Java-free!)

•

Flexible

•

Independent release cycle
©2013Cloudera, Inc. All Rights Reserved.
Analogy from Operating Systems (OS) world

ISV’s view of OS

Systems Management
Package
Mgmt

Process/
Resource
Mgmt

Security
Mgmt

Core OS kernel

9

©2013Cloudera, Inc. All Rights Reserved.

Data
Access
Mgmt
Bringing ISV Apps to CDH
ISV’s view of Hadoop

Cloudera Manager
Parcels

Resource
Mgmt

Security
Mgmt

CDK API’s

Core Hadoop/CDH kernel

10

©2013Cloudera, Inc. All Rights Reserved.
Integrating into the Cloudera Product Portfolio
Features

Description

Examples

Package
Mgmt

- Ability to easily package and distribute
binaries/jars via “Parcels”

-Informatica
-Syncsort

Resource
Mgmt

- Ability to deploy applications as stand-alone
processes or via YARN* on the Hadoop grid
- Resource isolation of cluster resources

-SAS
-0xData
-Accumulo

Security
Mgmt

- Support for Kerberos Mgmt
- Role bases access control for Tables/Views in
Hive/Impala via Sentry

Data Access
Mgmt

ISV’s

- HDFS and HBase API abstraction and
simplification

Cloudera Manager

Systems Mgmt
Manage

- Deploy and upgrade (rolling) services and pkgs
- Manage configurations

Monitor

- Proactive health checks
- Track resource utilization
- Custom metrics charts

Diagnose

- Distributed log collection and searching
- Tag and track key events

Integrate

- Access operational tools via API
- Surface overall cluster metrics to ISV dashboard

Non-CDH Apps…
Accumulo,
Spark, Giraph
etc.

* Support for YARN planned as part of CM5.x in FY14

11

©2013Cloudera, Inc. All Rights Reserved.
So.. How does it work?
• A JSON file that describes of your service

• Set of control scripts
• Packaged as a JAR file

• As promised, Java-free

©2013Cloudera, Inc. All Rights Reserved.
Example: Cloudera Manager Extensions - Spark

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.
The Code
name : “spark”,

#!/bin/bash

roles : [{

CMD=$1

name : "master",

MASTER_PORT=<read in from ./params.properties>

startRunner : {
program : "scripts/control.sh",

case $CMD in

args : [ "start_master",

(start_master)

"./params.properties"]

exec $SPARK_HOME/scripts/spark-start.sh master"

},

;;

parameters : [{

(*)

name : "master_port",

echo "$timestamp Don't understand [$CMD]"

type : "port",

;;

default : 7077

esac

}],
configWriter : {
generators : [{
filename : "params.properties"
}]
}]

©2013Cloudera, Inc. All Rights Reserved.
Next Steps
• Documentation & SDK as part of C5 Beta2
or later (definitely before GA!)
• Working with select ISV’s (SAS, Syncsort,
0xData etc.) as part of Beta to further finetune this feature
Develop & Contribute your Cloudera Manager service
extensibility plug-in’s !

©2013Cloudera, Inc. All Rights Reserved.
Service Extensibility

Vertical Extension

Vision of CM Extensibility

Horizontal Extension

0xData

SAS

Syncsort

Informatica

Revolution

API

Ops Apps
Capacity
Mgr

Security
ISV’s

SLA Mgr

Cost
Optimizer

CDH

CM
SNMP API

Oracle
OEM

20

Nagios

Dell

Chef/
Puppet

©2012Cloudera, Inc. All Rights Reserved.

Accumulo

Spark

Giraph
Q&A

©2013Cloudera, Inc. All Rights Reserved.

Contenu connexe

Tendances

Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechCloudera Japan
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...Cloudera, Inc.
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Datadplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale DataCloudera, Inc.
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesNacho García Fernández
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data PlatformRakuten Group, Inc.
 
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Apache solr performance and scalability effort update palo alto 2017%2 f7
Apache solr performance and scalability effort update palo alto 2017%2 f7Apache solr performance and scalability effort update palo alto 2017%2 f7
Apache solr performance and scalability effort update palo alto 2017%2 f7Cloudera, Inc.
 
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clustersSpark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clustersshareddatamsft
 
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とはCloudera Japan
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 

Tendances (20)

Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Datadplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Data
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
 
Apache solr performance and scalability effort update palo alto 2017%2 f7
Apache solr performance and scalability effort update palo alto 2017%2 f7Apache solr performance and scalability effort update palo alto 2017%2 f7
Apache solr performance and scalability effort update palo alto 2017%2 f7
 
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clustersSpark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clusters
 
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 

Similaire à Pa cloudera manager-api's_extensibility_v2

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIClouderaUserGroups
 
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityCloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudGoDataDriven
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera, Inc.
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Cloumon Product Introduction
Cloumon Product IntroductionCloumon Product Introduction
Cloumon Product IntroductionGruter
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadDataWorks Summit
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 

Similaire à Pa cloudera manager-api's_extensibility_v2 (20)

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via API
 
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityCloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Cloumon Product Introduction
Cloumon Product IntroductionCloumon Product Introduction
Cloumon Product Introduction
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 

Dernier

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Pa cloudera manager-api's_extensibility_v2

  • 1. Cloudera Manager – API’s & Extensibility Patrick Angeles, Director Field Technical Services December 2013 1 CONFIDENTIAL - RESTRICTED
  • 2. Cloudera Manager End-to-End Administration for CDH 1 Monitor 2 Diagnose 3 Integrate 4 Manage Easily deploy, configure & optimize clusters Maintain a central view of all activity Easily identify and resolve issues Use Cloudera Manager with existing tools 2 ©2013 Cloudera, Inc. All Rights Reserved.
  • 3. Integrating with your IT Mgmt tools Datacenter Operations Various options of integrating Cloudera Manager into your existing Installation, Datacenter Operations/Tools Monitoring Alerting Deployment Tools tools Tools e.g. Orion, • Cloudera Manager API e.g. Chef, e.g Nagios, Tivoli, BMC Puppet etc. SNMP etc. etc. • Introduced in CM4 (June 2012) • Installation & deployment • Monitoring • SNMP Alerts • Introduced in CM4.5 (Feb 2013) • Hadoop Operations And more… Cloudera • Monitoring ‘tsquery’ (Feb 2013) Manager • User-defined triggers/alarms (new for C5!) • Service extensibility (new for C5!) 3 ©2013 Cloudera, Inc. All Rights Reserved.
  • 4. Cloudera Manager (CM) API • • API access was a new feature introduced in Cloudera Manager 4.0, providing programmatic access to cluster operations (such as configuration and restart) and monitoring information (such as health and metrics). The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host and port as the CM web UI, and does not require an extra process or extra configuration. API users have the same privileges as they do in the web UI world. • Docs & Examples http://cloudera.github.io/cm_api/ https://github.com/cloudera/cm_api • Java/Python clients http://blog.cloudera.com/blog/2013/05/how-toautomate-your-hadoop-cluster-from-java/ 4 ©2013Cloudera, Inc. All Rights Reserved.
  • 5. Examples of integration with CM API • Installation & Deployment • • • Chef Puppet Dell Crowbar • • http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-with-dell-crowbar-and-cloudera-manager/ StackIQ • http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-with-Cloudera • • • WANdisco – non-stop NN setup Several other customers/partners leveraging the API’s as part of their install & deployment process Monitoring & Alerting • • Oracle Enterprise Manager (via Big Data Appliance) Nagios • • • https://github.com/cloudera/cm_api/tree/master/nagios https://github.com/harisekhon/nagiosplugins/blob/master/check_hadoop_cloudera_manager_metrics.pl SNMP alerts integration with IBM Netcool Develop & Contribute your plug-in’s using Cloudera Manager API 5 ©2013 Cloudera, Inc. All Rights Reserved.
  • 6. Cloudera Manager – Monitoring via ‘tsquery’ • Introduced as part of CM4.5 release (Feb 2013) • Great way to add interesting charts (above & beyond what is provided by default) and monitor metrics that are relevant to your clusters • The tsquery language is used to specify statements for retrieving time-series data from the Cloudera Manager time-series data store • Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service? select bytes_read, bytes_written where roleType=DATANODE and serviceName=hdfs1 • Retrieved time-series data can be plotted via various options – line, bar, scatter, heat maps, table list etc. • Extending this concept to create user-defined triggers/alarms (new for C5!). • More details • http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/ClouderaManager-Diagnostics-Guide/cm5dg_chart_time_series_data.html 6 ©2013 Cloudera, Inc. All Rights Reserved.
  • 7. Examples of Cloudera Manager ‘tsquery’ Example1: How do I track the aggregate Cluster Disk IO? select dt0(read_bytes_disk_sum), dt0(write_bytes_disk_sum) where category = CLUSTER and clusterId = $CLUSTERID Example2: How do I compare CPU usage across hosts? select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100, dt0(total_cpu_system) / getHostFact(numCores, 1) * 100, dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100, dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100, dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100, dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100 Create & Contribute your ‘tsqueries’! https://github.com/cloudera/cm_charting_scrapbook 7 ©2013 Cloudera, Inc. All Rights Reserved.
  • 8. Cloudera Manager – Service Extensibility • Introduced in C5 • Still in Beta! • Some aspects (espcially Parcel mgmt) available in CM4.x • Example: Collaboration with Syncsort to deploy DMX-h libraries • Single management console for CDH, non-CDH services and ISV applications • Similar look and feel as existing services • Easy to write (Java-free!) • Flexible • Independent release cycle ©2013Cloudera, Inc. All Rights Reserved.
  • 9. Analogy from Operating Systems (OS) world ISV’s view of OS Systems Management Package Mgmt Process/ Resource Mgmt Security Mgmt Core OS kernel 9 ©2013Cloudera, Inc. All Rights Reserved. Data Access Mgmt
  • 10. Bringing ISV Apps to CDH ISV’s view of Hadoop Cloudera Manager Parcels Resource Mgmt Security Mgmt CDK API’s Core Hadoop/CDH kernel 10 ©2013Cloudera, Inc. All Rights Reserved.
  • 11. Integrating into the Cloudera Product Portfolio Features Description Examples Package Mgmt - Ability to easily package and distribute binaries/jars via “Parcels” -Informatica -Syncsort Resource Mgmt - Ability to deploy applications as stand-alone processes or via YARN* on the Hadoop grid - Resource isolation of cluster resources -SAS -0xData -Accumulo Security Mgmt - Support for Kerberos Mgmt - Role bases access control for Tables/Views in Hive/Impala via Sentry Data Access Mgmt ISV’s - HDFS and HBase API abstraction and simplification Cloudera Manager Systems Mgmt Manage - Deploy and upgrade (rolling) services and pkgs - Manage configurations Monitor - Proactive health checks - Track resource utilization - Custom metrics charts Diagnose - Distributed log collection and searching - Tag and track key events Integrate - Access operational tools via API - Surface overall cluster metrics to ISV dashboard Non-CDH Apps… Accumulo, Spark, Giraph etc. * Support for YARN planned as part of CM5.x in FY14 11 ©2013Cloudera, Inc. All Rights Reserved.
  • 12. So.. How does it work? • A JSON file that describes of your service • Set of control scripts • Packaged as a JAR file • As promised, Java-free ©2013Cloudera, Inc. All Rights Reserved.
  • 13. Example: Cloudera Manager Extensions - Spark ©2013Cloudera, Inc. All Rights Reserved.
  • 14. Cloudera Manager Extensions ©2013Cloudera, Inc. All Rights Reserved.
  • 15. Cloudera Manager Extensions: Spark ©2013Cloudera, Inc. All Rights Reserved.
  • 16. Cloudera Manager Extensions: Spark ©2013Cloudera, Inc. All Rights Reserved.
  • 17. Cloudera Manager Extensions: Spark ©2013Cloudera, Inc. All Rights Reserved.
  • 18. The Code name : “spark”, #!/bin/bash roles : [{ CMD=$1 name : "master", MASTER_PORT=<read in from ./params.properties> startRunner : { program : "scripts/control.sh", case $CMD in args : [ "start_master", (start_master) "./params.properties"] exec $SPARK_HOME/scripts/spark-start.sh master" }, ;; parameters : [{ (*) name : "master_port", echo "$timestamp Don't understand [$CMD]" type : "port", ;; default : 7077 esac }], configWriter : { generators : [{ filename : "params.properties" }] }] ©2013Cloudera, Inc. All Rights Reserved.
  • 19. Next Steps • Documentation & SDK as part of C5 Beta2 or later (definitely before GA!) • Working with select ISV’s (SAS, Syncsort, 0xData etc.) as part of Beta to further finetune this feature Develop & Contribute your Cloudera Manager service extensibility plug-in’s ! ©2013Cloudera, Inc. All Rights Reserved.
  • 20. Service Extensibility Vertical Extension Vision of CM Extensibility Horizontal Extension 0xData SAS Syncsort Informatica Revolution API Ops Apps Capacity Mgr Security ISV’s SLA Mgr Cost Optimizer CDH CM SNMP API Oracle OEM 20 Nagios Dell Chef/ Puppet ©2012Cloudera, Inc. All Rights Reserved. Accumulo Spark Giraph
  • 21. Q&A ©2013Cloudera, Inc. All Rights Reserved.

Notes de l'éditeur

  1. Software: Cloudera Enterprise – The Platform for Big DataA complete data management solution that includes and expands upon Apache HadoopA collection of open source projects form the foundation of the platformCloudera has wrapped the open source core with additional software for system and data management as well as technical support5 Attributes of Cloudera Enterprise:ScalableStorage and compute in a single system – brings computation to data (rather than the other way around)Scale capacity and performance linearly – just add nodesProven at massive scale – tens of PB of data, millions of usersFlexibleStore any type of dataStructured, unstructured, semi-structuredIn it’s native format – no conversion requiredNo loss of data fidelity due to ETLFluid structuringNo single model or schema that the data must conform toDetermine how you want to look at data at the time you ask the question – if the attribute exists in the raw data, you can query against itAlter structure to optimize query performance as desired (not required) – multiple open source file formats like Avro, ParquetMultiple forms of computationBring different tools to bear on the data, depending on your skillset and what you want to doBatch processing – MapReduce, Hive, Pig, JavaInteractive SQL – Impala, BI toolsInteractive Search – for non-technical users, or helping to identify datasets for further analysisMachine learning – apply algorithms to large datasets using libraries like Apache MahoutMath – tools like SAS and R for data scientists and statisticiansMore to come…Cost-EffectiveScale out on inexpensive, industry standard hardware (vs. highly tuned, specialized hardware)Fault tolerance built-inLeverage cost structures with existing vendorsReduced data movement – can perform more operations in a single place due to flexible toolingFewer redundant copies of dataLess time spent migrating/managingOpen source software is easy acquire and prove the value/ROIOpenRapid innovationLarge development communitiesThe most talented engineers from across the worldEasy to acquire and prove valueFree to download and deployDemonstrate the value of the technology before you make a large-scale investmentNo vendor lock-in – choose your vendor based solely on meritCloudera’s open source strategyIf it stores or processes data, it’s open sourceBig commitment to open sourceLeading contributor to the Apache Hadoop ecosystem – defining the future of the platform together with the communityIntegratedWorks with all your existing investmentsDatabases and data warehousesAnalytics and BI solutionsETL toolsPlatforms and operating systemsHardware and networking equipmentOver 700 partners including all of the leaders in the market segments aboveComplements those investments by allowing you to align data and processes to the right solution
  2. Software: Cloudera Enterprise – The Platform for Big DataA complete data management solution that includes and expands upon Apache HadoopA collection of open source projects form the foundation of the platformCloudera has wrapped the open source core with additional software for system and data management as well as technical support5 Attributes of Cloudera Enterprise:ScalableStorage and compute in a single system – brings computation to data (rather than the other way around)Scale capacity and performance linearly – just add nodesProven at massive scale – tens of PB of data, millions of usersFlexibleStore any type of dataStructured, unstructured, semi-structuredIn it’s native format – no conversion requiredNo loss of data fidelity due to ETLFluid structuringNo single model or schema that the data must conform toDetermine how you want to look at data at the time you ask the question – if the attribute exists in the raw data, you can query against itAlter structure to optimize query performance as desired (not required) – multiple open source file formats like Avro, ParquetMultiple forms of computationBring different tools to bear on the data, depending on your skillset and what you want to doBatch processing – MapReduce, Hive, Pig, JavaInteractive SQL – Impala, BI toolsInteractive Search – for non-technical users, or helping to identify datasets for further analysisMachine learning – apply algorithms to large datasets using libraries like Apache MahoutMath – tools like SAS and R for data scientists and statisticiansMore to come…Cost-EffectiveScale out on inexpensive, industry standard hardware (vs. highly tuned, specialized hardware)Fault tolerance built-inLeverage cost structures with existing vendorsReduced data movement – can perform more operations in a single place due to flexible toolingFewer redundant copies of dataLess time spent migrating/managingOpen source software is easy acquire and prove the value/ROIOpenRapid innovationLarge development communitiesThe most talented engineers from across the worldEasy to acquire and prove valueFree to download and deployDemonstrate the value of the technology before you make a large-scale investmentNo vendor lock-in – choose your vendor based solely on meritCloudera’s open source strategyIf it stores or processes data, it’s open sourceBig commitment to open sourceLeading contributor to the Apache Hadoop ecosystem – defining the future of the platform together with the communityIntegratedWorks with all your existing investmentsDatabases and data warehousesAnalytics and BI solutionsETL toolsPlatforms and operating systemsHardware and networking equipmentOver 700 partners including all of the leaders in the market segments aboveComplements those investments by allowing you to align data and processes to the right solution
  3. Software: Cloudera Enterprise – The Platform for Big DataA complete data management solution that includes and expands upon Apache HadoopA collection of open source projects form the foundation of the platformCloudera has wrapped the open source core with additional software for system and data management as well as technical support5 Attributes of Cloudera Enterprise:ScalableStorage and compute in a single system – brings computation to data (rather than the other way around)Scale capacity and performance linearly – just add nodesProven at massive scale – tens of PB of data, millions of usersFlexibleStore any type of dataStructured, unstructured, semi-structuredIn it’s native format – no conversion requiredNo loss of data fidelity due to ETLFluid structuringNo single model or schema that the data must conform toDetermine how you want to look at data at the time you ask the question – if the attribute exists in the raw data, you can query against itAlter structure to optimize query performance as desired (not required) – multiple open source file formats like Avro, ParquetMultiple forms of computationBring different tools to bear on the data, depending on your skillset and what you want to doBatch processing – MapReduce, Hive, Pig, JavaInteractive SQL – Impala, BI toolsInteractive Search – for non-technical users, or helping to identify datasets for further analysisMachine learning – apply algorithms to large datasets using libraries like Apache MahoutMath – tools like SAS and R for data scientists and statisticiansMore to come…Cost-EffectiveScale out on inexpensive, industry standard hardware (vs. highly tuned, specialized hardware)Fault tolerance built-inLeverage cost structures with existing vendorsReduced data movement – can perform more operations in a single place due to flexible toolingFewer redundant copies of dataLess time spent migrating/managingOpen source software is easy acquire and prove the value/ROIOpenRapid innovationLarge development communitiesThe most talented engineers from across the worldEasy to acquire and prove valueFree to download and deployDemonstrate the value of the technology before you make a large-scale investmentNo vendor lock-in – choose your vendor based solely on meritCloudera’s open source strategyIf it stores or processes data, it’s open sourceBig commitment to open sourceLeading contributor to the Apache Hadoop ecosystem – defining the future of the platform together with the communityIntegratedWorks with all your existing investmentsDatabases and data warehousesAnalytics and BI solutionsETL toolsPlatforms and operating systemsHardware and networking equipmentOver 700 partners including all of the leaders in the market segments aboveComplements those investments by allowing you to align data and processes to the right solution