As Big Data becomes an integral element in your data management infrastructure, you need mature controls for governance, provisioning, automation, disaster recovery, and high-availability services. Find out how you can integrate and simplify the management of Big Data with Hadoop and address the needs of your sensitive information and compliance policies.
5. Bettering the Business
Coordination & Automation
Advanced Analysis & Monitoring
Provisioning & Multi-Cluster
Management
HIVE HIVE
Data Protection & Business HDFS
Continuity NODES NODES
5
6. Bettering the Business
IAM / LDAP SYSTEM
Verify Permissions
CLOUDERA NAVIGATOR 1.0
Audit Configuration ACCESS AUDIT LOG
HDFS
SERVICE SERVICE
VIEW PERMISSIONS AUDIT LOG CONFIG
Audit Dashboard AUDIT LOG
COLLECTION HBASE
Information Export 3rd PARTY SIEM / GRC SYSTEM
HIVE
6
7. Focusing on Business
ENTERPRISE READINESS
Business
GET FASTER RESULTS
Business
USE FAMILIAR TOOLS
IT
Business REDUCED RISK
ACCESS MORE DATA
IT PRICE-PERFORMANCE
ASK NEW QUESTIONS REDUCED RISK
IT SEAMLESS INTEGRATION HIGH AVAILABILITY
PRICE-PERFORMANCE REAL-TIME QUERY
IT REDUCED RISK SECURITY
PRICE-PERFORMANCE TECHNOLOGY CERT.
REDUCED RISK SYSTEM MANAGEMENT
IMPROVED STABILITY
HADOOP DISTRIBUTION
2009 2010 2011 2012
7
8. Ask Bigger Questions:
How can we increase ad clicks?
CBS Interactive re-optimizes web page layouts
for each user segment ever hour, improving
website stickiness and driving ad clicks.
8
9. Ask Bigger Questions:
How do we feed the world?
A Fortune 500 company specializing in agriculture and
genomics can automate data-driven R&D decisions to
reduce time to market from years to months.
9
10. Ask Bigger Questions:
How can we help healthcare
providers get paid faster?
RelayHealth, a McKesson subsidiary,
streamlines message processing
Between payers and providers.
10
11. Key Takeaways
Streamlined operations
Accelerating innovation
Business focus
11
12. Starting Point – Avoiding Planned Downtime
Establish a
development cluster
Start work,
monitor performance
Push changes
and restart
12
How you can more easily build, operate, monitor, and maintain your Hadoop clustersOutgoingwith RapLeaf (video)https://cloudera.box.com/files/0/f/532875998/1/f_5336356340
Hadoop is becoming a central, strategic technology - business critical, business sensitive. *Control is keyBut from an operational POV it's still somewhat separate and demanding to operate * Distributed system - nodes, roles, services all acting in concert * Imposes management complexity - monitoring, diagnostics, provisioning, upgrades, etc.Customers are concerned that centralized command and controls are elusive and are asking questions * Make Hadoop operations easier, more centralized, and more comprehensive? * Reconcile the expanding needs of your business w/ the needs to uphold critical policies and oversight? * Promote Big Data innovation w/o sacrificing the necessary data protection and assurances?Cloudera hears these questions & we have a purposeful direction * Make tools and offer support to simplify integration & management * Extend business continuity & governance to Hadoop solutions
There are opposing forces at play: innovation vs. control * Organizations are struggling to find the balance between maximizing ROI from the platform and maintaining oversight and controlAs we build out the toolset and operational capabilities around Hadoop, our goal is to remove those tradeoffs as much as possible * Empower IT to foster use and production without infringement by controls, but also without worry and compromise.
There are 3 focus areas that help remove those tradeoffs and enable IT to*Foster use and innovation with the platform * While still maintaining the appropriate level of controlAppropriate rights* See, use & contribute * Improve productivity by finding data easier & more quicklyComprehensive protection* Guarding data at rest or in transit * Participating in backup routines & continuity plans * Complying with regulationsFocused productivity* Multi-disciplined athletes * Focus on the business vs. the busy-workAgain, the question is*How do you get the most into/out of the system * Yet maintain the stability & reliability of system & data * Any system needs to answer this question to be fit for enterprise use & wide adoption
So what does it mean to make Hadoop a better "enterprise citizen"?* Many advances in Hadoop, and in Cloudera's platform more specifically * Empower you to focus on your objectives, not just the means to those ends * Start with system management, then data managementCluster coordination and automation*Planned vs. unplanned downtime (3:1) - time is money * CM rolling updates & restarts - minimize planned downtime, take advantage of new features fasterWe've also made it easier to analyze and monitor the cluster *CM: centralized management and monitoring * Custom charting & dashboards * Hadoop-centric events and alerts w/ SNMP support * Immediate operational visibility = more control, less time spentThese benefits extent to provisioning and multi-cluster management* Grow at the rate & form needed to satisfy business demands * Node templating - quickly add servers for storage & compute power, heterogeneous clusters * Multi-cluster management - purpose-built clusters without losing central managementHadoop also offers assurances for data protection & business continuity* System is fully fault tolerant at single component and cluster level - maintain SLAs in wide range of compromises and disruptions * Block replication - redundancy for disk, node & rack failureChecksums for bit rot & other compromises w/ automatic self-healing * Alerts for failed disks, missing blocks, corrupt blocks, under-replicated blocks, etc. * Pluggable encryption for data at rest (Gazzang, Vormetric), industry standard for data in transit (SSL) * HA - recent updates provide native failover for file system management eliminating the often-derided SPOF*Confidence that data stored stays intact / protected from prying eyes / available - critical functions for proper governance & SLAs*How can you protect data when the cluster in its entirety is at risk?* All services that store data have mechanisms for backup & DR - but no centralized solution * BDR - centrally manage DR workflows for files and metadata - define, monitor, alert * Extend existing SLAs and RTOs to Hadoop
Another area where there have been rapid advances is data management*New data management application called Cloudera Navigator * First release focuses on security & governanceThe application simplifies the process of verifying permissions * Hadoop has integrated with industry standard directory services & authentication for some time *Various services have their own mechanisms for applying permissions & logging access * Navigator allows administrators to verify access permissions by user & type across these services through a single, centralized viewIt also enables a full suite of data audit capabilities for HDFS, Hive and HBase * Configuration of audit tracking - set up which services get audited as well as filters & thresholds * Audit dashboard - centrally view all access to data, filter by service, user & type * Information export for your existing governance, risk management & compliance systemsThese features extend your existing control systems to your data housed within Hadoop * Granting you visibility into the activities & permissions of the actors in your Hadoop systemIn coming releases * Additional mechanisms for exploration, data lineage & lifecycle management
To sum up: Our goal (past, present & future) is to allow resources to focus on business, not the scaffolding * Tools designed specifically for Hadoop * Modeled very similarly to "standard" enterprise management applications (multi-disciplined athletes)Innovation of our platform continues to accelerate * CM: More advanced capabilities around monitoring & resource management * Navigator: exploration, lineage & lifecycle managementProgress means you can focus on putting Hadoop to work for you rather than working on Hadoop*Now: examples of how improved management & control is helping organizations maximize the value of their Hadoop deployments
The first example is CBS InteractiveGoal w/Hadoop is to increase ad clicks by elongating each user session on website * Customize pages for each user segment based on behavior * Processing >500M global events daily: clicks, page views, downloads, streaming video events, ad eventsAt first they built out on CDH without a subscription to Cloudera Enterprise - lead to operational challenges * Lacked a holistic view of their entire Hadoop cluster * Trouble controlling configuration changes * No audit trail tracking historical changes * Difficult to use Ganglia and maintain Hadoop Web UI pages * No visibility into activity failures - approach was reactive when users complained about failed or long running jobsNow, with Cloudera Enterprise they have a web analytics platform with centralized system mgmt*Don't need a dedicated Hadoop administrator to keep the cluster running * Avoiding the licensing costs, time, and skills associated with managing different tools * A single, holistic view that helps them understand the performance of their cluster * Operations are more efficient and repeatable using defined workflows
The 2nd example is a Fortune 500 Biotech CompanyUse Cloudera Enterprise as a PB-scale platform for single view of all R&D data * Lab data, field data, literature, etc. * Accelerate data processing with MapReduce vs. specialized systemsBenefits: * Usability: 1000+ scientists have direct access to data in Hadoop * Cloudera Navigator offers auditing & access control
The last example of RelayHealth, a connectivity & IT subsidiary of McKessonUse Cloudera Enterprise as a 24x7 data processing engine for claims & remit data * Use Cloudera Manager for centralized administrationBenefits: * Simplified deployment * Easy on-going management * Software engineer:”The deployment process into production with Hadoop was actually quite easy… Cloudera Manager really helped us out a lot. It’s as easy as clicking a few buttons and you’re up and going. It’s really simple.”
So to recap what we've discussed, this is how you can simplify management & operations of HadoopFirst, streamline operations * Quickly and easily build, manage, monitor & analyze operations of the cluster with a centralized tool like CM * Minimize complex and time-consuming tasks with latest in automation - provisioning & rolling upgradesNext, accelerate innovation *Foster use and production without compromising control * Leverage the robust data protection and business continuity advancements * Maintain visibility with the access control & auditing capabilitiesFinally, focus on the business * More of a meta-point to the other two* These innovations let you align your resources to the projects which can best advance your business and minimize the amount of attention needed for the maintenance and operations of the underlying systems.
Focus on minimizing planned downtimeStraightforward and effective; Immediately applicableEstablish a development cluster; Start work; Push changes; Continue work; Monitor performance; Repeat
Intro with RapLeaf (video)https://cloudera.box.com/files/0/f/532875998/1/f_5336356340