451 Analyst Matt Aslett, Cloudera CEO Mike Olson and Cloudera customers RIM and YP (formerly AT&T Interactive) to learn:
» Why Cloudera customers have chosen CDH to get started with Hadoop
» The business value resulting from analyzing new data sources in new ways
» How Hadoop will change these Customers’ business and industry over the next 3-5 years
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer Webinar Series: 451 Research
1. THE BUSINESS ADVANTAGE OF
HADOOP: LESSONS FROM THE FIELD
Matt Aslett, Research Manager, 451 Research
Mike Olson, CEO, Cloudera
Bill Theisinger, Executive Director, Platform Data Services, YP
Aaron Wiebe, Blackberry Infrastructure Architect, Research In Motion
1
12. 2008 2009 2011 2012 BEYOND…
CLOUDERA CDH: CLOUDERA CLOUDERA TRANSFORMING
FOUNDED BY MIKE FIRST REACHES 100 ENTERPRISE 4: HOW COMPANIES
OLSON, COMMERCIAL PRODUCTION THE STANDARD THINK ABOUT
AMR AWADALLAH & APACHE CUSTOMERS FOR HADOOP IN DATA
JEFF HADOOP THE ENTERPRISE
HAMMERBACHER DISTRIBUTION
CHANGING
CLO UDERA THE WORLD
ENTERPRIS ONE PETABYTE
E AT A TIME
4
2009 2010 2011 2012
HADOOP CLOUDERA CLOUDERA CLOUDERA
CREATOR DOUG MANAGER: UNIVERSITY CONNECT
CUTTING JOINS FIRST EXPANDS TO 140 REACHES 300
CLOUDERA MANAGEMENT COUNTRIES PARTNERS
APPLICATION FOR
HADOOP
12
13. CLOUDERA ENTERPRISE EDUCATION
CLOUDERA SUPPORT:
OUR TEAM OF EXPERTS ON CALL TO HELP YOU MEET YOUR SERVICE DEVELOPERS
LEVEL AGREEMENTS (SLAS)
ADMINISTRATORS
CLOUDERA MANAGER:
END-TO-END MANAGEMENT APPLICATION FOR THE DEPLOYMENT &
OPERATION OF CDH
DATA SCIENTISTS
CDH:
BIG DATA STORAGE, PROCESSING & ANALYTICS PLATFORM BASED CERTIFICATION
ON APACHE HADOOP – 100% OPEN SOURCE PROGRAMS
PROFESSIONAL SERVICES
USE CASE NEW HADOOP PROOF OF PRODUCTION PROCESS & TEAM DEPLOYMENT
DISCOVERY DEPLOYMENT CONCEPT PILOTS DEVELOPMENT CERTIFICATION
13
21. What we were facing
• Increasing volume of traffic data through our distribution
network
• Need for a system to support changing data complexity and
detail
• Adhere to tighter SLAs
• Provide intra-day reporting
• Benefit from the intelligence trapped in our data
21
22. Legacy processing flow
Data Load
Application Log Data Layer ETL
Data Load Data Warehouse
Data processing
Data Load
• Drop reportable events on the floor
• Loading multiple DBs
• Processing time was significant
• Reporting lag was in days, not hours
• High maintainability required
Page
24. Hadoop processing flow
Data Data Hadoop Platform Data
Applications
LWES Collection Layer Warehouse
• All ETL processing in Hadoop
• Several systems integrate to Hadoop platform
• All Java MapReduce with some Hive for end user and
dependent systems
• Reporting lag in hours, not days
• Actual reduction in maintainability needs
Page
26. Hadoop processing flow
Data
Warehouse
Applications Data Data Hadoop Platform
LWES Collection Layer
HBase Platform
• Migrating some reporting to HBase
• Exposing core business KPIs via APIs
• Replacing various data marts with HBase tables/schemas
• Reducing TCO
• Alignment of core skill sets
Page
27. Hadoop @ Research In Motion
Aaron Wiebe
BlackBerry Infrastructure Architect
28. Internal Use Only
The Problem
1. BlackBerry Services currently generate 500TB of
instrumentation data daily (and growing rapidly).
2. Traditional systems unable to cope with both growth and
access requests.
3. Total global dataset of ~100PB.
28 Confidential and Proprietary
29. Internal Use Only
The Old Way
Event Monitoring Alerting
Filter
Streaming ETL Complex Correlation
Services and
Split Streaming ETL Data Warehouse
Archive Storage
1. - Focus on reducing data to required data set
2. - Pipeline data flows to avoid hitting disk
3. - Scalability issues at most stages
4. - Going back to the Archive was really time consuming
29 Confidential and Proprietary
30. Internal Use Only
The Hadoop Way
Event Monitoring Alerting
Filter
Services and Hadoop
Archive Storage
Split ETL Data Warehouse
Correlation
Stage 1 DWH
1. - Archive storage moved to HDFS
2. - ETL processes converted to Hadoop (Pig+Hive)
3. - Some data warehouse functions migrating to Hadoop
30 Confidential and Proprietary
31. Internal Use Only
Real Results
1. - 90% code base reduction for ETL Tools
2. - Example Performance:
3. - Previous Ad-Hoc query would take around 4 days
- Now takes 53 minutes
- Significant capital cost reductions over previous system
31 Confidential and Proprietary
Hadoop typically solves two types of problems. Data process is the first step after collection. Data is combined and prepared, features extracted and curated Advanced analytics is where science is applied. Extracting and understanding models of how the business operates. The results are then integrated back into business operations. These go by different terms in different industries The applicability of these solutions is broad We ’ve successfully deployed Hadoop and helped solve a diverse set of business problems
Speak to the size and scope of the problem Problems with handling ~100PB of data using traditional methods
-Lose data as pipelines progress -Going back for information after the fact is hard, if not impossible. -
This is where Hadoop fit for us
-But changing to Hadoop has bigger, more massive impacts overall. -Things we couldn ’t even consider doing are now feasible -