8. CDH & Enterprise Ecosystem
Drivers, language enhancements, testing
File System Mount UI Framework SDK
FUSE-DFS HUE HUE SDK
Sqoop Workflow
APACHE OOZIE
Scheduling
APACHE OOZIE
Metadata
APACHE HIVE
frame-
work, Languages / Compilers
More
adapters Data
Integration
APACHE PIG, APACHE HIVE
Fast
Read/Write coming…
Access
APACHE FLUME,
APACHE SQOOP APACHE HBASE
Coordination
APACHE ZOOKEEPER
Packaging, testing
8
9. Hadoop / RDBMS Use Cases
Create context
Analyze
unstructured data (classification, text mining)
Parse, aggregate
Analyze, report
semi-structured data
Active archival
Analyze, report Long running queries
structured data
Slide borrowed from Krishnan Parasuraman presentation at Enzee’11
9
Copyright 2011 Cloudera Inc. All rights reserved
12. Use Case: Customer Risk
Build comprehensive data picture of customer side risk
Publish a consolidated set of attributes for analysis
Map ratings across products
Parse and aggregate data from difference sources
Credit and debit cards, product payments, deposits and savings
Banking activity, browsing behavior, call logs, e-mails and chats
Merge data into a single view
A “fuzzy join” among data sources
Structure and normalize attributes
Sentiment analysis, pattern recognition
12
Copyright 2010 Cloudera Inc. All rights reserved
13. Use Case: Sentiment Analysis
Internet generates a lot of chatter about brands
Understanding what’s being said is crucial to protecting brand value
Facebook, Twitter generate a lot of data for a global top brand
Capturing and Processing direct feedback
Better engagement and alerting via Sentiment Analysis
Not yet ready for fully automated customer service
Hadoop handles the diverse data types and processing
Sources of data changing and semantics continuously evolving
Sophistication of algorithms is improving daily
13
Copyright 2010 Cloudera Inc. All rights reserved
FinSvc companies are realizing that they need to understand the fundamental risk in their customer base.All of a bank’s working capital originals with customers.Being able to better predict fluctuations can help them optimize how to put that capital to work.
Much of the discussions about brands today happens in the social media.This not only impacts the companies perception but can have a direct influence on relationships with customers and the ability to sell.Hadoop is a natural solution for gathering and contextualizing discussions about company brands and products.