This document discusses a project between Pentaho and Verizon to leverage big data analytics. Verizon generates vast amounts of call detail record (CDR) data from mobile networks that is currently stored in a data warehouse for 2 years and then archived to tape. Pentaho's platform will help optimize the data warehouse by using Hadoop to store all CDR data history. This will free up data warehouse capacity for high value data and allow analysis of the full 10 years of CDR data. Pentaho tools will ingest raw CDR data into Hadoop, execute MapReduce jobs to enrich the data, load results into Hive, and enable analyzing the data to understand calling patterns by geography over time.
9. Pentaho Platform Design Drivers
1. Big data is changing the world
2. Open systems are more innovative
3. Subscriptions models reduce cost and risk
4. Simplicity empowers the masses
5. Pluggable java architectures enables flexibility and
competitive advantage
6. Enterprise-wide integration reduce cost and complexity
7. Predictive technologies are next big thing in analytics
9
11. 3
Calling
Plans
• Nationwide
• PAYG
• Prepaid 50
2
Business
Units
• B2B
• B2C
7
Retail
Stores
7
Product
Lines
3
Websites
Big Wireless– Wireless Carrier
• San Francisco
• Boston
• NYC
• Paris
• Tokyo
• Sydney
• London
• Smartphones
• Home Phones
• Wifi Devices
• Modems
• Notebooks
• Tablets
• Accessories
• Ecommerce Site
• Reseller Portal
• Manufacturer Portal
Store Managers
Executives & Product
Managers
Operations and Store
Employees
Marketing & Customer
Support
B2B Sales Organization
Databases
Call Detail
Records
Retail Sales
Website
Clickstream
Website
User Registration
12. 2013 Performance Goals
12
Increase subscription revenue
Improve store profitability
Eliminate inventory stock outs
Leverage big data to maximize profits
Profile and target profitable customers
Improve supply chain visibility for partners
13. 2013 Performance Goals
13
Goals Objectives Enablers
Increase
subscription
revenue
Analyze call data to upsell PAYG
customers to subscriptions
Improve store
profitability
Hold store managers accountable by
pushing store income statements to
email
Eliminate inventory
stock outs
Empower store employees with iPads
and real-time inventory reports
Profile and target
profitable
customers
Profile mobile plan customers with
high average call duration
Leverage big data
to maximize profits
Analyze e-commerce clickstream data
in MongoDB to profile purchasing
users and predict users propensity to
purchase.
Improve supply
chain visibility for
partners
Give phone manufacturers and
resellers web access to secure sales
reports
14. 3
Calling
Plans
• Nationwide
• PAYG
• Prepaid 50
2
Business
Units
• B2B
• B2C
7
Retail
Stores
7
Product
Lines
3
Websites
Enterprise-Wide Analytics
10
Resellers
10
Phone
Manf
Red River Mobile
• San Francisco
• Boston
• NYC
• Paris
• Tokyo
• Sydney
• London
• Smartphones
• Home Phones
• Wifi Devices
• Modems
• Notebooks
• Tablets
• Accessories
• Ecommerce Site
• Reseller Portal
• Manufacturer Portal
EXTERNAL INTERNAL
IFrame Integration
Custom Widget
Embedding
16. Mobile Network Provider
Call Detail Records (CDR)
• Mobile networks generate vast amounts of daily call data
• CDR tracks every voice, SMS, or location service
• 2 years of detailed CDR records in DW
• Archived to tape after 2 years
Data Sources Data Warehouse Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Tape
Archive
17. Current Data Warehouse Architecture
Data Sources Data Warehouse Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Tape
Archive
With Current EDW Architecture With Hadoop
EDW stores only 2 years of data à Hadoop active archive for all history
Infrastructure at capacity à Frees EDW capacity for high value data
Expensive to scale à Lowers cost and inexpensive to scale
ETL process complex and slow à Streamlined ingestion of raw data
Only analyze 2 years of data à Analyze 10 years of data
18. Data Warehouse Optimization
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
Tape
Archive