At CenterPoint Energy, both structured and unstructured data are continuing to grow at a rapid pace. This growth presents many opportunities to deliver business value and many challenges to control costs. To maximize the value of this data while controlling costs, CenterPoint Energy created a data lake using SAP HANA and Hadoop. During this presentation, CenterPoint will discuss their journey of moving smart meter data to Hadoop, how Hadoop is allowing CenterPoint to derive value from big data and their future use case road map.
6. Agenda
6CenterPoint Energy Proprietary Information
About CenterPoint
Business Challenge
Design
Smart Meter Use Cases
CNP Architecture
Other Hadoop Initiatives
7. About
7CenterPoint Energy Proprietary Information
Publicly traded on New York Stock Exchange
Headquartered in Houston, Texas
Over 5000 square miles of electric transmission
and distribution service area
Assets total $22 billion
Over 7,700 plus employees
CNP & its predecessor companies in
business for over 140 years
Over 5.5 Million
Metered
Customers
2.4 million Smart
Meters
3,718 Miles of
Transmission
52,639 Miles of
Distribution
Electric
Transmission &
Distribution
Natural Gas
Distribution
Competitive Natural
Gas Sales and
Services
8. Business Challenge
1+ PB of Smart Meter Data
2.4MM Smart Meters taking readings every 15
creating 230MM Readings per day, or over 84 Billion
Readings in a Year.
Regulatory requirements require historical readings to
be available for 10 years.
Uncompressed Data Growth of 8TB per month and
over 1PB in a 10 year period.
Current DW technology is approaching End of Life
Massive amounts of data stored in proprietary vendor
solution, was hard to manage and has a significantly
high total cost of ownership.
Need a cost effective solution for today's analytics,
regulatory requirements and preparation for future
use cases.
8CenterPoint Energy Proprietary Information
9. Vision for ADMP
9CenterPoint Energy Proprietary Information
Cost effective, scalable data management platform
Data resides in the data tier which aligns with the response
time required
Real time reporting
Reliable
Support future advance use cases, streaming, machine
learning, cognitive computing, etc.
11. Data Flow
Interval data is loaded to SAP HANA 3 times a
day using SAP Data Services
• Intervals can be updated at any point but the majority of the
updates happen within 13 months
After 13 months, interval data is aged from SAP
HANA to Hive using Sqoop
• Interval data can still be updated occasionally after 13 months
i.e. meter firmware update
Master data is loaded into Hadoop using Sqoop
CenterPoint Energy Proprietary Information
12. Hive Design
Transactional Hive table required for updates
Shell script used to move data from staging to transactional
target. Sqoop does not support inserts into a transactional table
Partitioned by day with 8 buckets on premise identification
number
File size aligned with HDFS block size
Master data bucketed the same as interval data to take
advantage of performance gains during joins
Data is sorted during the insert to the transactional table
• If new data is inserted to a partition after the initial load, the partition is reloaded
CenterPoint Energy Proprietary Information
13. Smart Meter Use Cases
13CenterPoint Energy Proprietary Information
Forecasting Model Engine
How does weather and consumer behavior impact
load?
Weather response functions
Short-term and long-term forecasts
Weather normalization
14. Smart Meter Use Cases Continued
14CenterPoint Energy Proprietary Information
Diversion
Utilize interval and event data to detect and analyze any
tamper or diversion attempt
15. Smart Meter Use Cases Continued
15CenterPoint Energy Proprietary Information
Usage History Portal
Web front-end for internal and external customers to
view interval data for a premise
Transformer Load Managment
Identify at risk transformers
Maximize usable life
Load Studies
Hourly loads by rate class used in rate cases to allocate
cost to rate classes
Previously random samples were used
16. Other Hadoop Initiatives
16CenterPoint Energy Proprietary Information
Document Storage
Historical invoices
5 million gas & electric PDF invoices a month
10 years of history required
Sub second response time required by web front-end
Less than 100 KB
Historical mainframe reports
Mainframe is being decommissioned but business
clients still need access to historical reports
Response time less than 10 seconds is acceptable
Reports are converted to text files and stored as
blobs in Hive