SlideShare une entreprise Scribd logo
1  sur  16
Pouring the Foundation: The Journey to Big
Data Management at CenterPoint Energy
CenterPoint Energy Proprietary Information
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Utility Industry Challenges & Pressures
Aging Assets &
Workforce Data Growth
Regulatory Pressure Alternative / Distributed
Energy
Driving
Innovation
In the power and utilities
space, the Big Data challenge
is centered on harnessing
massive new influxes of
information to meet business
imperatives such as reliability
& efficiency, safety &
security, profitability, and an
evolving intelligent grid
serving an increasingly
sophisticated customer base.
Source: PennWell “Big Data:
Business Insight for Power and
Utilities”, February 2016
In addition to these challenges, customers are becoming more demanding!!
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Smart Grid Explosion
Big Data in the utilities sector
can only get even bigger as the
smart transformation of the
industry accelerates. It is
estimated that 680 million smart
meters will be installed globally
by 2017 – leading to 280
petabytes of data a year.
Capgemini Consulting: “Big Data
BlackOut: Are Utilities Powering Up
Their Data Analytics?”, 2015
https://www.marsdd.com/wp-content/uploads/2014/08/MaRS-ConnectedWorld-AMI-Figure2-GlobalSmartMeterInstallations.jpg
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Utility Analytics Spending on the Rise
Market analyst GTM Research predicts global utility company expenditure on data analytics will grow from $700m in 2012 to
$3.8bn in 2020, with gas, electricity, and water suppliers in all regions of the world increasing their investment.
Source: Engineering and Technology Magazine “How utilities are profiting from Big Data analytics”, January 20, 2014
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Actionable Intelligence Transforms Energy & Utilities
Asset
Data
Customer
Surveys
Weather &
Environmental
Service Fleet
GPS Data
Smart Meter
Streams
Commodity
Prices
REVENUE
PROTECTION
SINGLE VIEW
OF CUSTOMER
PREDICTIVE EQUIPMENT
MAINTENANCE
CONSERVATION
VOLTAGE REDUCTION
NEXT BEST ACTION
PROGRAMS
Social
Media
GIS
Data
SCADA Outage
Histories
CIS
Records
EDW
Agenda
6CenterPoint Energy Proprietary Information
About CenterPoint
Business Challenge
Design
Smart Meter Use Cases
CNP Architecture
Other Hadoop Initiatives
About
7CenterPoint Energy Proprietary Information
 Publicly traded on New York Stock Exchange
 Headquartered in Houston, Texas
 Over 5000 square miles of electric transmission
and distribution service area
 Assets total $22 billion
 Over 7,700 plus employees
 CNP & its predecessor companies in
business for over 140 years
 Over 5.5 Million
Metered
Customers
 2.4 million Smart
Meters
 3,718 Miles of
Transmission
 52,639 Miles of
Distribution
 Electric
Transmission &
Distribution
 Natural Gas
Distribution
 Competitive Natural
Gas Sales and
Services
Business Challenge
1+ PB of Smart Meter Data
 2.4MM Smart Meters taking readings every 15
creating 230MM Readings per day, or over 84 Billion
Readings in a Year.
 Regulatory requirements require historical readings to
be available for 10 years.
 Uncompressed Data Growth of 8TB per month and
over 1PB in a 10 year period.
 Current DW technology is approaching End of Life
 Massive amounts of data stored in proprietary vendor
solution, was hard to manage and has a significantly
high total cost of ownership.
 Need a cost effective solution for today's analytics,
regulatory requirements and preparation for future
use cases.
8CenterPoint Energy Proprietary Information
Vision for ADMP
9CenterPoint Energy Proprietary Information
Cost effective, scalable data management platform
Data resides in the data tier which aligns with the response
time required
Real time reporting
Reliable
Support future advance use cases, streaming, machine
learning, cognitive computing, etc.
Architecture
10CenterPoint Energy Proprietary Information
ApplicationsDataLake
Data
Sources
ETLand
Streaming
Traditional
(OLTP, OLAP, RDBMS)
Unstructured
Data Flow
Interval data is loaded to SAP HANA 3 times a
day using SAP Data Services
• Intervals can be updated at any point but the majority of the
updates happen within 13 months
After 13 months, interval data is aged from SAP
HANA to Hive using Sqoop
• Interval data can still be updated occasionally after 13 months
i.e. meter firmware update
Master data is loaded into Hadoop using Sqoop
CenterPoint Energy Proprietary Information
Hive Design
Transactional Hive table required for updates
Shell script used to move data from staging to transactional
target. Sqoop does not support inserts into a transactional table
Partitioned by day with 8 buckets on premise identification
number
File size aligned with HDFS block size
Master data bucketed the same as interval data to take
advantage of performance gains during joins
Data is sorted during the insert to the transactional table
• If new data is inserted to a partition after the initial load, the partition is reloaded
CenterPoint Energy Proprietary Information
Smart Meter Use Cases
13CenterPoint Energy Proprietary Information
Forecasting Model Engine
 How does weather and consumer behavior impact
load?
 Weather response functions
 Short-term and long-term forecasts
 Weather normalization
Smart Meter Use Cases Continued
14CenterPoint Energy Proprietary Information
Diversion
 Utilize interval and event data to detect and analyze any
tamper or diversion attempt
Smart Meter Use Cases Continued
15CenterPoint Energy Proprietary Information
Usage History Portal
 Web front-end for internal and external customers to
view interval data for a premise
Transformer Load Managment
 Identify at risk transformers
 Maximize usable life
Load Studies
 Hourly loads by rate class used in rate cases to allocate
cost to rate classes
 Previously random samples were used
Other Hadoop Initiatives
16CenterPoint Energy Proprietary Information
Document Storage
 Historical invoices
 5 million gas & electric PDF invoices a month
 10 years of history required
 Sub second response time required by web front-end
 Less than 100 KB
 Historical mainframe reports
 Mainframe is being decommissioned but business
clients still need access to historical reports
 Response time less than 10 seconds is acceptable
 Reports are converted to text files and stored as
blobs in Hive

Contenu connexe

Tendances

Fighting Financial Crime with Artificial Intelligence
Fighting Financial Crime with Artificial IntelligenceFighting Financial Crime with Artificial Intelligence
Fighting Financial Crime with Artificial IntelligenceDataWorks Summit
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Data Science at Speed. At Scale.
Data Science at Speed. At Scale.Data Science at Speed. At Scale.
Data Science at Speed. At Scale.DataWorks Summit
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASDataWorks Summit
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
BI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonBI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonDataWorks Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
Depositing Value from Transactional Data at Danske Bank
Depositing Value from Transactional Data at Danske BankDepositing Value from Transactional Data at Danske Bank
Depositing Value from Transactional Data at Danske BankDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at WalgreensDataWorks Summit
 
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...DataWorks Summit
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDataWorks Summit/Hadoop Summit
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudDataWorks Summit
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboardDataWorks Summit
 

Tendances (20)

Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
 
Fighting Financial Crime with Artificial Intelligence
Fighting Financial Crime with Artificial IntelligenceFighting Financial Crime with Artificial Intelligence
Fighting Financial Crime with Artificial Intelligence
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Data Science at Speed. At Scale.
Data Science at Speed. At Scale.Data Science at Speed. At Scale.
Data Science at Speed. At Scale.
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLAS
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
BI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonBI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at Verizon
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
Depositing Value from Transactional Data at Danske Bank
Depositing Value from Transactional Data at Danske BankDepositing Value from Transactional Data at Danske Bank
Depositing Value from Transactional Data at Danske Bank
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
 
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
Practical advice to build a data driven company
Practical advice to build a data driven companyPractical advice to build a data driven company
Practical advice to build a data driven company
 

Similaire à Pouring the Foundation: Data Management in the Energy Industry

Compu Dynamics White Paper - Essential Elements for Data Center Optimization
Compu Dynamics White Paper - Essential Elements for Data Center OptimizationCompu Dynamics White Paper - Essential Elements for Data Center Optimization
Compu Dynamics White Paper - Essential Elements for Data Center OptimizationDan Ephraim
 
Time Machines: The Evolution and Application of Predictive Analytics-Dr Steve...
Time Machines: The Evolution and Application of Predictive Analytics-Dr Steve...Time Machines: The Evolution and Application of Predictive Analytics-Dr Steve...
Time Machines: The Evolution and Application of Predictive Analytics-Dr Steve...IT Network marcus evans
 
New Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonNew Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonPaul Hofmann
 
Redefining-Smart-Grid-Architectural-Thinking-Using-Stream-Computing
Redefining-Smart-Grid-Architectural-Thinking-Using-Stream-ComputingRedefining-Smart-Grid-Architectural-Thinking-Using-Stream-Computing
Redefining-Smart-Grid-Architectural-Thinking-Using-Stream-ComputingAjoy Kumar
 
SG Data analytics.pptx
SG Data analytics.pptxSG Data analytics.pptx
SG Data analytics.pptxDanish Mahmood
 
Ericsson hds 8000 wp 16
Ericsson hds 8000 wp 16Ericsson hds 8000 wp 16
Ericsson hds 8000 wp 16Mainstay
 
Improvements in Data Center Management
Improvements in Data Center ManagementImprovements in Data Center Management
Improvements in Data Center ManagementScottMadden, Inc.
 
State Of The Market Mission Critical Facilities
State Of The Market   Mission Critical FacilitiesState Of The Market   Mission Critical Facilities
State Of The Market Mission Critical FacilitiesAnn Fiorelli
 
State Of The Market Mission Critical Facilities
State Of The Market   Mission Critical FacilitiesState Of The Market   Mission Critical Facilities
State Of The Market Mission Critical FacilitiesAnn Fiorelli
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...DataWorks Summit
 
IBM Power 7
IBM Power 7IBM Power 7
IBM Power 7None
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE John Furrier
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
Big Data Blackout: Are Utilities Powering up their Data Analytics
Big Data Blackout: Are Utilities Powering up their Data AnalyticsBig Data Blackout: Are Utilities Powering up their Data Analytics
Big Data Blackout: Are Utilities Powering up their Data AnalyticsRick Bouter
 
Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?
Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?
Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?Capgemini
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data ArchitectureWei-Chiu Chuang
 
Pipeline and Gas Tech April 09 - SCADA Evolution
Pipeline and Gas Tech April 09 - SCADA EvolutionPipeline and Gas Tech April 09 - SCADA Evolution
Pipeline and Gas Tech April 09 - SCADA Evolutionsmrobb
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product ManagersPentaho
 

Similaire à Pouring the Foundation: Data Management in the Energy Industry (20)

Compu Dynamics White Paper - Essential Elements for Data Center Optimization
Compu Dynamics White Paper - Essential Elements for Data Center OptimizationCompu Dynamics White Paper - Essential Elements for Data Center Optimization
Compu Dynamics White Paper - Essential Elements for Data Center Optimization
 
Time Machines: The Evolution and Application of Predictive Analytics-Dr Steve...
Time Machines: The Evolution and Application of Predictive Analytics-Dr Steve...Time Machines: The Evolution and Application of Predictive Analytics-Dr Steve...
Time Machines: The Evolution and Application of Predictive Analytics-Dr Steve...
 
New Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonNew Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @Wharton
 
Redefining-Smart-Grid-Architectural-Thinking-Using-Stream-Computing
Redefining-Smart-Grid-Architectural-Thinking-Using-Stream-ComputingRedefining-Smart-Grid-Architectural-Thinking-Using-Stream-Computing
Redefining-Smart-Grid-Architectural-Thinking-Using-Stream-Computing
 
SG Data analytics.pptx
SG Data analytics.pptxSG Data analytics.pptx
SG Data analytics.pptx
 
Ericsson hds 8000 wp 16
Ericsson hds 8000 wp 16Ericsson hds 8000 wp 16
Ericsson hds 8000 wp 16
 
Improvements in Data Center Management
Improvements in Data Center ManagementImprovements in Data Center Management
Improvements in Data Center Management
 
State Of The Market Mission Critical Facilities
State Of The Market   Mission Critical FacilitiesState Of The Market   Mission Critical Facilities
State Of The Market Mission Critical Facilities
 
State Of The Market Mission Critical Facilities
State Of The Market   Mission Critical FacilitiesState Of The Market   Mission Critical Facilities
State Of The Market Mission Critical Facilities
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
 
IBM Power 7
IBM Power 7IBM Power 7
IBM Power 7
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
 
Hyper-Convergence CrowdChat
Hyper-Convergence CrowdChatHyper-Convergence CrowdChat
Hyper-Convergence CrowdChat
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
Big Data Blackout: Are Utilities Powering up their Data Analytics
Big Data Blackout: Are Utilities Powering up their Data AnalyticsBig Data Blackout: Are Utilities Powering up their Data Analytics
Big Data Blackout: Are Utilities Powering up their Data Analytics
 
Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?
Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?
Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 
Pipeline and Gas Tech April 09 - SCADA Evolution
Pipeline and Gas Tech April 09 - SCADA EvolutionPipeline and Gas Tech April 09 - SCADA Evolution
Pipeline and Gas Tech April 09 - SCADA Evolution
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product Managers
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Pouring the Foundation: Data Management in the Energy Industry

  • 1. Pouring the Foundation: The Journey to Big Data Management at CenterPoint Energy CenterPoint Energy Proprietary Information
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Utility Industry Challenges & Pressures Aging Assets & Workforce Data Growth Regulatory Pressure Alternative / Distributed Energy Driving Innovation In the power and utilities space, the Big Data challenge is centered on harnessing massive new influxes of information to meet business imperatives such as reliability & efficiency, safety & security, profitability, and an evolving intelligent grid serving an increasingly sophisticated customer base. Source: PennWell “Big Data: Business Insight for Power and Utilities”, February 2016 In addition to these challenges, customers are becoming more demanding!!
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Smart Grid Explosion Big Data in the utilities sector can only get even bigger as the smart transformation of the industry accelerates. It is estimated that 680 million smart meters will be installed globally by 2017 – leading to 280 petabytes of data a year. Capgemini Consulting: “Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?”, 2015 https://www.marsdd.com/wp-content/uploads/2014/08/MaRS-ConnectedWorld-AMI-Figure2-GlobalSmartMeterInstallations.jpg
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Utility Analytics Spending on the Rise Market analyst GTM Research predicts global utility company expenditure on data analytics will grow from $700m in 2012 to $3.8bn in 2020, with gas, electricity, and water suppliers in all regions of the world increasing their investment. Source: Engineering and Technology Magazine “How utilities are profiting from Big Data analytics”, January 20, 2014
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Actionable Intelligence Transforms Energy & Utilities Asset Data Customer Surveys Weather & Environmental Service Fleet GPS Data Smart Meter Streams Commodity Prices REVENUE PROTECTION SINGLE VIEW OF CUSTOMER PREDICTIVE EQUIPMENT MAINTENANCE CONSERVATION VOLTAGE REDUCTION NEXT BEST ACTION PROGRAMS Social Media GIS Data SCADA Outage Histories CIS Records EDW
  • 6. Agenda 6CenterPoint Energy Proprietary Information About CenterPoint Business Challenge Design Smart Meter Use Cases CNP Architecture Other Hadoop Initiatives
  • 7. About 7CenterPoint Energy Proprietary Information  Publicly traded on New York Stock Exchange  Headquartered in Houston, Texas  Over 5000 square miles of electric transmission and distribution service area  Assets total $22 billion  Over 7,700 plus employees  CNP & its predecessor companies in business for over 140 years  Over 5.5 Million Metered Customers  2.4 million Smart Meters  3,718 Miles of Transmission  52,639 Miles of Distribution  Electric Transmission & Distribution  Natural Gas Distribution  Competitive Natural Gas Sales and Services
  • 8. Business Challenge 1+ PB of Smart Meter Data  2.4MM Smart Meters taking readings every 15 creating 230MM Readings per day, or over 84 Billion Readings in a Year.  Regulatory requirements require historical readings to be available for 10 years.  Uncompressed Data Growth of 8TB per month and over 1PB in a 10 year period.  Current DW technology is approaching End of Life  Massive amounts of data stored in proprietary vendor solution, was hard to manage and has a significantly high total cost of ownership.  Need a cost effective solution for today's analytics, regulatory requirements and preparation for future use cases. 8CenterPoint Energy Proprietary Information
  • 9. Vision for ADMP 9CenterPoint Energy Proprietary Information Cost effective, scalable data management platform Data resides in the data tier which aligns with the response time required Real time reporting Reliable Support future advance use cases, streaming, machine learning, cognitive computing, etc.
  • 10. Architecture 10CenterPoint Energy Proprietary Information ApplicationsDataLake Data Sources ETLand Streaming Traditional (OLTP, OLAP, RDBMS) Unstructured
  • 11. Data Flow Interval data is loaded to SAP HANA 3 times a day using SAP Data Services • Intervals can be updated at any point but the majority of the updates happen within 13 months After 13 months, interval data is aged from SAP HANA to Hive using Sqoop • Interval data can still be updated occasionally after 13 months i.e. meter firmware update Master data is loaded into Hadoop using Sqoop CenterPoint Energy Proprietary Information
  • 12. Hive Design Transactional Hive table required for updates Shell script used to move data from staging to transactional target. Sqoop does not support inserts into a transactional table Partitioned by day with 8 buckets on premise identification number File size aligned with HDFS block size Master data bucketed the same as interval data to take advantage of performance gains during joins Data is sorted during the insert to the transactional table • If new data is inserted to a partition after the initial load, the partition is reloaded CenterPoint Energy Proprietary Information
  • 13. Smart Meter Use Cases 13CenterPoint Energy Proprietary Information Forecasting Model Engine  How does weather and consumer behavior impact load?  Weather response functions  Short-term and long-term forecasts  Weather normalization
  • 14. Smart Meter Use Cases Continued 14CenterPoint Energy Proprietary Information Diversion  Utilize interval and event data to detect and analyze any tamper or diversion attempt
  • 15. Smart Meter Use Cases Continued 15CenterPoint Energy Proprietary Information Usage History Portal  Web front-end for internal and external customers to view interval data for a premise Transformer Load Managment  Identify at risk transformers  Maximize usable life Load Studies  Hourly loads by rate class used in rate cases to allocate cost to rate classes  Previously random samples were used
  • 16. Other Hadoop Initiatives 16CenterPoint Energy Proprietary Information Document Storage  Historical invoices  5 million gas & electric PDF invoices a month  10 years of history required  Sub second response time required by web front-end  Less than 100 KB  Historical mainframe reports  Mainframe is being decommissioned but business clients still need access to historical reports  Response time less than 10 seconds is acceptable  Reports are converted to text files and stored as blobs in Hive