SlideShare une entreprise Scribd logo
1  sur  38
Energy Usage Insights
with Hadoop & HBase
July 25, 2013
Scott Kuehn Data Architect
Oren Benjamin Senior Software Engineer
Our Utility Partners
2
Australia New Zealand France Nova ScotiaUK
Energy Usage Insights
326 July 2013
Home Energy Report
426 July 2013
Energy Savings
526 July 2013
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Average Steady State Savings = ~1.5 – 3.5%
Months since program start
Energy saved
Impact
626 July 2013
$300,000,000
2,500,000,000 kWh
4,000,000,000 lbs CO2
Web Portal
726 July 2013
826 July 2013
Data Overview: Energy Usage Streams
926 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00
0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00
0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00
0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00
0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00
0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00
0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00
0001 – Meter (Bills)
0002 – Smart Meter (Quarter-hourly reads)
Data Overview: Smart Meter
1026 July 2013
Data Overview: Entities
1126 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
Data Overview: Size
1226 July 2013
» Billing data: 60M households
» Smart meter data: 15M households
» On disk: 5TB (raw)
» More smart meter data than all other data combined
Architecture: Usage Data Store
1326 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
Architecture: Usage Data Store
1426 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
HBase + Hadoop Architecture v1.0
1526 July 2013
Meter
metadata
Usage data
Mysql report/
AMI DB's
Batch
Workers
Web
servers
Sqoop
MySQL
report/AMI
DB's
HDFS
M/RHBase
HBase + Hadoop Architecture v2.0
1626 July 2013
Meter
metadata
Batch
Workers
Web
servers
HDFS file upload
Mysql report/
AMI DB's
MySQL
report/AMI
DB's
metadata
requests
HDFS
M/RHBase
Usage data
Data Schema: Kiji
1726 July 2013
Kiji Schema
»  Table layout definition
»  Schema management
»  Object serialization
»  Entity-centric data model
Supporting Projects
»  Kiji MR
»  Kiji Hive Adapter
»  Kiji REST
»  ...
Entity-centric Table: Row Key
1826 July 2013
Hash prefix Utility company Site ID
1 byte 4 bytes 8 bytes
"keys_format":{
"encoding":"FORMATTED",
"salt": { "hash_type": "MD5”, "hash_size": 1 },
"components":[
{ "name":"utility_company”, "type":"INTEGER” },
{ "name":"site_id”, "type":"LONG” }
]
}
Entity-centric Table: Site
1926 July 2013
A single row
0.12 kWh
1.3 Therm
24 Therm
356 kWh
Usage Data Column Family
UUA
June 18 - July
17; $25
Insights Column Family
stream:0 stream:1
stream:2 stream:3
uua:0
bill_forecast:0
Insight Example: Rate Calculation
2026 July 2013
Insights: Jobs & Services
2126 July 2013
»  M/R jobs to compute insights in batch
»  Services to access pre-computed insights / compute insights on demand
»  Insight for a Site is calculated based on the data in the Site’s row
»  The calculated insight is saved back to the Site row
Insight Example: Rate Calculation
2226 July 2013
Usage data column family
site
… … …rate
calculation
bill
forecast
Insights column family
Rate Calculation
MapReduce
stream:0 stream:n
Rate Calculation: Producer
2326 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  
	
   	
   	
   	
   	
   	
  ProducerContext	
  context)	
  {	
  
	
   	
  RateCalculation	
  insight	
  =	
  computeInsight(siteRowData);	
  
	
   	
  context.put(insight);	
  
}	
  
}	
  
Rate Calculation: Producer
2426 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  
	
   	
   	
   	
   	
   	
  ProducerContext	
  context)	
  {	
  
	
   	
  RateCalculation	
  insight	
  =	
  computeInsight(siteRowData);	
  
	
   	
  context.put(insight);	
  
}	
  
	
  
@Override	
  
public	
  String	
  getOutputColumn()	
  {	
  
	
   	
  return	
  "rate_calculation”;	
  
}	
  
	
  
}	
  
2526 July 2013
public class RateCalculationProducer extends KijiProducer {	
  
	
  
	
  @Override	
  
	
  public	
  KijiDataRequest	
  getDataRequest()	
  {	
  
	
   	
  Configuration	
  conf	
  =	
  getConf();	
  
	
  	
  	
  	
   	
  long	
  startTime	
  =	
  parseLong(conf.get(START_PARAM));	
  
	
  
	
  	
  	
  	
   	
  return	
  KijiDataRequest.builder()	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .withTimeRange(startTime,	
  END_OF_TIME)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .addColumns(ColumnsDef.create()	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .withMaxVersions(ALL_VERSIONS)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .addFamily("usage_data"))	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .build();	
  
	
  	
  	
  }	
  
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  ...	
  	
  
In-practice
2626 July 2013
»  ETL to an entity-centric schema
»  Bulk loading
»  Mixed workloads
Design decisions and challenges
In-practice: ETL to entity-centric schema
2726 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00
0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00
0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00
0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00
0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00
0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00
0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00
0001 – Meter (Bills)
0002 – Smart Meter (Quarter-hourly reads)
In practice: ETL to entity-centric schema
2826 July 2013
»  Use bulkloading for performance
»  Make ingest process idempotent
»  Introduce a read-log for utility company billing corrections
»  ETL Steps:
1. Ingest all reads into a read-log table2
2. Load reads into the corresponding Site row
Read-log table
M/R Bulkload
Pivot
Site table21
M/R Bulkload
Billing files
In practice: bulk loading
2926 July 2013
»  Bulk loaded files are not assigned sequence numbers
»  All compactions become major compactions
»  Solution: Find a temporary fix, monitor the HBase JIRA
In practice: Mixed workloads
3026 July 2013
Site table
Reporting
apps
Web
servers
M/R
Ad-hoc reads
and forecasts
Batch insight
calculations
Bulk scans
In practice: Mixed workloads
3126 July 2013
»  Supporting mixed workloads requires adapting jobs and configurations
»  IO: Switch to bulkloading, enable direct HDFS reads
»  Major compactions: Disabled
»  Memory: increase heap and region sizes, use MSLAB
»  Verify performance by simulating nominal and high load scenarios
In practice: Mixed workloads
3226 July 2013
Results Visualized
3326 July 2013
Animation of jobs in progress
Mixed Workload Success
3426 July 2013
9ms
2ms
»  Mean read time is ~2ms
»  Nearly 200 forecasts/sec on performance testing cluster
3526 July 2013
Recap
3626 July 2013
Opower
»  Save energy
»  Make money
»  Big (enough) data
Oren Benjamin
oren.benjamin@opower.com
We’re hiring.
http://opower.com/careers
Scott Kuehn
scott.kuehn@opower.com
Rate Calculation: Rate Engine
3726 July 2013
public interface RateEngine {
/**	
  	
  
	
  *	
  Compute	
  the	
  cost	
  per	
  usage	
  read	
  for	
  the	
  given	
  Site	
  	
  
	
  *	
  over	
  the	
  requested	
  time	
  interval.	
  	
  
	
  *	
  @return	
  a	
  RateCalculation	
  containing	
  the	
  result	
  
	
  */	
  
RateCalculation calculate(Site site, List<UsageRead> usageReads);
}
Rate Calculation: Application Context
3826 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  private	
  ConfigurableApplicationContext	
  appContext;	
  
	
  private	
  RateEngine	
  rateEngine;
	
  
	
  @Override	
  
	
  public	
  void	
  setup(KijiContext	
  context)	
  {	
  
	
  	
  	
  	
   	
  String	
  contextPath	
  =	
  getConf().get(CONTEXT_PATH_KEY);	
  
	
  	
  	
  	
   	
  appContext	
  =	
  new	
  XmlAppContext(contextPath);	
  
	
  	
  	
  	
   	
  rateEngine	
  =	
  appContext.getBean(RateEngine.class);	
  
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  …

Contenu connexe

Similaire à Energy usage insights_with_hadoop_and_h_base

Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Sumeet Singh
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku
 

Similaire à Energy usage insights_with_hadoop_and_h_base (20)

MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting modelsBlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
 
Modernizing sql server the right way
Modernizing sql server the right wayModernizing sql server the right way
Modernizing sql server the right way
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse AutomationSolving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Azuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data FactoryAzuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data Factory
 
Reference for data migration pls choose and
Reference for data migration pls choose andReference for data migration pls choose and
Reference for data migration pls choose and
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Energy usage insights_with_hadoop_and_h_base

  • 1. Energy Usage Insights with Hadoop & HBase July 25, 2013 Scott Kuehn Data Architect Oren Benjamin Senior Software Engineer
  • 2. Our Utility Partners 2 Australia New Zealand France Nova ScotiaUK
  • 5. Energy Savings 526 July 2013 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Average Steady State Savings = ~1.5 – 3.5% Months since program start Energy saved
  • 9. Data Overview: Energy Usage Streams 926 July 2013 meter usage cost start end 0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00 0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00 0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00 0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00 0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00 0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00 0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00 0001 – Meter (Bills) 0002 – Smart Meter (Quarter-hourly reads)
  • 10. Data Overview: Smart Meter 1026 July 2013
  • 11. Data Overview: Entities 1126 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 12. Data Overview: Size 1226 July 2013 » Billing data: 60M households » Smart meter data: 15M households » On disk: 5TB (raw) » More smart meter data than all other data combined
  • 13. Architecture: Usage Data Store 1326 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 14. Architecture: Usage Data Store 1426 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 15. HBase + Hadoop Architecture v1.0 1526 July 2013 Meter metadata Usage data Mysql report/ AMI DB's Batch Workers Web servers Sqoop MySQL report/AMI DB's HDFS M/RHBase
  • 16. HBase + Hadoop Architecture v2.0 1626 July 2013 Meter metadata Batch Workers Web servers HDFS file upload Mysql report/ AMI DB's MySQL report/AMI DB's metadata requests HDFS M/RHBase Usage data
  • 17. Data Schema: Kiji 1726 July 2013 Kiji Schema »  Table layout definition »  Schema management »  Object serialization »  Entity-centric data model Supporting Projects »  Kiji MR »  Kiji Hive Adapter »  Kiji REST »  ...
  • 18. Entity-centric Table: Row Key 1826 July 2013 Hash prefix Utility company Site ID 1 byte 4 bytes 8 bytes "keys_format":{ "encoding":"FORMATTED", "salt": { "hash_type": "MD5”, "hash_size": 1 }, "components":[ { "name":"utility_company”, "type":"INTEGER” }, { "name":"site_id”, "type":"LONG” } ] }
  • 19. Entity-centric Table: Site 1926 July 2013 A single row 0.12 kWh 1.3 Therm 24 Therm 356 kWh Usage Data Column Family UUA June 18 - July 17; $25 Insights Column Family stream:0 stream:1 stream:2 stream:3 uua:0 bill_forecast:0
  • 20. Insight Example: Rate Calculation 2026 July 2013
  • 21. Insights: Jobs & Services 2126 July 2013 »  M/R jobs to compute insights in batch »  Services to access pre-computed insights / compute insights on demand »  Insight for a Site is calculated based on the data in the Site’s row »  The calculated insight is saved back to the Site row
  • 22. Insight Example: Rate Calculation 2226 July 2013 Usage data column family site … … …rate calculation bill forecast Insights column family Rate Calculation MapReduce stream:0 stream:n
  • 23. Rate Calculation: Producer 2326 July 2013 public class RateCalculationProducer extends KijiProducer {   @Override   public  void  produce(KijiRowData  siteRowData,              ProducerContext  context)  {      RateCalculation  insight  =  computeInsight(siteRowData);      context.put(insight);   }   }  
  • 24. Rate Calculation: Producer 2426 July 2013 public class RateCalculationProducer extends KijiProducer {   @Override   public  void  produce(KijiRowData  siteRowData,              ProducerContext  context)  {      RateCalculation  insight  =  computeInsight(siteRowData);      context.put(insight);   }     @Override   public  String  getOutputColumn()  {      return  "rate_calculation”;   }     }  
  • 25. 2526 July 2013 public class RateCalculationProducer extends KijiProducer {      @Override    public  KijiDataRequest  getDataRequest()  {      Configuration  conf  =  getConf();            long  startTime  =  parseLong(conf.get(START_PARAM));              return  KijiDataRequest.builder()                                    .withTimeRange(startTime,  END_OF_TIME)                                    .addColumns(ColumnsDef.create()                                            .withMaxVersions(ALL_VERSIONS)                                            .addFamily("usage_data"))                                    .build();        }     @Override   public  void  produce(KijiRowData  siteRowData,  ...    
  • 26. In-practice 2626 July 2013 »  ETL to an entity-centric schema »  Bulk loading »  Mixed workloads Design decisions and challenges
  • 27. In-practice: ETL to entity-centric schema 2726 July 2013 meter usage cost start end 0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00 0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00 0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00 0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00 0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00 0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00 0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00 0001 – Meter (Bills) 0002 – Smart Meter (Quarter-hourly reads)
  • 28. In practice: ETL to entity-centric schema 2826 July 2013 »  Use bulkloading for performance »  Make ingest process idempotent »  Introduce a read-log for utility company billing corrections »  ETL Steps: 1. Ingest all reads into a read-log table2 2. Load reads into the corresponding Site row Read-log table M/R Bulkload Pivot Site table21 M/R Bulkload Billing files
  • 29. In practice: bulk loading 2926 July 2013 »  Bulk loaded files are not assigned sequence numbers »  All compactions become major compactions »  Solution: Find a temporary fix, monitor the HBase JIRA
  • 30. In practice: Mixed workloads 3026 July 2013 Site table Reporting apps Web servers M/R Ad-hoc reads and forecasts Batch insight calculations Bulk scans
  • 31. In practice: Mixed workloads 3126 July 2013 »  Supporting mixed workloads requires adapting jobs and configurations »  IO: Switch to bulkloading, enable direct HDFS reads »  Major compactions: Disabled »  Memory: increase heap and region sizes, use MSLAB »  Verify performance by simulating nominal and high load scenarios
  • 32. In practice: Mixed workloads 3226 July 2013
  • 33. Results Visualized 3326 July 2013 Animation of jobs in progress
  • 34. Mixed Workload Success 3426 July 2013 9ms 2ms »  Mean read time is ~2ms »  Nearly 200 forecasts/sec on performance testing cluster
  • 36. Recap 3626 July 2013 Opower »  Save energy »  Make money »  Big (enough) data Oren Benjamin oren.benjamin@opower.com We’re hiring. http://opower.com/careers Scott Kuehn scott.kuehn@opower.com
  • 37. Rate Calculation: Rate Engine 3726 July 2013 public interface RateEngine { /**      *  Compute  the  cost  per  usage  read  for  the  given  Site      *  over  the  requested  time  interval.      *  @return  a  RateCalculation  containing  the  result    */   RateCalculation calculate(Site site, List<UsageRead> usageReads); }
  • 38. Rate Calculation: Application Context 3826 July 2013 public class RateCalculationProducer extends KijiProducer {  private  ConfigurableApplicationContext  appContext;    private  RateEngine  rateEngine;    @Override    public  void  setup(KijiContext  context)  {            String  contextPath  =  getConf().get(CONTEXT_PATH_KEY);            appContext  =  new  XmlAppContext(contextPath);            rateEngine  =  appContext.getBean(RateEngine.class);     @Override   public  void  produce(KijiRowData  siteRowData,  …