12. Data Overview: Size
1226 July 2013
» Billing data: 60M households
» Smart meter data: 15M households
» On disk: 5TB (raw)
» More smart meter data than all other data combined
13. Architecture: Usage Data Store
1326 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
14. Architecture: Usage Data Store
1426 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
15. HBase + Hadoop Architecture v1.0
1526 July 2013
Meter
metadata
Usage data
Mysql report/
AMI DB's
Batch
Workers
Web
servers
Sqoop
MySQL
report/AMI
DB's
HDFS
M/RHBase
16. HBase + Hadoop Architecture v2.0
1626 July 2013
Meter
metadata
Batch
Workers
Web
servers
HDFS file upload
Mysql report/
AMI DB's
MySQL
report/AMI
DB's
metadata
requests
HDFS
M/RHBase
Usage data
17. Data Schema: Kiji
1726 July 2013
Kiji Schema
» Table layout definition
» Schema management
» Object serialization
» Entity-centric data model
Supporting Projects
» Kiji MR
» Kiji Hive Adapter
» Kiji REST
» ...
18. Entity-centric Table: Row Key
1826 July 2013
Hash prefix Utility company Site ID
1 byte 4 bytes 8 bytes
"keys_format":{
"encoding":"FORMATTED",
"salt": { "hash_type": "MD5”, "hash_size": 1 },
"components":[
{ "name":"utility_company”, "type":"INTEGER” },
{ "name":"site_id”, "type":"LONG” }
]
}
19. Entity-centric Table: Site
1926 July 2013
A single row
0.12 kWh
1.3 Therm
24 Therm
356 kWh
Usage Data Column Family
UUA
June 18 - July
17; $25
Insights Column Family
stream:0 stream:1
stream:2 stream:3
uua:0
bill_forecast:0
21. Insights: Jobs & Services
2126 July 2013
» M/R jobs to compute insights in batch
» Services to access pre-computed insights / compute insights on demand
» Insight for a Site is calculated based on the data in the Site’s row
» The calculated insight is saved back to the Site row
22. Insight Example: Rate Calculation
2226 July 2013
Usage data column family
site
… … …rate
calculation
bill
forecast
Insights column family
Rate Calculation
MapReduce
stream:0 stream:n
23. Rate Calculation: Producer
2326 July 2013
public class RateCalculationProducer extends KijiProducer {
@Override
public
void
produce(KijiRowData
siteRowData,
ProducerContext
context)
{
RateCalculation
insight
=
computeInsight(siteRowData);
context.put(insight);
}
}
24. Rate Calculation: Producer
2426 July 2013
public class RateCalculationProducer extends KijiProducer {
@Override
public
void
produce(KijiRowData
siteRowData,
ProducerContext
context)
{
RateCalculation
insight
=
computeInsight(siteRowData);
context.put(insight);
}
@Override
public
String
getOutputColumn()
{
return
"rate_calculation”;
}
}
25. 2526 July 2013
public class RateCalculationProducer extends KijiProducer {
@Override
public
KijiDataRequest
getDataRequest()
{
Configuration
conf
=
getConf();
long
startTime
=
parseLong(conf.get(START_PARAM));
return
KijiDataRequest.builder()
.withTimeRange(startTime,
END_OF_TIME)
.addColumns(ColumnsDef.create()
.withMaxVersions(ALL_VERSIONS)
.addFamily("usage_data"))
.build();
}
@Override
public
void
produce(KijiRowData
siteRowData,
...
26. In-practice
2626 July 2013
» ETL to an entity-centric schema
» Bulk loading
» Mixed workloads
Design decisions and challenges
27. In-practice: ETL to entity-centric schema
2726 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00
0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00
0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00
0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00
0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00
0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00
0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00
0001 – Meter (Bills)
0002 – Smart Meter (Quarter-hourly reads)
28. In practice: ETL to entity-centric schema
2826 July 2013
» Use bulkloading for performance
» Make ingest process idempotent
» Introduce a read-log for utility company billing corrections
» ETL Steps:
1. Ingest all reads into a read-log table2
2. Load reads into the corresponding Site row
Read-log table
M/R Bulkload
Pivot
Site table21
M/R Bulkload
Billing files
29. In practice: bulk loading
2926 July 2013
» Bulk loaded files are not assigned sequence numbers
» All compactions become major compactions
» Solution: Find a temporary fix, monitor the HBase JIRA
30. In practice: Mixed workloads
3026 July 2013
Site table
Reporting
apps
Web
servers
M/R
Ad-hoc reads
and forecasts
Batch insight
calculations
Bulk scans
31. In practice: Mixed workloads
3126 July 2013
» Supporting mixed workloads requires adapting jobs and configurations
» IO: Switch to bulkloading, enable direct HDFS reads
» Major compactions: Disabled
» Memory: increase heap and region sizes, use MSLAB
» Verify performance by simulating nominal and high load scenarios
36. Recap
3626 July 2013
Opower
» Save energy
» Make money
» Big (enough) data
Oren Benjamin
oren.benjamin@opower.com
We’re hiring.
http://opower.com/careers
Scott Kuehn
scott.kuehn@opower.com
37. Rate Calculation: Rate Engine
3726 July 2013
public interface RateEngine {
/**
*
Compute
the
cost
per
usage
read
for
the
given
Site
*
over
the
requested
time
interval.
*
@return
a
RateCalculation
containing
the
result
*/
RateCalculation calculate(Site site, List<UsageRead> usageReads);
}
38. Rate Calculation: Application Context
3826 July 2013
public class RateCalculationProducer extends KijiProducer {
private
ConfigurableApplicationContext
appContext;
private
RateEngine
rateEngine;
@Override
public
void
setup(KijiContext
context)
{
String
contextPath
=
getConf().get(CONTEXT_PATH_KEY);
appContext
=
new
XmlAppContext(contextPath);
rateEngine
=
appContext.getBean(RateEngine.class);
@Override
public
void
produce(KijiRowData
siteRowData,
…