HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

•Télécharger en tant que PPT, PDF•

0 j'aime•3,262 vues

Opower is a fast moving energy management SaaS company that collects sensor data from nearly all of the major utilities in the United States–meaning from more than 45 million American households–along with major utilities in 5 countries throughout Europe and AsiaPac. Opower manages more than 100 billion meter reads, ranging from high frequency power data (AMI), smart thermostats data, and weather data. Currently all data at Opower is stored in HBase or Hadoop (and is notably not security sensitive). This discussion will discuss Opower’s HBase architecture, highlight potential and current uses of data in HBase, share the vision of Opower’s future projects and directions, and reveal how Opower’s big data management has allowed the company to help its utility clients save enough energy to power a city of nearly 200,000 people and save utility customers more than $70 million since only 2008!

Technologie Business

HBase to Save the Planet

Alex Newman
posix4e@apache.org
Architect, Drawn to Scale
Strategic Advisor, Opower

My life with HBase

Drawn to
Factset Cloudera Opower
Scale

About Opower

Opower is a customer engagement
platform for the utility industry

About Opower

Home energy reports
Customized utility bills
Energy efficiency programs for utilities

About Opower

Opower runs on analytics
Analytics run on Hadoop + HBase

Opower analysis relies on data
from a variety of sources

» Electric Utility Usage » Thermostat » Weather » Gas Utility Usage
Data data data Data

Data Storage & 4
Shared Energy
Processing Signature
Repository
3 1 2

Disaggregation OPOWER
Algorithms Platform

Opower’s first architecture could
not support their analytic vision
MySQL
Scalability?
Performance?
Data integration?

Opower’s first architecture could
not support their analytic vision
Analytic workflow instead of
analytic apps:
SQL -> CSV -> R -> too little, too slow

Problem #1
Data Lake Cost

Usage AMI Regional AMI Sensor Data Data Lake

Problem #2
Slower and slower queries
Smart-grid-scale data
Lots of supporting data: weather, demographics, etc.

Problem #3
It was taking lots of “magic”
Intense analytics
Strange schemas
Segmented queries

Hadoop + HBase at Opower

Opower determined that they needed
an entirely new data architecture

Hadoop + HBase at Opower
Early success:
HBase AMI

What rocked

The analytics team loved it!

What sucked

Hard on the ops team – still trying to
grok it

What sucked
NoSchema p1.
Creating Schema
Managing MetaData
Schema <=> Performance

What sucked
No secondary index
Aggregation is slow (Rollup/OLAP)
Poor Client Performance

It would be better if only …

Developers were not forced to know
how the data is stored, indexed, etc.

It would be better if only …

There were nicer APIs and better
query languages (SQL?)

It would be better if only …

Version migrations were easy
Hierarchical Tables

It would be better if only …

Real-time tuning

It would be better if only …

Did I mention HA?

In summary

HBase has helped Opower achieve their analytic
vision
But they’ve still got a long way to go
HBase still has a long way to go

Questions?

Alex Newman
posix4e@apache.org
Architect, Drawn to Scale
Strategic Advisor, Opower

Recommandé

HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.

HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.

HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.

Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon

HBaseCon 2015 General Session: The Evolution of HBase @ BloombergHBaseCon

HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...Cloudera, Inc.

HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon

HBaseCon 2015 General Session: State of HBaseHBaseCon

Recommandé

HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.

HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.

HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.

Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon

HBaseCon 2015 General Session: The Evolution of HBase @ BloombergHBaseCon

HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...Cloudera, Inc.

HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon

HBaseCon 2015 General Session: State of HBaseHBaseCon

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon

HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack

HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack

HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.

MapR-DB Elasticsearch IntegrationMapR Technologies

Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...DataWorks Summit

HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...Michael Stack

HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon

HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresCloudera, Inc.

DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...DataStax

Hadoop and HBase @eBayDataWorks Summit

Data Evolution in HBaseHBaseCon

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...Cloudera, Inc.

A Survey of HBase Application ArchetypesHBaseCon

HBaseConAsia2018 Track3-3: HBase at China Life InsuranceMichael Stack

Keynote: The Future of Apache HBaseHBaseCon

HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack

HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit

A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data ...National Information Standards Organization (NISO)

April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...National Information Standards Organization (NISO)

Contenu connexe

Tendances

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon

HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack

HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack

HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.

MapR-DB Elasticsearch IntegrationMapR Technologies

Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...DataWorks Summit

HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...Michael Stack

HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon

HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresCloudera, Inc.

DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...DataStax

Hadoop and HBase @eBayDataWorks Summit

Data Evolution in HBaseHBaseCon

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...Cloudera, Inc.

A Survey of HBase Application ArchetypesHBaseCon

HBaseConAsia2018 Track3-3: HBase at China Life InsuranceMichael Stack

Keynote: The Future of Apache HBaseHBaseCon

HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack

HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit

Tendances (20)

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB

HBase Read High Availability Using Timeline-Consistent Region Replicas

HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark

HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase

HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

MapR-DB Elasticsearch Integration

Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...

HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...

HBaseCon 2015: State of HBase Docs and How to Contribute

HBaseConAsia2018 Track1-3: HBase at Xiaomi

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures

DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...

Hadoop and HBase @eBay

Data Evolution in HBase

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...

A Survey of HBase Application Archetypes

HBaseConAsia2018 Track3-3: HBase at China Life Insurance

Keynote: The Future of Apache HBase

HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...

HBase Global Indexing to support large-scale data ingestion at Uber

En vedette

A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data ...National Information Standards Organization (NISO)

April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...National Information Standards Organization (NISO)

IT서비스사업의 이해: SW CEO 아카데미 9차 강의Korea Advanced Institute of Science and Technology

HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...Cloudera, Inc.

HBaseCon 2013: 1500 JIRAs in 20 MinutesCloudera, Inc.

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.

HBaseCon 2013: Rebuilding for Scale on Apache HBaseCloudera, Inc.

HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!Cloudera, Inc.

HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.

HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.

HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon

HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARNHBaseCon

HBaseCon 2012 | Scaling GIS In Three ActsCloudera, Inc.

HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponCloudera, Inc.

HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.

HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.

Tales from the Cloudera FieldHBaseCon

Cross-Site BigTable using HBaseHBaseCon

HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.

HBaseCon 2012 | Building Mobile Infrastructure with HBaseCloudera, Inc.

En vedette (20)

A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data ...

April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...

IT서비스사업의 이해: SW CEO 아카데미 9차 강의

HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...

HBaseCon 2013: 1500 JIRAs in 20 Minutes

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...

HBaseCon 2013: Rebuilding for Scale on Apache HBase

HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!

HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase

HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...

HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase

HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN

HBaseCon 2012 | Scaling GIS In Three Acts

HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon

HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics

HBaseCon 2012 | HBase for the Worlds Libraries - OCLC

Tales from the Cloudera Field

Cross-Site BigTable using HBase

HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...

HBaseCon 2012 | Building Mobile Infrastructure with HBase

Similaire à HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.

How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.

Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.

Data Science Day New York: The Platform for Big DataCloudera, Inc.

The power of hadoop in cloud computingJoey Echeverria

Anexinet Big Data SolutionsMark Kromer

Introduction to HadoopOvidiu Dimulescu

Bi on Big Data - Strata 2016 in LondonDremio Corporation

Hadoop as data refinerySteve Loughran

Hadoop as Data Refinery - Steve LoughranJAX London

Using hadoop to expand data warehousingDataWorks Summit

Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta

Processing Big Datacwensel

Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services

Big Data Analysis Starts with RRevolution Analytics

Big Data and HPCNetApp

Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA

Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Cloudera, Inc.

Agile analytics applications on hadoopHortonworks

Similaire à HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower (20)

Business Intelligence and Data Analytics Revolutionized with Apache Hadoop

How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011

Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...

Data Science Day New York: The Platform for Big Data

The power of hadoop in cloud computing

Anexinet Big Data Solutions

Introduction to Hadoop

Bi on Big Data - Strata 2016 in London

Hadoop as data refinery

Hadoop as Data Refinery - Steve Loughran

Using hadoop to expand data warehousing

Introducing the Big Data Ecosystem with Caserta Concepts & Talend

Processing Big Data

Data & Analytics - Session 1 - Big Data Analytics

Big Data Analysis Starts with R

Big Data and HPC

Hadoop and NoSQL joining forces by Dale Kim of MapR

Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...

Agile analytics applications on hadoop

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.

2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.

Extending Cloudera SDX beyond the PlatformCloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx

Cloudera Data Impact Awards 2021 - Finalists

2020 Cloudera Data Impact Awards Finalists

Edc event vienna presentation 1 oct 2019

Machine Learning with Limited Labeled Data 4/3/19

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Introducing Cloudera DataFlow (CDF) 2.13.19

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19

Leveraging the cloud for analytics and machine learning 1.29.19

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Leveraging the Cloud for Big Data Analytics 12.11.18

Modern Data Warehouse Fundamentals Part 3

Modern Data Warehouse Fundamentals Part 2

Modern Data Warehouse Fundamentals Part 1

Extending Cloudera SDX beyond the Platform

Federated Learning: ML with Privacy on the Edge 11.15.18

Analyst Webinar: Doing a 180 on Customer 360

Build a modern platform for anti-money laundering 9.19.18

Introducing the data science sandbox as a service 8.30.18

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Scaling API-first – The story of a global engineering organizationRadu Cotescu

GenCyber Cyber Security Day PresentationMichael W. Hawkins

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Slack Application Development 101 Slidespraypatel2

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

A Year of the Servo Reboot: Where Are We Now?Igalia

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Scaling API-first – The story of a global engineering organization

GenCyber Cyber Security Day Presentation

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

08448380779 Call Girls In Friends Colony Women Seeking Men

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

CNv6 Instructor Chapter 6 Quality of Service

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Slack Application Development 101 Slides

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

🐬 The future of MySQL is Postgres 🐘

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Breaking the Kubernetes Kill Chain: Host Path Mount

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Automating Google Workspace (GWS) & more with Apps Script

What Are The Drone Anti-jamming Systems Technology?

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

A Year of the Servo Reboot: Where Are We Now?

HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

1. HBase to Save the Planet Alex Newman posix4e@apache.org Architect, Drawn to Scale Strategic Advisor, Opower

2. My life with HBase Drawn to Factset Cloudera Opower Scale

3. About Opower Opower is a customer engagement platform for the utility industry

4. About Opower Home energy reports Customized utility bills Energy efficiency programs for utilities

5. About Opower Opower runs on analytics Analytics run on Hadoop + HBase

6. Opower analysis relies on data from a variety of sources » Electric Utility Usage » Thermostat » Weather » Gas Utility Usage Data data data Data Data Storage & 4 Shared Energy Processing Signature Repository 3 1 2 Disaggregation OPOWER Algorithms Platform

7. Opower’s first architecture could not support their analytic vision MySQL Scalability? Performance? Data integration?

8. Opower’s first architecture could not support their analytic vision Analytic workflow instead of analytic apps: SQL -> CSV -> R -> too little, too slow

9. Problem #1 Data Lake Cost Usage AMI Regional AMI Sensor Data Data Lake

10. Problem #2 Slower and slower queries Smart-grid-scale data Lots of supporting data: weather, demographics, etc.

11. Problem #3 It was taking lots of “magic” Intense analytics Strange schemas Segmented queries

12. Hadoop + HBase at Opower Opower determined that they needed an entirely new data architecture

13. NexGen Architecture @ Opower

14. Hadoop + HBase at Opower Early success: HBase AMI

15. What rocked Endless, cheap scalability

16. What rocked The analytics team loved it!

17. What sucked Hard on the ops team – still trying to grok it

18. What sucked NoSchema p1. Creating Schema Managing MetaData Schema <=> Performance

19. What sucked HA Failover Snapshots

20. What sucked No secondary index Aggregation is slow (Rollup/OLAP) Poor Client Performance

21. It would be better if only … Developers were not forced to know how the data is stored, indexed, etc.

22. It would be better if only … There were nicer APIs and better query languages (SQL?)

23. It would be better if only … Version migrations were easy Hierarchical Tables

24. It would be better if only … Real-time tuning

25. It would be better if only … Did I mention HA?

26. In summary HBase has helped Opower achieve their analytic vision But they’ve still got a long way to go HBase still has a long way to go

27. Questions? Alex Newman posix4e@apache.org Architect, Drawn to Scale Strategic Advisor, Opower

Notes de l'éditeur

Name Email Address Title
WARNING THESE ARE MY WORDS, not FDS, Cloudera or OPower Factest 2005: - Maybe I was crazy to use it - Tens of databases 10 of query langagues, VMS moving towards commdity servers. Running into issues with scaling on environments like MySQL - They were used to code that crashed. In fact, I would say while I was on call, a service from one of the sites was down, at least once a week. Luckily they had redundancy in multiple sites, and multiple servers within those sites. The redundancy was added at a higher level, so generally, at least all of the times I remember, it was able to increase the availability and downtime wasn't actually an issue. - What was an issue was scale. - INteresting enough Hbase, even at that time, was a pretty highly available database. So what did they use it for - Time and Sales. This is the collection of all of the Quotes and Trades, for different securities. So to translate you put out quotes to buy or sell stocks at a certain price. If they overlapp, the echange registers a trade, and you just bought or sold a security. Not just stocks, but options and extremely high frequency data. - There was some value add on top of that, for calculating more complicated statistics on the fly through a home grown Web SASS thing - Cloudera: - Started off in kitchen focusing on building the packages that y’all know and love. When I entered it was all manual, when I left it was all automated. One could think of this as sortof like dev-opsie, meets, qa, meets release engineering, meets generic development - Moved into our first management tools team as a developer. Where we developed the cloudera manager. It was originally part of HUE and it became more springy. - Then I left Cloudera to be a founder in Drawn to Scale. We built a prototype and started pitching it for about 6 to 7 months. - While that was going on, I because the Lead Data Architect at OPower. And then more recently, after funding, I have returned to drawntoscale as a coder in the trenches, and have changed myself to a advisor to opower. The reason why I bring this up, is I have been working with HBase in production for about 5 years.
Opower helps people use energy more efficiently and ultimately save money on their energy bills.it vastly improves the overall customer experience by making energy use personally relevant. - Behavioral Science (Great marketing, understanding people, great hci) - Data Science (Analytics, Data Infastructure Teams) - Lobbying (Yep we do lobbying)
- OHow many of you get a bill - OPower White labeled websites. So this is the interface you probably use through your energy website to view how much power you use. Bill forecasting, etc. - Smart Thermostats - Gas and Electric - Social
- Analytics is used to understand who we should be targeting - Answering questions that our customers what answered. We can help them improve customer service, improve there marketing, etc. - Justifying our own existence. (Compliance)
- This is an old slide which doesn’t really include all the places we get data - Story about detecting broken thermostats
- But it had it’s up - Spring and MVC provided a very clear and systematic way for developers developer systems. - It was very easy to manage from an operations perspective.
- WE did this at FDS as well. Of course not with R, but specialized langauages. - IN fact our customers did as well, and they had a whole team of people to help customers do it.
So here is the data sizes we have, along with the costs with traditional hadoop systems. - We were a cisco shop but we ended up going with dell, mostly because of the 3.5 inch disks. It looks like cisco is wising up to this whole hadoop thing. - These numbers are for dell. So I think this is priced out assuming a 710, then a 810 and then a 910 for the RDBMS, and 510's for hadoop.
- A lot of this data just doesn’t work well with traditional databases. - An unnamed utility takes 3 days to mysqldump the ami data out. subsampling interpolation
- I should warn you, i drawn almost all of my drawings in xfig so if this isn’t clear I’m sorry. - Basically the utility data has to come in from a variety of different protocols, as we integrate into the utility pipeline. It then flows into hbase, it’s validated from hbase, and then imported into our existent workflow. - Some of that data, you could imagine for instance information about user is still stored in MySQL. - All of the data is in a HIVE data lake
All of our timeseries data in regards to high frequency data is being ported to being stored in HBase. Also soon things like bill forecasting, and a bunch of cool other stuff I probably should mention is being moved here. This includes data from the utilities, and data that users are enterring themselves. In additition thermostat data is moving here.
- We still need to improve effeciency - We are doubling the size of the cluster this year - We have a ton of room to grow.
- Having all of your data is a huge thing - Having a place to do m/r based R is great - No more running out of memory or being bounded to a single machine - Having a cheap scratch space
At cloudera i thought all we needed was cfengine, snmp and syslog. Frankly that would have made ops happy. But more and more I think we made the right decision and that these tools really aren’t the right answer. JuJu looks interesting. - cloudera of course built there own tool. - access and auth