SlideShare une entreprise Scribd logo
1  sur  45
HBase @ Salesforce
Lars Hofhansl
Architect, Father, Meditator,Aikido Blackbelt
http://hadoop-hbase.blogspot.com
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our
service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to
larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is
included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent
fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor
Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions
based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these
forward-looking statements.
Safe Harbor
Why HBase?
• SAN
• RDBMS
• Transactions
Zookeeper?
Commodity
Hardware?
HBase?
HDFS?Unstructured
Data?
A. Why HBase?
B. Interacting with the open source community
C. HBase at Salesforce
Size Matters*
New Salesforce customer:
•“How many rows do you have?”
•We will turn folks away if they have too many!
Data Storage is expensive:
•SAN storage
•Relational Database
•Too many rows  Too expensive
* In a relational world
What if in the future we:
… and have cheaper storage?
… and never need to ask again
about the number of rows?
… grow with the data by just
adding more machines?
(Disclaimer: no transactions, no joins, no 2nd’ary indexes, …)
(A quick note about) Relational Databases
• We love them. They are core to our infrastructure.
• SQL and NoSQL NoACID are complementary.
• (Almost) everything we do is SQL based (see Phoenix – the SQL layer for HBase.)
The Search - Requirements
• Consistent
– “Eventually consistent stores are 100% consistent 99% of the time” – Ian Varley
• Scalable
– No “features” impeding horizontal scaling
• Persistent
– Duh...?
• Key lookups
• Range lookups
• Open source (ASL great, GPLv2 OK, GPLv3/AGPL not acceptable)
Enter HBase
“A Sparse, Consistent, Distributed,
Multidimensional, Persistent, Sorted Map”
Salesforce and the HBase Community
To Fork or not to Fork – that is the question
Fork - pros
• Agility. No waiting for community review. Just get stuff done
• Freedom. Patches that might not be acceptable to the community
Fork - cons
• Lose out on community work
• Patches not useful to other parties
There is no right or wrong. It’s a matter of choice, taste, and requirements.
HBase Development @ Salesforce
• No fork of HBase.
• No fork of HBase.
• Internal HBase/HDFS branch for possible emergency fixes
• All fixes are cleaned and contributed back
• We switch to the next open source point release periodically
PMC member, 2 committers, release manager, contributors
HBASE-11042 HBASE-11040 HBASE-11037 HBASE-11030 HBASE-11029 HBASE-11024 HBASE-11022 HBASE-
11010 HBASE-10996 HBASE-10989 HBASE-10988 HBASE-10987 HBASE-10982 HBASE-10969 HBASE-10847
HBASE-10805 HBASE-10722 HBASE-10706 HBASE-10642 HBASE-10594 HBASE-10562 HBASE-10551
HBASE-10546 HBASE-10505 HBASE-10501 HBASE-10489 HBASE-10470 HBASE-10420 HBASE-10416
HBASE-10383 HBASE-10363 HBASE-10320 HBASE-10317 HBASE-10286 HBASE-10284 HBASE-10281
HBASE-10279 HBASE-10259 HBASE-10257 HBASE-10250 HBASE-10181 HBASE-10117 HBASE-10076
HBASE-10058 HBASE-10057 HBASE-10015 HBASE-9993 HBASE-9971 HBASE-9956 HBASE-9915 HBASE-
9865 HBASE-9834 HBASE-9807 HBASE-9799 HBASE-9789 HBASE-9778 HBASE-9751 HBASE-9749 HBASE-
9732 HBASE-9731 HBASE-9711 HBASE-9658 HBASE-9584 HBASE-9566 HBASE-9534 HBASE-9429 HBASE-
9428 HBASE-9377 HBASE-9356 HBASE-9344 HBASE-9301 HBASE-9266 HBASE-9231 HBASE-9221 HBASE-
9186 HBASE-9158 HBASE-9103 HBASE-9097 HBASE-9049 HBASE-8971 HBASE-8945 HBASE-8930 HBASE-
8912 HBASE-8858 HBASE-8809 HBASE-8767 HBASE-8702 HBASE-8698 HBASE-8684 HBASE-8671 HBASE-
8636 HBASE-8525 HBASE-8503 HBASE-8355 HBASE-8316 HBASE-8229 HBASE-8188 HBASE-8166 HBASE-
8151 HBASE-8110 HBASE-8108 HBASE-8055 HBASE-8008 HBASE-7999 HBASE-7947 HBASE-7945 HBASE-
7817 HBASE-7801 HBASE-7729 HBASE-7725 HBASE-7717 HBASE-7709 HBASE-7702 HBASE-7681 HBASE-
7617 HBASE-7602 HBASE-7578 HBASE-7550 HBASE-7499 HBASE-7497 HBASE-7483 HBASE-7466 HBASE-
7465 HBASE-7455 HBASE-7438 HBASE-7435 HBASE-7432 HBASE-7431 HBASE-7417 HBASE-7415 HBASE-
7371 HBASE-7336 HBASE-7293 HBASE-7279 HBASE-7270 HBASE-7252 HBASE-7240 HBASE-7215 HBASE-
7214 HBASE-7180 HBASE-7177 HBASE-7166 HBASE-7165 HBASE-7091 HBASE-7069 HBASE-7051 HBASE-
7047 HBASE-7021 HBASE-7010 HBASE-6996 HBASE-6974
PMC member, 2 committers, release manager, contributors
HBASE-6949 HBASE-6946 HBASE-6912 HBASE-6889 HBASE-6879 HBASE-6868 HBASE-6865 HBASE-6863
HBASE-6797 HBASE-6796 HBASE-6784 HBASE-6765 HBASE-6757 HBASE-6755 HBASE-6711 HBASE-6707
HBASE-6690 HBASE-6667 HBASE-6638 HBASE-6637 HBASE-6621 HBASE-6582 HBASE-6580 HBASE-6579
HBASE-6573 HBASE-6571 HBASE-6570 HBASE-6569 HBASE-6568 HBASE-6561 HBASE-6523 HBASE-6522
HBASE-6505 HBASE-6504 HBASE-6496 HBASE-6495 HBASE-6441 HBASE-6439 HBASE-6427 HBASE-6426
HBASE-6421 HBASE-6406 HBASE-6355 HBASE-6347 HBASE-6326 HBASE-6296 HBASE-6293 HBASE-6291
HBASE-6178 HBASE-6138 HBASE-6113 HBASE-6112 HBASE-6110 HBASE-6087 HBASE-5961 HBASE-5955
HBASE-5909 HBASE-5884 HBASE-5871 HBASE-5865 HBASE-5782 HBASE-5775 HBASE-5774 HBASE-5682
HBASE-5670 HBASE-5659 HBASE-5641 HBASE-5609 HBASE-5604 HBASE-5574 HBASE-5569 HBASE-5548
HBASE-5547 HBASE-5541 HBASE-5526 HBASE-5523 HBASE-5509 HBASE-5497 HBASE-5460 HBASE-5455
HBASE-5440 HBASE-5431 HBASE-5368 HBASE-5350 HBASE-5348 HBASE-5318 HBASE-5304 HBASE-5266
HBASE-5229 HBASE-5203 HBASE-5118 HBASE-5096 HBASE-5088 HBASE-5084 HBASE-5070 HBASE-5058
HBASE-5005 HBASE-5001 HBASE-4998 HBASE-4981 HBASE-4979 HBASE-4945 HBASE-4886 HBASE-4874
HBASE-4870 HBASE-4838 HBASE-4805 HBASE-4800 HBASE-4691 HBASE-4682 HBASE-4673 HBASE-4657
HBASE-4626 HBASE-4605 HBASE-4583 HBASE-4561 HBASE-4559 HBASE-4556 HBASE-4536 HBASE-4517
HBASE-4488 HBASE-4454 HBASE-4439 HBASE-4404 HBASE-4387 HBASE-4347 HBASE-4336 HBASE-4335
HBASE-4334 HBASE-4331 HBASE-4296 HBASE-4283 HBASE-4263 HBASE-4242 HBASE-4241 HBASE-4197
HBASE-4178 HBASE-4171 HBASE-4102 HBASE-4071 HBASE-3661 HBASE-3645 HBASE-3584 HBASE-3443
HBASE-3433 HBASE-3387 HBASE-2947 HBASE-2196 HBASE-2195 HDFS-3979 HDFS-744
Managing HBase 0.94
Established monthly release train for 0.94
Contributed >300 of features, bug fixes, perf improvements
Reviewed 1000’s of open source patches
Committed 100’s of patches
Open Sourced Apache Phoenix – SQL skin on HBase
Salesforce High-level Architecture
Salesforce *is* a database
Salesforce is a Database
Query Parser
Query (SQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
Evaluation Plan
Query Plan Evaluator
System
Catalog
Database
Stats
Tables
Columns
Indexes
Salesforce is a Database
Query Parser
Query (SOQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
System
Catalog
Oracle
Hinted Oracle SQL
Database
Stats
Objects
Fields
Indexes
Salesforce is multi tenant
…pod
Tenant A-D
pod
Tenant E-H
pod
Tenant I-O
pod = a database instance
•Oracle RAC
•AppServers
•Blob store servers
•Search servers
•Shared SAN storage
•SAN replication for DR
App
Server
App
Server
App
Server
App
Server
…
Oracle
Node
Oracle
Node
Oracle
Node
Oracle
Node…
Oracle RAC cluster
Primary Site
Secondary Site
SAN replication
SAN
SAN
SQL/JDBC
Finally: HBase @ Salesforce
Oracle
Hinted Oracle SQL
Query Parser
Query (SOQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
System
Catalog
Database
Stats
Objects
Fields
Indexes
1. External Objects 2. Phoenix SQL
HBaseHBaseHBaseHBase
Where does HBase Fit?
Where does HBase Fit?
•Separate HBase per pod (close to 50 clusters)
•Logically co-located with Oracle
•Small clusters striped across five racks
•Each cluster’s master service on a different rack
•Identical cluster for DR
App
Server
App
Server
App
Server
App
Server
…
Oracle
Node
Oracle
Node
HBase
Node
HBase
Node…
Oracle Cluster
HBase
Node
HBase
Node
HBase
Node …
Primary Site
Secondary Site
DR HBase Cluster
Decentralized
HBase
Replication
SQL/JDBC
via Phoenix
HBase Cluster
…
SAN
SAN
Use Cases
1. Audit Trails (Entity History)
• Identity managed in RDBMS
• Indexed in HBase (Phoenix indexes)
• Historical, immutable data only
• No need to reason about updates, split identities, and transactions
2. Archiving (Data Lifecycle Management)
• Objects (rows) moved to HBase
• Identity managed in HBase after move
• Data immutable in HBase
• No Transactions
3. Live data in HBase (BigObjects)
• Mutable data (possibly)
• Everything managed in HBase
• Still no Transactions, yet
• Platform for other team to use
Merrill Lynch Rationalization Data Governance, Audit & Archive
• First Salesforce Enterprise Customer
• On PlatformArchival compelling versus On Premise
Solution from Informatica
• Retention Requirements for 7 Years
Merrill Lynch
“Data Audit, Governance & Lifecycle management is
critical for Merrill for the entire banking & financial
industry has become a benchmark requirement
Heating, ventilation, and air-conditioning in the EU
• Top 10 Platform Users
• Subject to highly variable data governance and
retention requirements
• Significant SAP footprint driving business rules –
need to connect that to Salesforce data for archival
and data retention needs
• Massive service workforce generates significant data
processing challenges
“The Salesforce.com Platform roadmap for Data Archive is
critical for future data management needs”
MichaelRoehr, CTO Vailliant
BMW Enriches Their Customer Perspective
• Sales Cloud available across all German Dealership
Franchises
• All customer data subject stringent & government
mandated protection, audit & retention
• Correlations with Car Builder App data enables more
contextual customer interactions
• Car Telemetry, used correctly help refine product
evolution and customer needs alignment
“Data driven customer engagement is a
key driver for our enhance customer
experience
System Of Record (SOR)
SOR = HA + DR + Backup + M&M
+ Security
Highly Available, Disaster Recovery
• Five peer Zookeeper Quorum
• Five Quorum Journals (for fs edits)
• Five HMasters
• Three NameNodes (yes, three, we made a patch to run more than one standby)
• HBase Replication to identical hot standby pod in a different data center
– In the event of a disaster we fail a complete pod to the secondary site
• Weekly automated, unattended rolling restarts
Replication
Backup High-level Architecture
Primary pod
HBase 48h
HDFS
Backup
per tenant
DR pod
HBase 48h
HDFS
Merkle Tree
Verification
Backup
per tenant
Monitoring & Management (M&M)
• Nagios alerts
• Trending via OpenTSDB.
Custom UI on top the time series data.
• Rolling upgrades
– Eventually scheduled and unattended
• Absolutely no unscheduled downtime.
Not even during a rack failure.
A. Why HBase?
B. Interacting with the open source community
C. HBase at Salesforce
Lars Hofhansl
http://hadoop-hbase.blogspot.com

Contenu connexe

Tendances

Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Kai Wähner
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
Vinay Kumar Chella
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 

Tendances (20)

Serverless computing - Build and run applications without thinking about servers
Serverless computing - Build and run applications without thinking about serversServerless computing - Build and run applications without thinking about servers
Serverless computing - Build and run applications without thinking about servers
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
IBM Cloud Paks - IBM Cloud
IBM Cloud Paks - IBM CloudIBM Cloud Paks - IBM Cloud
IBM Cloud Paks - IBM Cloud
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Getting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless ComputingGetting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless Computing
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Serverless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about serversServerless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about servers
 
Security Architectures on AWS
Security Architectures on AWSSecurity Architectures on AWS
Security Architectures on AWS
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS Cloud
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Intro to AI & ML at Amazon
Intro to AI & ML at AmazonIntro to AI & ML at Amazon
Intro to AI & ML at Amazon
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 

En vedette

HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
larsgeorge
 

En vedette (20)

HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQL
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Durable Streaming and Enterprise Messaging
Durable Streaming and Enterprise MessagingDurable Streaming and Enterprise Messaging
Durable Streaming and Enterprise Messaging
 
HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
 
Salesforce External Objects for Big Data
Salesforce External Objects for Big DataSalesforce External Objects for Big Data
Salesforce External Objects for Big Data
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Salesforce for Nonprofits: Turn Big Data into Social Change
Salesforce for Nonprofits: Turn Big Data into Social ChangeSalesforce for Nonprofits: Turn Big Data into Social Change
Salesforce for Nonprofits: Turn Big Data into Social Change
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBase
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Analyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObjectAnalyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObject
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 

Similaire à Hbase at Salesforce.com

Realtime Apps with Node.js, Heroku, and Force.com Streaming
Realtime Apps with Node.js, Heroku, and Force.com StreamingRealtime Apps with Node.js, Heroku, and Force.com Streaming
Realtime Apps with Node.js, Heroku, and Force.com Streaming
Salesforce Developers
 

Similaire à Hbase at Salesforce.com (20)

developer-burnout.pdf
developer-burnout.pdfdeveloper-burnout.pdf
developer-burnout.pdf
 
Winter 14 Release Developer Preview
Winter 14 Release Developer PreviewWinter 14 Release Developer Preview
Winter 14 Release Developer Preview
 
Data hero dream ole19
Data hero dream ole19Data hero dream ole19
Data hero dream ole19
 
Moving Your ERP to the Cloud
Moving Your ERP to the CloudMoving Your ERP to the Cloud
Moving Your ERP to the Cloud
 
Forces of the Future That's Now - Peter Coffee at SoTeC 2015
Forces of the Future That's Now - Peter Coffee at SoTeC 2015Forces of the Future That's Now - Peter Coffee at SoTeC 2015
Forces of the Future That's Now - Peter Coffee at SoTeC 2015
 
Using Apex for REST Integration
Using Apex for REST IntegrationUsing Apex for REST Integration
Using Apex for REST Integration
 
Introduction to Apex Triggers
Introduction to Apex TriggersIntroduction to Apex Triggers
Introduction to Apex Triggers
 
Spice up Your Internal Portal with Visualforce and Twitter Bootstrap
Spice up Your Internal Portal with Visualforce and Twitter BootstrapSpice up Your Internal Portal with Visualforce and Twitter Bootstrap
Spice up Your Internal Portal with Visualforce and Twitter Bootstrap
 
Realtime Apps with Node.js, Heroku, and Force.com Streaming
Realtime Apps with Node.js, Heroku, and Force.com StreamingRealtime Apps with Node.js, Heroku, and Force.com Streaming
Realtime Apps with Node.js, Heroku, and Force.com Streaming
 
再考PaaS 〜 Heroku最新情報で考える、2017年のPaaS選択基準 〜
再考PaaS 〜 Heroku最新情報で考える、2017年のPaaS選択基準 〜再考PaaS 〜 Heroku最新情報で考える、2017年のPaaS選択基準 〜
再考PaaS 〜 Heroku最新情報で考える、2017年のPaaS選択基準 〜
 
Forcing Functions: Reconceiving Everything - Peter Coffee at AITP San Diego C...
Forcing Functions: Reconceiving Everything - Peter Coffee at AITP San Diego C...Forcing Functions: Reconceiving Everything - Peter Coffee at AITP San Diego C...
Forcing Functions: Reconceiving Everything - Peter Coffee at AITP San Diego C...
 
10 Best Practices using Flow - Darrell DeVeaux
10 Best Practices using Flow - Darrell DeVeaux10 Best Practices using Flow - Darrell DeVeaux
10 Best Practices using Flow - Darrell DeVeaux
 
Operationalizing Big Data as a Service
Operationalizing Big Data as a ServiceOperationalizing Big Data as a Service
Operationalizing Big Data as a Service
 
Df14 Building Machine Learning Systems with Apex
Df14 Building Machine Learning Systems with ApexDf14 Building Machine Learning Systems with Apex
Df14 Building Machine Learning Systems with Apex
 
Data Democracy: Use Lightning Connect & Heroku to Visualize any Data, Anywhere
Data Democracy: Use Lightning Connect & Heroku to Visualize any Data, AnywhereData Democracy: Use Lightning Connect & Heroku to Visualize any Data, Anywhere
Data Democracy: Use Lightning Connect & Heroku to Visualize any Data, Anywhere
 
Forcelandia 2016 Wave App Development
Forcelandia 2016   Wave App DevelopmentForcelandia 2016   Wave App Development
Forcelandia 2016 Wave App Development
 
Docker on Heroku のはじめ方
Docker on Heroku のはじめ方Docker on Heroku のはじめ方
Docker on Heroku のはじめ方
 
Finding relevant results faster with Elasticsearch
Finding relevant results faster with ElasticsearchFinding relevant results faster with Elasticsearch
Finding relevant results faster with Elasticsearch
 
Doc is Dead! How Walkthroughs Changed Salesforce's Content Strategy
Doc is Dead! How Walkthroughs Changed Salesforce's Content StrategyDoc is Dead! How Walkthroughs Changed Salesforce's Content Strategy
Doc is Dead! How Walkthroughs Changed Salesforce's Content Strategy
 
Loading Data into the Analytics Cloud with Apex
Loading Data into the Analytics Cloud with ApexLoading Data into the Analytics Cloud with Apex
Loading Data into the Analytics Cloud with Apex
 

Plus de Salesforce Engineering

Plus de Salesforce Engineering (20)

Locker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With WebpackLocker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With Webpack
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the CloudTechniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the Cloud
 
Predictive System Performance Data Analysis
Predictive System Performance Data AnalysisPredictive System Performance Data Analysis
Predictive System Performance Data Analysis
 
Apache HBase State of the Project
Apache HBase State of the ProjectApache HBase State of the Project
Apache HBase State of the Project
 
Hit the Trail with Trailhead
Hit the Trail with TrailheadHit the Trail with Trailhead
Hit the Trail with Trailhead
 
HBase/PHOENIX @ Scale
HBase/PHOENIX @ ScaleHBase/PHOENIX @ Scale
HBase/PHOENIX @ Scale
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Containers and Security for DevOps
Containers and Security for DevOpsContainers and Security for DevOps
Containers and Security for DevOps
 
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already HaveAspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already Have
 
Monitoring @ Scale in Salesforce
Monitoring @ Scale in SalesforceMonitoring @ Scale in Salesforce
Monitoring @ Scale in Salesforce
 
Performance Tuning with XHProf
Performance Tuning with XHProfPerformance Tuning with XHProf
Performance Tuning with XHProf
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
 
Implementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 MilesImplementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 Miles
 
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief OverviewSalesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
 
Koober Preduction IO Presentation
Koober Preduction IO PresentationKoober Preduction IO Presentation
Koober Preduction IO Presentation
 
Finding Security Issues Fast!
Finding Security Issues Fast!Finding Security Issues Fast!
Finding Security Issues Fast!
 
Microservices
MicroservicesMicroservices
Microservices
 
Global State Management of Micro Services
Global State Management of Micro ServicesGlobal State Management of Micro Services
Global State Management of Micro Services
 
Apache BookKeeper Distributed Store- a Salesforce use case
Apache BookKeeper Distributed Store- a Salesforce use caseApache BookKeeper Distributed Store- a Salesforce use case
Apache BookKeeper Distributed Store- a Salesforce use case
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Hbase at Salesforce.com

  • 1. HBase @ Salesforce Lars Hofhansl Architect, Father, Meditator,Aikido Blackbelt http://hadoop-hbase.blogspot.com
  • 2. Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements. Safe Harbor
  • 3. Why HBase? • SAN • RDBMS • Transactions
  • 5. A. Why HBase? B. Interacting with the open source community C. HBase at Salesforce
  • 6. Size Matters* New Salesforce customer: •“How many rows do you have?” •We will turn folks away if they have too many! Data Storage is expensive: •SAN storage •Relational Database •Too many rows  Too expensive * In a relational world
  • 7. What if in the future we: … and have cheaper storage? … and never need to ask again about the number of rows? … grow with the data by just adding more machines? (Disclaimer: no transactions, no joins, no 2nd’ary indexes, …)
  • 8. (A quick note about) Relational Databases • We love them. They are core to our infrastructure. • SQL and NoSQL NoACID are complementary. • (Almost) everything we do is SQL based (see Phoenix – the SQL layer for HBase.)
  • 9. The Search - Requirements • Consistent – “Eventually consistent stores are 100% consistent 99% of the time” – Ian Varley • Scalable – No “features” impeding horizontal scaling • Persistent – Duh...? • Key lookups • Range lookups • Open source (ASL great, GPLv2 OK, GPLv3/AGPL not acceptable)
  • 10. Enter HBase “A Sparse, Consistent, Distributed, Multidimensional, Persistent, Sorted Map”
  • 11. Salesforce and the HBase Community
  • 12. To Fork or not to Fork – that is the question Fork - pros • Agility. No waiting for community review. Just get stuff done • Freedom. Patches that might not be acceptable to the community Fork - cons • Lose out on community work • Patches not useful to other parties There is no right or wrong. It’s a matter of choice, taste, and requirements.
  • 13. HBase Development @ Salesforce • No fork of HBase. • No fork of HBase. • Internal HBase/HDFS branch for possible emergency fixes • All fixes are cleaned and contributed back • We switch to the next open source point release periodically
  • 14. PMC member, 2 committers, release manager, contributors HBASE-11042 HBASE-11040 HBASE-11037 HBASE-11030 HBASE-11029 HBASE-11024 HBASE-11022 HBASE- 11010 HBASE-10996 HBASE-10989 HBASE-10988 HBASE-10987 HBASE-10982 HBASE-10969 HBASE-10847 HBASE-10805 HBASE-10722 HBASE-10706 HBASE-10642 HBASE-10594 HBASE-10562 HBASE-10551 HBASE-10546 HBASE-10505 HBASE-10501 HBASE-10489 HBASE-10470 HBASE-10420 HBASE-10416 HBASE-10383 HBASE-10363 HBASE-10320 HBASE-10317 HBASE-10286 HBASE-10284 HBASE-10281 HBASE-10279 HBASE-10259 HBASE-10257 HBASE-10250 HBASE-10181 HBASE-10117 HBASE-10076 HBASE-10058 HBASE-10057 HBASE-10015 HBASE-9993 HBASE-9971 HBASE-9956 HBASE-9915 HBASE- 9865 HBASE-9834 HBASE-9807 HBASE-9799 HBASE-9789 HBASE-9778 HBASE-9751 HBASE-9749 HBASE- 9732 HBASE-9731 HBASE-9711 HBASE-9658 HBASE-9584 HBASE-9566 HBASE-9534 HBASE-9429 HBASE- 9428 HBASE-9377 HBASE-9356 HBASE-9344 HBASE-9301 HBASE-9266 HBASE-9231 HBASE-9221 HBASE- 9186 HBASE-9158 HBASE-9103 HBASE-9097 HBASE-9049 HBASE-8971 HBASE-8945 HBASE-8930 HBASE- 8912 HBASE-8858 HBASE-8809 HBASE-8767 HBASE-8702 HBASE-8698 HBASE-8684 HBASE-8671 HBASE- 8636 HBASE-8525 HBASE-8503 HBASE-8355 HBASE-8316 HBASE-8229 HBASE-8188 HBASE-8166 HBASE- 8151 HBASE-8110 HBASE-8108 HBASE-8055 HBASE-8008 HBASE-7999 HBASE-7947 HBASE-7945 HBASE- 7817 HBASE-7801 HBASE-7729 HBASE-7725 HBASE-7717 HBASE-7709 HBASE-7702 HBASE-7681 HBASE- 7617 HBASE-7602 HBASE-7578 HBASE-7550 HBASE-7499 HBASE-7497 HBASE-7483 HBASE-7466 HBASE- 7465 HBASE-7455 HBASE-7438 HBASE-7435 HBASE-7432 HBASE-7431 HBASE-7417 HBASE-7415 HBASE- 7371 HBASE-7336 HBASE-7293 HBASE-7279 HBASE-7270 HBASE-7252 HBASE-7240 HBASE-7215 HBASE- 7214 HBASE-7180 HBASE-7177 HBASE-7166 HBASE-7165 HBASE-7091 HBASE-7069 HBASE-7051 HBASE- 7047 HBASE-7021 HBASE-7010 HBASE-6996 HBASE-6974
  • 15. PMC member, 2 committers, release manager, contributors HBASE-6949 HBASE-6946 HBASE-6912 HBASE-6889 HBASE-6879 HBASE-6868 HBASE-6865 HBASE-6863 HBASE-6797 HBASE-6796 HBASE-6784 HBASE-6765 HBASE-6757 HBASE-6755 HBASE-6711 HBASE-6707 HBASE-6690 HBASE-6667 HBASE-6638 HBASE-6637 HBASE-6621 HBASE-6582 HBASE-6580 HBASE-6579 HBASE-6573 HBASE-6571 HBASE-6570 HBASE-6569 HBASE-6568 HBASE-6561 HBASE-6523 HBASE-6522 HBASE-6505 HBASE-6504 HBASE-6496 HBASE-6495 HBASE-6441 HBASE-6439 HBASE-6427 HBASE-6426 HBASE-6421 HBASE-6406 HBASE-6355 HBASE-6347 HBASE-6326 HBASE-6296 HBASE-6293 HBASE-6291 HBASE-6178 HBASE-6138 HBASE-6113 HBASE-6112 HBASE-6110 HBASE-6087 HBASE-5961 HBASE-5955 HBASE-5909 HBASE-5884 HBASE-5871 HBASE-5865 HBASE-5782 HBASE-5775 HBASE-5774 HBASE-5682 HBASE-5670 HBASE-5659 HBASE-5641 HBASE-5609 HBASE-5604 HBASE-5574 HBASE-5569 HBASE-5548 HBASE-5547 HBASE-5541 HBASE-5526 HBASE-5523 HBASE-5509 HBASE-5497 HBASE-5460 HBASE-5455 HBASE-5440 HBASE-5431 HBASE-5368 HBASE-5350 HBASE-5348 HBASE-5318 HBASE-5304 HBASE-5266 HBASE-5229 HBASE-5203 HBASE-5118 HBASE-5096 HBASE-5088 HBASE-5084 HBASE-5070 HBASE-5058 HBASE-5005 HBASE-5001 HBASE-4998 HBASE-4981 HBASE-4979 HBASE-4945 HBASE-4886 HBASE-4874 HBASE-4870 HBASE-4838 HBASE-4805 HBASE-4800 HBASE-4691 HBASE-4682 HBASE-4673 HBASE-4657 HBASE-4626 HBASE-4605 HBASE-4583 HBASE-4561 HBASE-4559 HBASE-4556 HBASE-4536 HBASE-4517 HBASE-4488 HBASE-4454 HBASE-4439 HBASE-4404 HBASE-4387 HBASE-4347 HBASE-4336 HBASE-4335 HBASE-4334 HBASE-4331 HBASE-4296 HBASE-4283 HBASE-4263 HBASE-4242 HBASE-4241 HBASE-4197 HBASE-4178 HBASE-4171 HBASE-4102 HBASE-4071 HBASE-3661 HBASE-3645 HBASE-3584 HBASE-3443 HBASE-3433 HBASE-3387 HBASE-2947 HBASE-2196 HBASE-2195 HDFS-3979 HDFS-744
  • 17. Established monthly release train for 0.94
  • 18. Contributed >300 of features, bug fixes, perf improvements
  • 19. Reviewed 1000’s of open source patches
  • 21. Open Sourced Apache Phoenix – SQL skin on HBase
  • 23. Salesforce *is* a database
  • 24. Salesforce is a Database Query Parser Query (SQL) Parsed Query Query Optimizer Plan Generator Plan Cost Estimator Evaluation Plan Query Plan Evaluator System Catalog Database Stats Tables Columns Indexes
  • 25. Salesforce is a Database Query Parser Query (SOQL) Parsed Query Query Optimizer Plan Generator Plan Cost Estimator System Catalog Oracle Hinted Oracle SQL Database Stats Objects Fields Indexes
  • 28. pod = a database instance •Oracle RAC •AppServers •Blob store servers •Search servers •Shared SAN storage •SAN replication for DR App Server App Server App Server App Server … Oracle Node Oracle Node Oracle Node Oracle Node… Oracle RAC cluster Primary Site Secondary Site SAN replication SAN SAN SQL/JDBC
  • 29. Finally: HBase @ Salesforce
  • 30. Oracle Hinted Oracle SQL Query Parser Query (SOQL) Parsed Query Query Optimizer Plan Generator Plan Cost Estimator System Catalog Database Stats Objects Fields Indexes 1. External Objects 2. Phoenix SQL HBaseHBaseHBaseHBase Where does HBase Fit?
  • 31. Where does HBase Fit? •Separate HBase per pod (close to 50 clusters) •Logically co-located with Oracle •Small clusters striped across five racks •Each cluster’s master service on a different rack •Identical cluster for DR App Server App Server App Server App Server … Oracle Node Oracle Node HBase Node HBase Node… Oracle Cluster HBase Node HBase Node HBase Node … Primary Site Secondary Site DR HBase Cluster Decentralized HBase Replication SQL/JDBC via Phoenix HBase Cluster … SAN SAN
  • 33. 1. Audit Trails (Entity History) • Identity managed in RDBMS • Indexed in HBase (Phoenix indexes) • Historical, immutable data only • No need to reason about updates, split identities, and transactions
  • 34. 2. Archiving (Data Lifecycle Management) • Objects (rows) moved to HBase • Identity managed in HBase after move • Data immutable in HBase • No Transactions
  • 35. 3. Live data in HBase (BigObjects) • Mutable data (possibly) • Everything managed in HBase • Still no Transactions, yet • Platform for other team to use
  • 36. Merrill Lynch Rationalization Data Governance, Audit & Archive • First Salesforce Enterprise Customer • On PlatformArchival compelling versus On Premise Solution from Informatica • Retention Requirements for 7 Years Merrill Lynch “Data Audit, Governance & Lifecycle management is critical for Merrill for the entire banking & financial industry has become a benchmark requirement
  • 37. Heating, ventilation, and air-conditioning in the EU • Top 10 Platform Users • Subject to highly variable data governance and retention requirements • Significant SAP footprint driving business rules – need to connect that to Salesforce data for archival and data retention needs • Massive service workforce generates significant data processing challenges “The Salesforce.com Platform roadmap for Data Archive is critical for future data management needs” MichaelRoehr, CTO Vailliant
  • 38. BMW Enriches Their Customer Perspective • Sales Cloud available across all German Dealership Franchises • All customer data subject stringent & government mandated protection, audit & retention • Correlations with Car Builder App data enables more contextual customer interactions • Car Telemetry, used correctly help refine product evolution and customer needs alignment “Data driven customer engagement is a key driver for our enhance customer experience
  • 39. System Of Record (SOR) SOR = HA + DR + Backup + M&M + Security
  • 40.
  • 41. Highly Available, Disaster Recovery • Five peer Zookeeper Quorum • Five Quorum Journals (for fs edits) • Five HMasters • Three NameNodes (yes, three, we made a patch to run more than one standby) • HBase Replication to identical hot standby pod in a different data center – In the event of a disaster we fail a complete pod to the secondary site • Weekly automated, unattended rolling restarts
  • 42. Replication Backup High-level Architecture Primary pod HBase 48h HDFS Backup per tenant DR pod HBase 48h HDFS Merkle Tree Verification Backup per tenant
  • 43. Monitoring & Management (M&M) • Nagios alerts • Trending via OpenTSDB. Custom UI on top the time series data. • Rolling upgrades – Eventually scheduled and unattended • Absolutely no unscheduled downtime. Not even during a rack failure.
  • 44. A. Why HBase? B. Interacting with the open source community C. HBase at Salesforce

Notes de l'éditeur

  1. Spent time with StumbleUpon, Facebook, many others. This is a great community.
  2. Salesforce is seeing increasing change of center of gravity of customer data.Driving this forward across verticals such as Banking & Finserv requires data audit driven by post 2008 regularity requirements and Sar-Box requirements. As this data generated in a transactional environment we use HBase as our historical and immutable storage. 
  3. Their use of the  Salesforce.com platform to drive their entire business keeps to keep their dynamic and highly work force mobile in touch with their data.Given their operating environment in Germany they are required to deliver complete data audit and use Field History for this. They also are required to keep all customer data for at least 15 years which is why Archive is so key for them.
  4. Across Germany we've had a successful deployment in each franchise to establish new base lines in customer interactions with BMW customers, leases and service interactions. Looking beyond this usecase the capability of marrying together the customer data generated for the BMW Car Builder application and cleansed and anonymizedtelemetrics data is pushing Salesforce to deliver the concepts and tools to allow BMW to absorb the full spectrum of their customer event data stream, and take business actions on it.Imagine how I would feel as a prospective customer if I walked into a dealership and they have a more informed knowledge of who I am and my likely preferences. We are using the notion of BigObjects to absorb, store and act on the data that is behind the Internet of Customers.