SlideShare une entreprise Scribd logo
1  sur  88
1© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
How to use this presentation
• Covered topics: Accumulo architecture, operational maintenance,
fault handling
• Intended Audience: Developers, supporters, PMs who are
conversant in multi-component systems, i.e. involved in web
services.
• Presumes familiarity with RDBMS
• Expected running time: 40 - 60 minutes
• License: CC-BY-SA 2.0
• Please let me know if you find it useful and what it could use:
busbey@cloudera.com
Introduction to
Apache Accumulo
Scaling a web application made easier
Sean Busbey // Software Engineer
3© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Let’s talk about Apache Accumulo…
4© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
But in the context of a specific use case
•I really like technology that solves a
problem.
•Keep in mind that this won’t be
exhaustive.
•YMMV, proof-of-concepts with metrics
are better than slides.
5© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Who am I?
• Apache Accumulo PMC
• Apache HBase committer
• Software Engineer on Cloudera’s storage team
6© 2015 Cloudera licensed CC-BY-SA 2.0
That is to say, I
work for a vendor
and no longer have
operational scale
problems of my
own.
We’ll focus on an
application that
enables
conversations
centered on cute
cats.
8© 2015 Cloudera licensed CC-BY-SA 2.0
9© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Simple sharing model built with privacy
controls
•User defines a group that may see their
posting
•User posts a picture to a given group
•Members of the group may write short
messages
10© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Straight forward web architecture
11© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Relational Data Model
Will map user names to
identifiers used elsewhere.
Will track ownership and
descriptive name.
Will allow users to add and
remove members.
User table Group table Group membership
table
12© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Relational Data Model
Tracks distribution group,
owner, and topical image.
Individual comments from
users.
Topic table Comment table
13© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
First growth: robustness
14© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
First growth: robustness
15© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Second growth: application scale out
16© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Scaling reads: what goes into this page?
17© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Database reads eventually become a
bottleneck
18© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Scale by de-normalizing in favor of reads
19© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Change to writes - original
20© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Change to writes – de-normalized
Generally known
as the fan-out
pattern.
21© 2015 Cloudera licensed CC-BY-SA 2.0
22© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
The trick is to not get crushed by the writes
•Each poster now does a write for each
member of the group a post goes to.
•Removing access is now a much larger
delete query.
•Most databases are geared toward few
writes and many reads; are we
screwed?
23© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Recall our access pattern
Basically one of
these consumer
boxes.
24© 2015 Cloudera licensed CC-BY-SA 2.0
25© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Lines up very well with sharding
•Divide the query space up by e.g. a
hash of user id into n shards.
•Store a copy of table on each shard,
but just for user ids that hash to that
shard.
•Reads and writes are spread across
instances.
26© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Database shards Layout
27© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
What were the nice-to-haves for the RDBMS
again?
• No longer leveraging relational data model.
• Now running, backing up, and failing over num shards number of
database instances.
• Robustness in a shard has to be managed.
• Sharding is essentially static; adding more resources with growth still
painful.
28© 2015 Cloudera licensed CC-BY-SA 2.0
Now we have some
context for
Accumulo.
Our goal is to end up with less operational overhead.
29© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™
sorted, distributed
key/value store is a robust,
scalable, high
performance data storage
and retrieval system.”
Accumulo PMC via https://accumulo.apache.org/
30© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Accumulo-based App Layout
31© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™
sorted, distributed
key/value store is a robust,
scalable, high
performance data storage
and retrieval system.”
Accumulo PMC via https://accumulo.apache.org/
32© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
In Accumulo, you address cells rather than
records
Key Valu
e
33© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Keys are multi-dimensional
Key Valu
e
Ro
w
Column Tim
e
34© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Keys are multi-dimensional
Key Valu
e
Ro
w
Column Tim
eFamily Qualifier Visibility
35© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Accumulo doesn’t assume a schema
•All key and value components, save time, are
byte[]
•The application is responsible for
serialization
•Common to use different serialization for the
values in different columns.
36© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Mapping records to cells
•Treat a row as a database
• Essentially each column is a record field
•Treat each cell as a database record
• Need to uniquely identify each record
• Useful if you generally need the whole row and not
a subset of columns
• Can then treat each row as a shard of database
records.
37© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Let’s use a concrete example.
38© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Already know our reads are within a shard.
39© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Mapping our data into cells
Key Value
Row Column Family Column Qualifier Visibility author, image url,
and comment
reader id discussion id comment order group id
40© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
We end up with something close to our
original.
41© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Note the use of visibility
42© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Visibility enforcement
•At scan time, our application will pass in the
groups for the current user.
•Accumulo will filter any cells that don’t match
those groups.
• Group removal is a simple update in the group
management system again.
43© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Sparse column storage
•We are creating lots of columns: per
discussion per group member.
•Accumulo only stores columns that exist in a
given row.
44© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™
sorted, distributed
key/value store is a robust,
scalable, high
performance data storage
and retrieval system.”
Accumulo PMC via https://accumulo.apache.org/
45© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
All cells sorted according to key
• Total ordering based on lex-sort of raw byte arrays
of key components.
• Time is sorted most-recent-first
• Reads are done on a contiguous range of cells.
46© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
When sorted our data looks like this….
47© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
And the scan for a page is roughly…
48© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Lexicoders
• Turning different kinds of data into sortable bytes is painful
• Accumulo ships implementations for several common Java
types
• Also for e.g. reversing the sort order and building compound
keys.
49© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Inefficiencies in our data model
Key Value
Row Column Family Column Qualifier Visibility author, image url,
and comment
reader id discussion id comment order group id
50© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Two categories of data
Key Value
Row Column Family Column Qualifier Visibility author, image url
reader id discussion id image group id
Key Value
Row Column Family Column Qualifier Visibility author, comment
reader id discussion id text group id
51© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
And now our data looks like this
52© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
And the scan for a page covers less data
53© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™
sorted, distributed
key/value store is a robust,
scalable, high
performance data storage
and retrieval system.”
Accumulo PMC via https://accumulo.apache.org/
54© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Our simplified diagram
55© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Slightly less simplified
56© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Back to the data model
Key Valu
e
Ro
w
Column Tim
eFamily Qualifier Visibility
57© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Back to the data model
Key Valu
e
Ro
w
Column Tim
eFamily Qualifier Visibility
58© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Rows are grouped into Tablets
• Tablet is defined by a start and end row
• All cells for a given row must be in the same Tablet.
59© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Tablets are assigned to Tablet Servers
• At any given point in time, a Tablet is serviced by a single Tablet
Server
60© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Slightly less simplified
61© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Tablets are assigned to Tablet Servers
• At any given point in time, a Tablet is serviced by a single Tablet
Server
• That server is responsible for client reads and writes to all hosted
Tablets
• Finding the proper server is handled by the Accumulo libraries
• Proper key design means io load gets spread across multiple
machines
62© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™
sorted, distributed
key/value store is a robust,
scalable, high
performance data storage
and retrieval system.”
Accumulo PMC via https://accumulo.apache.org/
63© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Tablet assignment is not static
• Assignment tend to have steady state
• But can move in the event of new resources or failure
64© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Remember our RDBMS scaling?
65© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
New RDBMS shard
1. Provision hardware for service
2. Rewrite data under new sharding
3. Update application services
• Doing this without an outage is hard work (and well paid if you can
get it)
66© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
New Accumulo Tablet Server
1. Provision hardware for service
2. Add server to cluster
3. Tablets automatically migrate from busier nodes to new node
• No outage from client perspective.
67© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
“The Apache Accumulo™
sorted, distributed
key/value store is a robust,
scalable, high
performance data storage
and retrieval system.”
Accumulo PMC via https://accumulo.apache.org/
68© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
All distributed systems have communication
failures
In the face of such a failure you can either
• remain available on remaining nodes to all clients
• provide a consistent view of updates to a subset of
clients
69© 2015 Cloudera licensed CC-BY-SA 2.0
Now you know the
basics of CAP
Remember that you can’t give up partition tolerance
70© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Remember our RDBMS robustness?
71© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Accumulo is a CP system
• Tablet Servers ensure that updates have been written to a distributed
write-ahead-log before acknowledging
• Tablet Server failures are automatically detected
• Newly assigned hosts for recovered Tablets then replay edits up until
last ack before serving new requests
72© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
73© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client
write
74© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Write goals
• Low latency ack
• Don’t lose acked writes in face of node failure
75© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client
write
1
76© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client
write
1
2
77© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client
write
1
2
3
78© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
79© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
80© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
81© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
82© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
83© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Recovery timing
• Tunable time to detection – increases network load
• Size of outstanding write ahead logs
84© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Client
write
1
2
3
4
85© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Accumulo-based App Layout
86© 2015 Cloudera licensed CC-BY-SA 2.0
What’s the catch?
87© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
Gaps
• Still requires application updates to use API – no interactive SQL
bindings*
• No Disaster Recovery – coming in next minor release
Thank you.
Mr. Mean photo from mockup is © 2004 Flickr user
aznewbeginning; cc-by-sa 2.0 https://flic.kr/p/4uzdRc

Contenu connexe

Tendances

An Introduction To Server Virtualisation
An Introduction To Server VirtualisationAn Introduction To Server Virtualisation
An Introduction To Server Virtualisation
Alan McSweeney
 
A cheapskate's guide to Azure - Øredev 2022
A cheapskate's guide to Azure - Øredev 2022A cheapskate's guide to Azure - Øredev 2022
A cheapskate's guide to Azure - Øredev 2022
Karl Syvert Løland
 

Tendances (20)

Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
 
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...
 
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
GitOps - Modern best practices for high velocity app dev using cloud native t...
GitOps - Modern best practices for high velocity app dev using cloud native t...GitOps - Modern best practices for high velocity app dev using cloud native t...
GitOps - Modern best practices for high velocity app dev using cloud native t...
 
An Introduction To Server Virtualisation
An Introduction To Server VirtualisationAn Introduction To Server Virtualisation
An Introduction To Server Virtualisation
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
A cheapskate's guide to Azure - Øredev 2022
A cheapskate's guide to Azure - Øredev 2022A cheapskate's guide to Azure - Øredev 2022
A cheapskate's guide to Azure - Øredev 2022
 
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
 
FinOps at REA – Innovation in Finance & Operations
FinOps at REA – Innovation in Finance & OperationsFinOps at REA – Innovation in Finance & Operations
FinOps at REA – Innovation in Finance & Operations
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSA
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
 
Paris Kafka Meetup - Concepts & Architecture
Paris Kafka Meetup - Concepts & ArchitectureParis Kafka Meetup - Concepts & Architecture
Paris Kafka Meetup - Concepts & Architecture
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 

Similaire à Introduction to Apache Accumulo

Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
DataWorks Summit
 

Similaire à Introduction to Apache Accumulo (20)

Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...
VMworld Europe 2014: What’s New in End User Computing: Full Desktop Automatio...
 
CloudStack - Top 5 Technical Issues and Troubleshooting
CloudStack - Top 5 Technical Issues and TroubleshootingCloudStack - Top 5 Technical Issues and Troubleshooting
CloudStack - Top 5 Technical Issues and Troubleshooting
 
intro-kafka
intro-kafkaintro-kafka
intro-kafka
 
Docker Swarm vs. Kubernetes Which is the best
Docker Swarm vs. Kubernetes Which is the bestDocker Swarm vs. Kubernetes Which is the best
Docker Swarm vs. Kubernetes Which is the best
 
Paul Angus - CloudStack Container Service
Paul  Angus - CloudStack Container ServicePaul  Angus - CloudStack Container Service
Paul Angus - CloudStack Container Service
 
Enterprise Management for the AWS Cloud (DMG209) | AWS re:Invent 2013
Enterprise Management for the AWS Cloud (DMG209) | AWS re:Invent 2013Enterprise Management for the AWS Cloud (DMG209) | AWS re:Invent 2013
Enterprise Management for the AWS Cloud (DMG209) | AWS re:Invent 2013
 
Představení Oracle SPARC Miniclusteru
Představení Oracle SPARC MiniclusteruPředstavení Oracle SPARC Miniclusteru
Představení Oracle SPARC Miniclusteru
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
 
Best Practices For Workflow
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
 
Enterprise Management for the AWS Cloud
Enterprise Management for the AWS CloudEnterprise Management for the AWS Cloud
Enterprise Management for the AWS Cloud
 
Cloud as a Service: A Powerful New Cloud Management Platform
Cloud as a Service: A Powerful New Cloud Management PlatformCloud as a Service: A Powerful New Cloud Management Platform
Cloud as a Service: A Powerful New Cloud Management Platform
 
The Reality of DIY Kubernetes vs. PKS
The Reality of DIY Kubernetes vs. PKSThe Reality of DIY Kubernetes vs. PKS
The Reality of DIY Kubernetes vs. PKS
 
Migrating Your Windows Datacenter to AWS
Migrating Your Windows Datacenter to AWSMigrating Your Windows Datacenter to AWS
Migrating Your Windows Datacenter to AWS
 
Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
 
oci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfoci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdf
 
CloudStack Container Service
CloudStack Container ServiceCloudStack Container Service
CloudStack Container Service
 
Connecting All Abstractions with Istio
Connecting All Abstractions with IstioConnecting All Abstractions with Istio
Connecting All Abstractions with Istio
 
Azure News Slides for October2017 - Azure Nights User Group
Azure News Slides for October2017 - Azure Nights User GroupAzure News Slides for October2017 - Azure Nights User Group
Azure News Slides for October2017 - Azure Nights User Group
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Introduction to Apache Accumulo

  • 1. 1© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 How to use this presentation • Covered topics: Accumulo architecture, operational maintenance, fault handling • Intended Audience: Developers, supporters, PMs who are conversant in multi-component systems, i.e. involved in web services. • Presumes familiarity with RDBMS • Expected running time: 40 - 60 minutes • License: CC-BY-SA 2.0 • Please let me know if you find it useful and what it could use: busbey@cloudera.com
  • 2. Introduction to Apache Accumulo Scaling a web application made easier Sean Busbey // Software Engineer
  • 3. 3© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Let’s talk about Apache Accumulo…
  • 4. 4© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 But in the context of a specific use case •I really like technology that solves a problem. •Keep in mind that this won’t be exhaustive. •YMMV, proof-of-concepts with metrics are better than slides.
  • 5. 5© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Who am I? • Apache Accumulo PMC • Apache HBase committer • Software Engineer on Cloudera’s storage team
  • 6. 6© 2015 Cloudera licensed CC-BY-SA 2.0 That is to say, I work for a vendor and no longer have operational scale problems of my own.
  • 7. We’ll focus on an application that enables conversations centered on cute cats.
  • 8. 8© 2015 Cloudera licensed CC-BY-SA 2.0
  • 9. 9© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Simple sharing model built with privacy controls •User defines a group that may see their posting •User posts a picture to a given group •Members of the group may write short messages
  • 10. 10© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Straight forward web architecture
  • 11. 11© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Relational Data Model Will map user names to identifiers used elsewhere. Will track ownership and descriptive name. Will allow users to add and remove members. User table Group table Group membership table
  • 12. 12© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Relational Data Model Tracks distribution group, owner, and topical image. Individual comments from users. Topic table Comment table
  • 13. 13© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 First growth: robustness
  • 14. 14© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 First growth: robustness
  • 15. 15© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Second growth: application scale out
  • 16. 16© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Scaling reads: what goes into this page?
  • 17. 17© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Database reads eventually become a bottleneck
  • 18. 18© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Scale by de-normalizing in favor of reads
  • 19. 19© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Change to writes - original
  • 20. 20© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Change to writes – de-normalized
  • 21. Generally known as the fan-out pattern. 21© 2015 Cloudera licensed CC-BY-SA 2.0
  • 22. 22© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 The trick is to not get crushed by the writes •Each poster now does a write for each member of the group a post goes to. •Removing access is now a much larger delete query. •Most databases are geared toward few writes and many reads; are we screwed?
  • 23. 23© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Recall our access pattern
  • 24. Basically one of these consumer boxes. 24© 2015 Cloudera licensed CC-BY-SA 2.0
  • 25. 25© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Lines up very well with sharding •Divide the query space up by e.g. a hash of user id into n shards. •Store a copy of table on each shard, but just for user ids that hash to that shard. •Reads and writes are spread across instances.
  • 26. 26© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Database shards Layout
  • 27. 27© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 What were the nice-to-haves for the RDBMS again? • No longer leveraging relational data model. • Now running, backing up, and failing over num shards number of database instances. • Robustness in a shard has to be managed. • Sharding is essentially static; adding more resources with growth still painful.
  • 28. 28© 2015 Cloudera licensed CC-BY-SA 2.0 Now we have some context for Accumulo. Our goal is to end up with less operational overhead.
  • 29. 29© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  • 30. 30© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Accumulo-based App Layout
  • 31. 31© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  • 32. 32© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 In Accumulo, you address cells rather than records Key Valu e
  • 33. 33© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Keys are multi-dimensional Key Valu e Ro w Column Tim e
  • 34. 34© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Keys are multi-dimensional Key Valu e Ro w Column Tim eFamily Qualifier Visibility
  • 35. 35© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Accumulo doesn’t assume a schema •All key and value components, save time, are byte[] •The application is responsible for serialization •Common to use different serialization for the values in different columns.
  • 36. 36© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Mapping records to cells •Treat a row as a database • Essentially each column is a record field •Treat each cell as a database record • Need to uniquely identify each record • Useful if you generally need the whole row and not a subset of columns • Can then treat each row as a shard of database records.
  • 37. 37© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Let’s use a concrete example.
  • 38. 38© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Already know our reads are within a shard.
  • 39. 39© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Mapping our data into cells Key Value Row Column Family Column Qualifier Visibility author, image url, and comment reader id discussion id comment order group id
  • 40. 40© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 We end up with something close to our original.
  • 41. 41© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Note the use of visibility
  • 42. 42© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Visibility enforcement •At scan time, our application will pass in the groups for the current user. •Accumulo will filter any cells that don’t match those groups. • Group removal is a simple update in the group management system again.
  • 43. 43© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Sparse column storage •We are creating lots of columns: per discussion per group member. •Accumulo only stores columns that exist in a given row.
  • 44. 44© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  • 45. 45© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 All cells sorted according to key • Total ordering based on lex-sort of raw byte arrays of key components. • Time is sorted most-recent-first • Reads are done on a contiguous range of cells.
  • 46. 46© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 When sorted our data looks like this….
  • 47. 47© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 And the scan for a page is roughly…
  • 48. 48© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Lexicoders • Turning different kinds of data into sortable bytes is painful • Accumulo ships implementations for several common Java types • Also for e.g. reversing the sort order and building compound keys.
  • 49. 49© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Inefficiencies in our data model Key Value Row Column Family Column Qualifier Visibility author, image url, and comment reader id discussion id comment order group id
  • 50. 50© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Two categories of data Key Value Row Column Family Column Qualifier Visibility author, image url reader id discussion id image group id Key Value Row Column Family Column Qualifier Visibility author, comment reader id discussion id text group id
  • 51. 51© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 And now our data looks like this
  • 52. 52© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 And the scan for a page covers less data
  • 53. 53© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  • 54. 54© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Our simplified diagram
  • 55. 55© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Slightly less simplified
  • 56. 56© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Back to the data model Key Valu e Ro w Column Tim eFamily Qualifier Visibility
  • 57. 57© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Back to the data model Key Valu e Ro w Column Tim eFamily Qualifier Visibility
  • 58. 58© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Rows are grouped into Tablets • Tablet is defined by a start and end row • All cells for a given row must be in the same Tablet.
  • 59. 59© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Tablets are assigned to Tablet Servers • At any given point in time, a Tablet is serviced by a single Tablet Server
  • 60. 60© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Slightly less simplified
  • 61. 61© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Tablets are assigned to Tablet Servers • At any given point in time, a Tablet is serviced by a single Tablet Server • That server is responsible for client reads and writes to all hosted Tablets • Finding the proper server is handled by the Accumulo libraries • Proper key design means io load gets spread across multiple machines
  • 62. 62© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  • 63. 63© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Tablet assignment is not static • Assignment tend to have steady state • But can move in the event of new resources or failure
  • 64. 64© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Remember our RDBMS scaling?
  • 65. 65© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 New RDBMS shard 1. Provision hardware for service 2. Rewrite data under new sharding 3. Update application services • Doing this without an outage is hard work (and well paid if you can get it)
  • 66. 66© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 New Accumulo Tablet Server 1. Provision hardware for service 2. Add server to cluster 3. Tablets automatically migrate from busier nodes to new node • No outage from client perspective.
  • 67. 67© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  • 68. 68© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 All distributed systems have communication failures In the face of such a failure you can either • remain available on remaining nodes to all clients • provide a consistent view of updates to a subset of clients
  • 69. 69© 2015 Cloudera licensed CC-BY-SA 2.0 Now you know the basics of CAP Remember that you can’t give up partition tolerance
  • 70. 70© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Remember our RDBMS robustness?
  • 71. 71© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Accumulo is a CP system • Tablet Servers ensure that updates have been written to a distributed write-ahead-log before acknowledging • Tablet Server failures are automatically detected • Newly assigned hosts for recovered Tablets then replay edits up until last ack before serving new requests
  • 72. 72© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  • 73. 73© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write
  • 74. 74© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Write goals • Low latency ack • Don’t lose acked writes in face of node failure
  • 75. 75© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write 1
  • 76. 76© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write 1 2
  • 77. 77© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write 1 2 3
  • 78. 78© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  • 79. 79© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  • 80. 80© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  • 81. 81© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  • 82. 82© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  • 83. 83© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Recovery timing • Tunable time to detection – increases network load • Size of outstanding write ahead logs
  • 84. 84© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write 1 2 3 4
  • 85. 85© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Accumulo-based App Layout
  • 86. 86© 2015 Cloudera licensed CC-BY-SA 2.0 What’s the catch?
  • 87. 87© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Gaps • Still requires application updates to use API – no interactive SQL bindings* • No Disaster Recovery – coming in next minor release
  • 88. Thank you. Mr. Mean photo from mockup is © 2004 Flickr user aznewbeginning; cc-by-sa 2.0 https://flic.kr/p/4uzdRc

Notes de l'éditeur

  1. Accumulo is a distributed key value store based on the BigTable paper.
  2. Technology introduction talks are often driven from the perspective of what and how an application works rather than why. So this evening I’m going to start with a common use case and the talk about how Accumulo’s implementation addresses issues for that use. This means you’ll have to bear with me as we establish how we might end up with a set of problems that Accumulo helps solve.
  3. Involved in Accumulo for a while. Recently I’ve been working more on HBase, another distributed key value store based on bigtable.
  4. I do still have the scale problems of our supported customers and the project communities I work within. I also spent my years prior to Cloudera building scalable systems I can’t talk about. The upshot is that we’ll have to come up with a convincing contrivance.
  5. That cat’s name is actually Mr. Mean. The photo is from https://flic.kr/p/4uzdRc © 2004 Flickr user aznewbeginning; cc-by-sa 2.0
  6. Make sure you can see my sweet website mockup. The photo is from https://flic.kr/p/4uzdRc © 2004 Flickr user aznewbeginning; cc-by-sa 2.0
  7. Because Cute Cat Conversations Dot Com cares about giving users control over their pictures, we require that when a user removes someone from a group that person doesn’t still see old conversations. So groups are both a distribution and an authorization mechanism
  8. These are logical components. You’re probably running them on a single node.
  9. All nice, tidy, and easy to reason about. We can straightforwardly look at how we can add conversations, comments, and manage groups with a minimal number of updates.
  10. Our cat conversations get a little popular, we need to make sure one component failure doesn’t sink us.
  11. So you add another application server, and you set up a replication for your database. Depending on the specifics of the database you use and how much time you invest, you might have to deal with brief outages while you handle fail over yourself.
  12. For most applications, as you gain users and activity there’ll be substantial gains to just doing sessionized load balancing against more application servers. In most cases, this will also buy you more robustness at the application layer. More importantly, you still have a nice to reason about relational model for your data and a relatively easy to administer data store.
  13. You’ll need to do some filtering an ordering in addition, but you’ll need to hit these two joins because you must turn the current user into a set of groups. You might break this into two queries, one for a set of discussions and one for a set of comments, but you’ll still going to have latency because of having to look across tables or round trip to the application server.
  14. Now we can get everything we need for a given user’s page with a single table. (We could also just set the image url to null for the rows that are comments)
  15. However, originally each new comment or conversation start only required updating a single row. Like when this comment was added.
  16. Now that same single comment involves writing 3x as many rows. If you look above, the same applies for posting a new conversation image.
  17. When your read latency are important this is an established optimization technique. It’s called fan-out because the writes from a given consumer form a “fan” as they connect to a subset of the potential consumer lists. This pattern comes up whenever you’re going to have a large scale number of readers in a time sensitive context, like the web, that receive updates from a smaller set of producers. Think social sharing sites like Pinterest, Twitter, or even Google Plus.
  18. Generally, I think if you get to the point of implementing fan-out you should look seriously at moving to one of the distributed keyvalue stores. But traditional RDBMS isn’t done just yet.
  19. When we want to build the page for a given user, we just need the rows corresponding to them.
  20. As you can see, under this set up, each of the application servers will talk to some set of databases depending on what users they are servicing. When writes happen, they will need to be broadcast to every shard that contains a user in the appropriate group.
  21. Hopefully it’s been ~15-20 minutes. The cost for that lower operational overhead is that Accumulo is going ot make us think about our data organization more.
  22. The headline description from the project itself. We’ll break down the pieces of this description and how they end up easing the pain in our current scaled up application.
  23. As you can see, under this set up, each of the application servers will talk to some set of databases depending on what users they are servicing. When writes happen, they will need to be broadcast to every shard that contains a user in the appropriate group.
  24. It’s important that we start with the fundamental limitation of Accumulo: it’s a key value store and does not provide a relational model.
  25. You read and write values given a particular key.
  26. Keys are made up of a row, a column, and timestamp.
  27. A Column, in turn, is actually made up of 3 parts. A family, or general grouping of similar columns, a qualifier that specifies which coordinate within the family, and a visibility. We’ll cover how some of these key-parts are treated specially in a little bit. Generally, you can just think of it like a big multi-dimensional map.
  28. We’ll cover this last bit more in a few minutes.
  29. This is our read-oriented de-normalized conversation table
  30. When we want to build the page for a given user, we just need the rows corresponding to them. So we’ll take the cell-per-record approach, and use the reader id as a shard indicator in the row id.
  31. Mind you, this is just a first pass.
  32. Note that we’d set each cell’s visibility to be the group the message went to.
  33. If storage is at a premium, we can handle deleting cells we know a user won’t see in an offline way.
  34. Because Accumulo only deals with cells at its core, it doesn’t presume that a column being present in one row means it will be present in another. It stores nothing when a column doesn’t exist. This means we can have extremely wide tables that are only sparsely populated; perfect for the fan-out of our cat conversations.
  35. Accumulo asserts a total ordering across all keys.
  36. Sort is done key-component wise with decreasing priority across: row, family, qualifier, visibility, and finally time.
  37. A common difficulty for building on Accumulo is that you need an increased awareness of how parts of Accumulo will interact with your chosen data layout. Rather than something you can reason about once there are issues (like adding an index to a RDBMS), you need to work it out at the time of application design. To be performant, we need to make sure that our access pattern for a given user will be a small number of these sequential ranges. That means we have to understand how our chosen keys lay out for Accumulo scans. This layout and scan makes me think of two issues for our application.
  38. Out of the box, Accumulo will give you lexicoders for all the primitive types as well as java Strings, Date, and BigInteger objects. It will also let you build a sortable representation of a list of encoded values.
  39. This is how we’re laying things out again. First entry is always just the image; later entries never need the image because they got it at the start of the scan. Recall earlier when I mentioned that Accumulo just treats the bytes as-is and it’s common for applications to use multiple meanings for a cell value depending on the column. So let’s remove the placeholders in our values and instead make it explicit when a cell is the image for the start of a conversation and when it’s a comment.
  40. By default, Accumulo will only keep a single version of a given key around; it decides which one to keep based on whichever is newest according to the timestamp in the key. To simplify our current data model, we’re going to configure it to keep an arbitrary number of versions. This will allow us to leave the “comment order” out of our key entirely. We can either set the time based on the posting client or we can rely on the order Accumulo receives updates. We’d always receive them when reading most-recent-first.
  41. We’ve complicated the mapping from our original database. We’re relying on the way scans work in Accumulo to simplify how we interact with our dataset
  42. By relying on the timestamp and multiple cell versions, we do end up with most-recent-first ordering on comments. On the downside, we’ll have to reverse for display. On the plus side, we can easily do things like previews of most-recent-comment.
  43. At its simplest, this must means that Accumulo will scale across many machines. Unlike our manual database sharding, this should be transparent to you.
  44. Our diagram is a bit of an oversimplification
  45. We can add in a bit of detail. The requests from our clients are going to be served within the cluster by a set of Tablet Servers. Unfortunately, it won’t make much sense to talk about them without fist going back to our data model for a second.
  46. When I said earlier that we’ll treat the row like a database shard, I wasn’t just talking for our application. Internally, Accumulo manages cells in groups of rows.
  47. Practically, this means that the row is the atomic unit of parallelizability within an Accumulo system. In our case, we don’t expect one use to be in so many other people’s cat conversation groups that a single machine couldn’t handle their stream. In other use cases we may have to account for this in our key design.
  48. If you look closely, you can see the tablets!
  49. In particular, this means that we should probably use a hash on our user ids to ensure we don’t get a contiguous block of group members all going to the same server. Besides having to know about the contiguous group members issue, we don’t need to embed any other knowledge about the way sharding is handled into our application.
  50. Not having the logic in our application server also means that maintenance tasks like expanding our cluster is easier.
  51. Accumulo is horizontally scalable and tested at very large cluster sizes.
  52. Adding new hardware resources is equivalent ot adding a new shard.
  53. Hard engineering work mean expensive.
  54. That’s it. Once the server comes online, Accumulo’s internal coordination service will recognize that there are more physical resources available on the cluster and safely migrate Tablets from busier servers over to the new one.
  55. Accumulo has no single point of failure and safely recovers from partial failures.
  56. This is the CAP theorem, in brief. You can’t choose to give up “partition tolerance.”
  57. This is the CAP theorem, in brief. You can’t choose to give up “partition tolerance.”
  58. We had some fault tolerance. If fortunate we had automatic failover. If _very_ fortunate we had those without data loss. If we lost more nodes in a particular shard then we had replication set up for, that set of users was just out of luck until someone got paged. Whether this storage system favored availability or consistency is very implementation dependent. Most that I have dealt with chose availability because the replication was not synchronous.
  59. Remember zooming in here?
  60. Can’t write directly to persistent storage, because that’s all sorted.
  61. First, we write to a distributed write ahead log. These logs are append-only and written to other nodes via an underlying distributed file system. The are only used in the event that the node fails before we can update persistent storage.
  62. Once we are assured that there is a safe copy for recovery, we write the update into our buffer of accepted writes.
  63. Then we ack the client and the write is visible to the world.
  64. Now it’s possible that after that ack we’ll have a failure.
  65. Like this.
  66. After a tunable timeout, there’s a coordinator system that will notice the node is down.
  67. It will have the remaining Tablet Servers load the write ahead logs from the down server and
  68. When they’re done, then the Tablets from the down server will be reassigned. This assignment is a light weight RPC. It just tells the Tablet Server to take ownership of the Tablets, perform any recovery out of distributed storage, and then serve client requests.
  69. Can’t write directly to persistent storage, because that’s all sorted.
  70. In order to keep the size of write ahead logs down, the Tablet Server occasionally flushes buffered writes out into newly sorted files on persistent storage.
  71. Now that we’ve covered the internals of recovery, we can see that in addition to easier migration paths, we also have better robustness guarantees because our shards will move themselves around as failures occur, allowing for a more graceful degradation in the face of failures.
  72. What are the big gaps still?
  73. In open source. There’s a private company that has modified Presto. Replication in currently named 1.7.0