SlideShare une entreprise Scribd logo
1  sur  13
Accumulo: A Quick Introduction
James Salter
25 July 2013
About Me
• James Salter
• Former: PhD, University of Surrey
▫ Resource discovery in peer-to-peer networks
▫ Recommender systems
• Current: Applied Researcher
▫ Data mining algorithms, information fusion
▫ Hadoop
▫ Large graphs
▫ “other interesting things”
Outline
• What is Accumulo?
• Comparison with Relational Databases
• Architecture
• Potential Applications
Apache Hadoop
• Framework for distributed computing
• Clusters of commodity machines
• MapReduce
▫ Best-known sub-project
▫ Batch processing of bulk data
▫ (Potentially) large files of output
What is Accumulo?
• A distributed key/value store
▫ Runs in parallel across a Hadoop cluster
• Very scalable
▫ trillions of records, 10s of Petabytes of data
• Cell level security
▫ Every data item has a security label
• Open source version of Google’s BigTable
▫ Original development by NSA
▫ Now a top-level Apache project
Relational schema to Accumulo
CustName Birthday Phone
Alice 12/03/45 794838
Bob 09/09/67
Mary 23/04/83 975838
CustName ItemID Quantity
Alice 17 1
Alice 89 5
Bob 92 1
Mary 12 1
ItemID ItemName
12 DVD
17 Magazine
89 Ticket
92 Shirt
CustName Birthday Phone DVD Magazine Ticket Shirt
Alice 12/03/45 794838 1 5
Bob 09/09/67 1
Mary 23/04/83 975838 1
Relational schema to Accumulo
Row,Column Value
{Alice,Birthday} 12/03/45
{Alice,Phone} 794838
{Alice,Magazine} 1
{Alice,Ticket} 5
{Bob,Birthday} 09/09/67
{Bob,Shirt} 1
... ...
nulls are
not stored
easy to add
new columns
e.g. {Bob,Book}
CustName Birthday Phone DVD Magazine Ticket Shirt
Alice 12/03/45 794838 1 5
Bob 09/09/67 1
Mary 23/04/83 975838 1
Table Structure
• Tables contain key/value pairs sorted by key
• Split into tablets, distributed across a cluster
▫ Tablets reflect a portion of the table’s keyspace
Key Value
{Alice,Birthday} 12/03/45
{Alice,Magazine} 1
{Alice,Phone} 794838
{Alice,Ticket} 5 Key Value
{Bob,Birthday} 09/09/67
{Bob,Shirt} 1
... ...
Tablet Server
• Hosts one or more tablets
▫ Not necessarily for the same table
• Tablets store references to ISAM (Indexed
Sequential Access Method) files in HDFS
▫ Key/values stored in ISAM files
Tablet Server
Tablet
Table A
RowIDs g-n
Tablet
Table F
RowIDs a-c
Tablet
Table J
RowIDs x-zz
HDFS
ISAM
File
ISAM
File
ISAM
File
Master
• Detects Tablet Server failures
▫ Migrates tablets to other Tablet Servers
• Responsible for load balancing
▫ Assigns tablets to Tablet Servers
▫ Instructs Tablet Servers to migrate tablets
Potential Applications
• Massive datastore
▫ Interactive retrieval of MapReduce results
• Graph database/graph mining
▫ Data input to Google Pregel clones (e.g. Giraph)
• Machine learning/classification
▫ Good for storing sparse feature vectors
• Not good for applications involving JOIN
▫ Limited joins possible – Intersecting Iterator
▫ Combine with Hive, Impala, etc.
Conclusion
• Accumulo is a key-value datastore
• Data layout very different from Relational DBs
• Distributed architecture on top of Hadoop
• Many uses aside from “just” a simple store
Accumulo: A Quick Introduction

Contenu connexe

Tendances

Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysis
Raminder Singh
 
new_Rajesh_Hadoop Developer_2016
new_Rajesh_Hadoop Developer_2016new_Rajesh_Hadoop Developer_2016
new_Rajesh_Hadoop Developer_2016
Rajesh Kumar
 

Tendances (20)

A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primer
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoop
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysis
 
Introduction to Bigdata & Hadoop
Introduction to Bigdata & HadoopIntroduction to Bigdata & Hadoop
Introduction to Bigdata & Hadoop
 
Apache Con Eu2008 Hadoop Tour Tom White
Apache Con Eu2008 Hadoop Tour Tom WhiteApache Con Eu2008 Hadoop Tour Tom White
Apache Con Eu2008 Hadoop Tour Tom White
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
 
reddit genie
reddit geniereddit genie
reddit genie
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Big data
Big dataBig data
Big data
 
Bigdata
BigdataBigdata
Bigdata
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoop
 
Frequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoopFrequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoop
 
new_Rajesh_Hadoop Developer_2016
new_Rajesh_Hadoop Developer_2016new_Rajesh_Hadoop Developer_2016
new_Rajesh_Hadoop Developer_2016
 
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
 
INTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATAINTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATA
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptx
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Big data references
Big data referencesBig data references
Big data references
 
Edanz journal selector case study a prototype based on solr nutch hadoop
Edanz journal selector case study a prototype based on solr nutch hadoopEdanz journal selector case study a prototype based on solr nutch hadoop
Edanz journal selector case study a prototype based on solr nutch hadoop
 

Similaire à Accumulo: A Quick Introduction

12-BigDataMapReduce.pptx
12-BigDataMapReduce.pptx12-BigDataMapReduce.pptx
12-BigDataMapReduce.pptx
Shree Shree
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
wondimagegndesta
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Media Gorod
 

Similaire à Accumulo: A Quick Introduction (20)

Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data Modeling
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 
12-BigDataMapReduce.pptx
12-BigDataMapReduce.pptx12-BigDataMapReduce.pptx
12-BigDataMapReduce.pptx
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
 

Plus de James Salter

Plus de James Salter (11)

Security for The Machine: By Design
Security for The Machine: By DesignSecurity for The Machine: By Design
Security for The Machine: By Design
 
The Machine - a vision for the future of computing
The Machine - a vision for the future of computingThe Machine - a vision for the future of computing
The Machine - a vision for the future of computing
 
Big data ... for security
Big data ... for securityBig data ... for security
Big data ... for security
 
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...
 
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer Networks
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer NetworksINC 2005 - ROME: Optimising DHT-based Peer-to-Peer Networks
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer Networks
 
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...
 
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
 
Agents and P2P Networks
Agents and P2P NetworksAgents and P2P Networks
Agents and P2P Networks
 
Lecture - Network Technologies: Peer-to-Peer Networks
Lecture - Network Technologies: Peer-to-Peer NetworksLecture - Network Technologies: Peer-to-Peer Networks
Lecture - Network Technologies: Peer-to-Peer Networks
 
Lecture: Software Agents and P2P
Lecture: Software Agents and P2PLecture: Software Agents and P2P
Lecture: Software Agents and P2P
 
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in GridsINC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Dernier (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

Accumulo: A Quick Introduction

  • 1. Accumulo: A Quick Introduction James Salter 25 July 2013
  • 2. About Me • James Salter • Former: PhD, University of Surrey ▫ Resource discovery in peer-to-peer networks ▫ Recommender systems • Current: Applied Researcher ▫ Data mining algorithms, information fusion ▫ Hadoop ▫ Large graphs ▫ “other interesting things”
  • 3. Outline • What is Accumulo? • Comparison with Relational Databases • Architecture • Potential Applications
  • 4. Apache Hadoop • Framework for distributed computing • Clusters of commodity machines • MapReduce ▫ Best-known sub-project ▫ Batch processing of bulk data ▫ (Potentially) large files of output
  • 5. What is Accumulo? • A distributed key/value store ▫ Runs in parallel across a Hadoop cluster • Very scalable ▫ trillions of records, 10s of Petabytes of data • Cell level security ▫ Every data item has a security label • Open source version of Google’s BigTable ▫ Original development by NSA ▫ Now a top-level Apache project
  • 6. Relational schema to Accumulo CustName Birthday Phone Alice 12/03/45 794838 Bob 09/09/67 Mary 23/04/83 975838 CustName ItemID Quantity Alice 17 1 Alice 89 5 Bob 92 1 Mary 12 1 ItemID ItemName 12 DVD 17 Magazine 89 Ticket 92 Shirt CustName Birthday Phone DVD Magazine Ticket Shirt Alice 12/03/45 794838 1 5 Bob 09/09/67 1 Mary 23/04/83 975838 1
  • 7. Relational schema to Accumulo Row,Column Value {Alice,Birthday} 12/03/45 {Alice,Phone} 794838 {Alice,Magazine} 1 {Alice,Ticket} 5 {Bob,Birthday} 09/09/67 {Bob,Shirt} 1 ... ... nulls are not stored easy to add new columns e.g. {Bob,Book} CustName Birthday Phone DVD Magazine Ticket Shirt Alice 12/03/45 794838 1 5 Bob 09/09/67 1 Mary 23/04/83 975838 1
  • 8. Table Structure • Tables contain key/value pairs sorted by key • Split into tablets, distributed across a cluster ▫ Tablets reflect a portion of the table’s keyspace Key Value {Alice,Birthday} 12/03/45 {Alice,Magazine} 1 {Alice,Phone} 794838 {Alice,Ticket} 5 Key Value {Bob,Birthday} 09/09/67 {Bob,Shirt} 1 ... ...
  • 9. Tablet Server • Hosts one or more tablets ▫ Not necessarily for the same table • Tablets store references to ISAM (Indexed Sequential Access Method) files in HDFS ▫ Key/values stored in ISAM files Tablet Server Tablet Table A RowIDs g-n Tablet Table F RowIDs a-c Tablet Table J RowIDs x-zz HDFS ISAM File ISAM File ISAM File
  • 10. Master • Detects Tablet Server failures ▫ Migrates tablets to other Tablet Servers • Responsible for load balancing ▫ Assigns tablets to Tablet Servers ▫ Instructs Tablet Servers to migrate tablets
  • 11. Potential Applications • Massive datastore ▫ Interactive retrieval of MapReduce results • Graph database/graph mining ▫ Data input to Google Pregel clones (e.g. Giraph) • Machine learning/classification ▫ Good for storing sparse feature vectors • Not good for applications involving JOIN ▫ Limited joins possible – Intersecting Iterator ▫ Combine with Hive, Impala, etc.
  • 12. Conclusion • Accumulo is a key-value datastore • Data layout very different from Relational DBs • Distributed architecture on top of Hadoop • Many uses aside from “just” a simple store