SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
Accumulo 1.4 & 1.5



    Keith Turner
What is Accumulo?
A re-implementation of Big Table
●   Distributed Database
●   Does not support SQL
●   Data is arranged by
    ●   Row
    ●   Column
    ●   Time
●   No need to declare columns, types
Massively Distributed
Key Structure

                          Key                            Value
   Row          Column          Column      Time Stamp   Value
                Family          Qualifier
 Primary       Secondary      Uniqueness    Versioning   Value
Partitioning   Partitioning     Control      Control
Component      Component
Tablet Organization
System Processes
Tablet Data Flow
Rfile Multi-level index
1.3 Rfile Index
●   Index is an array of keys
●   One key per data block
●   Complete index loaded into memory when file
    opened
●   Complete index kept in memory when writing
    file
●   Slow load times
●   Memory exhaustion
1.3 RFile

        Index




Data             Data    Data
Block            Block   Block
Non multi-level 9G file

Locality group          : <DEFAULT>
   Start block           : 0
   Num   blocks          : 182,691
   Index level 0 size    : 8,038,404 bytes
   First key             : 00000008611ae5d6 0e94:74e1 [] 1310491465052 false
   Last key              : 7ffffff7daec0b4e 43ee:1c1e [] 1310491475396 false
   Num entries           : 172,014,468
   Column families       : <UNKNOWN>

Meta block     : BCFile.index
   Raw size             : 2,148,491 bytes
   Compressed size      : 1,485,697 bytes
   Compression type     : lzo

Meta block     : RFile.index
   Raw size             : 8,038,470 bytes
   Compressed size      : 5,664,704 bytes
   Compression type     : lzo
1.4 RFile

        Index Level 1




        Index Level 0




Data        Data           Data     Data    Data
Block       Block          Block    Block   Block
File layout


                    Index           Index           Index Index
        Data Data         Data Data       Data Data
                      L0              L0              L0    L1




●   Index blocks written as data is written
●   complete index not kept in memory during write
●   Index block for each level kept in memory
Multi-level 9G file

Locality group        : <DEFAULT>
   Start block         : 0
   Num   blocks        : 187,070
   Index level 1       : 4,573 bytes 1 blocks
   Index level 0       : 10,431,126 bytes 82 blocks
   First key           : 00000008611ae5d6 0e94:74e1 [] 1310491465052 false
   Last key            : 7ffffff7daec0b4e 43ee:1c1e [] 1310491475396 false
   Num entries         : 172,014,468
   Column families     : <UNKNOWN>

Meta block     : BCFile.index
   Raw size             : 5 bytes
   Compressed size      : 17 bytes
   Compression type     : lzo
Meta block     : RFile.index
   Raw size             : 4,652 bytes
   Compressed size      : 3,745 bytes
   Compression type     : lzo
Test Setup
●   Files in FS cache for test (cat test.rf > /dev/null)
●   Too much variability when some of file was in
    FS cache and some was not
●   Easier to force all of file into cache than out
Open, seek 9G file

Multilevel Index   Cache   Avg Time
       N             N     139.48 ms
       N             Y      37.55 ms
       Y             N       8.48 ms
       Y             Y       3.90 ms
Randomly seeking 9G file

Multilevel Index   Cache   Avg Time
       N             N      2.08 ms
       N             Y      2.32 ms
       Y             N      4.33 ms
       Y             Y      2.14 ms
Configuration
●   Index block size configurable per table
●   Make index block size large and it will behave
    like old rfile
●   Enabled index cache by default in 1.4
Cache hit rate on monitor page
Fault tolerant concurrent table operations
Problematic situations
●   Master dies during create table
●   Create and delete table run concurrently
●   Client dies during bulk ingest
●   Table deleted while writing or scanning
Create table fault
●   Master dies during
    ●   Could leave accumulo metadata in bad state
●   Master creates table, but then dies before
    notifying client
    ●   If client retries, it gets table exist exception
FATE
          Fault Tolerant Executor
●   If process dies, previously submitted operations
    continue to execute on restart.
●   Serializes operation in zookeeper before
    execution
●   Master uses FATE to execute table operations
Bulk import test
●   Create table with summation aggregator
●   Bulk import files of +1 or -1, graph ensures final
    sum is 0
●   Concurrently merge, split, compact, and
    consistency check table
●   Exit point verifies 0
Adampotent
●   Idempotent f(f(x)) = f(x)
●   Adampotent f(f'(x)) = f(x)
    ●   f'(x) denotes partial execution of f(x)
REPO
    Repeatable Persisted Operation
public interface Repo<T> extends Serializable {
     long isReady(long tid, T environment) throws Exception;
     Repo<T> call(long tid, T environment) throws Exception;
     void undo(long tid, T environment) throws Exception;
}



●   call() returns next op, null if done
●   call() and undo() must be adampotent
●   undo() should clean up partial execution of
    isReady() or call()
FATE interface

long startTransaction();
void seedTransaction(long tid, Repo op);
TStatus waitForCompletion(long tid);
Exception getException(long tid);
void delete(long tid);
Client calling FATE on master

Start Tx



           Seed Tx



                     Wait Tx




                               Delete Tx
FATE transaction states

             NEW




                     FAILED
                       IN
     IN            PROGRESS
  PROGRESS




                     FAILED
SUCCESSFUL
FATE Persistent store

public interface TStore {
    long reserve();
    Repo top(long tid);
    void push(long tid, Repo repo);
    void pop(long tid);
    Tstatus getStatus(long tid);
    void setStatus(long tid, Tstatus status);
        .
        .
        .
}
Seeding
void seedTransaction(long tid, Repo repo){
    if(store.getStatus(tid) == NEW){
         if(store.top(tid) == null)
              store.push(tid, repo);
         store.setStatus(tid, IN_PROGRESS);
    }
}
FATE transaction execution
                                    Mark
                                   success
Reserve           Execute
  tx               top op

                              Save
           Push             Exception
            op
                                          Mark
                                          failed
                                        in progress




     Pop           Undo             Mark
      op          top op             fail
Concurrent table ops
●   Per-table read/write lock in zookeeper
●   Zookeeper recipe with slight modification :
    ●   store fate tid as lock data
    ●   Use persistent sequential instead of ephemeral
        sequential
Concurrent delete table
●   Table deleted during read or write 1.3
    ●   Most likely, scan hangs forever
    ●   Rarely, permission exception thrown (seen in test)
●   In 1.4 check if table exists when :
    ●   Nothing in !METADATA
    ●   See a permission exception
Rogue threads
●   Bulk import fate op pushes work to tservers
●   Threads on tserver execute after fate op done
●   Master deletes WID in ZK, waits until counts 0
                        Tablet server worker thread


             Synchronized                             Synchronized

                WID                            Do
     Start                  cnt[WID]++                 cnt[WID]--
               Exist?                         work




                                              exit
Create Table Ops
1. CreateTable
 ●   Allocates table id
2. SetupPermissions
3. PopulateZookeeper
 ●   Reentrantly lock table in isReady()
 ●   Relate table name to table id
4. CreateDir
5. PopulateMetadata
6. FinishCreateTable
Random walk concurrent test
●   Perform all table operation on 5 well known
    tables
●   Run multiple test clients concurrently
●   Ensure accumulo or test client does not crash
    or hang
●   Too chaotic to verify data correctness
●   Found many bugs
Future Work
●   Allow multiple processes to run fate operations
●   Stop polling
●   Hulk Smash Tolerant
Other 1.4 Features
Tablet Merging
●   Splits exists forever; data is often aged off
●   Admin requested, not automatic
●   Merge by range or size
Merging Minor Compaction
●   1.3: minor compactions always add new files:




●   1.4: limits the total of new files:
Merging Minor Compactions
●   Slows down minor compactions
●   Memory fills up
●   Creates back-pressure on ingest
●   Prevents “unable to open enough files to scan”
Table Cloning
●   Create a new table using the same read-only
    files
●   Fast
●   Testing
    ●   At scale, with representative data
    ●   Compaction with a broken iterator: no more data
●   Offline and Copy
    ●   create an consistent snapshot in minutes
Range Operations
●   Compact Range
    ●   Compact all tablets that fall within a row range down
        to a single file
    ●   Useful for tail-insert indexes
    ●   Data purge
●   Delete Range
    ●   Delete all the content within a row range
    ●   Uses split and will delete whole tablets for efficiency
    ●   Useful for data purge
Logical Time for Bulk Import
●   Bulk files created on different machines will get
    different actual times (hours)
●   Bulk files always contain the timestamp created
    by the client
●   A single bulk import request can set a
    consistent time across all files
●   Implemented using iterators
Roadmap
1.4.1 Improvements
●   Snappy, LZ4
●   HDFS Kerberos compatability
●   Bloom filter improvements
●   Map Reduce directly over Accumulo files
●   Server side row select iterator
●   Bulk ingest support for map only jobs
●   BigTop support
●   Wikisearch example improvements
Possible 1.5 features
Performance
●   Multiple namenode support
●   WAL performance improvements
●   In-memory map locality groups
●   Timestamp visibility filter optimization
●   Prefix encoding
●   Distributed FATE ops
API
●   Stats API
●   Automatic deployment of iterators
●   Tuplestream support
●   Coprocessor integration
●   Support other client languages
●   Client session migration
Admin/Security
●   Kerberos support/pluggable authentication
●   Administration monitor page
●   Data center replication
●   Rollback/snapshot backup/recovery
Reliability
●   Decentralize master operations
●   Tablet server state model
Testing
●   Mini-cluster test harness
●   Continuous random walk testing

Contenu connexe

Tendances

Kernel Recipes 2013 - Kernel for your device
Kernel Recipes 2013 - Kernel for your deviceKernel Recipes 2013 - Kernel for your device
Kernel Recipes 2013 - Kernel for your deviceAnne Nicolas
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopTamas K Lengyel
 
Uccn1003 -may10_-_lab_04_-_intro_to_layer-1_network_devices-updated_30_june2...
Uccn1003  -may10_-_lab_04_-_intro_to_layer-1_network_devices-updated_30_june2...Uccn1003  -may10_-_lab_04_-_intro_to_layer-1_network_devices-updated_30_june2...
Uccn1003 -may10_-_lab_04_-_intro_to_layer-1_network_devices-updated_30_june2...Shu Shin
 
Replication
ReplicationReplication
ReplicationPerforce
 
Uccn1003 -may10_-_lab_09_-_wireshark_analysis_live_capture
Uccn1003  -may10_-_lab_09_-_wireshark_analysis_live_captureUccn1003  -may10_-_lab_09_-_wireshark_analysis_live_capture
Uccn1003 -may10_-_lab_09_-_wireshark_analysis_live_captureShu Shin
 
Introduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System AdministratorsIntroduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System AdministratorsJignesh Shah
 
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them allDEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them allFelipe Prado
 
Linux fundamental - Chap 01 io
Linux fundamental - Chap 01 ioLinux fundamental - Chap 01 io
Linux fundamental - Chap 01 ioKenny (netman)
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Sneeker Yeh
 
Hadoop on a personal supercomputer
Hadoop on a personal supercomputerHadoop on a personal supercomputer
Hadoop on a personal supercomputerPaul Dingman
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesAnne Nicolas
 
Windows internals Essentials
Windows internals EssentialsWindows internals Essentials
Windows internals EssentialsJohn Ombagi
 

Tendances (15)

Submission
SubmissionSubmission
Submission
 
Kernel Recipes 2013 - Kernel for your device
Kernel Recipes 2013 - Kernel for your deviceKernel Recipes 2013 - Kernel for your device
Kernel Recipes 2013 - Kernel for your device
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
Uccn1003 -may10_-_lab_04_-_intro_to_layer-1_network_devices-updated_30_june2...
Uccn1003  -may10_-_lab_04_-_intro_to_layer-1_network_devices-updated_30_june2...Uccn1003  -may10_-_lab_04_-_intro_to_layer-1_network_devices-updated_30_june2...
Uccn1003 -may10_-_lab_04_-_intro_to_layer-1_network_devices-updated_30_june2...
 
Replication
ReplicationReplication
Replication
 
Uccn1003 -may10_-_lab_09_-_wireshark_analysis_live_capture
Uccn1003  -may10_-_lab_09_-_wireshark_analysis_live_captureUccn1003  -may10_-_lab_09_-_wireshark_analysis_live_capture
Uccn1003 -may10_-_lab_09_-_wireshark_analysis_live_capture
 
Introduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System AdministratorsIntroduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System Administrators
 
SystemV vs systemd
SystemV vs systemdSystemV vs systemd
SystemV vs systemd
 
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them allDEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
 
Linux fundamental - Chap 01 io
Linux fundamental - Chap 01 ioLinux fundamental - Chap 01 io
Linux fundamental - Chap 01 io
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
 
Hadoop on a personal supercomputer
Hadoop on a personal supercomputerHadoop on a personal supercomputer
Hadoop on a personal supercomputer
 
Chap 17 advfs
Chap 17 advfsChap 17 advfs
Chap 17 advfs
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
 
Windows internals Essentials
Windows internals EssentialsWindows internals Essentials
Windows internals Essentials
 

Similaire à Accumulo 1.4 Features and Roadmap

Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityPythian
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdffengxun
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2Dvir Volk
 
OSMC 2009 | Windows monitoring - Going where no man has gone before... by Mic...
OSMC 2009 | Windows monitoring - Going where no man has gone before... by Mic...OSMC 2009 | Windows monitoring - Going where no man has gone before... by Mic...
OSMC 2009 | Windows monitoring - Going where no man has gone before... by Mic...NETWAYS
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRed Hat Developers
 
From Zero to Hero @ PyGrunn 2014
From Zero to Hero @ PyGrunn 2014From Zero to Hero @ PyGrunn 2014
From Zero to Hero @ PyGrunn 2014meij200
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and ChallengesBIOVIA
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems Baruch Osoveskiy
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
VLDB Administration Strategies
VLDB Administration StrategiesVLDB Administration Strategies
VLDB Administration StrategiesMurilo Miranda
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java codeAttila Balazs
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersJonathan Levin
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandraVinay Kumar Chella
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 

Similaire à Accumulo 1.4 Features and Roadmap (20)

Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdf
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
OSMC 2009 | Windows monitoring - Going where no man has gone before... by Mic...
OSMC 2009 | Windows monitoring - Going where no man has gone before... by Mic...OSMC 2009 | Windows monitoring - Going where no man has gone before... by Mic...
OSMC 2009 | Windows monitoring - Going where no man has gone before... by Mic...
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech Talk
 
From Zero to Hero @ PyGrunn 2014
From Zero to Hero @ PyGrunn 2014From Zero to Hero @ PyGrunn 2014
From Zero to Hero @ PyGrunn 2014
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Percona FT / TokuDB
Percona FT / TokuDBPercona FT / TokuDB
Percona FT / TokuDB
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
VLDB Administration Strategies
VLDB Administration StrategiesVLDB Administration Strategies
VLDB Administration Strategies
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 

Plus de Aaron Cordova

Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data LakeAaron Cordova
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo ClustersAaron Cordova
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache AccumuloAaron Cordova
 
Text Indexing in Accumulo
Text Indexing in AccumuloText Indexing in Accumulo
Text Indexing in AccumuloAaron Cordova
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name NodeAaron Cordova
 

Plus de Aaron Cordova (7)

Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo Clusters
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
MapReduce and NoSQL
MapReduce and NoSQLMapReduce and NoSQL
MapReduce and NoSQL
 
Text Indexing in Accumulo
Text Indexing in AccumuloText Indexing in Accumulo
Text Indexing in Accumulo
 
Accumulo on EC2
Accumulo on EC2Accumulo on EC2
Accumulo on EC2
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Accumulo 1.4 Features and Roadmap

  • 1. Accumulo 1.4 & 1.5 Keith Turner
  • 2. What is Accumulo? A re-implementation of Big Table ● Distributed Database ● Does not support SQL ● Data is arranged by ● Row ● Column ● Time ● No need to declare columns, types
  • 4. Key Structure Key Value Row Column Column Time Stamp Value Family Qualifier Primary Secondary Uniqueness Versioning Value Partitioning Partitioning Control Control Component Component
  • 9. 1.3 Rfile Index ● Index is an array of keys ● One key per data block ● Complete index loaded into memory when file opened ● Complete index kept in memory when writing file ● Slow load times ● Memory exhaustion
  • 10. 1.3 RFile Index Data Data Data Block Block Block
  • 11. Non multi-level 9G file Locality group : <DEFAULT> Start block : 0 Num blocks : 182,691 Index level 0 size : 8,038,404 bytes First key : 00000008611ae5d6 0e94:74e1 [] 1310491465052 false Last key : 7ffffff7daec0b4e 43ee:1c1e [] 1310491475396 false Num entries : 172,014,468 Column families : <UNKNOWN> Meta block : BCFile.index Raw size : 2,148,491 bytes Compressed size : 1,485,697 bytes Compression type : lzo Meta block : RFile.index Raw size : 8,038,470 bytes Compressed size : 5,664,704 bytes Compression type : lzo
  • 12. 1.4 RFile Index Level 1 Index Level 0 Data Data Data Data Data Block Block Block Block Block
  • 13. File layout Index Index Index Index Data Data Data Data Data Data L0 L0 L0 L1 ● Index blocks written as data is written ● complete index not kept in memory during write ● Index block for each level kept in memory
  • 14. Multi-level 9G file Locality group : <DEFAULT> Start block : 0 Num blocks : 187,070 Index level 1 : 4,573 bytes 1 blocks Index level 0 : 10,431,126 bytes 82 blocks First key : 00000008611ae5d6 0e94:74e1 [] 1310491465052 false Last key : 7ffffff7daec0b4e 43ee:1c1e [] 1310491475396 false Num entries : 172,014,468 Column families : <UNKNOWN> Meta block : BCFile.index Raw size : 5 bytes Compressed size : 17 bytes Compression type : lzo Meta block : RFile.index Raw size : 4,652 bytes Compressed size : 3,745 bytes Compression type : lzo
  • 15. Test Setup ● Files in FS cache for test (cat test.rf > /dev/null) ● Too much variability when some of file was in FS cache and some was not ● Easier to force all of file into cache than out
  • 16. Open, seek 9G file Multilevel Index Cache Avg Time N N 139.48 ms N Y 37.55 ms Y N 8.48 ms Y Y 3.90 ms
  • 17. Randomly seeking 9G file Multilevel Index Cache Avg Time N N 2.08 ms N Y 2.32 ms Y N 4.33 ms Y Y 2.14 ms
  • 18. Configuration ● Index block size configurable per table ● Make index block size large and it will behave like old rfile ● Enabled index cache by default in 1.4
  • 19. Cache hit rate on monitor page
  • 20. Fault tolerant concurrent table operations
  • 21. Problematic situations ● Master dies during create table ● Create and delete table run concurrently ● Client dies during bulk ingest ● Table deleted while writing or scanning
  • 22. Create table fault ● Master dies during ● Could leave accumulo metadata in bad state ● Master creates table, but then dies before notifying client ● If client retries, it gets table exist exception
  • 23. FATE Fault Tolerant Executor ● If process dies, previously submitted operations continue to execute on restart. ● Serializes operation in zookeeper before execution ● Master uses FATE to execute table operations
  • 24. Bulk import test ● Create table with summation aggregator ● Bulk import files of +1 or -1, graph ensures final sum is 0 ● Concurrently merge, split, compact, and consistency check table ● Exit point verifies 0
  • 25. Adampotent ● Idempotent f(f(x)) = f(x) ● Adampotent f(f'(x)) = f(x) ● f'(x) denotes partial execution of f(x)
  • 26. REPO Repeatable Persisted Operation public interface Repo<T> extends Serializable { long isReady(long tid, T environment) throws Exception; Repo<T> call(long tid, T environment) throws Exception; void undo(long tid, T environment) throws Exception; } ● call() returns next op, null if done ● call() and undo() must be adampotent ● undo() should clean up partial execution of isReady() or call()
  • 27. FATE interface long startTransaction(); void seedTransaction(long tid, Repo op); TStatus waitForCompletion(long tid); Exception getException(long tid); void delete(long tid);
  • 28. Client calling FATE on master Start Tx Seed Tx Wait Tx Delete Tx
  • 29. FATE transaction states NEW FAILED IN IN PROGRESS PROGRESS FAILED SUCCESSFUL
  • 30. FATE Persistent store public interface TStore { long reserve(); Repo top(long tid); void push(long tid, Repo repo); void pop(long tid); Tstatus getStatus(long tid); void setStatus(long tid, Tstatus status); . . . }
  • 31. Seeding void seedTransaction(long tid, Repo repo){ if(store.getStatus(tid) == NEW){ if(store.top(tid) == null) store.push(tid, repo); store.setStatus(tid, IN_PROGRESS); } }
  • 32. FATE transaction execution Mark success Reserve Execute tx top op Save Push Exception op Mark failed in progress Pop Undo Mark op top op fail
  • 33. Concurrent table ops ● Per-table read/write lock in zookeeper ● Zookeeper recipe with slight modification : ● store fate tid as lock data ● Use persistent sequential instead of ephemeral sequential
  • 34. Concurrent delete table ● Table deleted during read or write 1.3 ● Most likely, scan hangs forever ● Rarely, permission exception thrown (seen in test) ● In 1.4 check if table exists when : ● Nothing in !METADATA ● See a permission exception
  • 35. Rogue threads ● Bulk import fate op pushes work to tservers ● Threads on tserver execute after fate op done ● Master deletes WID in ZK, waits until counts 0 Tablet server worker thread Synchronized Synchronized WID Do Start cnt[WID]++ cnt[WID]-- Exist? work exit
  • 36. Create Table Ops 1. CreateTable ● Allocates table id 2. SetupPermissions 3. PopulateZookeeper ● Reentrantly lock table in isReady() ● Relate table name to table id 4. CreateDir 5. PopulateMetadata 6. FinishCreateTable
  • 37. Random walk concurrent test ● Perform all table operation on 5 well known tables ● Run multiple test clients concurrently ● Ensure accumulo or test client does not crash or hang ● Too chaotic to verify data correctness ● Found many bugs
  • 38. Future Work ● Allow multiple processes to run fate operations ● Stop polling ● Hulk Smash Tolerant
  • 40. Tablet Merging ● Splits exists forever; data is often aged off ● Admin requested, not automatic ● Merge by range or size
  • 41. Merging Minor Compaction ● 1.3: minor compactions always add new files: ● 1.4: limits the total of new files:
  • 42. Merging Minor Compactions ● Slows down minor compactions ● Memory fills up ● Creates back-pressure on ingest ● Prevents “unable to open enough files to scan”
  • 43. Table Cloning ● Create a new table using the same read-only files ● Fast ● Testing ● At scale, with representative data ● Compaction with a broken iterator: no more data ● Offline and Copy ● create an consistent snapshot in minutes
  • 44. Range Operations ● Compact Range ● Compact all tablets that fall within a row range down to a single file ● Useful for tail-insert indexes ● Data purge ● Delete Range ● Delete all the content within a row range ● Uses split and will delete whole tablets for efficiency ● Useful for data purge
  • 45. Logical Time for Bulk Import ● Bulk files created on different machines will get different actual times (hours) ● Bulk files always contain the timestamp created by the client ● A single bulk import request can set a consistent time across all files ● Implemented using iterators
  • 47. 1.4.1 Improvements ● Snappy, LZ4 ● HDFS Kerberos compatability ● Bloom filter improvements ● Map Reduce directly over Accumulo files ● Server side row select iterator ● Bulk ingest support for map only jobs ● BigTop support ● Wikisearch example improvements
  • 49. Performance ● Multiple namenode support ● WAL performance improvements ● In-memory map locality groups ● Timestamp visibility filter optimization ● Prefix encoding ● Distributed FATE ops
  • 50. API ● Stats API ● Automatic deployment of iterators ● Tuplestream support ● Coprocessor integration ● Support other client languages ● Client session migration
  • 51. Admin/Security ● Kerberos support/pluggable authentication ● Administration monitor page ● Data center replication ● Rollback/snapshot backup/recovery
  • 52. Reliability ● Decentralize master operations ● Tablet server state model
  • 53. Testing ● Mini-cluster test harness ● Continuous random walk testing