SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Outline
                Introduction
                      Design
             Implementation
                     Results
                 Conclusions




Bigtable: A Distributed Storage System for
             Structured Data

                 Alvanos Michalis


                    April 6, 2009




            Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                         Introduction
                               Design
                      Implementation
                              Results
                          Conclusions

1   Introduction
       Motivation
2   Design
       Data model
3   Implementation
       Building blocks
       Tablets
       Compactions
       Refinements
4   Results
       Hardware Environment
       Performance Evaluation
5   Conclusions
       Real applications
       Lessons
       End
                     Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction
                                 Design
                                          Motivation
                        Implementation
                                Results
                            Conclusions


Google!

     Lots of Different kinds of data!
          Crawling system URLs, contents, links, anchors, page-rank etc
          Per-user data: preferences, recent queries/ search history
          Geographic data, images etc ...
     Many incoming requests
     No commercial system is big enough
          Scale is too large for commercial databases
          May not run on their commodity hardware
          No dependence on other vendors
          Optimizations
          Better Price/Performance
          Building internally means the system can be applied across
          many projects for low incremental cost

                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction
                                 Design
                                          Motivation
                        Implementation
                                Results
                            Conclusions


Google goals



     Fault-tolerant, persistent
     Scalable
          1000s of servers
          Millions of reads/writes, efficient scans
     Self-managing
     Simple!




                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                            Introduction
                                  Design
                                           Data model
                         Implementation
                                 Results
                             Conclusions


Bigtable




  Definition
  A Bigtable is a sparse, distributed, persistent multidimensional
  sorted map.

  The map is indexed by a row key, column key, and a timestamp;
  each value in the map is an uninterpreted array of bytes.

  (row:string, column:string, time:int64) -> string
                        Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                            Introduction
                                  Design
                                           Data model
                         Implementation
                                 Results
                             Conclusions


Rows




       The row keys in a table are arbitrary strings
       Every read or write of data under a single row key is atomic
       maintains data in lexicographic order by row key


                        Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                         Introduction
                               Design
                                        Data model
                      Implementation
                              Results
                          Conclusions


Column Families




     Grouped into sets called column families
     All data stored in a column family is usually of the same type
     A column family must be created before data can be stored
     under any column key in that family
     A column key is named using the following syntax:
     family:qualifier
                     Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design
                                         Data model
                       Implementation
                               Results
                           Conclusions


Timestamps


     Each cell in a Bigtable can contain multiple versions of the
     same data; these versions are indexed by timestamp (64-bit
     integers).
     Applications that need to avoid collisions must generate
     unique timestamps themselves.
     To make the management of versioned data less onerous, they
     support two per-column-family settings that tell Bigtable to
     garbage-collect cell versions automatically.




                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction   Building blocks
                                 Design   Tablets
                        Implementation    Compactions
                                Results   Refinements
                            Conclusions


Infrastructure


      Google WorkQueue (scheduler)
      GFS: large-scale distributed file system
          Master: responsible for metadata
          Chunk servers: responsible for r/w large chunks of data
          Chunks replicated on 3 machines; master responsible
      Chubby: lock/file/name service
          Coarse-grained locks; can store small amount of data in a lock
          5 replicas; need a majority vote to be active




                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction   Building blocks
                                Design   Tablets
                       Implementation    Compactions
                               Results   Refinements
                           Conclusions


SSTable




     Lives in GFS
     Immutable, sorted file of key-value pairs
     Chunks of data plus an index
     Index is of block ranges, not values

                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                         Introduction   Building blocks
                               Design   Tablets
                      Implementation    Compactions
                              Results   Refinements
                          Conclusions


Tablet Design




     Large tables broken into tablets at row boundaries
         Tablets hold contiguous rows
         Approx 100 200 MB of data per tablet
     Approx 100 tablets per machine
         Fast recovery
         Load-balancing
     Built out of multiple SSTables
                     Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction   Building blocks
                                 Design   Tablets
                        Implementation    Compactions
                                Results   Refinements
                            Conclusions


Tablet Location




      Like a B+-tree, but fixed at 3 levels
      How can we avoid creating a bottleneck at the root?
          Aggressively cache tablet locations
          Lookup starts from leaf (bet on it being correct); reverse on
          miss
                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction   Building blocks
                                 Design   Tablets
                        Implementation    Compactions
                                Results   Refinements
                            Conclusions


Tablet Assignment
     Each tablet is assigned to one tablet server at a time. The
     master keeps track of the set of live tablet servers, and the
     current assignment of tablets to tablet servers.
     Bigtable uses Chubby to keep track of tablet servers. When a
     tablet server starts, it creates, and acquires an exclusive lock
     on, a uniquely-named file in a specific Chubby directory.
     Tablet server stops serving its tablets if loses its exclusive lock
     The master is responsible for detecting when a tablet server is
     no longer serving its tablets, and for reassigning those tablets
     as soon as possible.
     When a master is started by the cluster management system,
     it needs to discover the current tablet assignments before it
     can change them.
                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction   Building blocks
                                Design   Tablets
                       Implementation    Compactions
                               Results   Refinements
                           Conclusions


Serving a Tablet




      Updates are logged
      Each SSTable corresponds to a batch of updates or a
      snapshot of the tablet taken at some earlier time
      Memtable (sorted by key) caches recent updates
      Reads consult both memtable and SSTables
                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction   Building blocks
                                 Design   Tablets
                        Implementation    Compactions
                                Results   Refinements
                            Conclusions


Compactions

  As write operations execute, the size of the memtable increases.
      Minor compaction convert the memtable into an SSTable
           Reduce memory usage
           Reduce log traffic on restart
      Merging compaction
           Periodically executed in the background
           Reduce number of SSTables
           Good place to apply policy keep only N versions
      Major compaction
           Merging compaction that results in only one SSTable
           No deletion records, only live data
           Reclaim resources.

                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction   Building blocks
                                Design   Tablets
                       Implementation    Compactions
                               Results   Refinements
                           Conclusions


Refinements (1/2)


     Group column families together into an SSTable. Segregating
     column families that are not typically accessed together into
     separate locality groups enables more efficient reads.
     Can compress locality groups, using Bentley and McIlroy’s
     scheme and a fast compression algorithm that looks for
     repetitions.
     Bloom Filters on locality groups allows to ask whether an
     SSTable might contain any data for a specified row/column
     pair. Drastically reduces the number of disk seeks required -
     for non-existent rows or columns do not need to touch disk.


                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction   Building blocks
                                Design   Tablets
                       Implementation    Compactions
                               Results   Refinements
                           Conclusions


Refinements (2/2)


     Caching for read performance ( two levels of caching)
         Scan Cache: higher-level cache that caches the key-value pairs
         returned by the SSTable interface to the tablet server code.
         Block Cache: lower-level cache that caches SSTables blocks
         that were read from GFS.
     Commit-log implementation
     Speeding up tablet recovery (log entries)
     Exploiting immutability



                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Hardware Environment


     Tablet servers were configured to use 1 GB of memory and to
     write to a GFS cell consisting of 1786 machines with two 400
     GB IDE hard drives each.
     Each machine had two dual-core Opteron 2 GHz chips
     Enough physical memory to hold the working set of all
     running processes
     Single gigabit Ethernet link
     Two-level tree-shaped switched network with 100-200 Gbps
     aggregate bandwidth at the root.



                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Results Per Tablet Server




  Number of 1000-byte values read/written per second.


                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Results Aggregate Rate




  Number of 1000-byte values read/written per second.

                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Single tablet-server performance


      The tablet server executes 1200 reads per second ( 75
      MB/s), enough to saturate the tablet server CPUs because of
      overheads in networking stack
      Random and sequential writes perform better than random
      reads (commit log and uses group commit)
      No significant difference between random writes and
      sequential writes (same commit log)
      Sequential reads perform better than random reads (block
      cache)



                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Scaling


      Aggregate throughput increases dramatically performance of
      random reads from memory increases
      However, performance does not increase linearly
      Drop in per-server throughput
          Imbalance in load: Re-balancing is throttled to reduce the
          number of tablet movement and the load generated by
          benchmarks shifts around as the benchmark progresses
          The random read benchmark: transfer one 64KB block over
          the network for every 1000-byte read and saturates shared 1
          Gigabit links



                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                            Introduction
                                           Real applications
                                  Design
                                           Lessons
                         Implementation
                                           End
                                 Results
                             Conclusions


Timestamps




     Google Analytics
     Google Earth
     Personalized Search

                        Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction
                                          Real applications
                                 Design
                                          Lessons
                        Implementation
                                          End
                                Results
                            Conclusions


Lessons learned
      Large distributed systems are vulnerable to many types of
      failures, not just the standard network partitions and fail-stop
      failures
          Memory and network corruption
          Large clock skew
          Extended and asymmetric network partitions
          Bugs in other systems (Chubby !)
          ...
      Delay adding new features until it is clear how the new
      features will be used
      A practical lesson: the importance of proper system-level
      monitoring
      Keep It Simple!
                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                  Introduction
                                 Real applications
                        Design
                                 Lessons
               Implementation
                                 End
                       Results
                   Conclusions


END!




 QUESTIONS ?




           Alvanos Michalis      Bigtable: A Distributed Storage System for Structured Data

Contenu connexe

Tendances

Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremGrisha Weintraub
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Improving Presto performance with Alluxio at TikTok
Improving Presto performance with Alluxio at TikTokImproving Presto performance with Alluxio at TikTok
Improving Presto performance with Alluxio at TikTokAlluxio, Inc.
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And HbaseEdward Yoon
 
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )SANG WON PARK
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonRoberto Gaiser
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoCLOUDIAN KK
 
Grid Computing Systems and Resource Management
Grid Computing Systems and Resource ManagementGrid Computing Systems and Resource Management
Grid Computing Systems and Resource ManagementSouparnika Patil
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
Server system architecture
Server system architectureServer system architecture
Server system architectureFaiza Hafeez
 

Tendances (20)

Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Improving Presto performance with Alluxio at TikTok
Improving Presto performance with Alluxio at TikTokImproving Presto performance with Alluxio at TikTok
Improving Presto performance with Alluxio at TikTok
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And Hbase
 
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparison
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in Tokyo
 
Grid Computing Systems and Resource Management
Grid Computing Systems and Resource ManagementGrid Computing Systems and Resource Management
Grid Computing Systems and Resource Management
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Big table
Big tableBig table
Big table
 
Server system architecture
Server system architectureServer system architecture
Server system architecture
 

En vedette

Bigtable
BigtableBigtable
Bigtablenextlib
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonGrisha Weintraub
 
Big table
Big tableBig table
Big tablePSIT
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentationvanjakom
 
Cloud-native Apps - Architektur, Implementierung, Demo
Cloud-native Apps - Architektur, Implementierung, DemoCloud-native Apps - Architektur, Implementierung, Demo
Cloud-native Apps - Architektur, Implementierung, DemoAndreas Koop
 
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYSupport Driven
 
I've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveI've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveSimon Willison
 
Innovation Case Study BitTorrent
Innovation Case Study BitTorrentInnovation Case Study BitTorrent
Innovation Case Study BitTorrentPetter S. Rønning
 
Privacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storagePrivacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storageparry prabhu
 
Privacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storagePrivacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storageNagamalleswararao Tadikonda
 
Privacy preserving public auditing for secure cloud storage
Privacy preserving public auditing for secure cloud storagePrivacy preserving public auditing for secure cloud storage
Privacy preserving public auditing for secure cloud storageMustaq Syed
 
Privacy Preserving Public Auditing for Data Storage Security in Cloud
Privacy Preserving Public Auditing for Data Storage Security in Cloud Privacy Preserving Public Auditing for Data Storage Security in Cloud
Privacy Preserving Public Auditing for Data Storage Security in Cloud Girish Chandra
 
Privacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Privacy Preserving Public Auditing for Data Storage Security in Cloud.pptPrivacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Privacy Preserving Public Auditing for Data Storage Security in Cloud.pptGirish Chandra
 

En vedette (20)

google Bigtable
google Bigtablegoogle Bigtable
google Bigtable
 
Bigtable
BigtableBigtable
Bigtable
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
 
Google's BigTable
Google's BigTableGoogle's BigTable
Google's BigTable
 
Big table
Big tableBig table
Big table
 
Bigtable
BigtableBigtable
Bigtable
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
 
Cloud-native Apps - Architektur, Implementierung, Demo
Cloud-native Apps - Architektur, Implementierung, DemoCloud-native Apps - Architektur, Implementierung, Demo
Cloud-native Apps - Architektur, Implementierung, Demo
 
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
 
I've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveI've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you have
 
Big table
Big tableBig table
Big table
 
Innovation Case Study BitTorrent
Innovation Case Study BitTorrentInnovation Case Study BitTorrent
Innovation Case Study BitTorrent
 
Privacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storagePrivacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storage
 
Big table
Big tableBig table
Big table
 
Privacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storagePrivacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storage
 
Bigtable
BigtableBigtable
Bigtable
 
Privacy preserving public auditing for secure cloud storage
Privacy preserving public auditing for secure cloud storagePrivacy preserving public auditing for secure cloud storage
Privacy preserving public auditing for secure cloud storage
 
Privacy Preserving Public Auditing for Data Storage Security in Cloud
Privacy Preserving Public Auditing for Data Storage Security in Cloud Privacy Preserving Public Auditing for Data Storage Security in Cloud
Privacy Preserving Public Auditing for Data Storage Security in Cloud
 
Privacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Privacy Preserving Public Auditing for Data Storage Security in Cloud.pptPrivacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Privacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
 
Ppt 1
Ppt 1Ppt 1
Ppt 1
 

Similaire à Bigtable: A Distributed Storage System for Structured Data

Snowflake Cloning.pdf
Snowflake Cloning.pdfSnowflake Cloning.pdf
Snowflake Cloning.pdfVishnuGone
 
Bigtable
BigtableBigtable
Bigtableptdorf
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...Zilliz
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoDave Stokes
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06temp2004it
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06mrlonganh
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018Dave Stokes
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Bill Wilder
 

Similaire à Bigtable: A Distributed Storage System for Structured Data (20)

Scaling Up vs. Scaling-out
Scaling Up vs. Scaling-outScaling Up vs. Scaling-out
Scaling Up vs. Scaling-out
 
Snowflake Cloning.pdf
Snowflake Cloning.pdfSnowflake Cloning.pdf
Snowflake Cloning.pdf
 
Bigtable
BigtableBigtable
Bigtable
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
 
NOSQL
NOSQLNOSQL
NOSQL
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Gfs论文
Gfs论文Gfs论文
Gfs论文
 
The google file system
The google file systemThe google file system
The google file system
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
 
Sybase IQ Big Data
Sybase IQ Big DataSybase IQ Big Data
Sybase IQ Big Data
 
Sybase IQ ve Big Data
Sybase IQ ve Big DataSybase IQ ve Big Data
Sybase IQ ve Big Data
 

Plus de elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

Plus de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Dernier

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Dernier (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Bigtable: A Distributed Storage System for Structured Data

  • 1. Outline Introduction Design Implementation Results Conclusions Bigtable: A Distributed Storage System for Structured Data Alvanos Michalis April 6, 2009 Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 2. Outline Introduction Design Implementation Results Conclusions 1 Introduction Motivation 2 Design Data model 3 Implementation Building blocks Tablets Compactions Refinements 4 Results Hardware Environment Performance Evaluation 5 Conclusions Real applications Lessons End Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 3. Outline Introduction Design Motivation Implementation Results Conclusions Google! Lots of Different kinds of data! Crawling system URLs, contents, links, anchors, page-rank etc Per-user data: preferences, recent queries/ search history Geographic data, images etc ... Many incoming requests No commercial system is big enough Scale is too large for commercial databases May not run on their commodity hardware No dependence on other vendors Optimizations Better Price/Performance Building internally means the system can be applied across many projects for low incremental cost Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 4. Outline Introduction Design Motivation Implementation Results Conclusions Google goals Fault-tolerant, persistent Scalable 1000s of servers Millions of reads/writes, efficient scans Self-managing Simple! Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 5. Outline Introduction Design Data model Implementation Results Conclusions Bigtable Definition A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. (row:string, column:string, time:int64) -> string Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 6. Outline Introduction Design Data model Implementation Results Conclusions Rows The row keys in a table are arbitrary strings Every read or write of data under a single row key is atomic maintains data in lexicographic order by row key Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 7. Outline Introduction Design Data model Implementation Results Conclusions Column Families Grouped into sets called column families All data stored in a column family is usually of the same type A column family must be created before data can be stored under any column key in that family A column key is named using the following syntax: family:qualifier Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 8. Outline Introduction Design Data model Implementation Results Conclusions Timestamps Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp (64-bit integers). Applications that need to avoid collisions must generate unique timestamps themselves. To make the management of versioned data less onerous, they support two per-column-family settings that tell Bigtable to garbage-collect cell versions automatically. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 9. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Infrastructure Google WorkQueue (scheduler) GFS: large-scale distributed file system Master: responsible for metadata Chunk servers: responsible for r/w large chunks of data Chunks replicated on 3 machines; master responsible Chubby: lock/file/name service Coarse-grained locks; can store small amount of data in a lock 5 replicas; need a majority vote to be active Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 10. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions SSTable Lives in GFS Immutable, sorted file of key-value pairs Chunks of data plus an index Index is of block ranges, not values Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 11. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Tablet Design Large tables broken into tablets at row boundaries Tablets hold contiguous rows Approx 100 200 MB of data per tablet Approx 100 tablets per machine Fast recovery Load-balancing Built out of multiple SSTables Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 12. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Tablet Location Like a B+-tree, but fixed at 3 levels How can we avoid creating a bottleneck at the root? Aggressively cache tablet locations Lookup starts from leaf (bet on it being correct); reverse on miss Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 13. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Tablet Assignment Each tablet is assigned to one tablet server at a time. The master keeps track of the set of live tablet servers, and the current assignment of tablets to tablet servers. Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. Tablet server stops serving its tablets if loses its exclusive lock The master is responsible for detecting when a tablet server is no longer serving its tablets, and for reassigning those tablets as soon as possible. When a master is started by the cluster management system, it needs to discover the current tablet assignments before it can change them. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 14. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Serving a Tablet Updates are logged Each SSTable corresponds to a batch of updates or a snapshot of the tablet taken at some earlier time Memtable (sorted by key) caches recent updates Reads consult both memtable and SSTables Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 15. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Compactions As write operations execute, the size of the memtable increases. Minor compaction convert the memtable into an SSTable Reduce memory usage Reduce log traffic on restart Merging compaction Periodically executed in the background Reduce number of SSTables Good place to apply policy keep only N versions Major compaction Merging compaction that results in only one SSTable No deletion records, only live data Reclaim resources. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 16. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Refinements (1/2) Group column families together into an SSTable. Segregating column families that are not typically accessed together into separate locality groups enables more efficient reads. Can compress locality groups, using Bentley and McIlroy’s scheme and a fast compression algorithm that looks for repetitions. Bloom Filters on locality groups allows to ask whether an SSTable might contain any data for a specified row/column pair. Drastically reduces the number of disk seeks required - for non-existent rows or columns do not need to touch disk. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 17. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Refinements (2/2) Caching for read performance ( two levels of caching) Scan Cache: higher-level cache that caches the key-value pairs returned by the SSTable interface to the tablet server code. Block Cache: lower-level cache that caches SSTables blocks that were read from GFS. Commit-log implementation Speeding up tablet recovery (log entries) Exploiting immutability Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 18. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Hardware Environment Tablet servers were configured to use 1 GB of memory and to write to a GFS cell consisting of 1786 machines with two 400 GB IDE hard drives each. Each machine had two dual-core Opteron 2 GHz chips Enough physical memory to hold the working set of all running processes Single gigabit Ethernet link Two-level tree-shaped switched network with 100-200 Gbps aggregate bandwidth at the root. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 19. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Results Per Tablet Server Number of 1000-byte values read/written per second. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 20. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Results Aggregate Rate Number of 1000-byte values read/written per second. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 21. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Single tablet-server performance The tablet server executes 1200 reads per second ( 75 MB/s), enough to saturate the tablet server CPUs because of overheads in networking stack Random and sequential writes perform better than random reads (commit log and uses group commit) No significant difference between random writes and sequential writes (same commit log) Sequential reads perform better than random reads (block cache) Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 22. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Scaling Aggregate throughput increases dramatically performance of random reads from memory increases However, performance does not increase linearly Drop in per-server throughput Imbalance in load: Re-balancing is throttled to reduce the number of tablet movement and the load generated by benchmarks shifts around as the benchmark progresses The random read benchmark: transfer one 64KB block over the network for every 1000-byte read and saturates shared 1 Gigabit links Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 23. Outline Introduction Real applications Design Lessons Implementation End Results Conclusions Timestamps Google Analytics Google Earth Personalized Search Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 24. Outline Introduction Real applications Design Lessons Implementation End Results Conclusions Lessons learned Large distributed systems are vulnerable to many types of failures, not just the standard network partitions and fail-stop failures Memory and network corruption Large clock skew Extended and asymmetric network partitions Bugs in other systems (Chubby !) ... Delay adding new features until it is clear how the new features will be used A practical lesson: the importance of proper system-level monitoring Keep It Simple! Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 25. Outline Introduction Real applications Design Lessons Implementation End Results Conclusions END! QUESTIONS ? Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data