SlideShare une entreprise Scribd logo
1  sur  91
Télécharger pour lire hors ligne
MVCC Unmasked!

          March 2013


     This presentation explains how Multiversion Concurrency Control (MVCC) is
     implemented in Postgres, and highlights optimizations which minimize the
     downsides of MVCC.
                                                        http://momjian.us/presentations


© 2013 EnterpriseDB. All rights reserved.         1
Why Unmask MVCC?!
          u  Predict concurrent query behavior
          u  Manage MVCC performance effects
          u  Understand storage space reuse




© 2013 EnterpriseDB. All rights reserved.   2
Outline!
          u    Overview of EnterpriseDB – who are we and how are we involved in
                the PostgreSQL community
          u    Introduction to MVCC
          u    (MVCC Implementation)
          u    MVCC Clean-up Requirements & Behavior
          u    Available PostgreSQL Resources




© 2013 EnterpriseDB. All rights reserved.   3
Who is EnterpriseDB?!




    u     Largest commercial entity behind PostgreSQL!
    u     Enables open-source cost benefits along with the products, resources
           and support of a full commercial database company!
    u     Enhanced Postgres Plus products for enterprise-class applications:!
                •    Security!
                •    Performance!
                •    Oracle compatibility!
                •    Monitoring, management & replication tools!

    u     Global provider for all your PostgreSQL needs!
    !

© 2013 EnterpriseDB. All rights reserved.                  4
What is MVCC?!

          Multiversion Concurrency Control (MVCC) allows Postgres to offer
          high concurrency even during significant database read/write
          activity.

          MVCC specifically offers behavior where “readers never block
          writers, and writers never block readers.”

          This presentation explains how MVCC is implemented in Postgres
          and highlights optimizations which minimize the downsides of
          MVCC.




© 2013 EnterpriseDB. All rights reserved.   5
Which other databases support MVCC?!
          u    Oracle
          u    DBs (partial)
          u    MySQL with InnoDB
          u    Informix
          u    Firebird
          u    MSSQL (optional, disabled by default)




© 2013 EnterpriseDB. All rights reserved.    6
MVCC Behavior!

           Cre                         40
                                                INSERT
           Exp


           Cre                         40
           Exp                          47      DELETE


           Cre                         64    old (delete)
           Exp                          78
                                                UPDATE
           Cre                         78
                                             new (insert)
           Exp




© 2013 EnterpriseDB. All rights reserved.                   7
MVCC Snapshots!
          MVCC snapshots which tuples are visible for SQL statements. A
          snapshot is recorded at the start of each SQL statement in READ
          COMMITTED transaction isolation mode, and at transaction start in
          SERIALIZABLE transaction isolation mode. In fact, it is frequency
          of taking new snapshots that controls the transactions isolation
          behavior.

          When a new snapshot is taken, the following information is
          gathered:

                   •  The highest-numbered committed transaction !
                   •  The transaction numbers currently executing!

          Using this snapshot information, Postgres can determine if a
          transaction’s actions should be visible to an executing statement.

© 2013 EnterpriseDB. All rights reserved.        8
MVCC Snapshots Determine Row Visibility!
           Create-Only

        Cre                30                                  Sequential Scan
                                            Visible
        Exp
        Cre                50               Invisible
        Exp                                                            Snapshot
        Cre              110                Invisible                  The highest-numbered
        Exp                                                            committed transaction: 100

                                                                       Open Transactions: 25, 50, 75
        Cre             30
                                            Invisible
        Exp             80                                             For simplicity, assume all other
                                                                       transactions are committed.
        Cre             30
                                            Visible
        Exp             75
        Cre            30
        Exp            110                  Visible

        Internally, the creation xid is stored in the system column ‘xmin’, and expire in ‘xmax’.

© 2013 EnterpriseDB. All rights reserved.                  9
Confused Yet?!
          Source code comment in src/backend/utils/time/tqal.c:




Mao [Mike Olson] says 17 march 1993: the tests in this routine are correct; if you
think they’re not, you’re wrong, and you should think about it again. i know, it
happened to me.


© 2013 EnterpriseDB. All rights reserved.   10
Implementation Details!



          All queries were generated on an unmodified version of Postgres.
          The contrib module pageinspect was installed to show internal
          heap page information and pg_freespacemap was installed to
          show free space map information.




© 2013 EnterpriseDB. All rights reserved.   11
Setup!




© 2013 EnterpriseDB. All rights reserved.   12
Setup!




© 2013 EnterpriseDB. All rights reserved.   13
Insert Using Xmin!




           All the queries used in this presentation are available at
           http://momjian.us/main/writings/pgsql/mvcc.sql!


© 2013 EnterpriseDB. All rights reserved.      14
Just Like INSERT!

           Cre                         40
                                                INSERT
           Exp


           Cre                         40
           Exp                          47      DELETE


           Cre                         64    old (delete)
           Exp                          78
                                                UPDATE
           Cre                         78
                                             new (insert)
           Exp




© 2013 EnterpriseDB. All rights reserved.                   15
DELETE USING Xmax!




© 2013 EnterpriseDB. All rights reserved.   16
DELETE Using Xmax!




© 2013 EnterpriseDB. All rights reserved.   17
Just Like DELETE!

           Cre                         40
                                                INSERT
           Exp


           Cre                         40
           Exp                          47      DELETE


           Cre                         64    old (delete)
           Exp                          78
                                                UPDATE
           Cre                         78
                                             new (insert)
           Exp




© 2013 EnterpriseDB. All rights reserved.                   18
UPDATE Using Xmin and Xmax!




© 2013 EnterpriseDB. All rights reserved.   19
UPDATE Using Xmin and Xmax!




© 2013 EnterpriseDB. All rights reserved.   20
Just Like UPDATE!

           Cre                         40
                                                INSERT
           Exp


           Cre                         40
           Exp                          47      DELETE


           Cre                         64    old (delete)
           Exp                          78
                                                UPDATE
           Cre                         78
                                             new (insert)
           Exp




© 2013 EnterpriseDB. All rights reserved.                   21
Aborted Transaction IDs Remain!




© 2013 EnterpriseDB. All rights reserved.   22
Aborted IDs can remain because Transaction Status is recorded centrally
                                                                           !
                         Pg_clog
          XID            Status flags
                            0 0 0 1 0 0 1 0
                    028
                            1 0 1 0 0 0 0 0
                    024
                            1 0 1 0 0 1 0 0
                    020
                            0 0 0 0 0 0 1 1
                    016
                            0 0 0 1 0 1 1 0
                    012
                            1 0 1 0 0 0 1 0
                    008
                            1 0 1 0 0 0 0 0
                    004
                            1 0 0 1 0 0 1 0
                    000

                                              Transaction roll back marks the transaction
                                              ID as aborted. All sessions will ignore such
                                              transactions; it is not necessary to revisit
                                              each row to undo the transaction

© 2013 EnterpriseDB. All rights reserved.         23
Row Locks Using Xmax!




© 2013 EnterpriseDB. All rights reserved.   24
Row Locks Using Xmax!




© 2013 EnterpriseDB. All rights reserved.   25
Multi-Statement Transactions!


          Multi-statement transactions require extra tracking because each
          statement has its own visibility rules. For example, a cursor’s
          contents must remain unchanged even if later statements in the
          same transaction modify rows. Such tracking is implemented using
          system command id columns cmin/cmax, which is internally
          actually is a single column.




© 2013 EnterpriseDB. All rights reserved.   26
INSERT Using Cmin!




© 2013 EnterpriseDB. All rights reserved.   27
INSERT Using Cmin!




© 2013 EnterpriseDB. All rights reserved.   28
DELETE Using Cmin!




© 2013 EnterpriseDB. All rights reserved.   29
DELETE Using Cmin!




© 2013 EnterpriseDB. All rights reserved.   30
DELETE Using Cmin!




  A cursor had to be used because the rows were created and deleted in
  this transaction and therefore never visible outside this transaction.

© 2013 EnterpriseDB. All rights reserved.   31
Update Using Cmin!




© 2013 EnterpriseDB. All rights reserved.   32
Update Using Cmin!




© 2013 EnterpriseDB. All rights reserved.   33
Modifying Rows from Different Transactions!




© 2013 EnterpriseDB. All rights reserved.   34
Modifying Rows from Different Transactions!




© 2013 EnterpriseDB. All rights reserved.   35
Modifying Rows from Different Transactions!




© 2013 EnterpriseDB. All rights reserved.   36
Combo Command Id!


          Because cmin and cmax are internally a single system column, it is
          impossible to simply record the status of a row that is created and
          expired in the same multi-statement transaction. For that reason, a
          special combo command id is created that references a local
          memory hash that contains the actual cmin and cmax values.




© 2013 EnterpriseDB. All rights reserved.   37
Update Using Combo Command Ids!




© 2013 EnterpriseDB. All rights reserved.   38
UPDATE Using Combo Command Ids!




© 2013 EnterpriseDB. All rights reserved.   39
Update Using Combo Command Ids!




© 2013 EnterpriseDB. All rights reserved.   40
Update Using Combo Command Ids!




The last query uses /contrib/pageinspect, which allows visibility of internal heap
page structures and all stored rows, including those not visible in the current
snapshot. (Bit 0x0020 is internally called HEAP_COMBOCID.)

© 2013 EnterpriseDB. All rights reserved.   41
MVCC Implementation Summary!

            xmin:                  creation transaction number, set by INSERT and
                                   UPDATE

            xmax:                  expire transaction number, set by UPDATE and
                                   DELETE; also used for explicit row locks

cmin/cmax:                         used to identify the command number that created or
                                   expired the tuple; also used to store combo command
                                   ids when the tuple is created and expired in the same
                                   transaction, and for explicit row locks




© 2013 EnterpriseDB. All rights reserved.               42
Traditional Cleanup Requirements!



          Traditional single-row-version (non-MVCC) database systems
          require storage space cleanup:
          u  deleted rows
          u  rows created by aborted transactions




© 2013 EnterpriseDB. All rights reserved.   43
MVCC Cleanup Requirements!


          MVCC has additional cleanup requirements:
          u  The creation of a new row during UPDATE (rather than
              replacing the existing row); the storage space taken by the old
              row must eventually be recycled.
          u  The delayed cleanup of deleted rows (cleanup cannot occur
              until there are no transactions for which the row is visible)

          Postgres handles both traditional and MVCC-specific cleanup
          requirements.




© 2013 EnterpriseDB. All rights reserved.   44
Cleanup Behavior!


          Fortunately, Postgres cleanup happens automatically:
          u  On-demand cleanup of a single heap page during row access,
              specifically when a page is accessed by SELECT, UPDATE and
              DELETE
          u  In bulk by an autovacuum processes that runs in the
              background

          Cleanup can also be initiated manually by VACUUM.




© 2013 EnterpriseDB. All rights reserved.   45
Aspects of Cleanup!


          Cleanup involves recycling space taken by several entities:
          u  Heap tuples/rows (the largest)
          u  Heap item pointers (the smallest)
          u  Index entries




© 2013 EnterpriseDB. All rights reserved.   46
Internal Heap Page!




© 2013 EnterpriseDB. All rights reserved.   47
Indexes Points to Items, Not Tuples!




© 2013 EnterpriseDB. All rights reserved.   48
Heap Tuple Space Recycling!




          Indexes prevent item pointers from being recycled.


© 2013 EnterpriseDB. All rights reserved.   49
VACUUM Later Recycle Items!




          VACUUM performs index cleanup, then can mark “dead” items as
          “unused”.
© 2013 EnterpriseDB. All rights reserved.   50
Cleanup of Deleted Rows!




© 2013 EnterpriseDB. All rights reserved.   51
Cleanup Deleted Rows!




© 2013 EnterpriseDB. All rights reserved.   52
Cleanup of Deleted Rows!




     In normal, multi-user usage, cleanup might have been delayed
     because other open transactions in the same database might still
     need to view the expired rows. However, the behavior would be the
     same, just delays.

© 2013 EnterpriseDB. All rights reserved.   53
Cleanup of Deleted Rows!




© 2013 EnterpriseDB. All rights reserved.   54
Same as this Slide!




© 2013 EnterpriseDB. All rights reserved.   55
Cleanup of Deleted Rows!




© 2013 EnterpriseDB. All rights reserved.   56
Same as this Slide!




© 2013 EnterpriseDB. All rights reserved.   57
Free Space Map (FSM)!




          VACUUM also updates the free space map (FSM), which records
          pages containing significant free space. This information is used to
          provide target pages for INSERTs and some UPDATEs (those
          crossing page boundaries). Single-page vacuum does not update
          the free space map.



© 2013 EnterpriseDB. All rights reserved.   58
Another Free Space Map Example!




© 2013 EnterpriseDB. All rights reserved.   59
Another Free Space Map Example!




© 2013 EnterpriseDB. All rights reserved.   60
Another Free Space Map Example!




© 2013 EnterpriseDB. All rights reserved.   61
VACUUM Also Removes End-of-File Pages!




             VACUUM FULL shrinks the table file to its minimum size, but
             requires an exclusive table lock


© 2013 EnterpriseDB. All rights reserved.   62
Optimized Single-Page Cleanup of Old UPDATE Rows!



          The storage space taken by old UPDATE tuples can be reclaimed
          just like deleted rows. However, certain UPDATE rows can even
          have their items reclaimed, i.e. it is possible to reuse certain old
          UPDATE items, rath than making them as “dead” and requiring
          VACUUM to reclaim them after removing referencing index entries.
          Specifically, such item reuse is possible with special HOT update
          (head-only tuple) chains, where the chain is on a single heap page
          and all indexed values in the chain are identical.




© 2013 EnterpriseDB. All rights reserved.   63
Single-Page Cleanup of HOT UPDATE Rows!


          HOT update items can be freed (marked “unused”) if they are in the
          middle of the chain, i.e. not at the beginning or end of the chain.
          At the head of the chain is a special “Redirect” item pointers that
          are referenced by indexes; this is possible because all indexed
          values are identical in a HOT/redirect chain.
          Index creation with HOT chains is complex because the chains
          might contain inconsistent values for the newly indexed columns.
          This is handled by indexing just the end of the HOT chain and
          allowing the index to be used only by transactions that start after
          the index has been created. (Specifically, post-index-creation
          transactions cannot see the inconsistent HOT chain values due to
          MVCC visibility rules; they only see the end of the chain.)




© 2013 EnterpriseDB. All rights reserved.   64
Initial Single-Row State!




© 2013 EnterpriseDB. All rights reserved.   65
UPDATE Adds a New Row!




          No index entry added because indexes only point to the head of
          the HOT chain.


© 2013 EnterpriseDB. All rights reserved.   66
Redirect Allows Indexes To Remain Valid!




© 2013 EnterpriseDB. All rights reserved.   67
UPDATE Replaces Another Old Row!




© 2013 EnterpriseDB. All rights reserved.   68
All OLD UPDATE Row Versions Eventually Removed!




          This cleanup was performed by another operation on the same
          page.


© 2013 EnterpriseDB. All rights reserved.   69
Cleanup of Old Updated Rows!




© 2013 EnterpriseDB. All rights reserved.   70
Cleanup of Old Updated Rows!




© 2013 EnterpriseDB. All rights reserved.   71
Cleanup of Old Updated Rows!




          No index entry added because indexes only point to the head of
          the HOT chain.


© 2013 EnterpriseDB. All rights reserved.   72
Same as this Slide!




© 2013 EnterpriseDB. All rights reserved.   73
Cleanup of Old Updated Rows!




© 2013 EnterpriseDB. All rights reserved.   74
Same as this Slide!




© 2013 EnterpriseDB. All rights reserved.   75
Cleanup of Old Updated Rows!




© 2013 EnterpriseDB. All rights reserved.   76
Same as this Slide!




© 2013 EnterpriseDB. All rights reserved.   77
VACUUM Does Not Remove the Redirect!




© 2013 EnterpriseDB. All rights reserved.   78
Cleanup Using Manual VACUUM!




© 2013 EnterpriseDB. All rights reserved.   79
Cleanup Using Manual VACUUM!




© 2013 EnterpriseDB. All rights reserved.   80
The Indexed UPDATE Problem!


          The updating of any indexed columns prevents the use of “redirect”
          items because the chain must be usable by all indexes, i.e. a
          redirect/HOT UPDATE cannot require additional index entries due
          to an indexed value change.
          In such cases, item pointers can only be marked as “dead”, like
          DELETE does.
          No previously shown UPDATE queries modified indexed columns.




© 2013 EnterpriseDB. All rights reserved.   81
Index mvcc_demo Column!




© 2013 EnterpriseDB. All rights reserved.   82
UPDATE of an Indexed Column!




© 2013 EnterpriseDB. All rights reserved.   83
UPDATE of an Indexed Column!




© 2013 EnterpriseDB. All rights reserved.   84
UPDATE of an Indexed Column!




© 2013 EnterpriseDB. All rights reserved.   85
UPDATE of an Indexed Column!




© 2013 EnterpriseDB. All rights reserved.   86
UPDATE of an Indexed Column!




© 2013 EnterpriseDB. All rights reserved.   87
Cleanup Summary!




          Cleanup is possible only when there are no active transactions for
          which the tuples are visible.
          HOT items are UPDATE chains that span a single page and
          contain identical indexed column values.
          In normal usage, single-page cleanup performs the majority of the
          unnecessary index entries, and updates the free space map (FSM).


© 2013 EnterpriseDB. All rights reserved.   88
Conclusion!




          All the queries used in this presentation are available at:
          http://momjian.us/main/writings/pgsql/mvcc.sql
          http://momjian.us/main/presentations/


© 2013 EnterpriseDB. All rights reserved.            89
Postgres Resources!
u    Upcoming training and events
      •  Postgres training, NY – March 21st !
           –  http://pgday.nycpug.org/training/ !
      •  PGDay, NY – March 22nd!
           –  http://pgday.nycpug.org/!
      •  PGCon, Ottawa, CA – May 21-24!
           –  http://www.pgcon.org/2013/!


u    EnterpriseDB live and on-demand training
      •  http://www.enterprisedb.com/store/products/dba-training !


u    Resources & Community
      •  Whitepapers, datasheets, podcasts, videos, community!
      •  http://www.enterprisedb.com/resources-community !


u    Postgres products, support and services
      •  sales@enterprisedb.com or visit www.enterprisedb.com !
What are your Questions??!




© 2013 EnterpriseDB. All rights reserved.

Contenu connexe

En vedette

The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL ExplainMYXPLAIN
 
Como migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designerComo migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designerAlex Bernal
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structurezhaolinjnu
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQLVu Hung Nguyen
 
Transactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsTransactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsVlad Mihalcea
 
High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016Vlad Mihalcea
 
High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016Vlad Mihalcea
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internalmysqlops
 
MySQL EXPLAIN Explained-Norvald H. Ryeng
MySQL EXPLAIN Explained-Norvald H. RyengMySQL EXPLAIN Explained-Norvald H. Ryeng
MySQL EXPLAIN Explained-Norvald H. Ryeng郁萍 王
 
SQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they workSQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they workMarkus Winand
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain ExplainedJeremy Coates
 
Explaining the MySQL Explain
Explaining the MySQL ExplainExplaining the MySQL Explain
Explaining the MySQL ExplainMYXPLAIN
 
Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6MYXPLAIN
 
High Performance Hibernate JavaZone 2016
High Performance Hibernate JavaZone 2016High Performance Hibernate JavaZone 2016
High Performance Hibernate JavaZone 2016Vlad Mihalcea
 

En vedette (16)

The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL Explain
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
Como migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designerComo migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designer
 
Explain
ExplainExplain
Explain
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structure
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQL
 
Transactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsTransactions and Concurrency Control Patterns
Transactions and Concurrency Control Patterns
 
High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016
 
High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
 
MySQL EXPLAIN Explained-Norvald H. Ryeng
MySQL EXPLAIN Explained-Norvald H. RyengMySQL EXPLAIN Explained-Norvald H. Ryeng
MySQL EXPLAIN Explained-Norvald H. Ryeng
 
SQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they workSQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they work
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
 
Explaining the MySQL Explain
Explaining the MySQL ExplainExplaining the MySQL Explain
Explaining the MySQL Explain
 
Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6
 
High Performance Hibernate JavaZone 2016
High Performance Hibernate JavaZone 2016High Performance Hibernate JavaZone 2016
High Performance Hibernate JavaZone 2016
 

Similaire à Mv unmasked.w.code.march.2013

XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...The Linux Foundation
 
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...tutorialsruby
 
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...tutorialsruby
 
Sangam 18 - Database Development: Return of the SQL Jedi
Sangam 18 - Database Development: Return of the SQL JediSangam 18 - Database Development: Return of the SQL Jedi
Sangam 18 - Database Development: Return of the SQL JediConnor McDonald
 
Running your Java EE 6 Applications in the Cloud
Running your Java EE 6 Applications in the CloudRunning your Java EE 6 Applications in the Cloud
Running your Java EE 6 Applications in the CloudArun Gupta
 
Boston meetup : MySQL Innodb Cluster - May 1st 2017
Boston meetup : MySQL Innodb Cluster - May 1st  2017Boston meetup : MySQL Innodb Cluster - May 1st  2017
Boston meetup : MySQL Innodb Cluster - May 1st 2017Frederic Descamps
 
MySQL NoSQL Document Store
MySQL NoSQL Document StoreMySQL NoSQL Document Store
MySQL NoSQL Document StoreMark Swarbrick
 
Z Garbage Collector
Z Garbage CollectorZ Garbage Collector
Z Garbage CollectorDavid Buck
 
JFokus 2011 - Running your Java EE 6 apps in the Cloud
JFokus 2011 - Running your Java EE 6 apps in the CloudJFokus 2011 - Running your Java EE 6 apps in the Cloud
JFokus 2011 - Running your Java EE 6 apps in the CloudArun Gupta
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory AnalysisMoabi.com
 
Running your Java EE 6 applications in the Cloud
Running your Java EE 6 applications in the CloudRunning your Java EE 6 applications in the Cloud
Running your Java EE 6 applications in the CloudArun Gupta
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllThomas Wuerthinger
 
Create custom looping procedures in sql server 2005 tech republic
Create custom looping procedures in sql server 2005   tech republicCreate custom looping procedures in sql server 2005   tech republic
Create custom looping procedures in sql server 2005 tech republicKaing Menglieng
 
Regular Expressions with full Unicode support
Regular Expressions with full Unicode supportRegular Expressions with full Unicode support
Regular Expressions with full Unicode supportMartinHanssonOracle
 
Running your Java EE 6 applications in the Cloud (FISL 12)
Running your Java EE 6 applications in the Cloud (FISL 12)Running your Java EE 6 applications in the Cloud (FISL 12)
Running your Java EE 6 applications in the Cloud (FISL 12)Arun Gupta
 
[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory AnalysisMoabi.com
 
What's new in CQ 5.3? Top 10 features.
What's new in CQ 5.3? Top 10 features.What's new in CQ 5.3? Top 10 features.
What's new in CQ 5.3? Top 10 features.David Nuescheler
 

Similaire à Mv unmasked.w.code.march.2013 (20)

XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
 
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
 
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
exploit-writing-tutorial-part-5-how-debugger-modules-plugins-can-speed-up-bas...
 
Sangam 18 - Database Development: Return of the SQL Jedi
Sangam 18 - Database Development: Return of the SQL JediSangam 18 - Database Development: Return of the SQL Jedi
Sangam 18 - Database Development: Return of the SQL Jedi
 
Running your Java EE 6 Applications in the Cloud
Running your Java EE 6 Applications in the CloudRunning your Java EE 6 Applications in the Cloud
Running your Java EE 6 Applications in the Cloud
 
Boston meetup : MySQL Innodb Cluster - May 1st 2017
Boston meetup : MySQL Innodb Cluster - May 1st  2017Boston meetup : MySQL Innodb Cluster - May 1st  2017
Boston meetup : MySQL Innodb Cluster - May 1st 2017
 
MySQL NoSQL Document Store
MySQL NoSQL Document StoreMySQL NoSQL Document Store
MySQL NoSQL Document Store
 
Z Garbage Collector
Z Garbage CollectorZ Garbage Collector
Z Garbage Collector
 
JFokus 2011 - Running your Java EE 6 apps in the Cloud
JFokus 2011 - Running your Java EE 6 apps in the CloudJFokus 2011 - Running your Java EE 6 apps in the Cloud
JFokus 2011 - Running your Java EE 6 apps in the Cloud
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
 
Running your Java EE 6 applications in the Cloud
Running your Java EE 6 applications in the CloudRunning your Java EE 6 applications in the Cloud
Running your Java EE 6 applications in the Cloud
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them All
 
Create custom looping procedures in sql server 2005 tech republic
Create custom looping procedures in sql server 2005   tech republicCreate custom looping procedures in sql server 2005   tech republic
Create custom looping procedures in sql server 2005 tech republic
 
Tom Kyte at Hotsos 2015
Tom Kyte at Hotsos 2015Tom Kyte at Hotsos 2015
Tom Kyte at Hotsos 2015
 
Regular Expressions with full Unicode support
Regular Expressions with full Unicode supportRegular Expressions with full Unicode support
Regular Expressions with full Unicode support
 
Introducing CQ 5.1
Introducing CQ 5.1Introducing CQ 5.1
Introducing CQ 5.1
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
 
Running your Java EE 6 applications in the Cloud (FISL 12)
Running your Java EE 6 applications in the Cloud (FISL 12)Running your Java EE 6 applications in the Cloud (FISL 12)
Running your Java EE 6 applications in the Cloud (FISL 12)
 
[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis
 
What's new in CQ 5.3? Top 10 features.
What's new in CQ 5.3? Top 10 features.What's new in CQ 5.3? Top 10 features.
What's new in CQ 5.3? Top 10 features.
 

Plus de EDB

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenEDB
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube EDB
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EDB
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLEDB
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLEDB
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?EDB
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresEDB
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLEDB
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!EDB
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesEDB
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoEDB
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJEDB
 

Plus de EDB (20)

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQL
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQL
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJ
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos données
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - Italiano
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 

Mv unmasked.w.code.march.2013

  • 1. MVCC Unmasked! March 2013 This presentation explains how Multiversion Concurrency Control (MVCC) is implemented in Postgres, and highlights optimizations which minimize the downsides of MVCC. http://momjian.us/presentations © 2013 EnterpriseDB. All rights reserved. 1
  • 2. Why Unmask MVCC?! u  Predict concurrent query behavior u  Manage MVCC performance effects u  Understand storage space reuse © 2013 EnterpriseDB. All rights reserved. 2
  • 3. Outline! u  Overview of EnterpriseDB – who are we and how are we involved in the PostgreSQL community u  Introduction to MVCC u  (MVCC Implementation) u  MVCC Clean-up Requirements & Behavior u  Available PostgreSQL Resources © 2013 EnterpriseDB. All rights reserved. 3
  • 4. Who is EnterpriseDB?! u  Largest commercial entity behind PostgreSQL! u  Enables open-source cost benefits along with the products, resources and support of a full commercial database company! u  Enhanced Postgres Plus products for enterprise-class applications:! •  Security! •  Performance! •  Oracle compatibility! •  Monitoring, management & replication tools! u  Global provider for all your PostgreSQL needs! ! © 2013 EnterpriseDB. All rights reserved. 4
  • 5. What is MVCC?! Multiversion Concurrency Control (MVCC) allows Postgres to offer high concurrency even during significant database read/write activity. MVCC specifically offers behavior where “readers never block writers, and writers never block readers.” This presentation explains how MVCC is implemented in Postgres and highlights optimizations which minimize the downsides of MVCC. © 2013 EnterpriseDB. All rights reserved. 5
  • 6. Which other databases support MVCC?! u  Oracle u  DBs (partial) u  MySQL with InnoDB u  Informix u  Firebird u  MSSQL (optional, disabled by default) © 2013 EnterpriseDB. All rights reserved. 6
  • 7. MVCC Behavior! Cre 40 INSERT Exp Cre 40 Exp 47 DELETE Cre 64 old (delete) Exp 78 UPDATE Cre 78 new (insert) Exp © 2013 EnterpriseDB. All rights reserved. 7
  • 8. MVCC Snapshots! MVCC snapshots which tuples are visible for SQL statements. A snapshot is recorded at the start of each SQL statement in READ COMMITTED transaction isolation mode, and at transaction start in SERIALIZABLE transaction isolation mode. In fact, it is frequency of taking new snapshots that controls the transactions isolation behavior. When a new snapshot is taken, the following information is gathered: •  The highest-numbered committed transaction ! •  The transaction numbers currently executing! Using this snapshot information, Postgres can determine if a transaction’s actions should be visible to an executing statement. © 2013 EnterpriseDB. All rights reserved. 8
  • 9. MVCC Snapshots Determine Row Visibility! Create-Only Cre 30 Sequential Scan Visible Exp Cre 50 Invisible Exp Snapshot Cre 110 Invisible The highest-numbered Exp committed transaction: 100 Open Transactions: 25, 50, 75 Cre 30 Invisible Exp 80 For simplicity, assume all other transactions are committed. Cre 30 Visible Exp 75 Cre 30 Exp 110 Visible Internally, the creation xid is stored in the system column ‘xmin’, and expire in ‘xmax’. © 2013 EnterpriseDB. All rights reserved. 9
  • 10. Confused Yet?! Source code comment in src/backend/utils/time/tqal.c: Mao [Mike Olson] says 17 march 1993: the tests in this routine are correct; if you think they’re not, you’re wrong, and you should think about it again. i know, it happened to me. © 2013 EnterpriseDB. All rights reserved. 10
  • 11. Implementation Details! All queries were generated on an unmodified version of Postgres. The contrib module pageinspect was installed to show internal heap page information and pg_freespacemap was installed to show free space map information. © 2013 EnterpriseDB. All rights reserved. 11
  • 12. Setup! © 2013 EnterpriseDB. All rights reserved. 12
  • 13. Setup! © 2013 EnterpriseDB. All rights reserved. 13
  • 14. Insert Using Xmin! All the queries used in this presentation are available at http://momjian.us/main/writings/pgsql/mvcc.sql! © 2013 EnterpriseDB. All rights reserved. 14
  • 15. Just Like INSERT! Cre 40 INSERT Exp Cre 40 Exp 47 DELETE Cre 64 old (delete) Exp 78 UPDATE Cre 78 new (insert) Exp © 2013 EnterpriseDB. All rights reserved. 15
  • 16. DELETE USING Xmax! © 2013 EnterpriseDB. All rights reserved. 16
  • 17. DELETE Using Xmax! © 2013 EnterpriseDB. All rights reserved. 17
  • 18. Just Like DELETE! Cre 40 INSERT Exp Cre 40 Exp 47 DELETE Cre 64 old (delete) Exp 78 UPDATE Cre 78 new (insert) Exp © 2013 EnterpriseDB. All rights reserved. 18
  • 19. UPDATE Using Xmin and Xmax! © 2013 EnterpriseDB. All rights reserved. 19
  • 20. UPDATE Using Xmin and Xmax! © 2013 EnterpriseDB. All rights reserved. 20
  • 21. Just Like UPDATE! Cre 40 INSERT Exp Cre 40 Exp 47 DELETE Cre 64 old (delete) Exp 78 UPDATE Cre 78 new (insert) Exp © 2013 EnterpriseDB. All rights reserved. 21
  • 22. Aborted Transaction IDs Remain! © 2013 EnterpriseDB. All rights reserved. 22
  • 23. Aborted IDs can remain because Transaction Status is recorded centrally ! Pg_clog XID Status flags 0 0 0 1 0 0 1 0 028 1 0 1 0 0 0 0 0 024 1 0 1 0 0 1 0 0 020 0 0 0 0 0 0 1 1 016 0 0 0 1 0 1 1 0 012 1 0 1 0 0 0 1 0 008 1 0 1 0 0 0 0 0 004 1 0 0 1 0 0 1 0 000 Transaction roll back marks the transaction ID as aborted. All sessions will ignore such transactions; it is not necessary to revisit each row to undo the transaction © 2013 EnterpriseDB. All rights reserved. 23
  • 24. Row Locks Using Xmax! © 2013 EnterpriseDB. All rights reserved. 24
  • 25. Row Locks Using Xmax! © 2013 EnterpriseDB. All rights reserved. 25
  • 26. Multi-Statement Transactions! Multi-statement transactions require extra tracking because each statement has its own visibility rules. For example, a cursor’s contents must remain unchanged even if later statements in the same transaction modify rows. Such tracking is implemented using system command id columns cmin/cmax, which is internally actually is a single column. © 2013 EnterpriseDB. All rights reserved. 26
  • 27. INSERT Using Cmin! © 2013 EnterpriseDB. All rights reserved. 27
  • 28. INSERT Using Cmin! © 2013 EnterpriseDB. All rights reserved. 28
  • 29. DELETE Using Cmin! © 2013 EnterpriseDB. All rights reserved. 29
  • 30. DELETE Using Cmin! © 2013 EnterpriseDB. All rights reserved. 30
  • 31. DELETE Using Cmin! A cursor had to be used because the rows were created and deleted in this transaction and therefore never visible outside this transaction. © 2013 EnterpriseDB. All rights reserved. 31
  • 32. Update Using Cmin! © 2013 EnterpriseDB. All rights reserved. 32
  • 33. Update Using Cmin! © 2013 EnterpriseDB. All rights reserved. 33
  • 34. Modifying Rows from Different Transactions! © 2013 EnterpriseDB. All rights reserved. 34
  • 35. Modifying Rows from Different Transactions! © 2013 EnterpriseDB. All rights reserved. 35
  • 36. Modifying Rows from Different Transactions! © 2013 EnterpriseDB. All rights reserved. 36
  • 37. Combo Command Id! Because cmin and cmax are internally a single system column, it is impossible to simply record the status of a row that is created and expired in the same multi-statement transaction. For that reason, a special combo command id is created that references a local memory hash that contains the actual cmin and cmax values. © 2013 EnterpriseDB. All rights reserved. 37
  • 38. Update Using Combo Command Ids! © 2013 EnterpriseDB. All rights reserved. 38
  • 39. UPDATE Using Combo Command Ids! © 2013 EnterpriseDB. All rights reserved. 39
  • 40. Update Using Combo Command Ids! © 2013 EnterpriseDB. All rights reserved. 40
  • 41. Update Using Combo Command Ids! The last query uses /contrib/pageinspect, which allows visibility of internal heap page structures and all stored rows, including those not visible in the current snapshot. (Bit 0x0020 is internally called HEAP_COMBOCID.) © 2013 EnterpriseDB. All rights reserved. 41
  • 42. MVCC Implementation Summary! xmin: creation transaction number, set by INSERT and UPDATE xmax: expire transaction number, set by UPDATE and DELETE; also used for explicit row locks cmin/cmax: used to identify the command number that created or expired the tuple; also used to store combo command ids when the tuple is created and expired in the same transaction, and for explicit row locks © 2013 EnterpriseDB. All rights reserved. 42
  • 43. Traditional Cleanup Requirements! Traditional single-row-version (non-MVCC) database systems require storage space cleanup: u  deleted rows u  rows created by aborted transactions © 2013 EnterpriseDB. All rights reserved. 43
  • 44. MVCC Cleanup Requirements! MVCC has additional cleanup requirements: u  The creation of a new row during UPDATE (rather than replacing the existing row); the storage space taken by the old row must eventually be recycled. u  The delayed cleanup of deleted rows (cleanup cannot occur until there are no transactions for which the row is visible) Postgres handles both traditional and MVCC-specific cleanup requirements. © 2013 EnterpriseDB. All rights reserved. 44
  • 45. Cleanup Behavior! Fortunately, Postgres cleanup happens automatically: u  On-demand cleanup of a single heap page during row access, specifically when a page is accessed by SELECT, UPDATE and DELETE u  In bulk by an autovacuum processes that runs in the background Cleanup can also be initiated manually by VACUUM. © 2013 EnterpriseDB. All rights reserved. 45
  • 46. Aspects of Cleanup! Cleanup involves recycling space taken by several entities: u  Heap tuples/rows (the largest) u  Heap item pointers (the smallest) u  Index entries © 2013 EnterpriseDB. All rights reserved. 46
  • 47. Internal Heap Page! © 2013 EnterpriseDB. All rights reserved. 47
  • 48. Indexes Points to Items, Not Tuples! © 2013 EnterpriseDB. All rights reserved. 48
  • 49. Heap Tuple Space Recycling! Indexes prevent item pointers from being recycled. © 2013 EnterpriseDB. All rights reserved. 49
  • 50. VACUUM Later Recycle Items! VACUUM performs index cleanup, then can mark “dead” items as “unused”. © 2013 EnterpriseDB. All rights reserved. 50
  • 51. Cleanup of Deleted Rows! © 2013 EnterpriseDB. All rights reserved. 51
  • 52. Cleanup Deleted Rows! © 2013 EnterpriseDB. All rights reserved. 52
  • 53. Cleanup of Deleted Rows! In normal, multi-user usage, cleanup might have been delayed because other open transactions in the same database might still need to view the expired rows. However, the behavior would be the same, just delays. © 2013 EnterpriseDB. All rights reserved. 53
  • 54. Cleanup of Deleted Rows! © 2013 EnterpriseDB. All rights reserved. 54
  • 55. Same as this Slide! © 2013 EnterpriseDB. All rights reserved. 55
  • 56. Cleanup of Deleted Rows! © 2013 EnterpriseDB. All rights reserved. 56
  • 57. Same as this Slide! © 2013 EnterpriseDB. All rights reserved. 57
  • 58. Free Space Map (FSM)! VACUUM also updates the free space map (FSM), which records pages containing significant free space. This information is used to provide target pages for INSERTs and some UPDATEs (those crossing page boundaries). Single-page vacuum does not update the free space map. © 2013 EnterpriseDB. All rights reserved. 58
  • 59. Another Free Space Map Example! © 2013 EnterpriseDB. All rights reserved. 59
  • 60. Another Free Space Map Example! © 2013 EnterpriseDB. All rights reserved. 60
  • 61. Another Free Space Map Example! © 2013 EnterpriseDB. All rights reserved. 61
  • 62. VACUUM Also Removes End-of-File Pages! VACUUM FULL shrinks the table file to its minimum size, but requires an exclusive table lock © 2013 EnterpriseDB. All rights reserved. 62
  • 63. Optimized Single-Page Cleanup of Old UPDATE Rows! The storage space taken by old UPDATE tuples can be reclaimed just like deleted rows. However, certain UPDATE rows can even have their items reclaimed, i.e. it is possible to reuse certain old UPDATE items, rath than making them as “dead” and requiring VACUUM to reclaim them after removing referencing index entries. Specifically, such item reuse is possible with special HOT update (head-only tuple) chains, where the chain is on a single heap page and all indexed values in the chain are identical. © 2013 EnterpriseDB. All rights reserved. 63
  • 64. Single-Page Cleanup of HOT UPDATE Rows! HOT update items can be freed (marked “unused”) if they are in the middle of the chain, i.e. not at the beginning or end of the chain. At the head of the chain is a special “Redirect” item pointers that are referenced by indexes; this is possible because all indexed values are identical in a HOT/redirect chain. Index creation with HOT chains is complex because the chains might contain inconsistent values for the newly indexed columns. This is handled by indexing just the end of the HOT chain and allowing the index to be used only by transactions that start after the index has been created. (Specifically, post-index-creation transactions cannot see the inconsistent HOT chain values due to MVCC visibility rules; they only see the end of the chain.) © 2013 EnterpriseDB. All rights reserved. 64
  • 65. Initial Single-Row State! © 2013 EnterpriseDB. All rights reserved. 65
  • 66. UPDATE Adds a New Row! No index entry added because indexes only point to the head of the HOT chain. © 2013 EnterpriseDB. All rights reserved. 66
  • 67. Redirect Allows Indexes To Remain Valid! © 2013 EnterpriseDB. All rights reserved. 67
  • 68. UPDATE Replaces Another Old Row! © 2013 EnterpriseDB. All rights reserved. 68
  • 69. All OLD UPDATE Row Versions Eventually Removed! This cleanup was performed by another operation on the same page. © 2013 EnterpriseDB. All rights reserved. 69
  • 70. Cleanup of Old Updated Rows! © 2013 EnterpriseDB. All rights reserved. 70
  • 71. Cleanup of Old Updated Rows! © 2013 EnterpriseDB. All rights reserved. 71
  • 72. Cleanup of Old Updated Rows! No index entry added because indexes only point to the head of the HOT chain. © 2013 EnterpriseDB. All rights reserved. 72
  • 73. Same as this Slide! © 2013 EnterpriseDB. All rights reserved. 73
  • 74. Cleanup of Old Updated Rows! © 2013 EnterpriseDB. All rights reserved. 74
  • 75. Same as this Slide! © 2013 EnterpriseDB. All rights reserved. 75
  • 76. Cleanup of Old Updated Rows! © 2013 EnterpriseDB. All rights reserved. 76
  • 77. Same as this Slide! © 2013 EnterpriseDB. All rights reserved. 77
  • 78. VACUUM Does Not Remove the Redirect! © 2013 EnterpriseDB. All rights reserved. 78
  • 79. Cleanup Using Manual VACUUM! © 2013 EnterpriseDB. All rights reserved. 79
  • 80. Cleanup Using Manual VACUUM! © 2013 EnterpriseDB. All rights reserved. 80
  • 81. The Indexed UPDATE Problem! The updating of any indexed columns prevents the use of “redirect” items because the chain must be usable by all indexes, i.e. a redirect/HOT UPDATE cannot require additional index entries due to an indexed value change. In such cases, item pointers can only be marked as “dead”, like DELETE does. No previously shown UPDATE queries modified indexed columns. © 2013 EnterpriseDB. All rights reserved. 81
  • 82. Index mvcc_demo Column! © 2013 EnterpriseDB. All rights reserved. 82
  • 83. UPDATE of an Indexed Column! © 2013 EnterpriseDB. All rights reserved. 83
  • 84. UPDATE of an Indexed Column! © 2013 EnterpriseDB. All rights reserved. 84
  • 85. UPDATE of an Indexed Column! © 2013 EnterpriseDB. All rights reserved. 85
  • 86. UPDATE of an Indexed Column! © 2013 EnterpriseDB. All rights reserved. 86
  • 87. UPDATE of an Indexed Column! © 2013 EnterpriseDB. All rights reserved. 87
  • 88. Cleanup Summary! Cleanup is possible only when there are no active transactions for which the tuples are visible. HOT items are UPDATE chains that span a single page and contain identical indexed column values. In normal usage, single-page cleanup performs the majority of the unnecessary index entries, and updates the free space map (FSM). © 2013 EnterpriseDB. All rights reserved. 88
  • 89. Conclusion! All the queries used in this presentation are available at: http://momjian.us/main/writings/pgsql/mvcc.sql http://momjian.us/main/presentations/ © 2013 EnterpriseDB. All rights reserved. 89
  • 90. Postgres Resources! u  Upcoming training and events •  Postgres training, NY – March 21st ! –  http://pgday.nycpug.org/training/ ! •  PGDay, NY – March 22nd! –  http://pgday.nycpug.org/! •  PGCon, Ottawa, CA – May 21-24! –  http://www.pgcon.org/2013/! u  EnterpriseDB live and on-demand training •  http://www.enterprisedb.com/store/products/dba-training ! u  Resources & Community •  Whitepapers, datasheets, podcasts, videos, community! •  http://www.enterprisedb.com/resources-community ! u  Postgres products, support and services •  sales@enterprisedb.com or visit www.enterprisedb.com !
  • 91. What are your Questions??! © 2013 EnterpriseDB. All rights reserved.