Soumettre la recherche
Mettre en ligne
Using Parallel Recovery in PostgreSQL 17
•
0 j'aime
•
6 vues
Sebastián D'Alessandro
Suivre
Parallel Recovery in PostgreSQL
Lire moins
Lire la suite
Logiciels
Signaler
Partager
Signaler
Partager
1 sur 35
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
MySQL Cluster Asynchronous replication (2014)
MySQL Cluster Asynchronous replication (2014)
Frazer Clement
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
DoKC
Backing up Wikipedia Databases
Backing up Wikipedia Databases
Jaime Crespo
Postgresql Database Administration Basic - Day1
Postgresql Database Administration Basic - Day1
PoguttuezhiniVP
515689311-Postgresql-DBA-Architecture.pptx
515689311-Postgresql-DBA-Architecture.pptx
ssuser03ec3c
Redo log
Redo log
PaweOlchawa1
War Stories: DIY Kafka
War Stories: DIY Kafka
confluent
A kind and gentle introducton to rac
A kind and gentle introducton to rac
Riyaj Shamsudeen
Recommandé
MySQL Cluster Asynchronous replication (2014)
MySQL Cluster Asynchronous replication (2014)
Frazer Clement
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
DoKC
Backing up Wikipedia Databases
Backing up Wikipedia Databases
Jaime Crespo
Postgresql Database Administration Basic - Day1
Postgresql Database Administration Basic - Day1
PoguttuezhiniVP
515689311-Postgresql-DBA-Architecture.pptx
515689311-Postgresql-DBA-Architecture.pptx
ssuser03ec3c
Redo log
Redo log
PaweOlchawa1
War Stories: DIY Kafka
War Stories: DIY Kafka
confluent
A kind and gentle introducton to rac
A kind and gentle introducton to rac
Riyaj Shamsudeen
War Stories: DIY Kafka
War Stories: DIY Kafka
confluent
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion Edition
Markus Michalewicz
OpenShift.io on Gluster
OpenShift.io on Gluster
mountpoint.io
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
Aerospike, Inc.
MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!
Vitor Oliveira
Mutexes 2
Mutexes 2
Narendranath Reddy T
The Future of zHeap
The Future of zHeap
EDB
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
Vivek Parihar
Megastore by Google
Megastore by Google
Ankita Kapratwar
Apache Kafka
Apache Kafka
Saroj Panyasrivanit
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NAS
Rahul Janghel
Oracle Database 12c : Multitenant
Oracle Database 12c : Multitenant
Digicomp Academy Suisse Romande SA
IO Schedulers (Elevater) concept and its affection on database performance
IO Schedulers (Elevater) concept and its affection on database performance
Alireza Kamrani
Memory Bandwidth QoS
Memory Bandwidth QoS
Rohit Jnagal
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Karthik Ramasamy
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
ScyllaDB
Mongo DB
Mongo DB
Karan Kukreja
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
rajkumar669520
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
amy56318795
Contenu connexe
Similaire à Using Parallel Recovery in PostgreSQL 17
War Stories: DIY Kafka
War Stories: DIY Kafka
confluent
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion Edition
Markus Michalewicz
OpenShift.io on Gluster
OpenShift.io on Gluster
mountpoint.io
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
Aerospike, Inc.
MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!
Vitor Oliveira
Mutexes 2
Mutexes 2
Narendranath Reddy T
The Future of zHeap
The Future of zHeap
EDB
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
Vivek Parihar
Megastore by Google
Megastore by Google
Ankita Kapratwar
Apache Kafka
Apache Kafka
Saroj Panyasrivanit
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NAS
Rahul Janghel
Oracle Database 12c : Multitenant
Oracle Database 12c : Multitenant
Digicomp Academy Suisse Romande SA
IO Schedulers (Elevater) concept and its affection on database performance
IO Schedulers (Elevater) concept and its affection on database performance
Alireza Kamrani
Memory Bandwidth QoS
Memory Bandwidth QoS
Rohit Jnagal
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Karthik Ramasamy
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
ScyllaDB
Mongo DB
Mongo DB
Karan Kukreja
Similaire à Using Parallel Recovery in PostgreSQL 17
(20)
War Stories: DIY Kafka
War Stories: DIY Kafka
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion Edition
OpenShift.io on Gluster
OpenShift.io on Gluster
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!
Mutexes 2
Mutexes 2
The Future of zHeap
The Future of zHeap
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
Megastore by Google
Megastore by Google
Apache Kafka
Apache Kafka
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NAS
Oracle Database 12c : Multitenant
Oracle Database 12c : Multitenant
IO Schedulers (Elevater) concept and its affection on database performance
IO Schedulers (Elevater) concept and its affection on database performance
Memory Bandwidth QoS
Memory Bandwidth QoS
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
Mongo DB
Mongo DB
Dernier
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
rajkumar669520
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
amy56318795
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
Soroosh Khodami
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
Help Desk Migration
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
Alluxio, Inc.
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Q-Advise
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
Alluxio, Inc.
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Gáspár Nagy
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
Wave PLM
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
steffenkarlsson2
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
kalichargn70th171
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
Neo4j
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
Mehmet Akar
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
KnowledgeSeed
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
kalichargn70th171
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
Alluxio, Inc.
SQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
Mohammed Fazuluddin
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
AI Hackathon.pptx
AI Hackathon.pptx
ERRORhackerboy
Dernier
(20)
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
SQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
AI Hackathon.pptx
AI Hackathon.pptx
Using Parallel Recovery in PostgreSQL 17
1.
Parallel Recovery in PostgreSQL Koichi
Suzuki May 31st, 2023
2.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 2 Agenda • Recovery Overview • Possibility of recovery parallelism • Additional rules and synchronization • Implementation architecture • Current achievement • Remaining works
3.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 3 Recovery Overview • WAL (Write-Ahead-Log) takes essential role in backup and recovery. • This is also used in log-shipping replication. • Recovery is done by dedicated startup process. Database Data WAL Checkpoint Buffer flush Data update Serialized Data WAL Copy Archive Log shipping Database in recovery mode Read each WAL record in series Replay each WAL record in series
4.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 4 Single replay thread, why? ● Single thread replay guarantees consistent replay ○ This is the order of write to the database ○ Replaying in the write order is simple and the safest.
5.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 5 Current WAL replay architecture WAL record type: ● Categorized by “resource manager” ○ For types of data object ■ Heap, transaction, Btree, GIN, …. ○ Each resource manager provides replay function for each record. ● Replay (“redo” in the source) reads the resource manager ID (type RmgrId) of each WAL record and calls replay function for these resources. ○ StartupXLOG() in xlog.c ● Replay is done by dedicated startup process. ● Many more to handle in StartupXLOG() ○ Determine if postmaster can begin to accept connection, ○ Feedback commit/abort back to the primary, ○ Recovery target handling, ….(many more)...
6.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 6 Current list of RmgrId (as of V14/15) XLOG Transaction Storage CLOG Database Tablespace MultiXact RelMap Standby Heap2 Heap Btree Hash Gin Gist Sequence SPGist BRIN CommitTs ReplicationOrigin Generic LogicalMessage When new resource manager (for example, access method and index) are added, new RmgrID and its replay function are added. No changes are needed in current redo framework.
7.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 7 Structure of WAL (simplified) ・・・ initdb WAL segment, fixed size, 16 MB by default WAL record LSN is the position of the WAL record from the very beginning. 64bit ● Each WAL segment has its own header, ● WAL record may span across multiple WAL segment files.
8.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 8 Motivation of parallelism in the recovery? Motivation • Save time of restoration from the backup. • Improve replication lag in log-shipping replication. • At least we can replay WAL record for different database page in parallel. Challenge (for example) • There are many constraints which we must follow during WAL replay ○ When replaying COMMIT/ABORT, all the WAL records for the transaction must have been replayed. ○ Some WAL records has replay information for multiple pages.
9.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 9 Basic idea of constraints in parallel replay 1. For each database page, replay must be done in LSN order. 2. Replaying WAL for different pages can be done in different worker process in parallel. 3. If WAL is not associated with database page, they are replayed in LSN order by dedicated thread (transaction worker). 4. For some WAL record, such as timeline change, all the preceding WAL records must have been replayed before replaying such WAL record. 5. If a WAL record is associated with multiple pages, they will be replayed by one of the workers. ○ To maintain the first constraint, we need some synchronization (mentioned later) 6. Before replaying commit/abort/prepare, it must be confirmed that all the WAL record for the transaction must have been replayed. (mentioned later).
10.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 10 Worker processes for parallel recovery All the worker processes are startup process or its child process. ● Reader worker (dedicated) ○ This is startup process itself. Read each WAL record and ques to the distributor worker ● Distributor worker (dedicated) ○ Parses each WAL record and determine which worker process to handle. ○ If WAL record is associated with block data, then it is assigned to one of the block worker based on the hash value of the block, ○ Add each WAL record to the queue of each worker process in LSN order. ○ Can be merged into reader worker. ● Transaction worker (dedicated) ○ Replays all the WAL records without block information.
11.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 11 Worker processes for parallel recovery (cont.) ● Block worker (multiple, configurable) ○ Replays assigned WAL record. ○ Handling of multi-page WAL will be mentioned later. ● Invalid page worker (dedicated) ○ Target database page may be missing or inconsistent because table might have been dropped by subsequent WALs. ○ Current recovery registers this as invalid page and make correction when replaying corresponding DELETE/DROP WAL. ○ Now dedicated process to reuse existing code using hash functions based on the memory context belonging to a process, ○ May receive queue from txn and block workers. ○ Can be modified to use shared memory. In this case, this worker is not needed.
12.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 12 Worker processes Configured with three block workers. Child of startup process.
13.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 13 Shared data among workers ● WAL record and its parsed data is shared among workers through dynamic shared memory (dsm). ○ Allocated by reader worker (startup) before other workers are folked. ○ Handling of multi-page WAL will be mentioned later. ● Other shared data structures are accommodated in the same dsm.
14.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 14 Shared data among workers (cont.) ● Buffer for WAL and its parsed result, ● Status of outstanding transaction, ○ Latest LSN assigned to each block worker, ● Status of each worker, ● Queue to assign WAL to workers, ● History of outstanding WAL waiting for replay, ○ Used to advance replayed LSN. ● Data protection using spinlock. ○ No I/O or wait event should be associated while a spinlock is acquired. ○ Only one spinlock should be acquired by a worker. ■ May have to allow exception though…
15.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 15 WAL buffer is allocated in cyclic manner ● WAL buffer is allocated in cyclic manner to avoid fragmentation. ● WAL is eventually replayed and its data is freed. ● If we allocate WAL buffer in cyclic manner, we don’t have to worry about memory fragmentation. ● Buffer is allocated at the start of main free area. ● When a buffer is freed, it is concatenated with surrounding free area and if necessary, main free area info is updated. ● If free area is among allocated buffers, we don’t take care of it until it is concatenated to main free area.
16.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 16 WAL buffer is allocated in cyclic manner (cont.) Main free area Main free area Allocated Main free area Main free area Allocated Main free area Allocated Allocated Main free area Allocated ● If sufficient main free area is not available, wait until other workers replay WALs and free buffers. ● If all workers is waiting for the buffer, reply aborts and suggests to enlarge shared buffer (configurable through GUC). ● With three block workers and ten queues for each worker, 4MB of buffer looks sufficient. We may need more with more block workers and more queue depth. (head) (tail) Alloc start Alloc start Alloc start Allocated area grows. Allocated area grows. Allocated area grows. Allocated area grows.
17.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 17 Transaction status ● Composed of set of latest assigned WAL LSNs to each block worker for each transaction. ● When the worker replays this LSN, it is cleaned. ● When transaction worker replays commit/abort/prepare, it checks all the WALs for this transaction has been replayed. ● If not, waits until such workers replay all the assigned WALs (mentioned later). LSN for block worker 1 LSN for block worker 2 LSN for block worker n ・・・ LSN for block worker 1 LSN for block worker 2 LSN for block worker n ・・・ LSN for block worker 1 LSN for block worker 2 LSN for block worker n ・・・ LSN for block worker 1 LSN for block worker 2 LSN for block worker n ・・・ For each outstanding transaction Waits until all the workers replay corresponding WAL before commit/abort/prepare
18.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 18 Worker status ● Latest assigned WAL LSN, ● Latest replayed WAL LSN, ● Flags to indicate ○ the worker is waiting for sync from other workers, ○ If the worker failed, ○ If the worker terminated.
19.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 19 WAL history ● Recovered point can be advanced only all the WALs are replayed. ● If any WAL has not been replayed, we need to wait to advance recovered point until such WAL is replayed. Wal history tracks this status. Replayed Not replayed yet Recovered point Recovered point advances if all the previous WALs are replayed Not yet Replayed
20.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 20 Synchronization among workers • Dedicated data is queued asking for sync message. • Sync message is sent via socket (unix domain udp). Each worker has its own socket to receive synch. • For multi-page WAL, another mechanism is implemented for performance. Wal history tracks this status.
21.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 21 Waiting for all the preceding WAL replay ● Distributor worker sends queue asking for synchronization. ● Distributor reads data from the target worker from the socket. ● When the target worker receives synchronization request, writes synchronization data to the socket.
22.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 22 Synchronization for multi-page WAL ● Multi-page WAL is assigned to all the block workers based on the hash value of the page. ● WAL is associated with array of assigned workers and number of remaining workers (initial value is number of assigned block worker). ● When a block worker receives WAL, it checks if remaining worker is one (only myself). ○ If so, then the worker replays the WAL and then write synch to all the other workers. ○ If not, then the worker waits sync from the above worker. ● This guarantees that WAL replay to each page is in the order of WAL LSN.
23.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 23 Synchronization for multi-page WAL (cont.) Wait some other block worker replays Last block worker to pick up WAL. Replay and sync others. Multi-page WAL is assigned to all the block workers for each page (by dispatcher worker).
24.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 24 Current status of the work and achievement
25.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 25 Current Status ● Code is almost done. ● Still with many debug codes. ● Runs under gdb control. ● Parallelism looks working without issue. ● Existing redo functions are running without modification.
26.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 26 Affected major source files Modified/Added Source Outline A src/backend/access/transam/parallel_replay.c src/include/access/parallel_replay.h Main parallel recovery logic. Each worker loop. M src/backend/access/transam/xlog.c src/backend/access/transam/xlogreader.c src/backend/access/transam/xlogutils.c src/include/access/xlog.h src/include/access/xlogreader.h src/include/access/xlogutils.h Read and analyze WAL into shared buffer Pass WAL to distributor worker. M src/backend/postmaster/postmaster.c src/backend/postmaster/startup.c src/backend/storage/lmgr/proc.c src/backend/utils/activity/backend_status.c src/backend/utils/adt/pgstatfuncs.c src/include/miscadmin.h src/include/postmaster/postmaster.h src/include/storage/proc.h src/include/utils/backend_status.h Fork worker processes.
27.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 27 Affected major source files (cont.) Modified/Added Source Outline M src/backend/utils/misc/guc.c Additional GUC parameters M src/backend/utils/error/elog.c src/backend/utils/activity/backend_status.c src/backend/utils/misc/ps_status.c Add workers to log and status.
28.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 28 Additional major GUC parameters Name Type Default Description parallel_replay bool off Enable parallel recovery, num_preplay_workers int 5 Number of parallel replay worker, including reader/distributor/txn and invalid pag. num_preplay_queues int 128 Number of queue element used to pass WAL to various workers. num_preplay_max_txn int max_connec tions Number of transactions running in parallel in the recovery. preplay_buffers int (calculated) Shared buffer size, allocated using dynamic shared memory.
29.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 29 Remaining work
30.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 30 Remaining work ● Cleasnup test code. ● Cleanup structure, which still have some unused data from past try-and-error attempt. ● Validation of consistent recovery point determination. ● Determination of the end of WAL read. ● Port to version 16 or later. ● Run without gdb and measure the performance gain. ○ Archive recovery, ○ Log-shipping replications.
31.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 31 Issues for further discussion/development
32.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 32 Issues for discussion ● Synchronization: ○ Now using unix domain socket directly. May need some wrapper for portability. ● Queueing ○ Now using POSIX mqueue directly. It is very fast and simple and is supporting multi to one queueing. ○ Is it portable to other environment such as Windows? ● Bringing to PG core ○ Large amount of code: around 7k lines with debug code. ○ Maybe around 4k lines without debug code. ○ Need expert’s help to divide this into manageable smaller pieces for commit fest.
33.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 33 Resources Source repo https://github.com/koichi-szk/postgres.git Branch: parallel_replay_14_6 Test tools/environment https://github.com/koichi-szk/parallel_recovery_test.git Branch: main PostgreSQL Wiki: https://wiki.postgresql.org/wiki/Parallel_Recovery They are now all public. Please let me know if you are interested in.
34.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 34 Thank you very much Koichi Suzuki koichi.suzuki@enterprisedb.com koichi.dbms@gmail.com
35.
© Copyright EnterpriseDB
Corporation, 2023. All rights reserved. 35 Challenge in the test ● Startup process starts before postmaster begins to accept connection. ● No direct means to pause and notice its PID to attach to gdb. ● Writes these information to a debug file, wait until it is attached to gdb and signal file is touched. Paste to terminal Each terminal runs gdb for each worker
Télécharger maintenant