SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
1
#IDUG
A DB2 DBA's Guide to pureScale
Kelly Schlamb
Executive IT Specialist, IBM Canada Ltd.
Session Code: G05
Wednesday, May 14, 2014, 8:00am - 9:00am | Platform: DB2 for LUW
If you’ve been considering moving forward with the pureScale feature then I’m sure you’re
asking yourself some questions about what’s involved. Sure, DB2 is DB2, but how similar
is a pureScale environment from a non-pureScale one? Do your current operational
practices still apply? Is there anything new that you have to know or do to manage the
system? In this presentation you’ll be assured by the many similarities, but you will also
learn what’s new or slightly different in the pureScale world.
Biography: Kelly Schlamb has worked with DB2 for LUW for over 19 years in various roles
at IBM. He is currently an Executive IT Specialist within the Worldwide Information
Management Technical Sales organization focusing on DB2 High Availability, DB2
pureScale, and IBM PureData System for Transactions. Prior to this, Kelly was a long time
member of the DB2 Kernel development group in the IBM Toronto Lab. He started on the
Buffer Pool Services & Storage Management development team, working on various new
features and capabilities in this area. This included leading the group that introduced
Automatic Storage and adding various enhancements to it in the DB2 releases that followed.
Subsequently, Kelly spent over five years as the technical lead and manager of the DB2
pureScale Recovery development team. This team was responsible for designing and
implementing logging, backup/restore, crash recovery, rollforward, and transaction
management in the DB2 pureScale environment.
2
#IDUG
Topics Covered Today
• pureScale Overview
• Prerequisites and Preparing for pureScale
• Configuration
• Backup, Logging & Recovery
• Storage Management
• Monitoring
2
These are the topics that we'll be covering in today's presentation.
3
#IDUG
pureScale Overview
3
<no speaker notes>
4
#IDUG
DB2 pureScale
Scalability, Performance, and Always Available Transactions
• DB2 pureScale
• Robust infrastructure for OLTP workloads
• Provides improved availability, performance and scalability
• Application transparency
• Scales to >100 members
• Leverages z/OS cluster technology
• Highlights of pureScale enhancements
in DB2 10.5
• Rich disaster recovery options, now including
integrated HADR support
• Backup and restore between pureScale and non-pureScale environments
• Online database fix pack updates (in addition to system & OS updates)
• Add members online for additional capacity
• Included in Advanced Workgroup and Advanced Enterprise editions
4
pureScale is a DB2 feature that reduces the risk and cost of business growth by providing
extreme capacity, continuous availability, and application transparency. DB2 pureScale uses
the same architecture as the undisputed gold standard of reliability – System Z. This is a
shared data, active/active architecture and businesses trust this architecture to run their most
critical systems.
5
#IDUG
Main thing to remember… pureScale is DB2
• pureScale looks and feels very much like "regular DB2"
• Same code base shared by DB2, DPF, and pureScale
• In DB2 10.1 and 10.5, pureScale is just an installable feature of DB2
• Immediate productivity from DBAs and application developers
• Single system view for utilities
• Act and behave exactly like they do in non-pureScale
• Backup, restore, rollforward, reorg, load, …
• Applications don’t need to know about or care about the fact that are
multiple members
• In general, can run SQL statements or command on any member
• SQL, data access methods, and isolation levels are the same
• Backup/recovery processes are the same
• Database security is managed in the same way
• Environment (even the CFs) still managed by database manager and database
configuration parameters
5
I've sometimes had people comment to me that by introducing pureScale into their
environment, their DB2 DBAs have to learn a whole new database platform. That couldn't
be further from the truth. As the slide title says, pureScale is DB2. pureScale is just a
deployment option of DB2, like the Database Partitioning Feature (DPF) is. It's all one code
base and as part of the installation of DB2 you choose whether it's going to be a pureScale
environment or not.
All of the skills that a DBA has with DB2 is immediately transferable to pureScale. It has
the same look and feel, the utilities work in the same way, the same SQL and commands
exist, security is the same, etc, etc.
Even though there are new concepts in pureScale (which will be discussed over the next few
slides) things have been made to work as seamlessly as possible within the DB2 framework.
For instance, you have the concept of the CF (Cluster Caching Facility) in pureScale and
this new structure can be configured to suit your needs. However, rather than introducing a
new interface for configuring the CF, the existing database manager configuration and
database configuration methods are used. So, if you know how to view and change these
configuration parameters, you'll be easily able to do the same for the new CF-related ones.
6
#IDUG
DB2 pureScale Architecture
• Multiple DB2 members for scalable and
available database environment
• Client application connects into any DB2
member to execute transactions
• Automatic workload balancing
• Shared storage for database data and
transaction logs
• Cluster caching facilities (CF) provide
centralized global locking and page
cache management for highest levels of
availability and scalability
• Duplexed, for no single point of failure
• High speed, low latency interconnect for
efficient and scalable communication
between members and CFs
• DB2 Cluster Services provides integrated
failure detection, recovery automation
and the clustered file system
Shared Storage
Database
Logs Logs Logs
Logs
Cluster Interconnect
Member
CS
Member
CS
Member
CS
Member
CS
Primary CF
CFCS
Secondary CF
CFCS
Clients
DB2 pureScale Cluster (Instance)
6
DB2 pureScale keeps your critical systems available all the time, giving you uninterrupted access to your data, making sure
that your business is up all the time, and your line of business gets the high level of availability they need, through planned
and unplanned outages. DB2 pureScale provides continuous availability of data through the use of the highly reliable
cluster caching facility (for redundancy purposes and no single point of failure, there are two of these "CFs"). The CF
provides centralized locking and cache management and is very different from architectures in competitive products that
use a distributed model. With the distributed locking architecture, lock ownership is distributed across multiple nodes and
recovery from a node failure requires lock redistribution across the surviving nodes. Also during node failure and recovery,
I/O is frozen until the database system is able to determine what data pages need recovery. This can be a relatively lengthy
process, which has a significant impact on data availability. None of this is required with DB2 pureScale because that
information is centrally managed and is not impacted by DB2 member failures (and there are two CFs, in case a node
hosting a CF fails or needs to be brought down for maintenance).
This slide describes the architecture of a DB2 pureScale cluster (also referred to as a pureScale instance). It is an
active/active data sharing environment in which multiple nodes – called DB2 members – handle the transactional workload
of the system and they also have equal and shared access to a single copy of the database on disk. Clients are able to
connect into any of the members and there are default workload balancing capabilities that will distribute the workload
across all of the members. If a member fails, client connections are automatically rerouted to healthy members. The
members communicate with the aforementioned CFs for the purpose of global locking and data page caching. This
communication takes place over a high speed, low latency interconnect. Specifically, it is done via the RDMA (Remote
Direct Memory Access) protocol over either InfiniBand or RDMA-capable 10 Gigabit Ethernet.
Integrated into DB2 pureScale are various other IBM products, including the General Parallel File System (GPFS), Reliable
Scalable Cluster Technology (RSCT), and Tivoli Systems Automation for Multiplatform (TSA MP). Collectively, they are
known within pureScale as DB2 Cluster Services. These products provide the clustered file system on which the database
data and transaction logs reside, as well as the cluster domain management, heart-beating, and recovery automation.
Together, all of these components allow for very fast recovery when node failures occur. These products are fully integrated
into DB2 pureScale, in that they are all installed together as part of a single installation process, they are configured as part
of installation, and they are managed and maintained together all within pureScale.
7
#IDUG
What in pureScale is new to a DB2 DBA?
• CFs provide centralized locking and cache management
• Global Lock Manager and Group Buffer Pool
• Other components automatically installed and
configured as part of pureScale install
• RSCT provides heart beating, domain management
• TSA defines resources and dependencies, drives recovery
• GPFS provides a clustered file system on which shared data resides
• Online, automated member/CF recovery
• Automatic workload balancing across all members
• Multiple log streams are maintained, one per member
• Automatically merged when necessary (e.g. rollforward, group crash recovery)
• Monitoring includes members and CFs
• Storage management primarily done through DB2 interfaces to GPFS
• Cluster management done through DB2 interfaces to RSCT/TSA
7
This slide talks to some of the new components and concepts, as well as some slight
differences between pureScale and traditional DB2 environments. As you can see here – and
will learn more about later – they're really not that big a deal from a skills perspective.
pureScale is a multiple-server environment made up of members and CFs. Within each CF
are components such as the group buffer pool and the global lock manager.
DB2 keeps track of changes to data in the database through its transaction logs and in a
pureScale environment each member has its own set of logs (referred to as a log stream).
And if DB2 needs to perform some sort of operation that requires multiple log streams, it
will automatically merge them as necessary, invisible to the user.
In addition to the database manager itself, a pureScale environment also includes other IBM
technologies such as RSCT (heart beating, domain management), TSA (recovery
automation), and GPFS (the clustered file system). These components are all a part of a
single integrated install (as is the case with fix packs as well). And rather than having to
learn the commands and operations for these different products, we've abstracted out many
of the common operations into a DB2 command (db2cluster). Through this command you
can configure/manage/view the storage and cluster manager.
8
#IDUG
Centralized Lock Management
• Local lock manager (LLM) exists on each member
• Responsible for granting locks locally to individual application transactions
• Global lock manager (GLM) exists on the CF
• Grants locks to members upon request (if not already held by another member, or
currently held by the member in a compatible mode)
• Maintains global lock state (what member has what lock, in what mode, who's waiting)
• When member needs a lock it doesn't already hold, the LLM coordinates with the
GLM to get it
• Via fast RDMA requests and lock negotiation
• Locks can be:
• Logical/transactional (e.g. row locks and table locks)
• Physical (e.g. page locks)
8
In a pureScale environment there are two locking components that work together to manage data access across
the cluster. The first is the Local Lock Manager (LLM) that resides on each member and the second is the
Global Lock Manager (GLM) that resides on the CF. The LLM handles things at a local level within a member
and understands what locks are being held and requested by applications/transactions within a member. The
GLM has a cluster-wide understanding of the locks being held and requested within the cluster. It doesn’t
know specifics about applications or transactions, it only knows members are involved.
If a transaction running on a member requires a lock (e.g. a row lock) then the LLM will first see if the
member is already holding a lock that can satisfy the request. If it doesn’t then it must request it from the
GLM. Communication between the LLM and GLM involve “set lock state” requests and responses, as well as
notifications and negotiations if the GLM can’t immediately satisfy a request. The messages and responses take
place using RDMA. RDMA (Remote Direct Memory Access) allows one server to reach into the memory of
another server and read or write bytes of data. No interrupt processing takes place, there is no context
switching, and the target server spends no CPU cycles on the operation. This is increasingly important as the
size of your cluster grows.
In pureScale, locks are categorized as either logical or physical. Logical locks are simply transactional locks,
such as the row or table locks you might get while executing a transaction. These are the types of locks you
have in non-pureScale. Physical locks are unique to pureScale and are used for concurrency control on
"physical" objects, most notably pages (sometimes you'll hear them referred to as "P-Locks"). These physical
locks are not transactional in nature. That means that one of these locks can be given up prior to the end of a
transaction (i.e. you don't have to commit the transaction to give up the lock). You'll see more about page
negotiation (a.k.a. page reclaim) via the page locks in the next few slides.
9
#IDUG
Physical Locks vs. Logical Locks
• Locks that are held on rows, tables, etc. are referred to
as logical locks
• Associated with applications and transactions
• Physical locks (P-Locks) are used for serialization of a physical resource, like a page
• For example, a member will hold a P-Lock in X mode when it is in the middle of updating a
page and can release the lock when it is done
• P-Locks are not transactional in nature
• Associated with a member
• Requested as needed, freed when no longer needed
• Typically held until end of transaction, but protocol doesn’t strictly require them being
held until commit or rollback time
• Can be negotiated away if another member wants them
• Use of P-Locks is mostly invisible within the database
• Not included in monitor metrics
• Cannot result in deadlocks
• Simply an internal database primitive that is built on the locking infrastructure
4
Give me your lunch money!
9
Physical Locks (P-Locks) are not something that you typically hear much about, as their use
by DB2 is mostly invisible to users. However, they are commonly talked about when
describing pureScale internals (such as page reclaim/negotiation) and so they are explained
here.
A physical lock is used for concurrency/serialization of physical resources, such as a page.
They are not transactional in nature (unlike logical locks, which are locks that are associated
with rows, tables, etc.). They are requested and freed as necessary.
Although internal, the “physical lock” concept and term is commonly used when describing
page access within a pureScale database and therefore it is described here.
10
#IDUG
Two Levels of Data Page Caching
• Local buffer pools (LBP) exist on each member
• LBPs cache modified pages and clean pages
• A single group buffer pool (GBP) exists in the CF per database
• Global cache, containing modified pages written from member LBPs
• Page registry tracks which pages exist in the member LBPs
• Allows for invalidation of old versions of page on other members when a page is
updated and those changes have been committed
• Pages are written from the LBP to the GBP at:
• Transaction commit time (referred to as "force at commit")
• During page reclaim (a.k.a. page negotiation)
• Member has modified a page, another member wants to use that page
• Pages can be read by members very quickly from the GBP
• Avoids costly reads from disk
• Speeds up member recovery in case of a failure
10
There are two levels of data page caching that occur within a DB2 pureScale cluster. This first is the caching of
data in the buffer pools found on the members. These are referred to as local buffer pools (LBP) in a pureScale
environment. These are the buffer pools you're familiar with in non-pureScale DB2.
With pureScale, there is also a global buffer pool (GBP), which resides within the CF (and is duplexed across
both the primary and secondary CF).
When transactions are executing on a member, pages are being read into the local buffer pool and
modifications are made locally. When a transaction commits, all of the pages that were modified by the
transaction get sent to the GBP (on both the primary and secondary CF). This is pureScale's "force at commit"
protocol. When this happens, older copies of the page that might exist in local buffer pools on other members
are invalidated. This is accomplished via the page registry. This registry resides with the GBP and keeps track
of all of the pages that exist in the local buffer pools and the GBP. RDMA is used to do this invalidation in a
very efficient and scalable manner.
Pages may also be written to the GBP under other circumstances. For instance, during page negotiation. This is
when a page has been modified on one member (as part of an uncommitted transaction that updates a row, for
example) and another member wants to update the same page (perhaps to update a different row on that page).
The page is locked exclusively (X) by the first member and when the request comes in from the second
member, the lock is used to negotiate the page away from the first member. As previously mentioned, page
locks are not transactional in nature and so they don't need to be held until commit time. Therefore,
concurrency is not an issue.
For those pages that are in the GBP, if a member wants to read or update them, they can be quickly read from
the GBP into an LBP using RDMA. Reading it from the GBP is orders of magnitude faster than reading a page
from disk (tens of microseconds vs. single digit milliseconds).
Various monitor metrics can be used to monitor the caching activity that is occurring at both levels in the LBPs
and GBP.
11
#IDUG
Page Reclaim
Member 0
CF
P
LBP
Member 1
LBP
Member 2
LBP
Member 3
LBP
P
P
P
1. Uncommitted
update modifies
page P on M0
(page locked X)
2. Transaction
on M3 wants to
modify page P;
requests page
lock in X
GBP Registry
GLM
P: M1-X P: M0, M1, M3
3. Page P
negotiated away
(via page
lock); page P
written to GBP;
lock given up
P
P
4. Old copies
of page P are
invalidated
5. X lock on
page P granted
to M3
P: M3-X
6. With lock
acquired, page
P found to be
invalid, so
latest copy is
requested from
GBP
P
P
All lock requests, page transfers, and invalidation is done using Remote Direct Memory Access
(RDMA). This allows for high performance and high scalability, even as the cluster size grows
11
Although difficult to fully describe through graphics on a single slide, the intention here is to
give you an idea of how page reclaim (a.k.a. page negotiation) works. The different steps are
described with the text and the graphics on this slide.
As mentioned at the bottom of the slide, all of the communication between the members and
CFs – including lock requests, writing pages to the GBP, reading pages from the GBP, and
page invalidation (sometimes referred to as silent invalidation or cross invalidation) – is
done via Remote Direct Memory Access (RDMA). RDMA allows one server to reach into
the memory of another server and read or write bytes of data (it's actually accomplished
using the network adapter cards). With RDMA, no interrupt processing takes place, there is
no context switching, and the target server spends no CPU cycles on the operation. This is
increasingly important as the size of your cluster grows.
pureScale's exploitation of RDMA is unique and is one of the reasons that pureScale
performs so well and can scale up without needing to make applications cluster aware.
12
#IDUG
pureScale Client Configuration
• Workload Balancing (WLB)
• Application requests balanced across all members or
subsets of members
• Takes server load of members into consideration
• Connection-level or transaction-level balancing
• Client Affinity
• Direct different groups of clients or workloads to specific
members in the cluster
• Consolidate separate workloads/applications on same
database infrastructure
• Define list of members for failover purposes
• Automatic Client Reroute (ACR)
• Client automatically connected to healthy member in case
of member failure
• May be seamless in that no error messages returned
to client
• Application may have to re-execute the transaction X X
12
Database applications running in a DB2 pureScale environment can use the DB2 transaction-level or connection-level workload balancing
(WLB) functionality. WLB balances application requests among all members of the DB2 pureScale cluster. When WLB is enabled the DB2
clients distribute workload or application request based on the capacity (that is, the priority or weight) values in a server list that the DB2
pureScale server returns. These capacity values indicate the current load on a DB2 pureScale member. A member with a capacity value
below that of the other members in the server list is considered busier than other members.
The “db2pd –serverlist” command can be used to see the relative load (priority or weight) of a member. A member (A) having a higher
value compared with another member (B) indicates to the client that more work should be directed at member A. CPU load average (over
recent time) and memory utilization (based on looking at swap space and paging) is used to determine the relative load of a member.
The client affinities feature allows you to define an ordered list of DB2 pureScale members to which a DB2 client can connect; different
clients can implement a different ordered list. In certain situations, you might want to direct application requests from a DB2 client to a
particular DB2 pureScale member on the list. If that DB2 pureScale member goes down because of a planned or unplanned outage, the DB2
client can direct the client application requests to another DB2 pureScale member on the list. If that member is unavailable, the DB2 client
can work through the list of all DB2 pureScale members to find an available member. This feature is typically used in an environment
where the applications and data are inherently segregated and particular servers are targeted to service requests of particular applications.
With client affinities, you can also control whether application requests fail back to the failure primary server after it comes back online.
The primary server is the DB2 pureScale member that the application originally connected to. If you set up the client in this manner, you
can choose how often the DB2 client should check whether the primary server is back online.
Automatic client reroute (ACR) is a feature in DB2 clients that takes application requests that are directed toward an offline DB2 pureScale
member and reroutes them to active DB2 pureScale members. ACR is automatically enabled with WLB or client affinities so no additional
steps are required to specify which member the application should connect to upon encountering an outage. In some cases, after an outage,
clients are seamlessly routed to another DB2 pureScale member, and no error messages are returned to the application because the failure is
transparent to the application. For more details on when failures are seamless, see the DB2 Information Center information about seamless
client reroute. However, in some situations, the DB2 client cannot replay the statement execution environment on the new connection after
automatic client reroute occurs. In such a situation, the transaction is rolled back, and SQLCODE -30108 (or SQLCODE -4498 for Java
applications) is returned to the application after the connection is rerouted from the failing member to a surviving member. If this occurs,
applications must replay the statement execution environment and redo the statements, but applications do not have to explicitly reconnect
to the database because ACR automatically reconnects the applications.
13
#IDUG
Online System and Database Maintenance
• Transparently perform maintenance to the
cluster in an online rolling fashion
• DB2 pureScale fix packs (DB2 10.5)
• System updates such as operating system
fixes, firmware updates, etc.
• No outage experienced by applications
• DB2 fix pack install involves a single installFixPack command to be run on
each member/CF
• Quiesces member
• Existing transactions allowed to finish
• New transactions sent to other members
• Installs binaries
• Updates instance
• Member still behaves as if running on previous fix pack level
• Unquiesces member
• Final installFixPack command to complete and commit updates
• Instance now running at new fix pack level
CF
CF
13
System maintenance, such as hardware updates, operating system fixes, firmware updates,
etc. can be performed in an online rolling fashion.
Previously, when applying DB2 pureScale fix packs it was necessary to completely stop the
entire cluster so that the fix pack could be applied to all of the members and CFs. This is no
longer the case in DB2 10.5 and you can perform this kind of maintenance without bringing
the cluster down.
It is termed a "rolling update" because you can perform the maintenance on one host at a
time, rolling through each of the members and CFs. During the application of the fix pack to
a particular host, only one member or CF is offline but at least one other member and CF
should still be online and so the cluster itself is still online and able to perform work on
behalf of applications.
The newly enhanced installFixPack command is used to do the update of the binaries on an
individual host by host basis. The installFixPack command also has new options for
committing the changes and for doing a pre-commit check first.
14
#IDUG
Disaster Recovery Options for pureScale
• HADR
• Replication between a database in a primary pureScale cluster
and a standby pureScale cluster with a matching member topology
• Active/passive DR
• Storage Replication
• Synchronous or asynchronous disk-based replication between a database in a primary
pureScale cluster and a standby pureScale cluster with a matching member topology
• Active/passive DR
• Q Replication / InfoSphere Change Data Capture (CDC)
• Logical replication between a pureScale database and a pureScale or non-pureScale
standby (bidirectional supported)
• Can be active/active DR
• Geographically Dispersed pureScale Cluster (GDPC)
• Single pureScale cluster "stretched" over two sites with half of members/CFs at each site
• Active/active DR
14
These are various different disaster recovery options that are currently supported in
pureScale. Given the variety of the options, there is a solution to meet the needs of all
pureScale users.
15
#IDUG
Prerequisites and Preparing for pureScale
15
<no speaker notes>
16
#IDUG
Setting Up Your DB2 pureScale Environment
Like the Boy Scouts always say… "Be Prepared"
• The DB2 Information Center provides
plenty of information including
• Pre-requisites for hardware
and software
• Preparation and planning steps
• Pre-installation "cheat sheets"
• Installation instructions
16
In this next section we'll cover some topics around installation and instance/database
movement to pureScale. When preparing for and then subsequently performing these tasks,
it is highly recommended that you read through the relevant sections of the Information
Center. As shown in the screen capture on the slide, there is a great deal of information on
these topics in there.
17
#IDUG
DB2 pureScale Supported Hardware and OS
GPFS compatible storage
(ideally storage that supports
SCSI-3 PR fast I/O fencing)
IBM and Non-IBM
Rack Mounted Servers
OR
RHEL 5.9
RHEL 6.1
SLES 10 SP4
SLES 11 SP2
BladeCenter
H22/HS23
High speed, low latency
interconnect
• InfiniBand
• 10 GE (RoCE)
POWER6
POWER7/7+
Flex
Flex
17
pureScale is supported on both IBM Power and Intel-x86 based systems. This chart shows
the hardware and software requirements at a high level (current as of April 2014). For more
information on the specific hardware requirements, please see the following sections of the
Information Center:
Power/AIX:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r
0054850.html
Intel x86/Linux:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r
0057441.html
18
18
#IDUG
Installing DB2 pureScale
• Components installed
• Database members
• Cluster caching facilities (CFs)
• DB2 cluster services, which includes TSA, RSCT, and GPFS
• Methods
• DB2 Setup Wizard (db2setup): User friendly GUI for interactive deployment
• Allows setup of a DB2 pureScale environment across multiple hosts
• Ensures that all necessary components are installed and configured on all hosts according to best
practices
• db2_install command: Command line install process (deprecated)
• Response file: Automated install
• Install and instance creation includes
• Installing binaries
• Occurs across all hosts – does not require installation separately on each one
• All components are installed and configured as part of the single install process
• Cluster domain created
• sqllib_shared file system created and mounted on all hosts
When installing DB2 pureScale it will do the installation and setup of the database
member(s), CF(s), and DB2 Cluster Services (which includes TSA, RSCT, and GPFS).
You can install via the db2setup tool (interactive setup wizard GUI or response file) or the
db2_install command. Note that the db2_install command has been deprecated and so it
might be removed in a future release. It is suggested to use db2setup with a response file as
an alternative.
If you want to setup your instance with more than one CF and member to start with then the
GUI is a good method to use. It is also very easy to use.
Install processing involves installing the binaries, which are copied to all hosts in the cluster,
creating the cluster domain, and created the shared file system on which the instance shared
files go (assuming you're performing an install that is also creating an instance at the same
time).
19
#IDUG
Check Prerequisites using db2prereqcheck
• Checks whether your system meets prerequisites for the installation of a specific
version of DB2 before installing it
• Including the pureScale prerequisites
• Includes OS level, Linux distribution, AIX technology level, C library, and uDAPL
• Prerequisites contained within an XML file
• Contains prerequisites for DB2 9.8, 10.1, and 10.5
• Command and file located in <installPath>/cfg/DB2prereqs.xml
• Examples
db2prereqcheck –i –p
db2prereqcheck –p –v 10.5.0.0
db2prereqcheck –u –v 10.5.0.0
Check pureScale requirements for latest
version of DB2 described in XML file
Check pureScale requirements
for DB2 10.5.0.0
Check only pureScale uDAPL
requirements for DB2 10.5.0.0
19
The db2prereqcheck command checks whether your system meets the prerequisites for the
installation of a specific version of DB2 for LUW, including the prerequisites for the
pureScale feature. By using this command, you can determine whether your system satisfies
the prerequisites before you start the installation process. The prerequisites checked include
the operating system level, Linux distribution, AIX Technology Level, C library and
runtime patches, uDAPL, and other DB2 pureScale specific requirements. Note that it isn't
exhaustive. Some things may still need to be checked manually.
The db2prereqcheck command uses a resource XML file that contains the prerequisites,
covering DB2 9.8 FP2 up to the latest fix pack of DB2 10.5. The file will be updated with
each version and fix pack. The default path of the XML file is located in <DB2 installation
path>/cfg/DB2prereqs.xml. You should never modify the contents of the XML file.
You use the –p option to check the prerequisites for pureScale. You can specify the –i option
to specify that you want to check the prerequisites for the latest DB2 version that is defined
in the resource XML file. If you want to check for a specific version then you can specify
that via the –v option.
20
#IDUG
Pre-Installation Cheat Sheets
AIX: http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.qb.server.doc/doc/r0056077.html
Linux: http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.qb.server.doc/doc/r0057204.html
20
Within the pre-installation checklist sections of the Information Center (links shown here)
there are a couple of cheat sheets that can be used to help identify and record installation-
related information prior to the installation process.
21
#IDUG
Converting an Existing DB2 Instance to pureScale
• Update the instance to a DB2 pureScale instance using one of
db2iupdt (command line) or db2isetup (GUI)
• e.g. db2iupdt –d -m coralpib154 –mnet coralpib154-ib0
-cf coralpib153 –cfnet coralpib153-ib0
-instance_shared_dev /dev/hdisk2
-tbdev /dev/hdisk4 demoin1
> cat sqllib/db2nodes.cfg
0 coralpib154.torolab.ibm.com 0 coralpib154-ib0 - MEMBER
128 coralpib153.torolab.ibm.com 0 coralpib153-ib0 - CF
• If using db2iupdt, instance will start with one CF and
one member
• Add additional members and a second CF using db2iupdt -add
21
To convert the instance to a DB2 pureScale instance, you can use either the db2iupdt
command or db2isetup (GUI). You must run them as root.
The example on this slide shows how you could use db2iupdt. Note that you can specify
either “-instance_shared_dev” or “-instance_shared_dir”. If you have not yet created a GPFS
file system for the instance shared file system (sqllib_shared) then specify the first option
and a disk device. If you have already created a GPFS file system for it using
db2cluster_prepare then use the second option along with the path name.
At this point after having used db2iupt, the instance will have one CF and one member.
Typically you will want two CFs and at least two members, so add members and a CF as
necessary.
22
#IDUG
Moving Existing Databases to pureScale
• DB2 pureScale database prerequisites
• All table spaces must be managed by Automatic Storage
• Database and logs must reside on GPFS
• Can setup file systems in advance and move database to it
• If pureScale not yet installed, use db2cluster_prepare command to create DB2 managed
GPFS file systems
• Run db2checkSD to verify that a database is ready to be moved
to pureScale
• Verifies that unsupported features are not used, all table spaces are automatic
storage, etc.
• For example: db2checkSD SAMPLE -l db2checkSD.log
• Able to backup a non-pureScale database and restore into pureScale
• Source and target version of DB2 must both be at 10.5
pureScale
22
In pureScale, the database and logs must reside on a GPFS file system (so that it can be
shared across multiple hosts). The database needs to be moved to GPFS before performing
the upgrade – and can be done at any point in time prior to this. The db2cluster_prepare tool
can be used to create a DB2 managed GPFS file system in advance of installing pureScale.
Create as many file systems as is necessary (for database storage paths and the log files).
Note that the GPFS licensing that comes included with pureScale only allows it's use with
pureScale. You are not licensed to create GPFS file systems for non-pureScale purposes.
You can run the db2checkSD command against a database before moving it to pureScale to
determine whether you must take actions before using the database in a DB2 pureScale
environment. The db2checkSD command generates a script file containing information
about any issues found.
In DB2 10.5 you can backup and restore between non-pureScale and pureScale instances (at
the same version level – you cannot restore a DB2 10.1 non-pureScale backup image into a
DB2 10.5 pureScale instance).
23
#IDUG
Migrating Databases to Automatic Storage
• Enable database for automatic storage by creating a storage group
• Existing table spaces are not impacted at this point
• Drop temporary table spaces and recreate as AS
• Convert DMS table spaces to AS using one of
• ALTER TABLESPACE
• Redirected restore of table spaces
• Note that SMS cannot be converted to AS
• Fundamentally different architectures
• Must rebuild the table space (e.g. ADMIN_MOVE_TABLE)
• If SYSCATSPACE is SMS then two options exist
• Rebuild the database (could use db2look and db2move)
• Might be able to use Transportable Schema feature to move all existing table spaces into
a new automatic database (with AS SYSCATSPACE)
23
DB2 pureScale requires that all databases are using automatic storage (AS) exclusively. That means that every table space
in a pureScale database must be using automatic storage.
This slide is for those who plan on moving a database to pureScale but the database isn't using automatic storage yet. If the
database was created in DB2 9.1 or any release since then, automatic storage is on by default – even if no table spaces are
using it. For those databases that were created earlier than that or where automatic storage was specifically not enabled at
database creation time, it is very simple to enable it. Starting with DB2 10.1, this can be done by creating a storage group
for the database (which becomes the default storage group for the database).
For all temporary table spaces, you must drop them and recreate them as automatic storage.
DMS table spaces can be converted to automatic storage very easily – and in an online way. The ALTER TABLESPACE
statement is used to do the conversion and to subsequently rebalance the data from the old non-AS containers to the new
AS ones. Alternatively, if a restore is being performed for a database where automatic storage is enabled, you can do a
redirected restore of the non-AS table spaces, converting them to AS in the process.
SMS table spaces cannot be converted to automatic storage as they have fundamentally different architectures under the
covers. For these table spaces you must rebuild them, potentially using something like online table move
(ADMIN_MOVE_TABLE).
However, if your system catalog table space (SYSCATSPACE) is SMS then you can't just rebuild it. In this case you must
recreate the database, potentially using tools like db2look and db2move to help make it easier. Another alternative – if none
of the documented restrictions come into play – is to create a new empty database, which will be enabled for automatic
storage, and then use DB2's transportable schema feature. With this feature, table spaces can be moved from the existing
database (via a backup image) into the new database. See the Information Center for details on this feature and things that
might restrict using this as an option here.
24
#IDUG
Configuration
24
<no speaker notes>
25
#IDUG
Database Member Process Model
• DB2 engine address space
• i.e. a db2sysc process and its threads
• Watched by the watchdog (db2wdog) process
• Similar to single partition DB2 server
• Contains local buffer pools, lock list, database heap,
log buffer, sort heap, etc.
• Various EDUs such as agents, page cleaners, prefetchers,
log writer, agents, etc.
• pureScale-specific EDUs exist, including some
related to lock management, CF interactions, etc.
• Use db2pd –edus to see list of EDUs running on
a member
• No concept of a “catalog node/partition”
• All data (including system catalog tables) accessible
by all members
db2 agents and
other threads
bufferpool(s)
log buffer,
dbheap, and
other heaps
Pri CF
CF CS
Member
CS
Member
CS
Member
CS
Sec CF
CF CS
Member
25
A database member in pureScale is very much like a single instance in a single partition
DB2 server in that there is a db2sysc process and within it various threads are running (e.g.
agents, page cleaners). Within the memory space of the process are the typical memory
heaps and structures that you would find in DB2 including buffer pools, the lock list,
database heap, etc.
There are also various new EDUs (Engine Dispatchable Units – or threads) that have been
introduced to support a pureScale environment. Most notably are those associated with lock
management and CF interactions. It isn't necessary to get into all of the details around these
new EDUs here, but for more information you can look at the “DB2 Process Model” section
in the Information Center.
The “db2pd –edus” command can be used to display all of the EDUs associated with the
db2sysc process for a member.
In a DPF environment there is the concept of a catalog node/partition. However, this does
not exist in pureScale as each member has equal access to the shared data on disk.
26
26
#IDUG
Configuring Members
• Database manager and database configuration parameters are
categorized as either having a global or local (per-member) scope
• Global parameters
• Same value must be used across all members
• Used for parameters for which per-member values would have little value
and/or may create functional issues
• e.g. CF_DB_MEM_SZ, ALT_COLLATE, PAGE_AGE_TRGT_GCR
UPDATE DATABASE MANAGER CONFIGURATION
USING <parameter name> <parameter value>
UPDATE DATABASE CONFIGURATION FOR <database name>
USING <parameter name> <parameter value>
In pureScale there are some database configuration parameters that have a global scope and
others that have a local per-member scope. Even for per-member parameters, if a specific
member number isn’t provided as part of the UPDATE DATABASE CONFIGURATION or
UPDATE DATABASE MANAGER CONFIGURATION command then it is applied
globally (i.e. to each member).
Typically, those parameters that are global are those for which per-member values would
have little value and/or might create a functional issue.
Global database configuration parameters are stored in the global database configuration file
(which is located in the global database path).
27
27
#IDUG
Configuring Members (cont.)
• Local (per-member) parameters
• Allows for per-member settings where different resources per member, or
affinitized workload tuning may be useful
• e.g. instance memory, database heap size
• Parameter updates are applied to all members by default (second example below)
• Kept consistent unless otherwise overridden
• Use the MEMBER option to override a value for a specific member:
UPDATE DATABASE CONFIGURATION FOR <database name>
MEMBER <member num> USING <parameter name> <parameter value>
• Can still apply a change to all members at the same time:
UPDATE DATABASE CONFIGURATION FOR <database name>
USING <parameter name> <parameter value>
Local database configuration parameters can be set on a per-member basis (i.e. they can
have different values across the different members). This allows for member-specific
settings where affinitized workload tuning might be useful. For example, setting the size of
the sort heap or the package cache. This is done by specifying the MEMBER option and a
member number on the UPDATE DATABASE CONFIGURATION command.
By default, if the MEMBER option is not specified as part of the UPDATE DATABASE
CONFIGURATION command then all members will be updated. This provides single-
system view database configuration management
Per-member database configuration parameters are stored in the local database directories
and there is one configuration file per member.
Note that the GET DATABASE CONFIGURATION command returns information only for
the member on which it is executed.
28
#IDUG
Cluster Caching Facility (CF)
• Software technology that assists in global buffer
coherency management and global locking
• Shared lineage with System z
Parallel Sysplex
• Software based
• Services provided include
• Group Bufferpool (GBP)
• Global Lock Management (GLM)
• Shared Communication Area (SCA)
• Members duplex GBP, GLM, SCA
state to both a primary and secondary
• Done synchronously
• Set up automatically, by default
• Having a secondary CF is optional (but recommended)
Pri CF
CF CS
Member
CS
Member
CS
Member
CS
Sec CF
CF CS
Cluster Caching
Facility (CF)
SCA
CF worker threads
GLM
GBP
28
This slide describes what a CF (Cluster Caching Facility) is in a DB2 pureScale cluster. It’s
primary responsibilities are to manage locking across the cluster (via the GLM) and to
manage data page access across the members in the cluster (via the GBP). But it has various
other roles and responsibilities as well.
It is recommended (especially for production systems) to have two CFs defined (a primary
and a secondary) so that there is no single point of failure in case a planned or unplanned
outage occurs.
29
29
#IDUG
Configuring the CF
• CF configuration is managed through traditional DB2 configuration interfaces
• GET/UPDATE DB/DBM CFG commands and APIs
• Parameter values applied to both primary and secondary CFs
• CFs should be hosted on equally configured hardware/LPARs, so no need for different values
• Two configuration categories
• CF server configuration (maintained through DBM configuration)
• CF structure configuration (maintained through DB configuration)
• CF Server Configuration
AUTOMATIC
(based on # members, workers, etc.)
Initial size of CF connection pool for each member
CF_NUM_CONNS
AUTOMATIC
(typically 70-90% of machine memory)
Controls total amount of memory used by the CF
CF_MEM_SZ
AUTOMATIC
(# cores – 1)
Number of worker threads started by the CF
CF_NUM_WORKERS
NULL
(<INSTHOME>/sqllib/db2dump/ $m)
Fully qualified path for the cfdiag.*.log files
CF_DIAGPATH
2
(all errors)
Specifies types of diagnostic errors that will be
recorded in the cfdiag.*.log file
CF_DIAGLEVEL
Default Value
Description
DBM Parameter
The CFs are configured using the traditional DB2 configuration interfaces. There are two configuration
categories: CF server and CF structure. The CF server configuration can be updated using UPDATE DBM
CFG and the CF structure configuration can be updated using UPDATE DB CFG.
The CF server configuration parameters are listed on this slide.
The default value for the CF_DIAGPATH in DB2 9.8 was NULL, which meant to use the value of
DIAGPATH. And by default that was "<INSTHOME>/sqllib/db2dump". In DB2 10.1, the default for both
DIAGPATH and CF_DIAGPATH have changed to "<INSTHOME>/sqllib/db2dump/ $m" (which evaluates to
"<INSTHOME>/sqllib/db2dump/DIAG#####" where "####" is the member/CF number). Starting in DB2
10.1, the CF diagnostic data directory path writes to a private db2diag.log for each CF by default. To revert to
the behavior of previous releases, in which the diagnostic data for the CF is written to the same directory,
specify CF_DIAGPATH with a pathname and no token.
The default CF_NUM_WORKERS value (if AUTOMATIC) is the number of logical CPUs (cores) – 1. Note
that on Power, each hardware thread is seen to the OS as a CPU core. For very small CF configurations,
recovery time performance can be helped by having 2 free hardware threads on the CF (i.e.
CF_NUM_WORKERS = (logical CPUs – 2)).
The default for CF_MEM_SZ value (if AUTOMATIC) is 70%-90% of the total available memory on the CF
(depends on whether CF and members co-exist).
When you set CF_NUM_CONNS to AUTOMATIC (the default), DB2 creates an initial number of CF
connections for each member with each CF at start time. This initial number is based on the number of worker
threads, number of connections per worker thread, and the number of members in the cluster.
30
30
#IDUG
Configuring the CF (cont.)
• CF Structure Configuration
AUTOMATIC
(based on CF_MEM_SZ and # active
databases settings)
Total amount of memory used by the
CF (includes GBP+GLM+SCA)
CF_DB_MEM_SZ
AUTOMATIC
(Remainder of memory from
CF_DB_MEM_SZ)
Amount of memory used by the
Group Buffer Pool (GBP) in the CF
CF_GBP_SZ
15
Target time in minutes for catch up
to bring a newly restarted CF into
peer state
CF_CATCHUP_TRGT
AUTOMATIC
(5-20% of CF_DB_MEM_SZ)
Amount of memory used by the
Shared Communication Area (SCA) in
the CF
CF_SCA_SZ
AUTOMATIC
(15% of CF_DB_MEM_SZ)
Amount of memory used by the
Global Lock Manager (GLM) in the CF
CF_LOCK_SZ
Default Value
Description
DBM Parameter
The CF structure configuration parameters are listed on this slide. These parameters specify
the size of the various structures that get created in the CF when a database is activated.
The cluster caching facility structure memory used for Group Buffer Pool (GBP), lock usage
(GLM), and Shared Communication Area (SCA) is allocated for the cluster caching facility
during the first database activation on any member and remains allocated until deactivation
on the last member. These parameters have a default value set to AUTOMATIC. When set
to AUTOMATIC, DB2 computes appropriate sizes for these parameters during database
activation. Because these values are closely related and dependent on one another, manually
setting at least one of the parameters causes none of the parameters to be calculated during
activation even if some parameters remain set to AUTOMATIC. Their values are what the
most recent automatically calculated values were.
The ONLINE option is also supported for structure parameters. Any updates to CF memory
parameters are applied immediately. Update requests are synchronous and are not returned
until the new value is set by the CF server.
31
#IDUG
Castout
• The process of writing dirty pages from the GBP
out to disk is called castout
• Similar in concept to page cleaning
• Two purposes
• Maintain a specific recovery window by ensuring that
no pages in the GBP are older than a certain age
• To keep GBP from getting full, allowing free space for
new pages being stored there
• Page writes not performed by the CFs directly
• Members sent pages to castout from CF via RDMA
(into private buffers on the members)
• Specialized page cleaner threads running on the
members write the pages out to disk
• Configuring castout:
• NUM_IOCLEANERS: Number of castout page
cleaner threads per member (default is AUTOMATIC)
• PAGE_AGE_TRGT_GCR: Age of pages in GBP
before castout to disk (default 240 seconds)
GBP
CF
Bufferpool(s)
Member
Local Castout
Buffers
Castout
Page
Cleaner
Threads
31
Bufferpool(s)
Member
Local Castout
Buffers
Castout
Page
Cleaner
Threads
Castout is the process of writing dirty pages from the GBP out to disk. If you're familiar with page cleaning in
non-pureScale DB2 where the page cleaners write out dirty pages from the buffer pool(s) to disk then it's easy
to understand what castout is. However, rather than cleaning the pages out to disk from the local buffer pools
(which is still done via page cleaning in pureScale), castout is the writing of the pages from the GBP to disk.
Castout is important for two reasons: 1) To write out dirty pages to disk and to ensure there are enough clean
directory entries and data elements in the GBP to use for new page registrations and writes. 2) To maintain a
specific recovery window (for Group Restart) by ensuring that no pages in the GBP are older than a certain
age. This reduces the number of log records that must be replayed in the case of a Group Crash Recovery
during Group Restart processing.
The page writes are not actually performed by the CFs directly. Instead, pages are passed from the CF to the
members via RDMA and it is the members that do the write I/O. The local buffer pools are not used during this
process. Instead, dedicated memory buffers within the members are used. Special page cleaner threads called
"castout engines" are used to do the I/O.
The number of threads used in castout per member is determined by the NUM_IOCLEANERS database
configuration parameter. When this is set to a value (or DB2 derives a vale for AUTOMATIC) then there are
this many regular page cleaners and this many castout page cleaners created at database activation time.
PAGE_AGE_TRGT_GCR will be covered in more detail later.
32
#IDUG
Backup, Logging and Recovery
32
<no speaker notes>
33
#IDUG
pureScale Recovery Basics
• Data is shared, but each member maintains its own set of logs
• Commonly referred to as a log stream
• Logs must be on the clustered file system
• Members only write to their own log stream, but can read from others
• For example: during merged log recovery
• Failures may require member
recovery or group recovery
for database
• Single system view backup,
restore, and rollforward
DB2 DB2 DB2
Shared
Data
Logs Logs Logs
Clustered
File System
33
In a pureScale database there is a single partition of data that each member has access to and
can modify. However, each member has its own dedicated set of log files, commonly
referred to as a log stream. Like the data in the database, the logs must reside on a GPFS
clustered file system – and it’s recommended that the logs be on their own file system,
separate from the data. Each member will only ever write to its own set of log files but it
may need to read the logs from other members. For instance, during a database rollforward
or group crash recovery where log merging takes place, the log files from all members are
read by a single member to perform the operation (log merging will be described in more
detail later on in this presentation).
Different types of failures can occur while a pureScale database is up and running (e.g.
power outage to the cluster, hardware failure on one machine). Recovery from failures may
involve just recovering and replaying through the logs of one member or of all the members
in the cluster. This too will be discussed in more detail later on.
Unlike in DPF where commands like BACKUP and RESTORE are performed on individual
nodes (database partitions), in pureScale you just execute these commands from any
member and work is performed against the entire database.
34
#IDUG
Backup and Restore
• No differences in command syntax or usage in pureScale
• Continue to utilize autonomic settings for buffers and parallelism
• Can continue to take advantage of compression, including logs, etc.
• Single system view backup and restore operations are executed on and performed
by a single member
• Unlike DPF where backup needs to be done on each database partition (remember,
there’s only one data partition in pureScale)
• BACKUP command can be executed from any member
• Can be as simple as: db2 backup database proddb
• RESTORE of an image can be done on any member
• Can be as simple as: db2 restore database proddb on …
• Can set UTIL_HEAP_SZ by member if dedicated backup member
• 3rd party backup products require that backups/log archives can be retrieved by
any host in the cluster
• Use proxy nodes with TSM
34
There are no differences in the command syntax of BACKUP and RESTORE or in how they are used between
non-pureScale DB2 and DB2 with pureScale. If you have experience running these commands outside of
pureScale then you’ll know how to use them in pureScale.
If you are familiar with backup and restore in a DPF environment, you will know that you have to backup each
database partition (although we make this easy through the “single system view” feature of backup by allowing
you to backup all partitions using a single command – but a backup image is in fact generated for each of the
partitions). In pureScale, you only have to run the BACKUP command on one member (any of them) and the
resulting image will contain everything for that database. And if you ever have to restore the database, you can
execute the RESTORE command on any of the members as well… it doesn’t have to be the one where the
backup image was generated.
35
#IDUG
Backup and Restore (cont.)
• Single backup image gets generated for the database and includes
• Description of members (topology) at time of backup
• Global/cluster metadata files (e.g. recovery history file, global config files)
• Member-specific metadata files for every member (e.g. member LFH files, local
config files)
• Log files from each active member for online backups
• Table space data
• Only one copy/partition of data exists, regardless of the number of members
DB2
Member
DB2
Member
BACKUP DATABASE TESTDB
TESTDB.0.db2inst.DBPART000.20120922191905.001
Shared Data
Logs Logs
Local
Config
Local
Config
Global
Config
35
The backup image that gets generated contains everything associated with the database
including the data itself, a description of the topology of the database, global/cluster
metadata files, and per-member metadata files. It doesn’t matter on which member the
database is backed up or restored since we have everything we need within the backup
image.
36
#IDUG
Topology-Changing Backup and Restore
• Backup and restore between topologies with differing numbers of members
Data
Member 0 Member 1 Member 2 Member 3
CF
CF
4 member instance
Data
Member 0 Member 1
CF
CF
2 member instance
Backup
Image
Online
Backup
Restore
Data
Member 0 Member 1 Member 2 Member 3
CF
CF
4 member instance
Data
Member 0 Member 1 Member 2
CF
CF
3 member instance
Backup
Restore
Backup
Image
To superset of members
To subset of members
36
You can restore a pureScale database backup to a different number of members. Also, you
can restore a non-DB2 pureScale backup image to a DB2 pureScale instance (and vice-
versa). The next slide covers the latter.
All of this applies to snapshot backups as well (keeping in mind that snapshot backups are
typically online, but can be taken offline).
37
#IDUG
Backup and Restore To/From pureScale
• Backup and restore from pureScale to non-pureScale (and vice-versa)
Data
Member 0 Member 1 Member 2 Member 3
CF
CF
4 member pureScale instance
Data
DB2
Non-pureScale instance
Data
Member 0 Member 1 Member 2
CF
CF
3 member pureScale instance
Data
DB2
Non-pureScale instance
Backup
Restore
Backup
Image
Backup
Restore
Backup
Image
To non-pureScale
To pureScale
37
You can also restore a non-DB2 pureScale backup image to a DB2 pureScale instance (and
vice-versa). In the case of restoring from non-DB2 pureScale to DB2 pureScale, per the DB2
pureScale prerequisites the database must be using automatic storage for all of the table
spaces (restore will fail otherwise). The target DB2 pureScale storage must be on GPFS but
it does not matter what kind of file system was being used on the original non-pureScale
source system.
This top example on this slide shows a situation where we are moving a database from a
pureScale instance to a non-pureScale instance. The bottom example shows the reverse of
this.
38
#IDUG
Set Write Suspend and Snapshot Backups
• SET WRITE SUSPEND and SET WRITE RESUME
• Suspends and resumes DB2 writes to the database
• Executed on one member, writes are suspended across all members
• Can use snapshot as a backup image, clone, or standby database
• Works in conjunction with db2inidb command
• Additional GPFS and db2cluster steps are required on top of SET WRITE
SUSPEND/RESUME
• See Information Center for full list of steps
• State available through database configuration parameter: SUSPEND_IO
• Values are YES, NO, or IN_PROGRESS
• History file record is not generated for backup
38
The SET WRITE SUSPEND and SET WRITE RESUME commands are used as part of the
process for taking snapshot (a.k.a. split mirror or flash copy) backups of the database. When
the SET WRITE SUSPEND command is issued on a member it distributes the request to all
of the members in the cluster, stopping writes to the database across all of those members.
SET WRITE RESUME works in a similar way to reverse the process, allowing writes to be
done against the database again.
In a pureScale environment there are other steps that must be followed to perform a snapshot
backup and subsequently use that copy of the database for the purposes of a clone, standby
image, or as a backup. These steps are documented in various sections in the Information
Center.
In older versions of DB2, you could tell if writes were suspended by looking at table spaces
states. Now, there is a database configuration parameter called SUSPEND_IO. This is an
informational parameter (i.e. it cannot be explicitly set using UPDATE DB CFG) and will
show one of YES, NO, or IN_PROGRESS.
39
#IDUG
BACKUP DATABASE PRODDATA USE SNAPSHOT SCRIPT '/scripts/snapshot.sh'
RESTORE DATABASE PRODDATA USE SNAPSHOT SCRIPT '/scripts/snapshot.sh'
TAKEN AT 20140307183200
Snapshot Backup Scripts
• Allows for integrated snapshot backup capabilities
for those storage devices not supported by
DB2 Advanced Copy Services (ACS)
• Custom script implements the DB2 ACS API
• Users or storage vendors can write their own scripts
• Write operations to the database are automatically suspended and resumed by DB2
during the backup process
• Benefits include
• Wider storage support
• Avoids need for manual snapshot backup process in pureScale
• Manually running SET WRITE SUSPEND, SET WRITE RESUME, db2inidb, and storage
vendor commands can be error prone
• History file record is generated
39
If you were performing a snapshot operation in DB2 10.1, you either had to use storage
hardware that provided a vendor library that supported the DB2 ACS API (for non-
pureScale only), or you had to write your own script -- which included having to suspend
and resume writes to the database and call the underlying storage commands to take the
snapshot. There are some drawbacks to writing a script like this. For instance, they can be
difficult and error-prone to write, especially in regards to suspending and resuming database
operations. Also, they do not generate a history file entry, so you cannot monitor the
progress and success of the snapshot operation.
In DB2 10.5, these trade-offs have been eliminated. The DB2 ACS API is now wrapped in
the library for DB2 ACS. The library invokes a custom script to perform the snapshot
operation. DB2 takes over the error-prone actions like issuing the SET WRITE SUSPEND,
SET WRITE RESUME, and db2inidb commands at the correct time. At the same time,
because the DB2 ACS API is being used as part of a true DB2 backup operation, an entry is
made in the recovery history file for every snapshot operation, allowing you to monitor
successful and unsuccessful backups.
We document the API requirements within the Information Center and it is possible for DB2
users to write their own scripts. However, it is suggested that people reach out to their
storage vendor to have them write and provide a script instead.
40
#IDUG
Logging Configuration Parameters
• In pureScale, each member maintains its own set of log files
• Most logging related parameters are global in scope
• The following parameters have member (local) scope
• BLK_LOG_DSK_FUL, LOGBUFSZ, MAX_LOG, NUM_LOG_SPAN
• Log paths are global but each member has its own subdirectory within it
• <logPath>/NODE0000/LOGSTREAM####
• Applies to the log path, mirror log path, and overflow log path
• Default log path is in the global database directory
• <dbPath>/<instance>/NODE0000/SQL#####/LOGSTREAM####
40
In a pureScale cluster, there is a single partition of data but each member maintains its own
set of log files.
Most of the logging related parameters are global in scope. This means that you can not set
them to different values on different members. However, there are a few (shown above) that
have a member scope and can be updated to different values across members (UPDATE DB
CFG FOR <dbName> MEMBER <#> …)
As previously mentioned, each member has its own set of log files (i.e. log stream). You
specify a single log path for the cluster but a subdirectory is created under that log path for
each log stream. The same is true for the mirror log path and the overflow log path
configuration parameters.
Prior to pureScale, the default log path was <dbPath>/<instance>/SQL#####/SQLOGDIR.
This was changed in 10.1 to what is shown on the slide (even for non-pureScale instances of
DB2).
41
#IDUG
Log File Management
• Log files are archived independently on each member
• Archive targets have member-specific directories/name space
• <ArchPath>/<instance>/<databaseName>/NODE0000/LOGSTREAM####
• Member performing a log merge operation (e.g. rollforward) retrieves logs from all
members as needed
DB2
Member
DB2
Member
ARCHIVE (/archivePath)
LOG PATH (/logPath)
/logPath/
NODE0000/
LOGSTREAM0000/
S0000000.LOG
S0000001.LOG
S0000002.LOG
...
LOGSTREAM0001/
S0000000.LOG
S0000001.LOG
S0000002.LOG
...
/archivePath/
<instance>/
<databaseName>/
NODE0000/
LOGSTREAM0000/
C0000000/
S0000000.LOG
S0000001.LOG
S0000002.LOG
LOGSTREAM0001/
C0000000/
S0000000.LOG
S0000001.LOG
S0000002.LOG
Member 0
Member 1
41
In a pureScale cluster, there is a single partition of data but each member maintains its own
set of log files. Assuming that log archiving is enabled, as log files are filled on a member,
they are archived independently of what is happening on the other members. If the archive
location is a disk location then there will be a directory per log stream
(LOGSTREAM####). In the case of an archive location like TSM, there is a database
partition number that makes up part of the name space and in pureScale this value represents
the log stream (member) number.
For log merge operations (such as rollforward), when reading archived log files that are
owned by other members, a member might need to retrieve log files into its own log path or
overflow log path. A set of subdirectories is created in the member’s log path for retrieved
log files.
42
#IDUG
Member vs. Group Crash Recovery
• Two types of crash recovery (determined by DB2, based on state of cluster)
• Member crash recovery (MCR)
• When one or more members fail with at least one CF remaining available
• Only requires one log stream (per recovering member) to perform recovery
• All data available during MCR (except in flight data)
• Group crash recovery (GCR)
• Simultaneous failure on both CFs
• Similar to crash recovery without pureScale except that log streams are merged
• Database will open for connections when recovery completes
db2sysc
CFp
CFs
db2sysc
Log Log
db2sysc
CFp
CFs
db2sysc
Log Log
CFp
CFs
db2sysc
Log
db2sysc
Log
Log
db2sysc
CFp
CFs
db2sysc
Log Log
db2sysc
Log
db2sysc
Log
CFp
CFs
db2sysc
db2sysc
Log Log
CFs
CFp
CFs
CFp
Log Log
42
This slide summarizes the differences between the two types of crash recovery in DB2
pureScale: member crash recovery (MCR) and group crash recovery (GCR).
MCR occurs when one or more members fail with at least one CF remaining available. It
only requires reading and recovering through one log stream (per member requiring
recovery). The database is online and accessible on other members. All data is available on
those other members except for in-flight data (data that was in the process of being
inserted/updated/deleted when the member went down).
GCR occurs when there are simultaneous failures of both CFs. This is similar to crash
recovery in non-pureScale environments in that the database is offline and inaccessible.
However, in pureScale all of the log streams are read and merged before replay occurs. Once
this is done the database is open for business. Note that for a GCR to occur, it requires a
double failure – both CFs (which should be on different hosts) coming down – and this
should be a very rare occurrence.
43
#IDUG
Tuning Crash Recovery in pureScale
• Force at Commit protocol means that pages are typically being "persisted" to GBP
much more quickly than to disk in non-pureScale
• Member crash recovery typically very fast as a result
• Group crash recovery impacted by rate of castout (page cleaning) from GBP to disk
• SOFTMAX database configuration parameter deprecated in DB2 10.5
• Replaced by PAGE_AGE_TRGT_MCR and PAGE_AGT_TRGT_GCR
• SOFTMAX=0 means use new parameters (set by default for new databases)
• PAGE_AGE_TRGT_MCR
• Target duration (in seconds) for changed pages to be kept in the local buffer pool before
being persisted to disk or to the group buffer pool (GBP)
• PAGE_AGE_TRGT_GCR
• Target duration (in seconds) for changed pages to be kept in the GBP before being
persisted (castout) to disk
43
Member crash recovery requires reading through the log files for the member that failed, redoing log records in the member's recovery
window to ensure that committed changes persisted, and undoing log records to rollback transactions that were still running at the time of
failure. Member crash recovery is typically very fast. One of the reasons for this is pureScale's "force at commit" protocol. When
transactions are executing on a member, pages are being read into the local buffer pool and modifications are made locally. When a
transaction commits, all of the pages that were modified by the transaction get sent to the GBP (on both the primary and secondary CF).
This means that during redo processing of member crash recovery, for all transactions that have committed we are going to find that the
updates have already been made and there is no need to actually redo the work. Plus, we are quite likely going to find the pages we need to
look at in the GBP and the member can get them very quickly via RDMA – which is much faster than reading pages from disk. For both of
these reasons, member crash recovery is typically very fast and there's not a lot of configuration needed to control it (for longer
transactions, PAGE_AGE_TRGT_MCR comes into play more – see below).
Externally, a group crash recovery is like crash recovery in non-pureScale in that it redoes all committed transaction updates that have not
yet been written to disk (remember, if we're doing a GCR then we've lost the contents of the GBP and that’s why GCR is needed) and it
undoes all of the work associated with in-flight (uncommitted) transactions. Internally, though, there are a few differences. For one, the logs
from all of the members’ log streams are merged into a single logical log stream that is replayed during the redo phase (this merging takes
place in memory and does not get written to disk). And at run-time, while individual members maintain their own MinBuffLSN values,
there is a cluster-wide global MinBuffLSN that is also maintained. Also, there is a concept of a cluster-wide current LSN, which is
essentially the highest LSN that has been consumed across the cluster. These values can then be used to determine the range of log records
in the merged log stream that corresponds to pages that haven’t been written out to disk yet (i.e. dirty pages that are currently sitting in the
GBP or in local buffer pools). In pureScale, it is this range within the merged log stream that gets compared to SOFTMAX (deprecated) /
PAGE_AGE_TRGT_GCR to determine when to persist the changes to disk.
When the global MinBuffLSN falls outside of that SOFTMAX / PAGE_AGE_TRGT_GCR range, the old pages in the GBP need to be
written out to disk. This is performed by castout threads (also known as castout page cleaners or castout engines) running on the members.
They read the old pages from the CF and write them out to disk.
PAGE_AGT_TRGT_MCR configures the target duration (in seconds) for changed pages to be kept in a local buffer pool before being
persisted to disk or to the group buffer pool (GBP) (via page cleaning). PAGE_AGE_TRGT_MCR applies to non-pureScale DB2 as well.
In pureScale, the default value for PAGE_AGT_TRGT_MCR is 120 seconds.
PAGE_AGE_TRGT_GCR configures the target duration (in seconds) for changed pages to be kept in the GBP before being persisted
(castout) to disk. PAGE_AGE_TRGT_GCR is applicable to pureScale only. The default value for PAGE_AGE_TRGT_GCR is 240
seconds.
44
#IDUG
Archive Location
Rollforward
• ROLLFORWARD DATABASE command can be executed on any member
• All processing performed by that single member
• An interrupted or stopped rollforward it can be started again from any member
• Applies to both database rollforward and table space rollforward
• Log files retrieved from archive location and
merged for replay purposes
DB2
Member
DB2
Member
ROLLFORWARD
DATABASE
Database
(shared data)
Log Stream 0 Log Stream 1
Redo
44
Database and table space-level rollforward are both supported.
When issuing a ROLLFORWARD DATABASE command it can be done from any
member. All processing will be performed by that single member and if it happens to get
interrupted or stopped then it can be started again from any other member (or the same one,
it doesn’t matter).
Logs are merged for replay purposes to ensure that all of the work done across the cluster is
replayed and in the correct order. Log files will be retrieved from the archive location if they
are not local.
45
#IDUG
Rollforward (cont.)
• Rollforward to point-in-time, end-of-logs, or end-of-backup
• Point-in-time operation stops when it encounters the first log record from any
of the log streams whose timestamp is greater than the specified time stamp
• Important that member clocks are synchronized as close as possible
• For table space rollforward, point-in-time must be greater than or equal to the
minimum recovery time (MRT) for the table space
• Rollforward status shows logs replayed per member (log stream)
Input database alias = TESTDB
Number of members have returned status = 3
Member Rollforward Next log Log files processed Last committed transaction
ID status to be read
------ ----------- ------------ ------------------------- --------------------------
0 DB working S0000014.LOG S0000005.LOG-S0000013.LOG 2014-03-12-14.39.23.000000 UTC
1 DB working S0000004.LOG S0000002.LOG-S0000003.LOG 2014-03-12-14.39.18.000000 UTC
2 DB working S0000011.LOG S0000001.LOG-S0000010.LOG 2014-03-12-14.39.25.000000 UTC
45
You can rollforward a database to a point-in-time, to end-of-logs, or to end-of-backup. A
point-in-time operation will stop when it encounters the first log record from any of the log
streams whose timestamp is greater than the specified timestamp. The caveat here is that the
member clocks should be synchronized as close as possible across the cluster. If one of the
members is actually significantly ahead of the rest then it’s possible that you might not
actually get as far as you would like. The use of NTP (Network Time Protocol) with
pureScale ensures that the machine clocks are kept in very close synch across the cluster.
When a rollforward completes, or when querying the rollforward status of a database, you
will see the status from each member log stream. This includes the log files processed and
the last committed transaction encountered. This is similar to what is shown in a DPF
environment when rolling forward multiple database partitions.
46
#IDUG
Storage Management
46
<no speaker notes>
47
#IDUG
IBM General Parallel File System (GPFS) and pureScale
• GPFS is a scalable, highly-available, high performance file
system optimized for multi-petabyte storage management
• Shipped with, installed and configured as part of pureScale
• DB2 pureScale license includes GPFS for use with cluster
• GPFS provides concurrent access from all hosts in the instance to instance
meta data and database files on disk
• Provision shared LUNs from storage administrator
• System administrator changes ownership of LUNs to DB2 instance owner
• Instance owner can create GPFS file systems with the storage LUNs
• Best practice:
• One (or more) GPFS file system for automatic storage and table spaces
• One GPFS file system for database metadata (database path) and log files
• db2cluster -cfs command used for typical GPFS
management activities
47
The IBM General Parallel File System – or GPFS – is a scalable, highly-available, high performance file system optimized for multi-petabyte storage
management. It has historically been used extensively in high performance computing (HPC) environments. As previously mentioned, it is shipped with,
installed, configured, and updated as part of pureScale.
GPFS provides concurrent access from all of the hosts in the instance to the instance’s meta data files (sqllib_shared) and to the database’s files (storage
paths, table space containers, and logs).
Physical disks within a disk subsystem are not usually directly accessible by host systems, such as DB2 database servers, and they are not directly visible to
DBAs. Storage administrators provision units of storage as logical unit numbers (LUNs), which appear to host systems as SCSI disks. A LUN, however, is a
completely virtual entity that can map to any combination of physical disks. An individual LUN can be a single RAID array, a portion of a RAID array, a
single physical disk, a portion of a disk, or a meta of multiple RAID arrays.
In pureScale, these LUNs must be made visible to all of the hosts in the cluster so that the GPFS file systems that are created on them can be visible to all
hosts.
To start with, these shared LUNs must be created. You will need to ask the storage administrator to do something called LUN masking or LUN zoning and
make these LUNs visible to all of the hosts in the instance (unlike in ESE where only one host uses a particular LUN).
The file systems are then created on these LUNs and they are mounted so that they are equally visible on all of the hosts.
GPFS has its own volume manager and so it does not use the logical volume manager on AIX. Therefore, the system administrator does not need to create
logical volumes. Instead, GPFS consumes these LUNs directly.
However, the system administrator does have to change the ownership of the LUNs/devices (e.g. /dev/hdisk1) to the instance owner ID. Once that is done, the
DBA can use the db2cluster command to create the file systems, mount them, and so on.
Some enterprises might not actually be comfortable doing this. In their opinion, anything that smells like a file system is something that needs to be handled
by a system administrator. That is fine. The system administrator doesn’t have to change the permissions on the LUNs to create the file system. He will just
have to create it as root or a super user ID using either GPFS commands or the db2cluster command. Finally, he can then change the permissions on the file
system to that of the instance owner so that it can be used by DB2.
48
#IDUG
Instance Directory Structure
…
../sqllib/bin/
../sqllib/bnd/
../sqllib/lib64/
…
../sqllib/ctrl/
../sqllib/db2dump/
../sqllib/db2nodes.cfg
../sqllib/db2systm
../sqllib/function/
…
../sqllib/adm/
…
…
../sqllib/bin/
../sqllib/bnd/
../sqllib/lib64/
…
../sqllib/ctrl/
../sqllib/db2dump/
../sqllib/db2nodes.cfg
../sqllib/db2systm
../sqllib/function/
…
../sqllib/adm/
…
…
../sqllib_shared/ctrl/
../sqllib_shared/db2dump/
../sqllib_shared/db2nodes.cfg
../sqllib_shared/db2systm
../sqllib_shared/function/
…
<shared file system>/<instance>/
…
../bin/
../bnd/
../lib64/
…
Local install directory on each host
(e.g. /opt/IBM/db2/V10.5)
…
../sqllib/bin/
../sqllib/bnd/
../sqllib/lib64/
…
../sqllib/ctrl/
../sqllib/db2dump/
../sqllib/db2nodes.cfg
../sqllib/db2systm
../sqllib/function/
…
../sqllib/adm/
…
…
../sqllib/bin/
../sqllib/bnd/
../sqllib/lib64/
…
../sqllib/ctrl/
../sqllib/db2dump/
../sqllib/db2nodes.cfg
../sqllib/db2systm
../sqllib/function/
…
../sqllib/adm/
…
…
../sqllib/bin/
../sqllib/bnd/
../sqllib/lib64/
…
../sqllib/ctrl/
../sqllib/db2dump/
../sqllib/db2nodes.cfg
../sqllib/db2systm
../sqllib/function/
…
../sqllib/adm/
…
…
../sqllib/bin/
../sqllib/bnd/
../sqllib/lib64/
…
../sqllib/ctrl/
../sqllib/db2dump/
../sqllib/db2nodes.cfg
../sqllib/db2systm
../sqllib/function/
…
../sqllib/adm/
…
…
../sqllib/bin/
../sqllib/bnd/
../sqllib/lib64/
…
../sqllib/ctrl/
../sqllib/db2dump/
../sqllib/db2nodes.cfg
../sqllib/db2systm
../sqllib/function/
…
../sqllib/adm/
…
…
../sqllib/bin/
../sqllib/bnd/
../sqllib/lib64/
…
../sqllib/ctrl/
../sqllib/db2dump/
../sqllib/db2nodes.cfg
../sqllib/db2systm
../sqllib/function/
…
../sqllib/adm/
…
Member/local sqllib directories
48
This is the instance directory structure (a.k.a. the “sqllib” directory structure). This is just
showing some of the main subdirectories and files that are found in there. If you do an “ls -l”
command in this directory you will see a lot more.
First off, you will notice that it looks an awful lot like the directory structure in non-
pureScale instances of DB2. In those cases, you have some directories and files that are local
within the sqllib directory and some that point back to the installation directory. In
pureScale, each member has its own sqllib directory and there is a combination of local
directories and files, links that point back to the local installation directory (remember that
the pureScale binaries get installed locally on each host as part of the installation process),
and links that point to the sqllib_shared directory which gets created on the cluster shared
file system (these are common across all hosts in the cluster).
Having a structure like this provides many benefits. Common information in the
sqllib_shared directory provides for single-system view instance management but having
local binaries and libraries allows for things like rolling fix pack updates.
49
#IDUG
Database Directory Layout
• Database directory resides in <dbpath>/<instance_name>/NODE0000
sqldbdir/
SQL00001/ <-- database token
db2event/ <-- global event monitors
db2rhist.asc/bak <-- history files
LOGSTREAM####/ <-- log directory (one per member)
SQLOGCTL.GLFH.1,2 <-- global LFH files
SQLDBCONF <-- global configuration file
SQLSGF.1,2 <-- storage group control files
SQLSPCS.1,2 <-- table space control files
MEMBER0000
MEMBER0001
…
MEMBER0002
…
SQL00002/
…
SQL0000X/
…
DBNAME1
T#######/ <-- AS table space containers for DBNAME1
DBNAME2
T#######/ <-- AS table space containers for DBNAME2
…
sqldbdir/
SQL00001/ <-- database token
db2event/ <-- global event monitors
db2rhist.asc/bak <-- history files
LOGSTREAM####/ <-- log directory (one per member)
SQLOGCTL.GLFH.1,2 <-- global LFH files
SQLDBCONF <-- global configuration file
SQLSGF.1,2 <-- storage group control files
SQLSPCS.1,2 <-- table space control files
MEMBER0000
MEMBER0001
…
MEMBER0002
…
SQL00002/
…
SQL0000X/
…
DBNAME1
T#######/ <-- AS table space containers for DBNAME1
DBNAME2
T#######/ <-- AS table space containers for DBNAME2
…
MEMBER####/ <-- Member-local directory
db2event/ <-- local event monitors
SQLDBCONF <-- local DB config
SQLBP.1,2 <-- local BP config
SQLOGCTL.LFH.1,2 <-- local LFH files
SQLOGMIR.LFH <-- mirror LFH file
SQLINSLK,SQLTMPLK <-- lock files
MEMBER####/ <-- Member-local directory
db2event/ <-- local event monitors
SQLDBCONF <-- local DB config
SQLBP.1,2 <-- local BP config
SQLOGCTL.LFH.1,2 <-- local LFH files
SQLOGMIR.LFH <-- mirror LFH file
SQLINSLK,SQLTMPLK <-- lock files
Partition-Global Directory
Member-Local Directory
49
The layout of the database directory structure was changed to accommodate multi-member
pureScale databases and this applies to non-pureScale DB2 as well.
There is a partition-global directory that contains metadata files that are global to the
database and this is found at <dbpath>/<instance_name>/NODE0000/SQL##### (remember
that there is just a single database partition in a pureScale database and it is called partition 0
– hence the files are found in NODE0000). There is an SQL##### directory for each
database in the instance (where the number is assigned when the database is created).
There are also member-specific metadata files. These are found within the MEMBER####
directories under <dbpath>/<instance_name>/NODE0000/SQL#####. Each member has a
corresponding MEMBER#### directory (e.g. member 0 is in MEMBER0000). The instance
database lock files, SQLINSLK,and SQLTMPLK, help to ensure that a database is used by
only one instance of the database manager on each member.
50
#IDUG
Storage Management
• Storage management in pureScale is mainly performed through GPFS
• E.g. Add storage to database by adding disks to existing file system(s)
• db2cluster –cfs is used to perform cluster file system management operations
• Supported options
• create: Create a shared file system
• add: Add disks to existing shared file system
• remove: Remove disks from existing shared file system
• delete: Delete shared file system
• set: Set various configuration options
• list: Returns details about configuration and tie-breaker disk
• verify: Verifies configuration of file system cluster
• mount/unmount: Mounts/unmounts a file system
• rebalance: Restripes the data on disk across all disks in the file system
• enter/exit -maintenance: Puts host into/takes host out of
maintenance mode
50
In pureScale, storage management is mainly performed through GPFS. For instance, you
would typically create a GPFS file system (or more than one) to place a database on. If you
require additional space to be added to the database then you wouldn't create new file
systems and add them. Instead, you would add new storage into the existing file system(s).
But rather than making you learn GPFS commands, you can use DB2's db2cluster command
with the –cfs option (there is also a –cm option used to manage the cluster manager).
The options of the db2cluster command that you can use depend on your authorization level.
Some options can only be specified by the DB2 cluster services administrator, others can
only be specified if you are part of the SYSADM, SYSCTL or SYSMAINT group, and a
smaller subset of commands can be run by any user ID on the system.
51
#IDUG
Example Commands
• List all file systems in cluster
$> db2cluster -cfs -list -filesystem
FILE SYSTEM NAME MOUNT_POINT
--------------------------------- -------------------------
db2fs1 /db2sd_20140112025840
db2sddata /db2sdfs/db2sd_data
db2sdlogs /db2sdfs/db2sd_log
• List disks associated with a specific file system
$> db2cluster -cfs -list -filesystem db2sddata -disk
PATH ON LOCAL HOST OTHER KNOWN PATHS
--------------------------------- -------------------------
/dev/hdisk5
• List configuration of a specific file system
> db2cluster -cfs -list -filesystem db2sddata –configuration
OPTION VALUE
blockSize 1048576
defaultMountPoint /db2sdfs/db2sd_data
…
51
These are examples of the “db2cluster –cfs –list” command.
The first example shows the command to list all of the file systems in the cluster. The
second example shows all of the disks associated within the db2sddata file system. The third
example shows the configuration of the db2sdddata file system.
52
#IDUG
Monitoring
52
<no speaker notes>
53
#IDUG
0 host0 0 - MEMBER
1 host1 0 - MEMBER
2 host2 0 - MEMBER
3 host3 0 - MEMBER
4 host4 0 - CF
5 host5 0 - CF
db2nodes.cfg
Host status
Instance status
> db2start
08/24/2008 00:52:59 0 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:00 1 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:01 2 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:01 3 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
> db2instance -list
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT
0 MEMBER STARTED host0 host0 NO
1 MEMBER STARTED host1 host1 NO
2 MEMBER STARTED host2 host2 NO
3 MEMBER STARTED host3 host3 NO
4 CF PRIMARY host4 host4 NO
5 CF PEER host5 host5 NO
HOST_NAME STATE INSTANCE_STOPPED ALERT
host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO
Instance and Host Status
0 host0 0 - MEMBER
1 host1 0 - MEMBER
2 host2 0 - MEMBER
3 host3 0 - MEMBER
4 host4 0 - CF
5 host5 0 - CF
db2nodes.cfg
DB2 DB2 DB2 DB2
Single Database View
CF CF
Shared Data
host1
host0 host3
host2
host5
Clients
host4
53
The db2instance –list command is the most important status reporting tool in DB2
pureScale.
The output splits status information into two sections, instance status and host status.
The instance status section tells you the state of each member and CF as well as the current
host that the member or CF resides on.
The host status section tells you the state of the hosts that the members and CFs are running
on.
Alert conditions (which impact the availability of a host or functionality of a member or CF)
are reported through this status interface.
54
#IDUG
Managing and Monitoring pureScale Using Optim Tooling
Optim Configuration
Manager
• Full support for tracking and reporting of configuration
changes across clients on servers
Configuration
Tracking and
Client
Management
Data Studio
• Full support for developing Java, C, and .NET applications
against a DB2 pureScale environment
Application
Development
DB2 High
Performance Unload
• Support for high speed unload utility
Optim Query
Workload Tuner
• Full support for query, statistics, and tuning advice for
applications on pureScale systems
Query Tuning
Data Studio Web
Console
Optim Performance
Manager
• Integrated alerting and notification
• Seamless view of status and statistics across all members
and CFs
System
Monitoring
Data Studio
• Ability to perform common administration tasks across
members and CF
• Integrated navigation through shared data instances
Database
Administration
Product
pureScale Support
Task
54
This slide shows the type of system management and monitoring that you can do against a
pureScale system using the IBM Optim tools suite and related products.
55
#IDUG
Monitoring the CF Environment
• ENV_CF_SYS_RESOURCES admin view
• Shows system resources used by the CFs on the system
• Includes physical memory, virtual memory, CPU load
• MON_GET_CF table function
• Shows CF status information, including current, configured, and target sizes for CF
memory and structures
• DB2_CF admin view
• Returns information about host name, state, and whether alerts exist for the CFs
• Similar to what you would see in the output of db2instance -list
• db2pd options available as well
(e.g. –cfinfo, –cfpool)
SELECT ID, NAME, VALUE, UNIT FROM SYSIBMADM.ENV_CF_SYS_RESOURCES ORDER BY ID
SELECT ID, CURRENT_HOST, STATE, ALERT FROM SYSIBMADM.DB2_CF ORDER BY ID
SELECT ID, HOST_NAME, DB_NAME, CURRENT_CF_MEM_SIZE, CURRENT_CG_GBP_SIZE,
CURRENT_CF_LOCK_SIZE, CURRENT_CF_SCA_SIZE
FROM TABLE(MON_GET_CF(CAST(NULL AS INTEGER))) AS CFINFO ORDER BY ID
States include: STOPPED,
RESTARTING, BECOMING_PRIMARY,
PRIMARY, CATCHUP, PEER, ERROR
Here are some of the routines that can be used to monitor the CFs in the cluster, including some example queries (not all
available columns are shown in these queries).
The ENV_CF_SYS_RESOURCES admin view returns a list of system resources used by the CFs on the system. For
example, physical memory, virtual memory, and CPU load.
The MON_GET_CF table function shows information such as the current, configured, and target sizes for the CF and its
structures (e.g. GBP, GLM, SCA). The current memory size represents the amount of a type of memory (e.g. GBP)
currently in use by the system. The configured memory size represents the total amount of that type of memory that is
currently configured by the database as the maximum. The value for current memory can never exceed that of configured
memory. The target memory size represents a new configured maximum value for that type of memory. Usually, the target
size is the same as the configured size. However, if the target and configured sizes differ, that means that that particular
type of memory is undergoing an online change in its configured size. The process of allocating memory takes place over
time. At any point during this resizing process, the configured memory represents the maximum amount of that type of
memory that can be used at that specific point in time. Eventually, the configured memory becomes the same as target
memory.
The DB2_CF admin view returns information about the host name, the state of the CF (e.g. STOPPED, RESTARTING,
BECOMING_PRIMARY, PRIMARY, CATCHUP, PEER, ERROR) and whether there are any alerts for the CF. This is
similar to the output that you would see in the output of the "db2instance –list" command. The
DB2_GET_INSTANCE_INFO table function (not shown here) returns the same information as the DB2_MEMBER and
DB2_CF administrative views, but enables you to filter the information returned by passing input parameters, such as the
current host.
The db2pd –cfinfo command displays information about the CF that can be useful when diagnosing performance problems
or just when generally looking at the state of the CF.
56
#IDUG
Monitoring the Member Environment
• ENV_GET_SYSTEM_RESOURCES table function
• Returns OS, CPU, memory, and other information related to members on the system
• Only returned from members that are active
• ENV_GET_DB2_SYSTEM_RESOURCES table function
• Returns CPU usage (user and system) for DB2 processes running under the
specified members
• DB2_MEMBER admin view
• Returns information about host name, state, and whether alerts exist for the members
• Similar to what you would see in the output of db2instance -list
SELECT MEMBER, HOST_NAME, OS_NAME, OS_VERSION, OS_RELEASE
FROM TABLE(SYSPROC.ENV_GET_SYSTEM_RESOURCES(-2)) ORDER BY MEMBER
SELECT MEMBER, DB2_PROCESS_NAME, DB2_PROCESS_ID, CPU_USER, CPU_SYSTEM
FROM TABLE(SYSPROC.ENV_GET_DB2_SYSTEM_RESOURCES(-2)) ORDER BY MEMBER
SELECT ID, HOME_HOST, CURRENT_HOST, STATE, ALERT FROM SYSIBMADM.DB2_CF
ORDER BY ID
States include: STARTED, STOPPED,
RESTARTING, WAITING_FOR_FAILBACK, ERROR
The ENV_GET_SYSTEM_RESOURCES table function returns operating system, CPU,
memory, and other information that is related to members on the system. Data is returned
only from members where the database that issued the command is active.
The ENV_GET_DB2_SYSTEM_RESOURCES table function returns CPU usage and DB2
process information for specified members in the current instance. The main process of
interest will be the DB2 system controller process (db2sysc).
The DB2_CF member view returns information about the host name (the "home host" on
which the member will reside when everything is healthy, and the "current host" that the
member is actually running on (which will be different if things aren't healthy with this
member)), the state of the member (e.g. STARTED, STOPPED, RESTARTING,
WAITING_FOR_FAILBACK, and ERROR) and whether there are any alerts for the
member. This is similar to the output that you would see in the output of the "db2instance –
list" command. The DB2_GET_INSTANCE_INFO table function (not shown here) returns
the same information as the DB2_MEMBER and DB2_CF administrative views, but enables
you to filter the information returned by passing input parameters, such as the current host.
57
#IDUG
Viewing Alerts
• Existence of alerts shown in output of db2instance –list, DB2_MEMBER,
and DB2_CF
• Very detailed alert messages available, including the impact and action to take
• Two options to list the alerts:
• Example output:
SELECT MESSAGE, ACTION, IMPACT FROM SYSIBMADM.DB2_INSTANCE_ALERTS
db2cluster –cm –list -alert
MESSAGE
----------------------------------------------------------------------------------------------------
Could not restart light DB2 member '0' on hosts 'hostA'. Check the db2diag.log for messages
concerning a restart light or database crash recovery failure on the indicated hosts for DB2 member
'0'.
ALERT_ACTION
----------------------------------------------------------------------------------------------------
This alert must be cleared manually with the command: 'db2cluster –clear -alert -member 0'
IMPACT
----------------------------------------------------------------------------------------------------
DB2 member '0' will not be able to restart light on host 'hostC' until this alert has been cleared.
There is an ALERT column in the output of the db2instance –list command in the
DB2_MEMBER and DB2_CF admin views. If a member, CF, or host has a value of YES
for an alert then you can query more information about it. This can be done using the
db2cluster –cm –list –alert command, or you can query from the
DB2_INSTANCE_ALERTS admin view. For each alert there is a message, an alert action,
and an impact. Some alerts will be cleared automatically whenever the root cause of the
problem has been resolved and others may require manual intervention. The text will state
what the manual action is, if any.
58
#IDUG
Monitoring Logging
• Each member has its own log stream and therefore monitoring is typically
viewed per-member and not in aggregate
• The various admin views, table functions, and procedures that return
logging information allow you to query for all members or return
information for all members
• SYSIBMADM.LOG_UTILIZATION
• SYSIBMADM.SNAPDB
• SYSIBMADM.SNAPDETAILLOG
• MON_GET_UNIT_OF_WORK
• MON_GET_WORKLOAD
58
In a pureScale database, there is a single shared copy of the database but each member has
its own log stream. Monitoring is typically viewed per-member and not in aggregate (similar
to DPF).
There are various administrative views, table functions, and stored procedures that are
related to monitoring. All of the ones listed here report some level of logging activity. Some
of these things accept a MEMBER or DBPARTITIONNUM parameter that allows you to
get the information for a specific member, the currently connected to member, or all
members. Others return a row per member or database partition number as part of the result
set.
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf
NA14G05 - A DB2 DBAs Guide to pureScale.pdf

Contenu connexe

Tendances

MySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best PracticesMySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best PracticesFrederic Descamps
 
Extreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateExtreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateBobby Curtis
 
Release and patching strategy
Release and patching strategyRelease and patching strategy
Release and patching strategyJitendra Singh
 
POUG 2019 - Oracle Partitioning for DBAs and Devs
POUG 2019 - Oracle Partitioning for DBAs and DevsPOUG 2019 - Oracle Partitioning for DBAs and Devs
POUG 2019 - Oracle Partitioning for DBAs and DevsFranky Weber Faust
 
Parallel Sysplex Implement2
Parallel Sysplex Implement2Parallel Sysplex Implement2
Parallel Sysplex Implement2ggddggddggdd
 
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorialMySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorialFrederic Descamps
 
DataOpsbarcelona 2019: Deep dive into MySQL Group Replication... the magic e...
DataOpsbarcelona 2019:  Deep dive into MySQL Group Replication... the magic e...DataOpsbarcelona 2019:  Deep dive into MySQL Group Replication... the magic e...
DataOpsbarcelona 2019: Deep dive into MySQL Group Replication... the magic e...Frederic Descamps
 
MySQL Day Roma - MySQL Shell and Visual Studio Code Extension
MySQL Day Roma - MySQL Shell and Visual Studio Code ExtensionMySQL Day Roma - MySQL Shell and Visual Studio Code Extension
MySQL Day Roma - MySQL Shell and Visual Studio Code ExtensionFrederic Descamps
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMakerKris Buytaert
 
MySQL Database Architectures - High Availability and Disaster Recovery Solution
MySQL Database Architectures - High Availability and Disaster Recovery SolutionMySQL Database Architectures - High Availability and Disaster Recovery Solution
MySQL Database Architectures - High Availability and Disaster Recovery SolutionMiguel Araújo
 
JCL MAINFRAMES
JCL MAINFRAMESJCL MAINFRAMES
JCL MAINFRAMESkamaljune
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10Kenny Gryp
 
New Features for Multitenant in Oracle Database 21c
New Features for Multitenant in Oracle Database 21cNew Features for Multitenant in Oracle Database 21c
New Features for Multitenant in Oracle Database 21cMarkus Flechtner
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting Mydbops
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle MultitenantJitendra Singh
 
MySQL Shell - The Best MySQL DBA Tool
MySQL Shell - The Best MySQL DBA ToolMySQL Shell - The Best MySQL DBA Tool
MySQL Shell - The Best MySQL DBA ToolMiguel Araújo
 
MySql Triggers Tutorial - The Webs Academy
MySql Triggers Tutorial - The Webs AcademyMySql Triggers Tutorial - The Webs Academy
MySql Triggers Tutorial - The Webs Academythewebsacademy
 

Tendances (20)

InnoDb Vs NDB Cluster
InnoDb Vs NDB ClusterInnoDb Vs NDB Cluster
InnoDb Vs NDB Cluster
 
MySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best PracticesMySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best Practices
 
Extreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateExtreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGate
 
Release and patching strategy
Release and patching strategyRelease and patching strategy
Release and patching strategy
 
POUG 2019 - Oracle Partitioning for DBAs and Devs
POUG 2019 - Oracle Partitioning for DBAs and DevsPOUG 2019 - Oracle Partitioning for DBAs and Devs
POUG 2019 - Oracle Partitioning for DBAs and Devs
 
Parallel Sysplex Implement2
Parallel Sysplex Implement2Parallel Sysplex Implement2
Parallel Sysplex Implement2
 
Mainframe interview
Mainframe interviewMainframe interview
Mainframe interview
 
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorialMySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
 
DataOpsbarcelona 2019: Deep dive into MySQL Group Replication... the magic e...
DataOpsbarcelona 2019:  Deep dive into MySQL Group Replication... the magic e...DataOpsbarcelona 2019:  Deep dive into MySQL Group Replication... the magic e...
DataOpsbarcelona 2019: Deep dive into MySQL Group Replication... the magic e...
 
MySQL Day Roma - MySQL Shell and Visual Studio Code Extension
MySQL Day Roma - MySQL Shell and Visual Studio Code ExtensionMySQL Day Roma - MySQL Shell and Visual Studio Code Extension
MySQL Day Roma - MySQL Shell and Visual Studio Code Extension
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMaker
 
MySQL Database Architectures - High Availability and Disaster Recovery Solution
MySQL Database Architectures - High Availability and Disaster Recovery SolutionMySQL Database Architectures - High Availability and Disaster Recovery Solution
MySQL Database Architectures - High Availability and Disaster Recovery Solution
 
JCL MAINFRAMES
JCL MAINFRAMESJCL MAINFRAMES
JCL MAINFRAMES
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10
 
New Features for Multitenant in Oracle Database 21c
New Features for Multitenant in Oracle Database 21cNew Features for Multitenant in Oracle Database 21c
New Features for Multitenant in Oracle Database 21c
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle Multitenant
 
Data Guard Architecture & Setup
Data Guard Architecture & SetupData Guard Architecture & Setup
Data Guard Architecture & Setup
 
MySQL Shell - The Best MySQL DBA Tool
MySQL Shell - The Best MySQL DBA ToolMySQL Shell - The Best MySQL DBA Tool
MySQL Shell - The Best MySQL DBA Tool
 
MySql Triggers Tutorial - The Webs Academy
MySql Triggers Tutorial - The Webs AcademyMySql Triggers Tutorial - The Webs Academy
MySql Triggers Tutorial - The Webs Academy
 

Similaire à NA14G05 - A DB2 DBAs Guide to pureScale.pdf

Dynamics of Leading Legacy Databases
Dynamics of Leading Legacy DatabasesDynamics of Leading Legacy Databases
Dynamics of Leading Legacy DatabasesCognizant
 
DB2 Performance Tuning Z/OS - email me please for more details
DB2 Performance Tuning Z/OS - email me please for more detailsDB2 Performance Tuning Z/OS - email me please for more details
DB2 Performance Tuning Z/OS - email me please for more detailsManikandan Suresh
 
Oracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra PasalapudiOracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra Pasalapudipasalapudi123
 
Database Performance Management in Cloud
Database Performance Management in CloudDatabase Performance Management in Cloud
Database Performance Management in CloudDr. Amarjeet Singh
 
Oracle database 12c intro
Oracle database 12c introOracle database 12c intro
Oracle database 12c intropasalapudi
 
The oracle database architecture
The oracle database architectureThe oracle database architecture
The oracle database architectureAkash Pramanik
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Charley Hanania
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyDonna Guazzaloca-Zehl
 
oracle_workprofile.pptx
oracle_workprofile.pptxoracle_workprofile.pptx
oracle_workprofile.pptxssuser20fcbe
 
High availability solutions bakostech
High availability solutions bakostechHigh availability solutions bakostech
High availability solutions bakostechViktoria Bakos
 
Database Configuration for Maximum SharePoint 2010 Performance
Database Configuration for Maximum SharePoint 2010 PerformanceDatabase Configuration for Maximum SharePoint 2010 Performance
Database Configuration for Maximum SharePoint 2010 PerformanceEdwin M Sarmiento
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 
Collaborate 2012 - Administering MySQL for Oracle DBAs
Collaborate 2012 - Administering MySQL for Oracle DBAsCollaborate 2012 - Administering MySQL for Oracle DBAs
Collaborate 2012 - Administering MySQL for Oracle DBAsNelson Calero
 

Similaire à NA14G05 - A DB2 DBAs Guide to pureScale.pdf (20)

Dynamics of Leading Legacy Databases
Dynamics of Leading Legacy DatabasesDynamics of Leading Legacy Databases
Dynamics of Leading Legacy Databases
 
DB2 Performance Tuning Z/OS - email me please for more details
DB2 Performance Tuning Z/OS - email me please for more detailsDB2 Performance Tuning Z/OS - email me please for more details
DB2 Performance Tuning Z/OS - email me please for more details
 
Oracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra PasalapudiOracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra Pasalapudi
 
Database Performance Management in Cloud
Database Performance Management in CloudDatabase Performance Management in Cloud
Database Performance Management in Cloud
 
Presentation day1oracle 12c
Presentation day1oracle 12cPresentation day1oracle 12c
Presentation day1oracle 12c
 
Oracle database 12c intro
Oracle database 12c introOracle database 12c intro
Oracle database 12c intro
 
The oracle database architecture
The oracle database architectureThe oracle database architecture
The oracle database architecture
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
 
Db2 v10.5 An Overview
Db2 v10.5 An OverviewDb2 v10.5 An Overview
Db2 v10.5 An Overview
 
Database CI/CD Pipeline
Database CI/CD PipelineDatabase CI/CD Pipeline
Database CI/CD Pipeline
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication Technology
 
oracle_workprofile.pptx
oracle_workprofile.pptxoracle_workprofile.pptx
oracle_workprofile.pptx
 
Rohit_Panot
Rohit_PanotRohit_Panot
Rohit_Panot
 
High availability solutions bakostech
High availability solutions bakostechHigh availability solutions bakostech
High availability solutions bakostech
 
Database Configuration for Maximum SharePoint 2010 Performance
Database Configuration for Maximum SharePoint 2010 PerformanceDatabase Configuration for Maximum SharePoint 2010 Performance
Database Configuration for Maximum SharePoint 2010 Performance
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
Collaborate 2012 - Administering MySQL for Oracle DBAs
Collaborate 2012 - Administering MySQL for Oracle DBAsCollaborate 2012 - Administering MySQL for Oracle DBAs
Collaborate 2012 - Administering MySQL for Oracle DBAs
 

Dernier

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 

Dernier (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

NA14G05 - A DB2 DBAs Guide to pureScale.pdf

  • 1. 1 #IDUG A DB2 DBA's Guide to pureScale Kelly Schlamb Executive IT Specialist, IBM Canada Ltd. Session Code: G05 Wednesday, May 14, 2014, 8:00am - 9:00am | Platform: DB2 for LUW If you’ve been considering moving forward with the pureScale feature then I’m sure you’re asking yourself some questions about what’s involved. Sure, DB2 is DB2, but how similar is a pureScale environment from a non-pureScale one? Do your current operational practices still apply? Is there anything new that you have to know or do to manage the system? In this presentation you’ll be assured by the many similarities, but you will also learn what’s new or slightly different in the pureScale world. Biography: Kelly Schlamb has worked with DB2 for LUW for over 19 years in various roles at IBM. He is currently an Executive IT Specialist within the Worldwide Information Management Technical Sales organization focusing on DB2 High Availability, DB2 pureScale, and IBM PureData System for Transactions. Prior to this, Kelly was a long time member of the DB2 Kernel development group in the IBM Toronto Lab. He started on the Buffer Pool Services & Storage Management development team, working on various new features and capabilities in this area. This included leading the group that introduced Automatic Storage and adding various enhancements to it in the DB2 releases that followed. Subsequently, Kelly spent over five years as the technical lead and manager of the DB2 pureScale Recovery development team. This team was responsible for designing and implementing logging, backup/restore, crash recovery, rollforward, and transaction management in the DB2 pureScale environment.
  • 2. 2 #IDUG Topics Covered Today • pureScale Overview • Prerequisites and Preparing for pureScale • Configuration • Backup, Logging & Recovery • Storage Management • Monitoring 2 These are the topics that we'll be covering in today's presentation.
  • 4. 4 #IDUG DB2 pureScale Scalability, Performance, and Always Available Transactions • DB2 pureScale • Robust infrastructure for OLTP workloads • Provides improved availability, performance and scalability • Application transparency • Scales to >100 members • Leverages z/OS cluster technology • Highlights of pureScale enhancements in DB2 10.5 • Rich disaster recovery options, now including integrated HADR support • Backup and restore between pureScale and non-pureScale environments • Online database fix pack updates (in addition to system & OS updates) • Add members online for additional capacity • Included in Advanced Workgroup and Advanced Enterprise editions 4 pureScale is a DB2 feature that reduces the risk and cost of business growth by providing extreme capacity, continuous availability, and application transparency. DB2 pureScale uses the same architecture as the undisputed gold standard of reliability – System Z. This is a shared data, active/active architecture and businesses trust this architecture to run their most critical systems.
  • 5. 5 #IDUG Main thing to remember… pureScale is DB2 • pureScale looks and feels very much like "regular DB2" • Same code base shared by DB2, DPF, and pureScale • In DB2 10.1 and 10.5, pureScale is just an installable feature of DB2 • Immediate productivity from DBAs and application developers • Single system view for utilities • Act and behave exactly like they do in non-pureScale • Backup, restore, rollforward, reorg, load, … • Applications don’t need to know about or care about the fact that are multiple members • In general, can run SQL statements or command on any member • SQL, data access methods, and isolation levels are the same • Backup/recovery processes are the same • Database security is managed in the same way • Environment (even the CFs) still managed by database manager and database configuration parameters 5 I've sometimes had people comment to me that by introducing pureScale into their environment, their DB2 DBAs have to learn a whole new database platform. That couldn't be further from the truth. As the slide title says, pureScale is DB2. pureScale is just a deployment option of DB2, like the Database Partitioning Feature (DPF) is. It's all one code base and as part of the installation of DB2 you choose whether it's going to be a pureScale environment or not. All of the skills that a DBA has with DB2 is immediately transferable to pureScale. It has the same look and feel, the utilities work in the same way, the same SQL and commands exist, security is the same, etc, etc. Even though there are new concepts in pureScale (which will be discussed over the next few slides) things have been made to work as seamlessly as possible within the DB2 framework. For instance, you have the concept of the CF (Cluster Caching Facility) in pureScale and this new structure can be configured to suit your needs. However, rather than introducing a new interface for configuring the CF, the existing database manager configuration and database configuration methods are used. So, if you know how to view and change these configuration parameters, you'll be easily able to do the same for the new CF-related ones.
  • 6. 6 #IDUG DB2 pureScale Architecture • Multiple DB2 members for scalable and available database environment • Client application connects into any DB2 member to execute transactions • Automatic workload balancing • Shared storage for database data and transaction logs • Cluster caching facilities (CF) provide centralized global locking and page cache management for highest levels of availability and scalability • Duplexed, for no single point of failure • High speed, low latency interconnect for efficient and scalable communication between members and CFs • DB2 Cluster Services provides integrated failure detection, recovery automation and the clustered file system Shared Storage Database Logs Logs Logs Logs Cluster Interconnect Member CS Member CS Member CS Member CS Primary CF CFCS Secondary CF CFCS Clients DB2 pureScale Cluster (Instance) 6 DB2 pureScale keeps your critical systems available all the time, giving you uninterrupted access to your data, making sure that your business is up all the time, and your line of business gets the high level of availability they need, through planned and unplanned outages. DB2 pureScale provides continuous availability of data through the use of the highly reliable cluster caching facility (for redundancy purposes and no single point of failure, there are two of these "CFs"). The CF provides centralized locking and cache management and is very different from architectures in competitive products that use a distributed model. With the distributed locking architecture, lock ownership is distributed across multiple nodes and recovery from a node failure requires lock redistribution across the surviving nodes. Also during node failure and recovery, I/O is frozen until the database system is able to determine what data pages need recovery. This can be a relatively lengthy process, which has a significant impact on data availability. None of this is required with DB2 pureScale because that information is centrally managed and is not impacted by DB2 member failures (and there are two CFs, in case a node hosting a CF fails or needs to be brought down for maintenance). This slide describes the architecture of a DB2 pureScale cluster (also referred to as a pureScale instance). It is an active/active data sharing environment in which multiple nodes – called DB2 members – handle the transactional workload of the system and they also have equal and shared access to a single copy of the database on disk. Clients are able to connect into any of the members and there are default workload balancing capabilities that will distribute the workload across all of the members. If a member fails, client connections are automatically rerouted to healthy members. The members communicate with the aforementioned CFs for the purpose of global locking and data page caching. This communication takes place over a high speed, low latency interconnect. Specifically, it is done via the RDMA (Remote Direct Memory Access) protocol over either InfiniBand or RDMA-capable 10 Gigabit Ethernet. Integrated into DB2 pureScale are various other IBM products, including the General Parallel File System (GPFS), Reliable Scalable Cluster Technology (RSCT), and Tivoli Systems Automation for Multiplatform (TSA MP). Collectively, they are known within pureScale as DB2 Cluster Services. These products provide the clustered file system on which the database data and transaction logs reside, as well as the cluster domain management, heart-beating, and recovery automation. Together, all of these components allow for very fast recovery when node failures occur. These products are fully integrated into DB2 pureScale, in that they are all installed together as part of a single installation process, they are configured as part of installation, and they are managed and maintained together all within pureScale.
  • 7. 7 #IDUG What in pureScale is new to a DB2 DBA? • CFs provide centralized locking and cache management • Global Lock Manager and Group Buffer Pool • Other components automatically installed and configured as part of pureScale install • RSCT provides heart beating, domain management • TSA defines resources and dependencies, drives recovery • GPFS provides a clustered file system on which shared data resides • Online, automated member/CF recovery • Automatic workload balancing across all members • Multiple log streams are maintained, one per member • Automatically merged when necessary (e.g. rollforward, group crash recovery) • Monitoring includes members and CFs • Storage management primarily done through DB2 interfaces to GPFS • Cluster management done through DB2 interfaces to RSCT/TSA 7 This slide talks to some of the new components and concepts, as well as some slight differences between pureScale and traditional DB2 environments. As you can see here – and will learn more about later – they're really not that big a deal from a skills perspective. pureScale is a multiple-server environment made up of members and CFs. Within each CF are components such as the group buffer pool and the global lock manager. DB2 keeps track of changes to data in the database through its transaction logs and in a pureScale environment each member has its own set of logs (referred to as a log stream). And if DB2 needs to perform some sort of operation that requires multiple log streams, it will automatically merge them as necessary, invisible to the user. In addition to the database manager itself, a pureScale environment also includes other IBM technologies such as RSCT (heart beating, domain management), TSA (recovery automation), and GPFS (the clustered file system). These components are all a part of a single integrated install (as is the case with fix packs as well). And rather than having to learn the commands and operations for these different products, we've abstracted out many of the common operations into a DB2 command (db2cluster). Through this command you can configure/manage/view the storage and cluster manager.
  • 8. 8 #IDUG Centralized Lock Management • Local lock manager (LLM) exists on each member • Responsible for granting locks locally to individual application transactions • Global lock manager (GLM) exists on the CF • Grants locks to members upon request (if not already held by another member, or currently held by the member in a compatible mode) • Maintains global lock state (what member has what lock, in what mode, who's waiting) • When member needs a lock it doesn't already hold, the LLM coordinates with the GLM to get it • Via fast RDMA requests and lock negotiation • Locks can be: • Logical/transactional (e.g. row locks and table locks) • Physical (e.g. page locks) 8 In a pureScale environment there are two locking components that work together to manage data access across the cluster. The first is the Local Lock Manager (LLM) that resides on each member and the second is the Global Lock Manager (GLM) that resides on the CF. The LLM handles things at a local level within a member and understands what locks are being held and requested by applications/transactions within a member. The GLM has a cluster-wide understanding of the locks being held and requested within the cluster. It doesn’t know specifics about applications or transactions, it only knows members are involved. If a transaction running on a member requires a lock (e.g. a row lock) then the LLM will first see if the member is already holding a lock that can satisfy the request. If it doesn’t then it must request it from the GLM. Communication between the LLM and GLM involve “set lock state” requests and responses, as well as notifications and negotiations if the GLM can’t immediately satisfy a request. The messages and responses take place using RDMA. RDMA (Remote Direct Memory Access) allows one server to reach into the memory of another server and read or write bytes of data. No interrupt processing takes place, there is no context switching, and the target server spends no CPU cycles on the operation. This is increasingly important as the size of your cluster grows. In pureScale, locks are categorized as either logical or physical. Logical locks are simply transactional locks, such as the row or table locks you might get while executing a transaction. These are the types of locks you have in non-pureScale. Physical locks are unique to pureScale and are used for concurrency control on "physical" objects, most notably pages (sometimes you'll hear them referred to as "P-Locks"). These physical locks are not transactional in nature. That means that one of these locks can be given up prior to the end of a transaction (i.e. you don't have to commit the transaction to give up the lock). You'll see more about page negotiation (a.k.a. page reclaim) via the page locks in the next few slides.
  • 9. 9 #IDUG Physical Locks vs. Logical Locks • Locks that are held on rows, tables, etc. are referred to as logical locks • Associated with applications and transactions • Physical locks (P-Locks) are used for serialization of a physical resource, like a page • For example, a member will hold a P-Lock in X mode when it is in the middle of updating a page and can release the lock when it is done • P-Locks are not transactional in nature • Associated with a member • Requested as needed, freed when no longer needed • Typically held until end of transaction, but protocol doesn’t strictly require them being held until commit or rollback time • Can be negotiated away if another member wants them • Use of P-Locks is mostly invisible within the database • Not included in monitor metrics • Cannot result in deadlocks • Simply an internal database primitive that is built on the locking infrastructure 4 Give me your lunch money! 9 Physical Locks (P-Locks) are not something that you typically hear much about, as their use by DB2 is mostly invisible to users. However, they are commonly talked about when describing pureScale internals (such as page reclaim/negotiation) and so they are explained here. A physical lock is used for concurrency/serialization of physical resources, such as a page. They are not transactional in nature (unlike logical locks, which are locks that are associated with rows, tables, etc.). They are requested and freed as necessary. Although internal, the “physical lock” concept and term is commonly used when describing page access within a pureScale database and therefore it is described here.
  • 10. 10 #IDUG Two Levels of Data Page Caching • Local buffer pools (LBP) exist on each member • LBPs cache modified pages and clean pages • A single group buffer pool (GBP) exists in the CF per database • Global cache, containing modified pages written from member LBPs • Page registry tracks which pages exist in the member LBPs • Allows for invalidation of old versions of page on other members when a page is updated and those changes have been committed • Pages are written from the LBP to the GBP at: • Transaction commit time (referred to as "force at commit") • During page reclaim (a.k.a. page negotiation) • Member has modified a page, another member wants to use that page • Pages can be read by members very quickly from the GBP • Avoids costly reads from disk • Speeds up member recovery in case of a failure 10 There are two levels of data page caching that occur within a DB2 pureScale cluster. This first is the caching of data in the buffer pools found on the members. These are referred to as local buffer pools (LBP) in a pureScale environment. These are the buffer pools you're familiar with in non-pureScale DB2. With pureScale, there is also a global buffer pool (GBP), which resides within the CF (and is duplexed across both the primary and secondary CF). When transactions are executing on a member, pages are being read into the local buffer pool and modifications are made locally. When a transaction commits, all of the pages that were modified by the transaction get sent to the GBP (on both the primary and secondary CF). This is pureScale's "force at commit" protocol. When this happens, older copies of the page that might exist in local buffer pools on other members are invalidated. This is accomplished via the page registry. This registry resides with the GBP and keeps track of all of the pages that exist in the local buffer pools and the GBP. RDMA is used to do this invalidation in a very efficient and scalable manner. Pages may also be written to the GBP under other circumstances. For instance, during page negotiation. This is when a page has been modified on one member (as part of an uncommitted transaction that updates a row, for example) and another member wants to update the same page (perhaps to update a different row on that page). The page is locked exclusively (X) by the first member and when the request comes in from the second member, the lock is used to negotiate the page away from the first member. As previously mentioned, page locks are not transactional in nature and so they don't need to be held until commit time. Therefore, concurrency is not an issue. For those pages that are in the GBP, if a member wants to read or update them, they can be quickly read from the GBP into an LBP using RDMA. Reading it from the GBP is orders of magnitude faster than reading a page from disk (tens of microseconds vs. single digit milliseconds). Various monitor metrics can be used to monitor the caching activity that is occurring at both levels in the LBPs and GBP.
  • 11. 11 #IDUG Page Reclaim Member 0 CF P LBP Member 1 LBP Member 2 LBP Member 3 LBP P P P 1. Uncommitted update modifies page P on M0 (page locked X) 2. Transaction on M3 wants to modify page P; requests page lock in X GBP Registry GLM P: M1-X P: M0, M1, M3 3. Page P negotiated away (via page lock); page P written to GBP; lock given up P P 4. Old copies of page P are invalidated 5. X lock on page P granted to M3 P: M3-X 6. With lock acquired, page P found to be invalid, so latest copy is requested from GBP P P All lock requests, page transfers, and invalidation is done using Remote Direct Memory Access (RDMA). This allows for high performance and high scalability, even as the cluster size grows 11 Although difficult to fully describe through graphics on a single slide, the intention here is to give you an idea of how page reclaim (a.k.a. page negotiation) works. The different steps are described with the text and the graphics on this slide. As mentioned at the bottom of the slide, all of the communication between the members and CFs – including lock requests, writing pages to the GBP, reading pages from the GBP, and page invalidation (sometimes referred to as silent invalidation or cross invalidation) – is done via Remote Direct Memory Access (RDMA). RDMA allows one server to reach into the memory of another server and read or write bytes of data (it's actually accomplished using the network adapter cards). With RDMA, no interrupt processing takes place, there is no context switching, and the target server spends no CPU cycles on the operation. This is increasingly important as the size of your cluster grows. pureScale's exploitation of RDMA is unique and is one of the reasons that pureScale performs so well and can scale up without needing to make applications cluster aware.
  • 12. 12 #IDUG pureScale Client Configuration • Workload Balancing (WLB) • Application requests balanced across all members or subsets of members • Takes server load of members into consideration • Connection-level or transaction-level balancing • Client Affinity • Direct different groups of clients or workloads to specific members in the cluster • Consolidate separate workloads/applications on same database infrastructure • Define list of members for failover purposes • Automatic Client Reroute (ACR) • Client automatically connected to healthy member in case of member failure • May be seamless in that no error messages returned to client • Application may have to re-execute the transaction X X 12 Database applications running in a DB2 pureScale environment can use the DB2 transaction-level or connection-level workload balancing (WLB) functionality. WLB balances application requests among all members of the DB2 pureScale cluster. When WLB is enabled the DB2 clients distribute workload or application request based on the capacity (that is, the priority or weight) values in a server list that the DB2 pureScale server returns. These capacity values indicate the current load on a DB2 pureScale member. A member with a capacity value below that of the other members in the server list is considered busier than other members. The “db2pd –serverlist” command can be used to see the relative load (priority or weight) of a member. A member (A) having a higher value compared with another member (B) indicates to the client that more work should be directed at member A. CPU load average (over recent time) and memory utilization (based on looking at swap space and paging) is used to determine the relative load of a member. The client affinities feature allows you to define an ordered list of DB2 pureScale members to which a DB2 client can connect; different clients can implement a different ordered list. In certain situations, you might want to direct application requests from a DB2 client to a particular DB2 pureScale member on the list. If that DB2 pureScale member goes down because of a planned or unplanned outage, the DB2 client can direct the client application requests to another DB2 pureScale member on the list. If that member is unavailable, the DB2 client can work through the list of all DB2 pureScale members to find an available member. This feature is typically used in an environment where the applications and data are inherently segregated and particular servers are targeted to service requests of particular applications. With client affinities, you can also control whether application requests fail back to the failure primary server after it comes back online. The primary server is the DB2 pureScale member that the application originally connected to. If you set up the client in this manner, you can choose how often the DB2 client should check whether the primary server is back online. Automatic client reroute (ACR) is a feature in DB2 clients that takes application requests that are directed toward an offline DB2 pureScale member and reroutes them to active DB2 pureScale members. ACR is automatically enabled with WLB or client affinities so no additional steps are required to specify which member the application should connect to upon encountering an outage. In some cases, after an outage, clients are seamlessly routed to another DB2 pureScale member, and no error messages are returned to the application because the failure is transparent to the application. For more details on when failures are seamless, see the DB2 Information Center information about seamless client reroute. However, in some situations, the DB2 client cannot replay the statement execution environment on the new connection after automatic client reroute occurs. In such a situation, the transaction is rolled back, and SQLCODE -30108 (or SQLCODE -4498 for Java applications) is returned to the application after the connection is rerouted from the failing member to a surviving member. If this occurs, applications must replay the statement execution environment and redo the statements, but applications do not have to explicitly reconnect to the database because ACR automatically reconnects the applications.
  • 13. 13 #IDUG Online System and Database Maintenance • Transparently perform maintenance to the cluster in an online rolling fashion • DB2 pureScale fix packs (DB2 10.5) • System updates such as operating system fixes, firmware updates, etc. • No outage experienced by applications • DB2 fix pack install involves a single installFixPack command to be run on each member/CF • Quiesces member • Existing transactions allowed to finish • New transactions sent to other members • Installs binaries • Updates instance • Member still behaves as if running on previous fix pack level • Unquiesces member • Final installFixPack command to complete and commit updates • Instance now running at new fix pack level CF CF 13 System maintenance, such as hardware updates, operating system fixes, firmware updates, etc. can be performed in an online rolling fashion. Previously, when applying DB2 pureScale fix packs it was necessary to completely stop the entire cluster so that the fix pack could be applied to all of the members and CFs. This is no longer the case in DB2 10.5 and you can perform this kind of maintenance without bringing the cluster down. It is termed a "rolling update" because you can perform the maintenance on one host at a time, rolling through each of the members and CFs. During the application of the fix pack to a particular host, only one member or CF is offline but at least one other member and CF should still be online and so the cluster itself is still online and able to perform work on behalf of applications. The newly enhanced installFixPack command is used to do the update of the binaries on an individual host by host basis. The installFixPack command also has new options for committing the changes and for doing a pre-commit check first.
  • 14. 14 #IDUG Disaster Recovery Options for pureScale • HADR • Replication between a database in a primary pureScale cluster and a standby pureScale cluster with a matching member topology • Active/passive DR • Storage Replication • Synchronous or asynchronous disk-based replication between a database in a primary pureScale cluster and a standby pureScale cluster with a matching member topology • Active/passive DR • Q Replication / InfoSphere Change Data Capture (CDC) • Logical replication between a pureScale database and a pureScale or non-pureScale standby (bidirectional supported) • Can be active/active DR • Geographically Dispersed pureScale Cluster (GDPC) • Single pureScale cluster "stretched" over two sites with half of members/CFs at each site • Active/active DR 14 These are various different disaster recovery options that are currently supported in pureScale. Given the variety of the options, there is a solution to meet the needs of all pureScale users.
  • 15. 15 #IDUG Prerequisites and Preparing for pureScale 15 <no speaker notes>
  • 16. 16 #IDUG Setting Up Your DB2 pureScale Environment Like the Boy Scouts always say… "Be Prepared" • The DB2 Information Center provides plenty of information including • Pre-requisites for hardware and software • Preparation and planning steps • Pre-installation "cheat sheets" • Installation instructions 16 In this next section we'll cover some topics around installation and instance/database movement to pureScale. When preparing for and then subsequently performing these tasks, it is highly recommended that you read through the relevant sections of the Information Center. As shown in the screen capture on the slide, there is a great deal of information on these topics in there.
  • 17. 17 #IDUG DB2 pureScale Supported Hardware and OS GPFS compatible storage (ideally storage that supports SCSI-3 PR fast I/O fencing) IBM and Non-IBM Rack Mounted Servers OR RHEL 5.9 RHEL 6.1 SLES 10 SP4 SLES 11 SP2 BladeCenter H22/HS23 High speed, low latency interconnect • InfiniBand • 10 GE (RoCE) POWER6 POWER7/7+ Flex Flex 17 pureScale is supported on both IBM Power and Intel-x86 based systems. This chart shows the hardware and software requirements at a high level (current as of April 2014). For more information on the specific hardware requirements, please see the following sections of the Information Center: Power/AIX: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r 0054850.html Intel x86/Linux: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r 0057441.html
  • 18. 18 18 #IDUG Installing DB2 pureScale • Components installed • Database members • Cluster caching facilities (CFs) • DB2 cluster services, which includes TSA, RSCT, and GPFS • Methods • DB2 Setup Wizard (db2setup): User friendly GUI for interactive deployment • Allows setup of a DB2 pureScale environment across multiple hosts • Ensures that all necessary components are installed and configured on all hosts according to best practices • db2_install command: Command line install process (deprecated) • Response file: Automated install • Install and instance creation includes • Installing binaries • Occurs across all hosts – does not require installation separately on each one • All components are installed and configured as part of the single install process • Cluster domain created • sqllib_shared file system created and mounted on all hosts When installing DB2 pureScale it will do the installation and setup of the database member(s), CF(s), and DB2 Cluster Services (which includes TSA, RSCT, and GPFS). You can install via the db2setup tool (interactive setup wizard GUI or response file) or the db2_install command. Note that the db2_install command has been deprecated and so it might be removed in a future release. It is suggested to use db2setup with a response file as an alternative. If you want to setup your instance with more than one CF and member to start with then the GUI is a good method to use. It is also very easy to use. Install processing involves installing the binaries, which are copied to all hosts in the cluster, creating the cluster domain, and created the shared file system on which the instance shared files go (assuming you're performing an install that is also creating an instance at the same time).
  • 19. 19 #IDUG Check Prerequisites using db2prereqcheck • Checks whether your system meets prerequisites for the installation of a specific version of DB2 before installing it • Including the pureScale prerequisites • Includes OS level, Linux distribution, AIX technology level, C library, and uDAPL • Prerequisites contained within an XML file • Contains prerequisites for DB2 9.8, 10.1, and 10.5 • Command and file located in <installPath>/cfg/DB2prereqs.xml • Examples db2prereqcheck –i –p db2prereqcheck –p –v 10.5.0.0 db2prereqcheck –u –v 10.5.0.0 Check pureScale requirements for latest version of DB2 described in XML file Check pureScale requirements for DB2 10.5.0.0 Check only pureScale uDAPL requirements for DB2 10.5.0.0 19 The db2prereqcheck command checks whether your system meets the prerequisites for the installation of a specific version of DB2 for LUW, including the prerequisites for the pureScale feature. By using this command, you can determine whether your system satisfies the prerequisites before you start the installation process. The prerequisites checked include the operating system level, Linux distribution, AIX Technology Level, C library and runtime patches, uDAPL, and other DB2 pureScale specific requirements. Note that it isn't exhaustive. Some things may still need to be checked manually. The db2prereqcheck command uses a resource XML file that contains the prerequisites, covering DB2 9.8 FP2 up to the latest fix pack of DB2 10.5. The file will be updated with each version and fix pack. The default path of the XML file is located in <DB2 installation path>/cfg/DB2prereqs.xml. You should never modify the contents of the XML file. You use the –p option to check the prerequisites for pureScale. You can specify the –i option to specify that you want to check the prerequisites for the latest DB2 version that is defined in the resource XML file. If you want to check for a specific version then you can specify that via the –v option.
  • 20. 20 #IDUG Pre-Installation Cheat Sheets AIX: http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.qb.server.doc/doc/r0056077.html Linux: http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.qb.server.doc/doc/r0057204.html 20 Within the pre-installation checklist sections of the Information Center (links shown here) there are a couple of cheat sheets that can be used to help identify and record installation- related information prior to the installation process.
  • 21. 21 #IDUG Converting an Existing DB2 Instance to pureScale • Update the instance to a DB2 pureScale instance using one of db2iupdt (command line) or db2isetup (GUI) • e.g. db2iupdt –d -m coralpib154 –mnet coralpib154-ib0 -cf coralpib153 –cfnet coralpib153-ib0 -instance_shared_dev /dev/hdisk2 -tbdev /dev/hdisk4 demoin1 > cat sqllib/db2nodes.cfg 0 coralpib154.torolab.ibm.com 0 coralpib154-ib0 - MEMBER 128 coralpib153.torolab.ibm.com 0 coralpib153-ib0 - CF • If using db2iupdt, instance will start with one CF and one member • Add additional members and a second CF using db2iupdt -add 21 To convert the instance to a DB2 pureScale instance, you can use either the db2iupdt command or db2isetup (GUI). You must run them as root. The example on this slide shows how you could use db2iupdt. Note that you can specify either “-instance_shared_dev” or “-instance_shared_dir”. If you have not yet created a GPFS file system for the instance shared file system (sqllib_shared) then specify the first option and a disk device. If you have already created a GPFS file system for it using db2cluster_prepare then use the second option along with the path name. At this point after having used db2iupt, the instance will have one CF and one member. Typically you will want two CFs and at least two members, so add members and a CF as necessary.
  • 22. 22 #IDUG Moving Existing Databases to pureScale • DB2 pureScale database prerequisites • All table spaces must be managed by Automatic Storage • Database and logs must reside on GPFS • Can setup file systems in advance and move database to it • If pureScale not yet installed, use db2cluster_prepare command to create DB2 managed GPFS file systems • Run db2checkSD to verify that a database is ready to be moved to pureScale • Verifies that unsupported features are not used, all table spaces are automatic storage, etc. • For example: db2checkSD SAMPLE -l db2checkSD.log • Able to backup a non-pureScale database and restore into pureScale • Source and target version of DB2 must both be at 10.5 pureScale 22 In pureScale, the database and logs must reside on a GPFS file system (so that it can be shared across multiple hosts). The database needs to be moved to GPFS before performing the upgrade – and can be done at any point in time prior to this. The db2cluster_prepare tool can be used to create a DB2 managed GPFS file system in advance of installing pureScale. Create as many file systems as is necessary (for database storage paths and the log files). Note that the GPFS licensing that comes included with pureScale only allows it's use with pureScale. You are not licensed to create GPFS file systems for non-pureScale purposes. You can run the db2checkSD command against a database before moving it to pureScale to determine whether you must take actions before using the database in a DB2 pureScale environment. The db2checkSD command generates a script file containing information about any issues found. In DB2 10.5 you can backup and restore between non-pureScale and pureScale instances (at the same version level – you cannot restore a DB2 10.1 non-pureScale backup image into a DB2 10.5 pureScale instance).
  • 23. 23 #IDUG Migrating Databases to Automatic Storage • Enable database for automatic storage by creating a storage group • Existing table spaces are not impacted at this point • Drop temporary table spaces and recreate as AS • Convert DMS table spaces to AS using one of • ALTER TABLESPACE • Redirected restore of table spaces • Note that SMS cannot be converted to AS • Fundamentally different architectures • Must rebuild the table space (e.g. ADMIN_MOVE_TABLE) • If SYSCATSPACE is SMS then two options exist • Rebuild the database (could use db2look and db2move) • Might be able to use Transportable Schema feature to move all existing table spaces into a new automatic database (with AS SYSCATSPACE) 23 DB2 pureScale requires that all databases are using automatic storage (AS) exclusively. That means that every table space in a pureScale database must be using automatic storage. This slide is for those who plan on moving a database to pureScale but the database isn't using automatic storage yet. If the database was created in DB2 9.1 or any release since then, automatic storage is on by default – even if no table spaces are using it. For those databases that were created earlier than that or where automatic storage was specifically not enabled at database creation time, it is very simple to enable it. Starting with DB2 10.1, this can be done by creating a storage group for the database (which becomes the default storage group for the database). For all temporary table spaces, you must drop them and recreate them as automatic storage. DMS table spaces can be converted to automatic storage very easily – and in an online way. The ALTER TABLESPACE statement is used to do the conversion and to subsequently rebalance the data from the old non-AS containers to the new AS ones. Alternatively, if a restore is being performed for a database where automatic storage is enabled, you can do a redirected restore of the non-AS table spaces, converting them to AS in the process. SMS table spaces cannot be converted to automatic storage as they have fundamentally different architectures under the covers. For these table spaces you must rebuild them, potentially using something like online table move (ADMIN_MOVE_TABLE). However, if your system catalog table space (SYSCATSPACE) is SMS then you can't just rebuild it. In this case you must recreate the database, potentially using tools like db2look and db2move to help make it easier. Another alternative – if none of the documented restrictions come into play – is to create a new empty database, which will be enabled for automatic storage, and then use DB2's transportable schema feature. With this feature, table spaces can be moved from the existing database (via a backup image) into the new database. See the Information Center for details on this feature and things that might restrict using this as an option here.
  • 25. 25 #IDUG Database Member Process Model • DB2 engine address space • i.e. a db2sysc process and its threads • Watched by the watchdog (db2wdog) process • Similar to single partition DB2 server • Contains local buffer pools, lock list, database heap, log buffer, sort heap, etc. • Various EDUs such as agents, page cleaners, prefetchers, log writer, agents, etc. • pureScale-specific EDUs exist, including some related to lock management, CF interactions, etc. • Use db2pd –edus to see list of EDUs running on a member • No concept of a “catalog node/partition” • All data (including system catalog tables) accessible by all members db2 agents and other threads bufferpool(s) log buffer, dbheap, and other heaps Pri CF CF CS Member CS Member CS Member CS Sec CF CF CS Member 25 A database member in pureScale is very much like a single instance in a single partition DB2 server in that there is a db2sysc process and within it various threads are running (e.g. agents, page cleaners). Within the memory space of the process are the typical memory heaps and structures that you would find in DB2 including buffer pools, the lock list, database heap, etc. There are also various new EDUs (Engine Dispatchable Units – or threads) that have been introduced to support a pureScale environment. Most notably are those associated with lock management and CF interactions. It isn't necessary to get into all of the details around these new EDUs here, but for more information you can look at the “DB2 Process Model” section in the Information Center. The “db2pd –edus” command can be used to display all of the EDUs associated with the db2sysc process for a member. In a DPF environment there is the concept of a catalog node/partition. However, this does not exist in pureScale as each member has equal access to the shared data on disk.
  • 26. 26 26 #IDUG Configuring Members • Database manager and database configuration parameters are categorized as either having a global or local (per-member) scope • Global parameters • Same value must be used across all members • Used for parameters for which per-member values would have little value and/or may create functional issues • e.g. CF_DB_MEM_SZ, ALT_COLLATE, PAGE_AGE_TRGT_GCR UPDATE DATABASE MANAGER CONFIGURATION USING <parameter name> <parameter value> UPDATE DATABASE CONFIGURATION FOR <database name> USING <parameter name> <parameter value> In pureScale there are some database configuration parameters that have a global scope and others that have a local per-member scope. Even for per-member parameters, if a specific member number isn’t provided as part of the UPDATE DATABASE CONFIGURATION or UPDATE DATABASE MANAGER CONFIGURATION command then it is applied globally (i.e. to each member). Typically, those parameters that are global are those for which per-member values would have little value and/or might create a functional issue. Global database configuration parameters are stored in the global database configuration file (which is located in the global database path).
  • 27. 27 27 #IDUG Configuring Members (cont.) • Local (per-member) parameters • Allows for per-member settings where different resources per member, or affinitized workload tuning may be useful • e.g. instance memory, database heap size • Parameter updates are applied to all members by default (second example below) • Kept consistent unless otherwise overridden • Use the MEMBER option to override a value for a specific member: UPDATE DATABASE CONFIGURATION FOR <database name> MEMBER <member num> USING <parameter name> <parameter value> • Can still apply a change to all members at the same time: UPDATE DATABASE CONFIGURATION FOR <database name> USING <parameter name> <parameter value> Local database configuration parameters can be set on a per-member basis (i.e. they can have different values across the different members). This allows for member-specific settings where affinitized workload tuning might be useful. For example, setting the size of the sort heap or the package cache. This is done by specifying the MEMBER option and a member number on the UPDATE DATABASE CONFIGURATION command. By default, if the MEMBER option is not specified as part of the UPDATE DATABASE CONFIGURATION command then all members will be updated. This provides single- system view database configuration management Per-member database configuration parameters are stored in the local database directories and there is one configuration file per member. Note that the GET DATABASE CONFIGURATION command returns information only for the member on which it is executed.
  • 28. 28 #IDUG Cluster Caching Facility (CF) • Software technology that assists in global buffer coherency management and global locking • Shared lineage with System z Parallel Sysplex • Software based • Services provided include • Group Bufferpool (GBP) • Global Lock Management (GLM) • Shared Communication Area (SCA) • Members duplex GBP, GLM, SCA state to both a primary and secondary • Done synchronously • Set up automatically, by default • Having a secondary CF is optional (but recommended) Pri CF CF CS Member CS Member CS Member CS Sec CF CF CS Cluster Caching Facility (CF) SCA CF worker threads GLM GBP 28 This slide describes what a CF (Cluster Caching Facility) is in a DB2 pureScale cluster. It’s primary responsibilities are to manage locking across the cluster (via the GLM) and to manage data page access across the members in the cluster (via the GBP). But it has various other roles and responsibilities as well. It is recommended (especially for production systems) to have two CFs defined (a primary and a secondary) so that there is no single point of failure in case a planned or unplanned outage occurs.
  • 29. 29 29 #IDUG Configuring the CF • CF configuration is managed through traditional DB2 configuration interfaces • GET/UPDATE DB/DBM CFG commands and APIs • Parameter values applied to both primary and secondary CFs • CFs should be hosted on equally configured hardware/LPARs, so no need for different values • Two configuration categories • CF server configuration (maintained through DBM configuration) • CF structure configuration (maintained through DB configuration) • CF Server Configuration AUTOMATIC (based on # members, workers, etc.) Initial size of CF connection pool for each member CF_NUM_CONNS AUTOMATIC (typically 70-90% of machine memory) Controls total amount of memory used by the CF CF_MEM_SZ AUTOMATIC (# cores – 1) Number of worker threads started by the CF CF_NUM_WORKERS NULL (<INSTHOME>/sqllib/db2dump/ $m) Fully qualified path for the cfdiag.*.log files CF_DIAGPATH 2 (all errors) Specifies types of diagnostic errors that will be recorded in the cfdiag.*.log file CF_DIAGLEVEL Default Value Description DBM Parameter The CFs are configured using the traditional DB2 configuration interfaces. There are two configuration categories: CF server and CF structure. The CF server configuration can be updated using UPDATE DBM CFG and the CF structure configuration can be updated using UPDATE DB CFG. The CF server configuration parameters are listed on this slide. The default value for the CF_DIAGPATH in DB2 9.8 was NULL, which meant to use the value of DIAGPATH. And by default that was "<INSTHOME>/sqllib/db2dump". In DB2 10.1, the default for both DIAGPATH and CF_DIAGPATH have changed to "<INSTHOME>/sqllib/db2dump/ $m" (which evaluates to "<INSTHOME>/sqllib/db2dump/DIAG#####" where "####" is the member/CF number). Starting in DB2 10.1, the CF diagnostic data directory path writes to a private db2diag.log for each CF by default. To revert to the behavior of previous releases, in which the diagnostic data for the CF is written to the same directory, specify CF_DIAGPATH with a pathname and no token. The default CF_NUM_WORKERS value (if AUTOMATIC) is the number of logical CPUs (cores) – 1. Note that on Power, each hardware thread is seen to the OS as a CPU core. For very small CF configurations, recovery time performance can be helped by having 2 free hardware threads on the CF (i.e. CF_NUM_WORKERS = (logical CPUs – 2)). The default for CF_MEM_SZ value (if AUTOMATIC) is 70%-90% of the total available memory on the CF (depends on whether CF and members co-exist). When you set CF_NUM_CONNS to AUTOMATIC (the default), DB2 creates an initial number of CF connections for each member with each CF at start time. This initial number is based on the number of worker threads, number of connections per worker thread, and the number of members in the cluster.
  • 30. 30 30 #IDUG Configuring the CF (cont.) • CF Structure Configuration AUTOMATIC (based on CF_MEM_SZ and # active databases settings) Total amount of memory used by the CF (includes GBP+GLM+SCA) CF_DB_MEM_SZ AUTOMATIC (Remainder of memory from CF_DB_MEM_SZ) Amount of memory used by the Group Buffer Pool (GBP) in the CF CF_GBP_SZ 15 Target time in minutes for catch up to bring a newly restarted CF into peer state CF_CATCHUP_TRGT AUTOMATIC (5-20% of CF_DB_MEM_SZ) Amount of memory used by the Shared Communication Area (SCA) in the CF CF_SCA_SZ AUTOMATIC (15% of CF_DB_MEM_SZ) Amount of memory used by the Global Lock Manager (GLM) in the CF CF_LOCK_SZ Default Value Description DBM Parameter The CF structure configuration parameters are listed on this slide. These parameters specify the size of the various structures that get created in the CF when a database is activated. The cluster caching facility structure memory used for Group Buffer Pool (GBP), lock usage (GLM), and Shared Communication Area (SCA) is allocated for the cluster caching facility during the first database activation on any member and remains allocated until deactivation on the last member. These parameters have a default value set to AUTOMATIC. When set to AUTOMATIC, DB2 computes appropriate sizes for these parameters during database activation. Because these values are closely related and dependent on one another, manually setting at least one of the parameters causes none of the parameters to be calculated during activation even if some parameters remain set to AUTOMATIC. Their values are what the most recent automatically calculated values were. The ONLINE option is also supported for structure parameters. Any updates to CF memory parameters are applied immediately. Update requests are synchronous and are not returned until the new value is set by the CF server.
  • 31. 31 #IDUG Castout • The process of writing dirty pages from the GBP out to disk is called castout • Similar in concept to page cleaning • Two purposes • Maintain a specific recovery window by ensuring that no pages in the GBP are older than a certain age • To keep GBP from getting full, allowing free space for new pages being stored there • Page writes not performed by the CFs directly • Members sent pages to castout from CF via RDMA (into private buffers on the members) • Specialized page cleaner threads running on the members write the pages out to disk • Configuring castout: • NUM_IOCLEANERS: Number of castout page cleaner threads per member (default is AUTOMATIC) • PAGE_AGE_TRGT_GCR: Age of pages in GBP before castout to disk (default 240 seconds) GBP CF Bufferpool(s) Member Local Castout Buffers Castout Page Cleaner Threads 31 Bufferpool(s) Member Local Castout Buffers Castout Page Cleaner Threads Castout is the process of writing dirty pages from the GBP out to disk. If you're familiar with page cleaning in non-pureScale DB2 where the page cleaners write out dirty pages from the buffer pool(s) to disk then it's easy to understand what castout is. However, rather than cleaning the pages out to disk from the local buffer pools (which is still done via page cleaning in pureScale), castout is the writing of the pages from the GBP to disk. Castout is important for two reasons: 1) To write out dirty pages to disk and to ensure there are enough clean directory entries and data elements in the GBP to use for new page registrations and writes. 2) To maintain a specific recovery window (for Group Restart) by ensuring that no pages in the GBP are older than a certain age. This reduces the number of log records that must be replayed in the case of a Group Crash Recovery during Group Restart processing. The page writes are not actually performed by the CFs directly. Instead, pages are passed from the CF to the members via RDMA and it is the members that do the write I/O. The local buffer pools are not used during this process. Instead, dedicated memory buffers within the members are used. Special page cleaner threads called "castout engines" are used to do the I/O. The number of threads used in castout per member is determined by the NUM_IOCLEANERS database configuration parameter. When this is set to a value (or DB2 derives a vale for AUTOMATIC) then there are this many regular page cleaners and this many castout page cleaners created at database activation time. PAGE_AGE_TRGT_GCR will be covered in more detail later.
  • 32. 32 #IDUG Backup, Logging and Recovery 32 <no speaker notes>
  • 33. 33 #IDUG pureScale Recovery Basics • Data is shared, but each member maintains its own set of logs • Commonly referred to as a log stream • Logs must be on the clustered file system • Members only write to their own log stream, but can read from others • For example: during merged log recovery • Failures may require member recovery or group recovery for database • Single system view backup, restore, and rollforward DB2 DB2 DB2 Shared Data Logs Logs Logs Clustered File System 33 In a pureScale database there is a single partition of data that each member has access to and can modify. However, each member has its own dedicated set of log files, commonly referred to as a log stream. Like the data in the database, the logs must reside on a GPFS clustered file system – and it’s recommended that the logs be on their own file system, separate from the data. Each member will only ever write to its own set of log files but it may need to read the logs from other members. For instance, during a database rollforward or group crash recovery where log merging takes place, the log files from all members are read by a single member to perform the operation (log merging will be described in more detail later on in this presentation). Different types of failures can occur while a pureScale database is up and running (e.g. power outage to the cluster, hardware failure on one machine). Recovery from failures may involve just recovering and replaying through the logs of one member or of all the members in the cluster. This too will be discussed in more detail later on. Unlike in DPF where commands like BACKUP and RESTORE are performed on individual nodes (database partitions), in pureScale you just execute these commands from any member and work is performed against the entire database.
  • 34. 34 #IDUG Backup and Restore • No differences in command syntax or usage in pureScale • Continue to utilize autonomic settings for buffers and parallelism • Can continue to take advantage of compression, including logs, etc. • Single system view backup and restore operations are executed on and performed by a single member • Unlike DPF where backup needs to be done on each database partition (remember, there’s only one data partition in pureScale) • BACKUP command can be executed from any member • Can be as simple as: db2 backup database proddb • RESTORE of an image can be done on any member • Can be as simple as: db2 restore database proddb on … • Can set UTIL_HEAP_SZ by member if dedicated backup member • 3rd party backup products require that backups/log archives can be retrieved by any host in the cluster • Use proxy nodes with TSM 34 There are no differences in the command syntax of BACKUP and RESTORE or in how they are used between non-pureScale DB2 and DB2 with pureScale. If you have experience running these commands outside of pureScale then you’ll know how to use them in pureScale. If you are familiar with backup and restore in a DPF environment, you will know that you have to backup each database partition (although we make this easy through the “single system view” feature of backup by allowing you to backup all partitions using a single command – but a backup image is in fact generated for each of the partitions). In pureScale, you only have to run the BACKUP command on one member (any of them) and the resulting image will contain everything for that database. And if you ever have to restore the database, you can execute the RESTORE command on any of the members as well… it doesn’t have to be the one where the backup image was generated.
  • 35. 35 #IDUG Backup and Restore (cont.) • Single backup image gets generated for the database and includes • Description of members (topology) at time of backup • Global/cluster metadata files (e.g. recovery history file, global config files) • Member-specific metadata files for every member (e.g. member LFH files, local config files) • Log files from each active member for online backups • Table space data • Only one copy/partition of data exists, regardless of the number of members DB2 Member DB2 Member BACKUP DATABASE TESTDB TESTDB.0.db2inst.DBPART000.20120922191905.001 Shared Data Logs Logs Local Config Local Config Global Config 35 The backup image that gets generated contains everything associated with the database including the data itself, a description of the topology of the database, global/cluster metadata files, and per-member metadata files. It doesn’t matter on which member the database is backed up or restored since we have everything we need within the backup image.
  • 36. 36 #IDUG Topology-Changing Backup and Restore • Backup and restore between topologies with differing numbers of members Data Member 0 Member 1 Member 2 Member 3 CF CF 4 member instance Data Member 0 Member 1 CF CF 2 member instance Backup Image Online Backup Restore Data Member 0 Member 1 Member 2 Member 3 CF CF 4 member instance Data Member 0 Member 1 Member 2 CF CF 3 member instance Backup Restore Backup Image To superset of members To subset of members 36 You can restore a pureScale database backup to a different number of members. Also, you can restore a non-DB2 pureScale backup image to a DB2 pureScale instance (and vice- versa). The next slide covers the latter. All of this applies to snapshot backups as well (keeping in mind that snapshot backups are typically online, but can be taken offline).
  • 37. 37 #IDUG Backup and Restore To/From pureScale • Backup and restore from pureScale to non-pureScale (and vice-versa) Data Member 0 Member 1 Member 2 Member 3 CF CF 4 member pureScale instance Data DB2 Non-pureScale instance Data Member 0 Member 1 Member 2 CF CF 3 member pureScale instance Data DB2 Non-pureScale instance Backup Restore Backup Image Backup Restore Backup Image To non-pureScale To pureScale 37 You can also restore a non-DB2 pureScale backup image to a DB2 pureScale instance (and vice-versa). In the case of restoring from non-DB2 pureScale to DB2 pureScale, per the DB2 pureScale prerequisites the database must be using automatic storage for all of the table spaces (restore will fail otherwise). The target DB2 pureScale storage must be on GPFS but it does not matter what kind of file system was being used on the original non-pureScale source system. This top example on this slide shows a situation where we are moving a database from a pureScale instance to a non-pureScale instance. The bottom example shows the reverse of this.
  • 38. 38 #IDUG Set Write Suspend and Snapshot Backups • SET WRITE SUSPEND and SET WRITE RESUME • Suspends and resumes DB2 writes to the database • Executed on one member, writes are suspended across all members • Can use snapshot as a backup image, clone, or standby database • Works in conjunction with db2inidb command • Additional GPFS and db2cluster steps are required on top of SET WRITE SUSPEND/RESUME • See Information Center for full list of steps • State available through database configuration parameter: SUSPEND_IO • Values are YES, NO, or IN_PROGRESS • History file record is not generated for backup 38 The SET WRITE SUSPEND and SET WRITE RESUME commands are used as part of the process for taking snapshot (a.k.a. split mirror or flash copy) backups of the database. When the SET WRITE SUSPEND command is issued on a member it distributes the request to all of the members in the cluster, stopping writes to the database across all of those members. SET WRITE RESUME works in a similar way to reverse the process, allowing writes to be done against the database again. In a pureScale environment there are other steps that must be followed to perform a snapshot backup and subsequently use that copy of the database for the purposes of a clone, standby image, or as a backup. These steps are documented in various sections in the Information Center. In older versions of DB2, you could tell if writes were suspended by looking at table spaces states. Now, there is a database configuration parameter called SUSPEND_IO. This is an informational parameter (i.e. it cannot be explicitly set using UPDATE DB CFG) and will show one of YES, NO, or IN_PROGRESS.
  • 39. 39 #IDUG BACKUP DATABASE PRODDATA USE SNAPSHOT SCRIPT '/scripts/snapshot.sh' RESTORE DATABASE PRODDATA USE SNAPSHOT SCRIPT '/scripts/snapshot.sh' TAKEN AT 20140307183200 Snapshot Backup Scripts • Allows for integrated snapshot backup capabilities for those storage devices not supported by DB2 Advanced Copy Services (ACS) • Custom script implements the DB2 ACS API • Users or storage vendors can write their own scripts • Write operations to the database are automatically suspended and resumed by DB2 during the backup process • Benefits include • Wider storage support • Avoids need for manual snapshot backup process in pureScale • Manually running SET WRITE SUSPEND, SET WRITE RESUME, db2inidb, and storage vendor commands can be error prone • History file record is generated 39 If you were performing a snapshot operation in DB2 10.1, you either had to use storage hardware that provided a vendor library that supported the DB2 ACS API (for non- pureScale only), or you had to write your own script -- which included having to suspend and resume writes to the database and call the underlying storage commands to take the snapshot. There are some drawbacks to writing a script like this. For instance, they can be difficult and error-prone to write, especially in regards to suspending and resuming database operations. Also, they do not generate a history file entry, so you cannot monitor the progress and success of the snapshot operation. In DB2 10.5, these trade-offs have been eliminated. The DB2 ACS API is now wrapped in the library for DB2 ACS. The library invokes a custom script to perform the snapshot operation. DB2 takes over the error-prone actions like issuing the SET WRITE SUSPEND, SET WRITE RESUME, and db2inidb commands at the correct time. At the same time, because the DB2 ACS API is being used as part of a true DB2 backup operation, an entry is made in the recovery history file for every snapshot operation, allowing you to monitor successful and unsuccessful backups. We document the API requirements within the Information Center and it is possible for DB2 users to write their own scripts. However, it is suggested that people reach out to their storage vendor to have them write and provide a script instead.
  • 40. 40 #IDUG Logging Configuration Parameters • In pureScale, each member maintains its own set of log files • Most logging related parameters are global in scope • The following parameters have member (local) scope • BLK_LOG_DSK_FUL, LOGBUFSZ, MAX_LOG, NUM_LOG_SPAN • Log paths are global but each member has its own subdirectory within it • <logPath>/NODE0000/LOGSTREAM#### • Applies to the log path, mirror log path, and overflow log path • Default log path is in the global database directory • <dbPath>/<instance>/NODE0000/SQL#####/LOGSTREAM#### 40 In a pureScale cluster, there is a single partition of data but each member maintains its own set of log files. Most of the logging related parameters are global in scope. This means that you can not set them to different values on different members. However, there are a few (shown above) that have a member scope and can be updated to different values across members (UPDATE DB CFG FOR <dbName> MEMBER <#> …) As previously mentioned, each member has its own set of log files (i.e. log stream). You specify a single log path for the cluster but a subdirectory is created under that log path for each log stream. The same is true for the mirror log path and the overflow log path configuration parameters. Prior to pureScale, the default log path was <dbPath>/<instance>/SQL#####/SQLOGDIR. This was changed in 10.1 to what is shown on the slide (even for non-pureScale instances of DB2).
  • 41. 41 #IDUG Log File Management • Log files are archived independently on each member • Archive targets have member-specific directories/name space • <ArchPath>/<instance>/<databaseName>/NODE0000/LOGSTREAM#### • Member performing a log merge operation (e.g. rollforward) retrieves logs from all members as needed DB2 Member DB2 Member ARCHIVE (/archivePath) LOG PATH (/logPath) /logPath/ NODE0000/ LOGSTREAM0000/ S0000000.LOG S0000001.LOG S0000002.LOG ... LOGSTREAM0001/ S0000000.LOG S0000001.LOG S0000002.LOG ... /archivePath/ <instance>/ <databaseName>/ NODE0000/ LOGSTREAM0000/ C0000000/ S0000000.LOG S0000001.LOG S0000002.LOG LOGSTREAM0001/ C0000000/ S0000000.LOG S0000001.LOG S0000002.LOG Member 0 Member 1 41 In a pureScale cluster, there is a single partition of data but each member maintains its own set of log files. Assuming that log archiving is enabled, as log files are filled on a member, they are archived independently of what is happening on the other members. If the archive location is a disk location then there will be a directory per log stream (LOGSTREAM####). In the case of an archive location like TSM, there is a database partition number that makes up part of the name space and in pureScale this value represents the log stream (member) number. For log merge operations (such as rollforward), when reading archived log files that are owned by other members, a member might need to retrieve log files into its own log path or overflow log path. A set of subdirectories is created in the member’s log path for retrieved log files.
  • 42. 42 #IDUG Member vs. Group Crash Recovery • Two types of crash recovery (determined by DB2, based on state of cluster) • Member crash recovery (MCR) • When one or more members fail with at least one CF remaining available • Only requires one log stream (per recovering member) to perform recovery • All data available during MCR (except in flight data) • Group crash recovery (GCR) • Simultaneous failure on both CFs • Similar to crash recovery without pureScale except that log streams are merged • Database will open for connections when recovery completes db2sysc CFp CFs db2sysc Log Log db2sysc CFp CFs db2sysc Log Log CFp CFs db2sysc Log db2sysc Log Log db2sysc CFp CFs db2sysc Log Log db2sysc Log db2sysc Log CFp CFs db2sysc db2sysc Log Log CFs CFp CFs CFp Log Log 42 This slide summarizes the differences between the two types of crash recovery in DB2 pureScale: member crash recovery (MCR) and group crash recovery (GCR). MCR occurs when one or more members fail with at least one CF remaining available. It only requires reading and recovering through one log stream (per member requiring recovery). The database is online and accessible on other members. All data is available on those other members except for in-flight data (data that was in the process of being inserted/updated/deleted when the member went down). GCR occurs when there are simultaneous failures of both CFs. This is similar to crash recovery in non-pureScale environments in that the database is offline and inaccessible. However, in pureScale all of the log streams are read and merged before replay occurs. Once this is done the database is open for business. Note that for a GCR to occur, it requires a double failure – both CFs (which should be on different hosts) coming down – and this should be a very rare occurrence.
  • 43. 43 #IDUG Tuning Crash Recovery in pureScale • Force at Commit protocol means that pages are typically being "persisted" to GBP much more quickly than to disk in non-pureScale • Member crash recovery typically very fast as a result • Group crash recovery impacted by rate of castout (page cleaning) from GBP to disk • SOFTMAX database configuration parameter deprecated in DB2 10.5 • Replaced by PAGE_AGE_TRGT_MCR and PAGE_AGT_TRGT_GCR • SOFTMAX=0 means use new parameters (set by default for new databases) • PAGE_AGE_TRGT_MCR • Target duration (in seconds) for changed pages to be kept in the local buffer pool before being persisted to disk or to the group buffer pool (GBP) • PAGE_AGE_TRGT_GCR • Target duration (in seconds) for changed pages to be kept in the GBP before being persisted (castout) to disk 43 Member crash recovery requires reading through the log files for the member that failed, redoing log records in the member's recovery window to ensure that committed changes persisted, and undoing log records to rollback transactions that were still running at the time of failure. Member crash recovery is typically very fast. One of the reasons for this is pureScale's "force at commit" protocol. When transactions are executing on a member, pages are being read into the local buffer pool and modifications are made locally. When a transaction commits, all of the pages that were modified by the transaction get sent to the GBP (on both the primary and secondary CF). This means that during redo processing of member crash recovery, for all transactions that have committed we are going to find that the updates have already been made and there is no need to actually redo the work. Plus, we are quite likely going to find the pages we need to look at in the GBP and the member can get them very quickly via RDMA – which is much faster than reading pages from disk. For both of these reasons, member crash recovery is typically very fast and there's not a lot of configuration needed to control it (for longer transactions, PAGE_AGE_TRGT_MCR comes into play more – see below). Externally, a group crash recovery is like crash recovery in non-pureScale in that it redoes all committed transaction updates that have not yet been written to disk (remember, if we're doing a GCR then we've lost the contents of the GBP and that’s why GCR is needed) and it undoes all of the work associated with in-flight (uncommitted) transactions. Internally, though, there are a few differences. For one, the logs from all of the members’ log streams are merged into a single logical log stream that is replayed during the redo phase (this merging takes place in memory and does not get written to disk). And at run-time, while individual members maintain their own MinBuffLSN values, there is a cluster-wide global MinBuffLSN that is also maintained. Also, there is a concept of a cluster-wide current LSN, which is essentially the highest LSN that has been consumed across the cluster. These values can then be used to determine the range of log records in the merged log stream that corresponds to pages that haven’t been written out to disk yet (i.e. dirty pages that are currently sitting in the GBP or in local buffer pools). In pureScale, it is this range within the merged log stream that gets compared to SOFTMAX (deprecated) / PAGE_AGE_TRGT_GCR to determine when to persist the changes to disk. When the global MinBuffLSN falls outside of that SOFTMAX / PAGE_AGE_TRGT_GCR range, the old pages in the GBP need to be written out to disk. This is performed by castout threads (also known as castout page cleaners or castout engines) running on the members. They read the old pages from the CF and write them out to disk. PAGE_AGT_TRGT_MCR configures the target duration (in seconds) for changed pages to be kept in a local buffer pool before being persisted to disk or to the group buffer pool (GBP) (via page cleaning). PAGE_AGE_TRGT_MCR applies to non-pureScale DB2 as well. In pureScale, the default value for PAGE_AGT_TRGT_MCR is 120 seconds. PAGE_AGE_TRGT_GCR configures the target duration (in seconds) for changed pages to be kept in the GBP before being persisted (castout) to disk. PAGE_AGE_TRGT_GCR is applicable to pureScale only. The default value for PAGE_AGE_TRGT_GCR is 240 seconds.
  • 44. 44 #IDUG Archive Location Rollforward • ROLLFORWARD DATABASE command can be executed on any member • All processing performed by that single member • An interrupted or stopped rollforward it can be started again from any member • Applies to both database rollforward and table space rollforward • Log files retrieved from archive location and merged for replay purposes DB2 Member DB2 Member ROLLFORWARD DATABASE Database (shared data) Log Stream 0 Log Stream 1 Redo 44 Database and table space-level rollforward are both supported. When issuing a ROLLFORWARD DATABASE command it can be done from any member. All processing will be performed by that single member and if it happens to get interrupted or stopped then it can be started again from any other member (or the same one, it doesn’t matter). Logs are merged for replay purposes to ensure that all of the work done across the cluster is replayed and in the correct order. Log files will be retrieved from the archive location if they are not local.
  • 45. 45 #IDUG Rollforward (cont.) • Rollforward to point-in-time, end-of-logs, or end-of-backup • Point-in-time operation stops when it encounters the first log record from any of the log streams whose timestamp is greater than the specified time stamp • Important that member clocks are synchronized as close as possible • For table space rollforward, point-in-time must be greater than or equal to the minimum recovery time (MRT) for the table space • Rollforward status shows logs replayed per member (log stream) Input database alias = TESTDB Number of members have returned status = 3 Member Rollforward Next log Log files processed Last committed transaction ID status to be read ------ ----------- ------------ ------------------------- -------------------------- 0 DB working S0000014.LOG S0000005.LOG-S0000013.LOG 2014-03-12-14.39.23.000000 UTC 1 DB working S0000004.LOG S0000002.LOG-S0000003.LOG 2014-03-12-14.39.18.000000 UTC 2 DB working S0000011.LOG S0000001.LOG-S0000010.LOG 2014-03-12-14.39.25.000000 UTC 45 You can rollforward a database to a point-in-time, to end-of-logs, or to end-of-backup. A point-in-time operation will stop when it encounters the first log record from any of the log streams whose timestamp is greater than the specified timestamp. The caveat here is that the member clocks should be synchronized as close as possible across the cluster. If one of the members is actually significantly ahead of the rest then it’s possible that you might not actually get as far as you would like. The use of NTP (Network Time Protocol) with pureScale ensures that the machine clocks are kept in very close synch across the cluster. When a rollforward completes, or when querying the rollforward status of a database, you will see the status from each member log stream. This includes the log files processed and the last committed transaction encountered. This is similar to what is shown in a DPF environment when rolling forward multiple database partitions.
  • 47. 47 #IDUG IBM General Parallel File System (GPFS) and pureScale • GPFS is a scalable, highly-available, high performance file system optimized for multi-petabyte storage management • Shipped with, installed and configured as part of pureScale • DB2 pureScale license includes GPFS for use with cluster • GPFS provides concurrent access from all hosts in the instance to instance meta data and database files on disk • Provision shared LUNs from storage administrator • System administrator changes ownership of LUNs to DB2 instance owner • Instance owner can create GPFS file systems with the storage LUNs • Best practice: • One (or more) GPFS file system for automatic storage and table spaces • One GPFS file system for database metadata (database path) and log files • db2cluster -cfs command used for typical GPFS management activities 47 The IBM General Parallel File System – or GPFS – is a scalable, highly-available, high performance file system optimized for multi-petabyte storage management. It has historically been used extensively in high performance computing (HPC) environments. As previously mentioned, it is shipped with, installed, configured, and updated as part of pureScale. GPFS provides concurrent access from all of the hosts in the instance to the instance’s meta data files (sqllib_shared) and to the database’s files (storage paths, table space containers, and logs). Physical disks within a disk subsystem are not usually directly accessible by host systems, such as DB2 database servers, and they are not directly visible to DBAs. Storage administrators provision units of storage as logical unit numbers (LUNs), which appear to host systems as SCSI disks. A LUN, however, is a completely virtual entity that can map to any combination of physical disks. An individual LUN can be a single RAID array, a portion of a RAID array, a single physical disk, a portion of a disk, or a meta of multiple RAID arrays. In pureScale, these LUNs must be made visible to all of the hosts in the cluster so that the GPFS file systems that are created on them can be visible to all hosts. To start with, these shared LUNs must be created. You will need to ask the storage administrator to do something called LUN masking or LUN zoning and make these LUNs visible to all of the hosts in the instance (unlike in ESE where only one host uses a particular LUN). The file systems are then created on these LUNs and they are mounted so that they are equally visible on all of the hosts. GPFS has its own volume manager and so it does not use the logical volume manager on AIX. Therefore, the system administrator does not need to create logical volumes. Instead, GPFS consumes these LUNs directly. However, the system administrator does have to change the ownership of the LUNs/devices (e.g. /dev/hdisk1) to the instance owner ID. Once that is done, the DBA can use the db2cluster command to create the file systems, mount them, and so on. Some enterprises might not actually be comfortable doing this. In their opinion, anything that smells like a file system is something that needs to be handled by a system administrator. That is fine. The system administrator doesn’t have to change the permissions on the LUNs to create the file system. He will just have to create it as root or a super user ID using either GPFS commands or the db2cluster command. Finally, he can then change the permissions on the file system to that of the instance owner so that it can be used by DB2.
  • 48. 48 #IDUG Instance Directory Structure … ../sqllib/bin/ ../sqllib/bnd/ ../sqllib/lib64/ … ../sqllib/ctrl/ ../sqllib/db2dump/ ../sqllib/db2nodes.cfg ../sqllib/db2systm ../sqllib/function/ … ../sqllib/adm/ … … ../sqllib/bin/ ../sqllib/bnd/ ../sqllib/lib64/ … ../sqllib/ctrl/ ../sqllib/db2dump/ ../sqllib/db2nodes.cfg ../sqllib/db2systm ../sqllib/function/ … ../sqllib/adm/ … … ../sqllib_shared/ctrl/ ../sqllib_shared/db2dump/ ../sqllib_shared/db2nodes.cfg ../sqllib_shared/db2systm ../sqllib_shared/function/ … <shared file system>/<instance>/ … ../bin/ ../bnd/ ../lib64/ … Local install directory on each host (e.g. /opt/IBM/db2/V10.5) … ../sqllib/bin/ ../sqllib/bnd/ ../sqllib/lib64/ … ../sqllib/ctrl/ ../sqllib/db2dump/ ../sqllib/db2nodes.cfg ../sqllib/db2systm ../sqllib/function/ … ../sqllib/adm/ … … ../sqllib/bin/ ../sqllib/bnd/ ../sqllib/lib64/ … ../sqllib/ctrl/ ../sqllib/db2dump/ ../sqllib/db2nodes.cfg ../sqllib/db2systm ../sqllib/function/ … ../sqllib/adm/ … … ../sqllib/bin/ ../sqllib/bnd/ ../sqllib/lib64/ … ../sqllib/ctrl/ ../sqllib/db2dump/ ../sqllib/db2nodes.cfg ../sqllib/db2systm ../sqllib/function/ … ../sqllib/adm/ … … ../sqllib/bin/ ../sqllib/bnd/ ../sqllib/lib64/ … ../sqllib/ctrl/ ../sqllib/db2dump/ ../sqllib/db2nodes.cfg ../sqllib/db2systm ../sqllib/function/ … ../sqllib/adm/ … … ../sqllib/bin/ ../sqllib/bnd/ ../sqllib/lib64/ … ../sqllib/ctrl/ ../sqllib/db2dump/ ../sqllib/db2nodes.cfg ../sqllib/db2systm ../sqllib/function/ … ../sqllib/adm/ … … ../sqllib/bin/ ../sqllib/bnd/ ../sqllib/lib64/ … ../sqllib/ctrl/ ../sqllib/db2dump/ ../sqllib/db2nodes.cfg ../sqllib/db2systm ../sqllib/function/ … ../sqllib/adm/ … Member/local sqllib directories 48 This is the instance directory structure (a.k.a. the “sqllib” directory structure). This is just showing some of the main subdirectories and files that are found in there. If you do an “ls -l” command in this directory you will see a lot more. First off, you will notice that it looks an awful lot like the directory structure in non- pureScale instances of DB2. In those cases, you have some directories and files that are local within the sqllib directory and some that point back to the installation directory. In pureScale, each member has its own sqllib directory and there is a combination of local directories and files, links that point back to the local installation directory (remember that the pureScale binaries get installed locally on each host as part of the installation process), and links that point to the sqllib_shared directory which gets created on the cluster shared file system (these are common across all hosts in the cluster). Having a structure like this provides many benefits. Common information in the sqllib_shared directory provides for single-system view instance management but having local binaries and libraries allows for things like rolling fix pack updates.
  • 49. 49 #IDUG Database Directory Layout • Database directory resides in <dbpath>/<instance_name>/NODE0000 sqldbdir/ SQL00001/ <-- database token db2event/ <-- global event monitors db2rhist.asc/bak <-- history files LOGSTREAM####/ <-- log directory (one per member) SQLOGCTL.GLFH.1,2 <-- global LFH files SQLDBCONF <-- global configuration file SQLSGF.1,2 <-- storage group control files SQLSPCS.1,2 <-- table space control files MEMBER0000 MEMBER0001 … MEMBER0002 … SQL00002/ … SQL0000X/ … DBNAME1 T#######/ <-- AS table space containers for DBNAME1 DBNAME2 T#######/ <-- AS table space containers for DBNAME2 … sqldbdir/ SQL00001/ <-- database token db2event/ <-- global event monitors db2rhist.asc/bak <-- history files LOGSTREAM####/ <-- log directory (one per member) SQLOGCTL.GLFH.1,2 <-- global LFH files SQLDBCONF <-- global configuration file SQLSGF.1,2 <-- storage group control files SQLSPCS.1,2 <-- table space control files MEMBER0000 MEMBER0001 … MEMBER0002 … SQL00002/ … SQL0000X/ … DBNAME1 T#######/ <-- AS table space containers for DBNAME1 DBNAME2 T#######/ <-- AS table space containers for DBNAME2 … MEMBER####/ <-- Member-local directory db2event/ <-- local event monitors SQLDBCONF <-- local DB config SQLBP.1,2 <-- local BP config SQLOGCTL.LFH.1,2 <-- local LFH files SQLOGMIR.LFH <-- mirror LFH file SQLINSLK,SQLTMPLK <-- lock files MEMBER####/ <-- Member-local directory db2event/ <-- local event monitors SQLDBCONF <-- local DB config SQLBP.1,2 <-- local BP config SQLOGCTL.LFH.1,2 <-- local LFH files SQLOGMIR.LFH <-- mirror LFH file SQLINSLK,SQLTMPLK <-- lock files Partition-Global Directory Member-Local Directory 49 The layout of the database directory structure was changed to accommodate multi-member pureScale databases and this applies to non-pureScale DB2 as well. There is a partition-global directory that contains metadata files that are global to the database and this is found at <dbpath>/<instance_name>/NODE0000/SQL##### (remember that there is just a single database partition in a pureScale database and it is called partition 0 – hence the files are found in NODE0000). There is an SQL##### directory for each database in the instance (where the number is assigned when the database is created). There are also member-specific metadata files. These are found within the MEMBER#### directories under <dbpath>/<instance_name>/NODE0000/SQL#####. Each member has a corresponding MEMBER#### directory (e.g. member 0 is in MEMBER0000). The instance database lock files, SQLINSLK,and SQLTMPLK, help to ensure that a database is used by only one instance of the database manager on each member.
  • 50. 50 #IDUG Storage Management • Storage management in pureScale is mainly performed through GPFS • E.g. Add storage to database by adding disks to existing file system(s) • db2cluster –cfs is used to perform cluster file system management operations • Supported options • create: Create a shared file system • add: Add disks to existing shared file system • remove: Remove disks from existing shared file system • delete: Delete shared file system • set: Set various configuration options • list: Returns details about configuration and tie-breaker disk • verify: Verifies configuration of file system cluster • mount/unmount: Mounts/unmounts a file system • rebalance: Restripes the data on disk across all disks in the file system • enter/exit -maintenance: Puts host into/takes host out of maintenance mode 50 In pureScale, storage management is mainly performed through GPFS. For instance, you would typically create a GPFS file system (or more than one) to place a database on. If you require additional space to be added to the database then you wouldn't create new file systems and add them. Instead, you would add new storage into the existing file system(s). But rather than making you learn GPFS commands, you can use DB2's db2cluster command with the –cfs option (there is also a –cm option used to manage the cluster manager). The options of the db2cluster command that you can use depend on your authorization level. Some options can only be specified by the DB2 cluster services administrator, others can only be specified if you are part of the SYSADM, SYSCTL or SYSMAINT group, and a smaller subset of commands can be run by any user ID on the system.
  • 51. 51 #IDUG Example Commands • List all file systems in cluster $> db2cluster -cfs -list -filesystem FILE SYSTEM NAME MOUNT_POINT --------------------------------- ------------------------- db2fs1 /db2sd_20140112025840 db2sddata /db2sdfs/db2sd_data db2sdlogs /db2sdfs/db2sd_log • List disks associated with a specific file system $> db2cluster -cfs -list -filesystem db2sddata -disk PATH ON LOCAL HOST OTHER KNOWN PATHS --------------------------------- ------------------------- /dev/hdisk5 • List configuration of a specific file system > db2cluster -cfs -list -filesystem db2sddata –configuration OPTION VALUE blockSize 1048576 defaultMountPoint /db2sdfs/db2sd_data … 51 These are examples of the “db2cluster –cfs –list” command. The first example shows the command to list all of the file systems in the cluster. The second example shows all of the disks associated within the db2sddata file system. The third example shows the configuration of the db2sdddata file system.
  • 53. 53 #IDUG 0 host0 0 - MEMBER 1 host1 0 - MEMBER 2 host2 0 - MEMBER 3 host3 0 - MEMBER 4 host4 0 - CF 5 host5 0 - CF db2nodes.cfg Host status Instance status > db2start 08/24/2008 00:52:59 0 0 SQL1063N DB2START processing was successful. 08/24/2008 00:53:00 1 0 SQL1063N DB2START processing was successful. 08/24/2008 00:53:01 2 0 SQL1063N DB2START processing was successful. 08/24/2008 00:53:01 3 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. > db2instance -list ID TYPE STATE HOME_HOST CURRENT_HOST ALERT 0 MEMBER STARTED host0 host0 NO 1 MEMBER STARTED host1 host1 NO 2 MEMBER STARTED host2 host2 NO 3 MEMBER STARTED host3 host3 NO 4 CF PRIMARY host4 host4 NO 5 CF PEER host5 host5 NO HOST_NAME STATE INSTANCE_STOPPED ALERT host0 ACTIVE NO NO host1 ACTIVE NO NO host2 ACTIVE NO NO host3 ACTIVE NO NO host4 ACTIVE NO NO host5 ACTIVE NO NO Instance and Host Status 0 host0 0 - MEMBER 1 host1 0 - MEMBER 2 host2 0 - MEMBER 3 host3 0 - MEMBER 4 host4 0 - CF 5 host5 0 - CF db2nodes.cfg DB2 DB2 DB2 DB2 Single Database View CF CF Shared Data host1 host0 host3 host2 host5 Clients host4 53 The db2instance –list command is the most important status reporting tool in DB2 pureScale. The output splits status information into two sections, instance status and host status. The instance status section tells you the state of each member and CF as well as the current host that the member or CF resides on. The host status section tells you the state of the hosts that the members and CFs are running on. Alert conditions (which impact the availability of a host or functionality of a member or CF) are reported through this status interface.
  • 54. 54 #IDUG Managing and Monitoring pureScale Using Optim Tooling Optim Configuration Manager • Full support for tracking and reporting of configuration changes across clients on servers Configuration Tracking and Client Management Data Studio • Full support for developing Java, C, and .NET applications against a DB2 pureScale environment Application Development DB2 High Performance Unload • Support for high speed unload utility Optim Query Workload Tuner • Full support for query, statistics, and tuning advice for applications on pureScale systems Query Tuning Data Studio Web Console Optim Performance Manager • Integrated alerting and notification • Seamless view of status and statistics across all members and CFs System Monitoring Data Studio • Ability to perform common administration tasks across members and CF • Integrated navigation through shared data instances Database Administration Product pureScale Support Task 54 This slide shows the type of system management and monitoring that you can do against a pureScale system using the IBM Optim tools suite and related products.
  • 55. 55 #IDUG Monitoring the CF Environment • ENV_CF_SYS_RESOURCES admin view • Shows system resources used by the CFs on the system • Includes physical memory, virtual memory, CPU load • MON_GET_CF table function • Shows CF status information, including current, configured, and target sizes for CF memory and structures • DB2_CF admin view • Returns information about host name, state, and whether alerts exist for the CFs • Similar to what you would see in the output of db2instance -list • db2pd options available as well (e.g. –cfinfo, –cfpool) SELECT ID, NAME, VALUE, UNIT FROM SYSIBMADM.ENV_CF_SYS_RESOURCES ORDER BY ID SELECT ID, CURRENT_HOST, STATE, ALERT FROM SYSIBMADM.DB2_CF ORDER BY ID SELECT ID, HOST_NAME, DB_NAME, CURRENT_CF_MEM_SIZE, CURRENT_CG_GBP_SIZE, CURRENT_CF_LOCK_SIZE, CURRENT_CF_SCA_SIZE FROM TABLE(MON_GET_CF(CAST(NULL AS INTEGER))) AS CFINFO ORDER BY ID States include: STOPPED, RESTARTING, BECOMING_PRIMARY, PRIMARY, CATCHUP, PEER, ERROR Here are some of the routines that can be used to monitor the CFs in the cluster, including some example queries (not all available columns are shown in these queries). The ENV_CF_SYS_RESOURCES admin view returns a list of system resources used by the CFs on the system. For example, physical memory, virtual memory, and CPU load. The MON_GET_CF table function shows information such as the current, configured, and target sizes for the CF and its structures (e.g. GBP, GLM, SCA). The current memory size represents the amount of a type of memory (e.g. GBP) currently in use by the system. The configured memory size represents the total amount of that type of memory that is currently configured by the database as the maximum. The value for current memory can never exceed that of configured memory. The target memory size represents a new configured maximum value for that type of memory. Usually, the target size is the same as the configured size. However, if the target and configured sizes differ, that means that that particular type of memory is undergoing an online change in its configured size. The process of allocating memory takes place over time. At any point during this resizing process, the configured memory represents the maximum amount of that type of memory that can be used at that specific point in time. Eventually, the configured memory becomes the same as target memory. The DB2_CF admin view returns information about the host name, the state of the CF (e.g. STOPPED, RESTARTING, BECOMING_PRIMARY, PRIMARY, CATCHUP, PEER, ERROR) and whether there are any alerts for the CF. This is similar to the output that you would see in the output of the "db2instance –list" command. The DB2_GET_INSTANCE_INFO table function (not shown here) returns the same information as the DB2_MEMBER and DB2_CF administrative views, but enables you to filter the information returned by passing input parameters, such as the current host. The db2pd –cfinfo command displays information about the CF that can be useful when diagnosing performance problems or just when generally looking at the state of the CF.
  • 56. 56 #IDUG Monitoring the Member Environment • ENV_GET_SYSTEM_RESOURCES table function • Returns OS, CPU, memory, and other information related to members on the system • Only returned from members that are active • ENV_GET_DB2_SYSTEM_RESOURCES table function • Returns CPU usage (user and system) for DB2 processes running under the specified members • DB2_MEMBER admin view • Returns information about host name, state, and whether alerts exist for the members • Similar to what you would see in the output of db2instance -list SELECT MEMBER, HOST_NAME, OS_NAME, OS_VERSION, OS_RELEASE FROM TABLE(SYSPROC.ENV_GET_SYSTEM_RESOURCES(-2)) ORDER BY MEMBER SELECT MEMBER, DB2_PROCESS_NAME, DB2_PROCESS_ID, CPU_USER, CPU_SYSTEM FROM TABLE(SYSPROC.ENV_GET_DB2_SYSTEM_RESOURCES(-2)) ORDER BY MEMBER SELECT ID, HOME_HOST, CURRENT_HOST, STATE, ALERT FROM SYSIBMADM.DB2_CF ORDER BY ID States include: STARTED, STOPPED, RESTARTING, WAITING_FOR_FAILBACK, ERROR The ENV_GET_SYSTEM_RESOURCES table function returns operating system, CPU, memory, and other information that is related to members on the system. Data is returned only from members where the database that issued the command is active. The ENV_GET_DB2_SYSTEM_RESOURCES table function returns CPU usage and DB2 process information for specified members in the current instance. The main process of interest will be the DB2 system controller process (db2sysc). The DB2_CF member view returns information about the host name (the "home host" on which the member will reside when everything is healthy, and the "current host" that the member is actually running on (which will be different if things aren't healthy with this member)), the state of the member (e.g. STARTED, STOPPED, RESTARTING, WAITING_FOR_FAILBACK, and ERROR) and whether there are any alerts for the member. This is similar to the output that you would see in the output of the "db2instance – list" command. The DB2_GET_INSTANCE_INFO table function (not shown here) returns the same information as the DB2_MEMBER and DB2_CF administrative views, but enables you to filter the information returned by passing input parameters, such as the current host.
  • 57. 57 #IDUG Viewing Alerts • Existence of alerts shown in output of db2instance –list, DB2_MEMBER, and DB2_CF • Very detailed alert messages available, including the impact and action to take • Two options to list the alerts: • Example output: SELECT MESSAGE, ACTION, IMPACT FROM SYSIBMADM.DB2_INSTANCE_ALERTS db2cluster –cm –list -alert MESSAGE ---------------------------------------------------------------------------------------------------- Could not restart light DB2 member '0' on hosts 'hostA'. Check the db2diag.log for messages concerning a restart light or database crash recovery failure on the indicated hosts for DB2 member '0'. ALERT_ACTION ---------------------------------------------------------------------------------------------------- This alert must be cleared manually with the command: 'db2cluster –clear -alert -member 0' IMPACT ---------------------------------------------------------------------------------------------------- DB2 member '0' will not be able to restart light on host 'hostC' until this alert has been cleared. There is an ALERT column in the output of the db2instance –list command in the DB2_MEMBER and DB2_CF admin views. If a member, CF, or host has a value of YES for an alert then you can query more information about it. This can be done using the db2cluster –cm –list –alert command, or you can query from the DB2_INSTANCE_ALERTS admin view. For each alert there is a message, an alert action, and an impact. Some alerts will be cleared automatically whenever the root cause of the problem has been resolved and others may require manual intervention. The text will state what the manual action is, if any.
  • 58. 58 #IDUG Monitoring Logging • Each member has its own log stream and therefore monitoring is typically viewed per-member and not in aggregate • The various admin views, table functions, and procedures that return logging information allow you to query for all members or return information for all members • SYSIBMADM.LOG_UTILIZATION • SYSIBMADM.SNAPDB • SYSIBMADM.SNAPDETAILLOG • MON_GET_UNIT_OF_WORK • MON_GET_WORKLOAD 58 In a pureScale database, there is a single shared copy of the database but each member has its own log stream. Monitoring is typically viewed per-member and not in aggregate (similar to DPF). There are various administrative views, table functions, and stored procedures that are related to monitoring. All of the ones listed here report some level of logging activity. Some of these things accept a MEMBER or DBPARTITIONNUM parameter that allows you to get the information for a specific member, the currently connected to member, or all members. Others return a row per member or database partition number as part of the result set.