Iod 2013 Jackman Schwenger

Scientific Research with DBaaS on
IBM PureApplication System &
PureData System for Transactions
IPT – 1961A
Tom Jackman, DRI
Maria N. Schwenger, IBM
Vikram Khatri, IBM

© 2013 IBM Corporation

Please note
IBM’s statements regarding its plans, directions, and intent are subject to
change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a
commitment, promise, or legal obligation to deliver any material, code or
functionality. Information about potential future products may not be
incorporated into any contract. The development, release, and timing of any
future features or functionality described for our products remains at our sole
discretion.

Performance is based on measurements and projections using standard IBM
benchmarks in a controlled environment. The actual throughput or performance
that any user will experience will vary depending upon many factors, including
considerations such as the amount of multiprogramming in the user’s job
stream, the I/O configuration, the storage configuration, and the workload
processed. Therefore, no assurance can be given that an individual user will
achieve results similar to those stated here.

Acknowledgements and Disclaimers
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries
in which IBM operates.
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are
provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to
any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is
provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or
otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect
of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable
license agreement governing the use of IBM software.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may
have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials
is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue
growth or other results.

© Copyright IBM Corporation 2012. All rights reserved.
•
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
IBM, the IBM logo, ibm.com, WebSphere, DB2, PureSystems, PureData and PureApplication System are trademarks or registered
trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM
trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate
U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be
registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and
trademark information” at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.

Assumptions
What we expect you to know
• You have a good understanding of cloud computing concepts
• You have a reasonable working level knowledge of Relational
database designs, principles, architecture
o Some knowledge of DB2 database and its features (i.e. DB2
HADR, DB2 pureScale, etc.)

• You are familiar with the IBM PureSystems family
o You are aware of the value of pattern based deployments in the
IBM PureSystems

• Application architecture knowledge preferred, but not essential
• Knowledge of DBaaS principles is highly appreciated!

Agenda
What this presentation is all about?
• The Nature of Scientific Data
o One client’s perspective
o Scientific Data (SD) vs Business Data (BD)
o High reliability and availability for SD management

• DataBase-as-a-Service (DBaaS)
o Why DBaaS and why now?
o Scientific research and DBaaS
o DBaaS in PureSystems

About Desert Research Institute (DRI)
Applied research addressing environmental issues globally

Non-profit research arm of the Nevada System of Higher Education
 More than 550 scientists, engineers and technicians
 Campuses in Reno and Las Vegas
 60 specialized labs & research facilities (e.g., Virtual Reality lab)
Non-tenured, entrepreneurial faculty
 300 research projects happening on all continents
 $459 million in sponsored research projects since 2000

The Story
Emergence of innovation-based economy
 Disruption by knowledge-based technology
 Non-traditional science institute (DRI) adapting
 Academia-Government-Industry partnerships
 Catalyzing change with IBM Pure Systems
 New science, new engineering, new model


7

Cooperating on shared values: innovation clustering
empowering, responsive, fiscally prudent

Government

Society
Academia
diffusive, relevant, sustainable

Industry
differentiated, competitive, profitable

Applied Innovation Center for Advanced Analytics
Supporting Nevada’s Economic Development with Innovation Services
8

● High Performance Computing
● Data Science & Engineering
● Cyber-physical Systems
● Advanced Visualization

DATA
acquiring, computing, processing,
archiving, correlating, visualizing,
exploring, analyzing, mining, …

Why is Scientific Data Important to You?
•
•
•
•
•

SD has the characteristics of Big Data
SD is your facilities data
Your BD will become more like SD
To remain competitive, you need research data
SD is relevant to your region/planet/solar system/galaxy/universe

ByBob Violino, New IDC Research shows Impact of Big Data on High Performance Computing Systems: October 28, 2013
Gary M. Johnson, Convergence: HPC, Big Data & Enterprise Computing, October 28, 2013

|

The Evolution of Scientific Investigation
Ancient
Greece

Observation

Renaissance –
Enlightenment

Observation Experimentation

Industrial
Revolution –
Atomic Age


Theory

Electronics Age


Theory

Computation

Data and
Communications Observation Experimentation
Age

Theory

Computation Telemetry

SD Management
Structured, semi-structured or unstructured
Heterogeneous (sources, units, types, dimensions)
Reliance on arrays and other complex data structures
Large data objects; sensitive to I/O & network performance
Distributed data repositories
Repositories are open, or not
Datasets are cleansed, and not
Many protocols, too few (persistent) standards


Increasing need for rigorous data provenance



SD is Heterogeneous
Structures
 raster
 vector
 point
 relational
 human-derived


documents



lab notes



social

Atomic Types * #
 array
 image
 table
 tuple
 string
 reference

Popular Formats
 HDF5
 netCDF
 SEG-Y
 FITS
 Shapefile
 XML
 3DXML
 JSON

* Structures can be composed of type float, double, integer, fixed-point, categorical,
binary, string
# Data may be noisy and have associated uncertainties

Sources of SD


NVM

In Situ sensing

RAM

Rx

ROM

sensor
sensor
sensor
sensor
sensor
Sensor

μP

Tx

o Sensor arrays
o RFID


o Smart meters
o Surveillance

Remote sensing
o Active
o Passive



o Aircraft
o Orbital craft (satellite)

Computed/Simulated
o Forecasts
o Earth models



o Hydro models
o Brain simulations

Machine-derived
o Seismograms
o Tomograms



o Gene sequencers
o Accelerators

Human-derived (text, media)

~

actuator
actuator

I/O

DAC

ADC

Actuator

Patterns of SD Database Design


Design 0: File based approaches




Design 1: RDBMS




Data is relational or can be made relational

Design 2: Metadata in RDBMS




Ad hoc management system lacking high availability

Only metadata abstraction is kept in relational database

Design 3: Metadata in RDBMS with file pointers





Metadata is kept in relational database
File pointers to non-relational data also included in RDBMS

Design 4: ETL subsets into a working RDBMS




Spatially register, temporally synchronize, and coherently fuse
data extractions for use in a “working” database

Design 5: NoSQL DBMS’s

Accessing Applications for SD
SD access patterns:
•Large and bursty
•Coupled to data analysis applications
o
o
o
o

Data integration
Feature extraction, segmentation
Interpolation, regression, kriging
Correlation
− ~O(N2) complexity

o Pattern discovery
− naively, ~O(N4) complexity

o Classification,

Data

APP

Access to software applications and hardware
processors needs to be part of the design
Data

APP
network

Where are each of
these located?

Full Service Cloud
minimal data movement

Jim Gray’s Rules for Database-centric
Computing
1. Scientific computing is increasingly data intensive
2. The applications need a scale-out architecture
3. Bring computations to data, rather than the other

way
4. Design the database environment around 20
queries
5. Be agile, be modular, design for change

Examples of SD Databases


Sloan Digital Sky Survey (SDSS)
o
o

1) 5 band photometric, 2) redshift surveys

o

5 Tpx images, 120 TB processed, 35 TB catalog

o



Public data resource with JHU as lead institution

Rich application portfolio

http://www.sdss.org

1000 Genomes Project
o

Part of the Bionimbus scientific cloud
(Note ~0.5 TB/genome, ~1 TB/patient)

o

Inst. for Genomics & Systems Biology at UChicago

o

Human diversity project using Next Gen Sequencing (NGS)

Both SDSS and 1000 Genomes are member projects
in the Open Science Data Cloud (OSDC).

Cloud-based, High-Availability, Distributed SD

Scient

ific
The Contextual Enterprise
V

Structured,
Repeatable,
Linear

Data
Warehouse
Data
•Transaction
•Client app
•OLTP

Hadoop &
Streams

Content
Accumulation
and
Integration

Data
•Sensor
•RFID
•Text

Adapted from IBM GTO 2013

Unstructured,
Exploratory,
Dynamic

In Summary











SD is similar to Big Data – heterogeneous, multi-contextual
There is no uniform infrastructure in science
Solutions must be flexible and generally interoperable
SD needs BD reliability and accessibility
SD access is not generally transactional
More typically involves large data extractions for analysis
There are alternative approaches to reliable SD management
RDBMS can be a practical approach to reliable SD access when
coupled with application delivery
As businesses embrace Big Data, they face similar challenges

What is DBaaS for science?
Why DBaaS for science?
How can DBaaS for science be implemented?

Why DBaaS for scientific research?
Optimization & integration for delivering higher values
Today, the scientific research starts to rethink its participation and
possible new collaboration in the different phases of data lifecycle:
Data
Collection

Data
Integration

Data
Analytics

Data
Presentation

• Scientific research is mainly based on HPC practices
o Often deals with unstructured data & file based processing
o Traditionally has not embraced high-availability, business solutions
o Capital cost and funding are significant issues

• Scientific research just starts to adopt RDBMS processing (where feasible)
o Process less and only relevant data, producing results faster
o Improved consumability - forced to integrate with other (i.e. commercial,
portal) applications to deliver the value

File vs. data driven processing
Files loaded into
PureData

VM
N
VM 3
VM 2
VM 1

GB
Size

TB
Size

DB2

File based processing
VM 1

VM 1

VM 1

DB2

DB2

DB2

VM 1

TXT
1

VM 1

DB2

DB2

DB2

VM 1

VM 1

VM 1

DB2

DB2

DB2

MB
Size
Single call to the
database (parallelism)
Only relevant data set
is retuned to the user

Parallel or sequential (!!!)
file reads

What is Database as a Service (DBaaS)?
On PureSystems family (private cloud)


Delivery of Database functionally as a Service





Defines the architectural and operational approaches of a new serviceoriented delivery
Often defined as “Database in a Cloud”

Characteristics of DBaaS architecture:








Self-service interaction models to reduce complexity of database
service delivery - on-demand usage, rapid self-provisioning and
management of database instances
Multi-tenancy capabilities
Elasticity of workloads
Multiple levels of high availability
Automated resource management and monitoring
Metering of database usage (to allow a charge-back functionality)

Why DBaaS? Why now?
The 4 Vs: Volume, Variety, Velocity, Veracity
• Database sprawl and infrastructure growth is overwhelming
o With the growth of data, database infrastructure management has become
hugely expensive, complicated and introduced many risks

• Self service technology is needed
o Today we need “IT on demand” for fast business response while keep up
with compliance, less risk, and proper security

• Cost savings from virtualization & smart IaaS are “a must”
o Database needs/volumes grow while IT budgets are shrinking

• Data driven business decisions are the only way to go
o The business wants the data delivered faster, simpler and more reliable

• Cost-effectively scaling the data layer
o Companies are looking to replace the traditional expensive
database/infrastructure model for scaling an enterprise level of SLAs

New Technical Concepts in DBaaS
• DB Instance: A live database instance
• DB Image: Similar to a HV/VM image, but for databases
o Database backup includes the meta data to reconstitute a deployment

• DB Clone: The act of creating a DB instance from a DB image
• DB Pattern: A saved set of provisioning parameters to encourage
standardization on the application group side
• Workload Standard: A package that allows a level of customization
for a DB under the virtual application or DB2 Service for Cloud
o Allows configuration of the OS, DB2 instance, DB2 database
o Linked with a workload such as OLTP, Datamart, etc.
• DBaaS: Defines the architectural and operational approaches of a
new service-oriented delivery of database functionally (as a service)

New operational approaches in DBaaS
• Single click provisioning of databases from patterns
• Linked with a workload such as OLTP, Data mart, etc.
• Database can be provisioned via cloning (from backup)
• The database might be a part of application pattern
• A database might be provisioned from another system - Integration
between PureApplciation and PureData system for transactions

o Use a Workload Standard to enforce your best practices

• Logs and monitoring are available straight in the console
o Use context links to navigate for troubleshooting, management and
monitoring

• New considerations on upgrades – system and workload upgrades
• Use of command line – only when feasible

Where is the database?
A Maximo deployment from pattern

Workloads standards and database patterns

Single click database
deployment

DB2 HADR pattern in Virtual System
on PureApplciation System

Match editions

Match versions

Deploy PureData database as part of application
pattern from PureApplication

New option added when
PureData is registered

Manage Logging (Database Service Console)

Database Service Console

OS logs
DB2 logs
Agent logs

Bring cursor on
file – arrow link
will pop up –
click to
download log file

Pre-integrated DB2 Monitoring
See detailed DB2 metrics from the Workload Console

Launches a new
browser Tab/window in
context to Database
Overview page.

Further Drill Down: Detailed DB2 metrics

Can drill-down & focus on “popular“ problems
•
•
•
•
•
•
•

Inflight Database Memory Dashboard
Inflight Rogue Query Dashboard
Inflight I/O Dashboard
Inflight Locking Dashboard
Inflight Logging Dashboard
Inflight Utilities Dashboard
Inflight Throughput Dashboard

IBM PureSystems & DBaaS
The ideal Platform as a Service (PaaS) for databases
• DBaaS provides a deep built-in integration of application and
database server capabilities in a simple, but powerful combination
intended to simplify the way applications and databases are designed,
deployed, run and managed.
• DBaaS offers a single-click pattern based development and
deployment via IBM provided database patterns and workloads that
speeds up the deployment of new applications and databases and
enforces creating of reusable assets for consistent enterprise
interactions.
• The capabilities to create custom patterns and workloads provide
optimized way of establishing and enforce enterprise standards.
• The pattern based management simplifies the database development
and deployment while the inbuilt best practices allow to obtain
optimized deployments right out of the box.
• DBaaS provides a simplified way of database development even for
complex task like creating of high availability and disaster recovery
(HADR) or DB2 cluster setups.

What is new in DBaaS on PureApplication System
DBaaS 1.1.0.8 - Sept 2013
• Added support for DB2 v10.5 (AKA Kepler) and DB2 BLU (for data mart)
o IBM DB2 for BLU Acceleration Pattern was added
• Added HADR for OLTP (HA in same rack with auto failover) (not related to HADR in vSys)
• Increased max VM size to 16 cores and 2TB disk
• Allow manual scaling up for existing DBaaS VM (CPU/Memory/Disk)
• DB2 versions available on IPAS:
o a choice of DB2 10.5.0.1 (DB2 10.5 FP1)

NOTE: DBaaS 1.1.0.8 is available separately on Fix Central (9/26/13) from where it
can be downloaded and imported as needed

Two key takeaways
How DBaaS applies to your business?
1) Explore the value
the SD might
provide to your
business
•

The scientific
research is motivated
to collaborate more
than ever

•

SD is Big Data

•

2) Explore the values of DBaaS for your
organization

•

The PureSystems
family provides an
easy way for
collaboration

Rapid transformation in data delivery is required by the
businesses today and is touching every side of our society
o

Even more conservative environments like scientific
research have to adapt to the new requirements to
stay relevant

•

IBM PureSystems provide an ideal platform in enabling the
efficiency of database provisioning and management

•

Use the patterns of expertise
o

•

They deliver real value in time and resources savings
for applications and databases alike.

Embrace the change DBaaS brings to you and your
organization
o

Simplicity means automation, less risk, more reliable
and cost effective data delivery for your business

Thank You
Your feedback is important!
• Access the Conference Agenda Builder to
complete your session surveys
o Any web or mobile browser at

http://iod13surveys.com/surveys.html

o Any Agenda Builder kiosk onsite

Questions?
Thomas Jackman
DRI/AIC

Maria Nichole Schwenger
IBM

Technical Lead for
Analysis & Computation

PureSystems Technical Specialist

thomas.jackman@dri.edu

schwenge@us.ibm.com

Learn More about IBM Cloud
Visit the EXPO
Cloud Booth
SoftLayer Booth
Connected Car

Cloud Sessions
Business Leadership Forums
Connected Car is Mobile, Social, Cloud,
Big Data – Tues, 10-11 a.m. in S. Pacific I
Social, Mobile, Analytics, Cloud, and
Beyond for the Automotive Industry -Tues, 4:30-5:45 p.m. in S. Pacific B

Online
Technology Forums
ibm.com/cloud
twitter.com/ibmcloud
youtube.com/ibmcloud

Forty unique Cloud Sessions across 72
time slots – check your event guide for
details!

DB2 deployment options in PureApplication system


Virtual systems using DB2 hypervisor-edition images



Ability to create custom patterns



Traditional configuration and administration model




Provides patterns for common topologies

Automated provisioning of images into patterns

DBaaS (Database-as-a-Service) using Database Patterns (virtual applications)



Simplified interaction model



Highly standardized and automated



Integrated life cycle management




Patterns are solutions derived from standardized industry best practices

Shared between users/teams

Connections to existing remote or existing local databases - option for both Virtual
Applciations and Virtual systems

Iod 2013 Jackman Schwenger

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (20)

Similaire à Iod 2013 Jackman Schwenger

Similaire à Iod 2013 Jackman Schwenger (20)

Dernier

Dernier (20)

Iod 2013 Jackman Schwenger