2. Please note
IBM’s statements regarding its plans, directions, and intent are subject to
change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a
commitment, promise, or legal obligation to deliver any material, code or
functionality. Information about potential future products may not be
incorporated into any contract. The development, release, and timing of any
future features or functionality described for our products remains at our sole
discretion.
Performance is based on measurements and projections using standard IBM
benchmarks in a controlled environment. The actual throughput or performance
that any user will experience will vary depending upon many factors, including
considerations such as the amount of multiprogramming in the user’s job
stream, the I/O configuration, the storage configuration, and the workload
processed. Therefore, no assurance can be given that an individual user will
achieve results similar to those stated here.
4. Assumptions
What we expect you to know
• You have a good understanding of cloud computing concepts
• You have a reasonable working level knowledge of Relational
database designs, principles, architecture
o Some knowledge of DB2 database and its features (i.e. DB2
HADR, DB2 pureScale, etc.)
• You are familiar with the IBM PureSystems family
o You are aware of the value of pattern based deployments in the
IBM PureSystems
• Application architecture knowledge preferred, but not essential
• Knowledge of DBaaS principles is highly appreciated!
5. Agenda
What this presentation is all about?
• The Nature of Scientific Data
o One client’s perspective
o Scientific Data (SD) vs Business Data (BD)
o High reliability and availability for SD management
• DataBase-as-a-Service (DBaaS)
o Why DBaaS and why now?
o Scientific research and DBaaS
o DBaaS in PureSystems
6. About Desert Research Institute (DRI)
Applied research addressing environmental issues globally
Non-profit research arm of the Nevada System of Higher Education
More than 550 scientists, engineers and technicians
Campuses in Reno and Las Vegas
60 specialized labs & research facilities (e.g., Virtual Reality lab)
Non-tenured, entrepreneurial faculty
300 research projects happening on all continents
$459 million in sponsored research projects since 2000
7. The Story
Emergence of innovation-based economy
Disruption by knowledge-based technology
Non-traditional science institute (DRI) adapting
Academia-Government-Industry partnerships
Catalyzing change with IBM Pure Systems
New science, new engineering, new model
7
Cooperating on shared values: innovation clustering
empowering, responsive, fiscally prudent
Government
Society
Academia
diffusive, relevant, sustainable
Industry
differentiated, competitive, profitable
8. Applied Innovation Center for Advanced Analytics
Supporting Nevada’s Economic Development with Innovation Services
8
● High Performance Computing
● Data Science & Engineering
● Cyber-physical Systems
● Advanced Visualization
DATA
acquiring, computing, processing,
archiving, correlating, visualizing,
exploring, analyzing, mining, …
9. Why is Scientific Data Important to You?
•
•
•
•
•
SD has the characteristics of Big Data
SD is your facilities data
Your BD will become more like SD
To remain competitive, you need research data
SD is relevant to your region/planet/solar system/galaxy/universe
ByBob Violino, New IDC Research shows Impact of Big Data on High Performance Computing Systems: October 28, 2013
Gary M. Johnson, Convergence: HPC, Big Data & Enterprise Computing, October 28, 2013
|
10. The Evolution of Scientific Investigation
Ancient
Greece
Observation
Renaissance –
Enlightenment
Observation Experimentation
Industrial
Revolution –
Atomic Age
Observation Experimentation
Theory
Electronics Age
Observation Experimentation
Theory
Computation
Data and
Communications Observation Experimentation
Age
Theory
Computation Telemetry
11. SD Management
Structured, semi-structured or unstructured
Heterogeneous (sources, units, types, dimensions)
Reliance on arrays and other complex data structures
Large data objects; sensitive to I/O & network performance
Distributed data repositories
Repositories are open, or not
Datasets are cleansed, and not
Many protocols, too few (persistent) standards
Increasing need for rigorous data provenance
12. SD is Heterogeneous
Structures
raster
vector
point
relational
human-derived
documents
lab notes
social
Atomic Types * #
array
image
table
tuple
string
reference
Popular Formats
HDF5
netCDF
SEG-Y
FITS
Shapefile
XML
3DXML
JSON
* Structures can be composed of type float, double, integer, fixed-point, categorical,
binary, string
# Data may be noisy and have associated uncertainties
13. Sources of SD
NVM
In Situ sensing
RAM
Rx
ROM
sensor
sensor
sensor
sensor
sensor
Sensor
μP
Tx
o Sensor arrays
o RFID
o Smart meters
o Surveillance
Remote sensing
o Active
o Passive
o Aircraft
o Orbital craft (satellite)
Computed/Simulated
o Forecasts
o Earth models
o Hydro models
o Brain simulations
Machine-derived
o Seismograms
o Tomograms
o Gene sequencers
o Accelerators
Human-derived (text, media)
~
actuator
actuator
I/O
DAC
ADC
Actuator
14. Patterns of SD Database Design
Design 0: File based approaches
Design 1: RDBMS
Data is relational or can be made relational
Design 2: Metadata in RDBMS
Ad hoc management system lacking high availability
Only metadata abstraction is kept in relational database
Design 3: Metadata in RDBMS with file pointers
Metadata is kept in relational database
File pointers to non-relational data also included in RDBMS
Design 4: ETL subsets into a working RDBMS
Spatially register, temporally synchronize, and coherently fuse
data extractions for use in a “working” database
Design 5: NoSQL DBMS’s
15. Accessing Applications for SD
SD access patterns:
•Large and bursty
•Coupled to data analysis applications
o
o
o
o
Data integration
Feature extraction, segmentation
Interpolation, regression, kriging
Correlation
− ~O(N2) complexity
o Pattern discovery
− naively, ~O(N4) complexity
o Classification,
Data
APP
Access to software applications and hardware
processors needs to be part of the design
Data
APP
network
Where are each of
these located?
Full Service Cloud
minimal data movement
16. Jim Gray’s Rules for Database-centric
Computing
1. Scientific computing is increasingly data intensive
2. The applications need a scale-out architecture
3. Bring computations to data, rather than the other
way
4. Design the database environment around 20
queries
5. Be agile, be modular, design for change
17. Examples of SD Databases
Sloan Digital Sky Survey (SDSS)
o
o
1) 5 band photometric, 2) redshift surveys
o
5 Tpx images, 120 TB processed, 35 TB catalog
o
Public data resource with JHU as lead institution
Rich application portfolio
http://www.sdss.org
1000 Genomes Project
o
Part of the Bionimbus scientific cloud
(Note ~0.5 TB/genome, ~1 TB/patient)
o
Inst. for Genomics & Systems Biology at UChicago
o
Human diversity project using Next Gen Sequencing (NGS)
Both SDSS and 1000 Genomes are member projects
in the Open Science Data Cloud (OSDC).
18. Cloud-based, High-Availability, Distributed SD
Scient
ific
The Contextual Enterprise
V
Structured,
Repeatable,
Linear
Data
Warehouse
Data
•Transaction
•Client app
•OLTP
Hadoop &
Streams
Content
Accumulation
and
Integration
Data
•Sensor
•RFID
•Text
Adapted from IBM GTO 2013
Unstructured,
Exploratory,
Dynamic
19. In Summary
SD is similar to Big Data – heterogeneous, multi-contextual
There is no uniform infrastructure in science
Solutions must be flexible and generally interoperable
SD needs BD reliability and accessibility
SD access is not generally transactional
More typically involves large data extractions for analysis
There are alternative approaches to reliable SD management
RDBMS can be a practical approach to reliable SD access when
coupled with application delivery
As businesses embrace Big Data, they face similar challenges
What is DBaaS for science?
Why DBaaS for science?
How can DBaaS for science be implemented?
20. Why DBaaS for scientific research?
Optimization & integration for delivering higher values
Today, the scientific research starts to rethink its participation and
possible new collaboration in the different phases of data lifecycle:
Data
Collection
Data
Integration
Data
Analytics
Data
Presentation
• Scientific research is mainly based on HPC practices
o Often deals with unstructured data & file based processing
o Traditionally has not embraced high-availability, business solutions
o Capital cost and funding are significant issues
• Scientific research just starts to adopt RDBMS processing (where feasible)
o Process less and only relevant data, producing results faster
o Improved consumability - forced to integrate with other (i.e. commercial,
portal) applications to deliver the value
21. File vs. data driven processing
Files loaded into
PureData
VM
N
VM 3
VM 2
VM 1
GB
Size
TB
Size
DB2
File based processing
VM 1
VM 1
VM 1
DB2
DB2
DB2
VM 1
TXT
1
VM 1
DB2
DB2
DB2
VM 1
VM 1
VM 1
DB2
DB2
DB2
MB
Size
Single call to the
database (parallelism)
Only relevant data set
is retuned to the user
Parallel or sequential (!!!)
file reads
22. What is Database as a Service (DBaaS)?
On PureSystems family (private cloud)
Delivery of Database functionally as a Service
Defines the architectural and operational approaches of a new serviceoriented delivery
Often defined as “Database in a Cloud”
Characteristics of DBaaS architecture:
Self-service interaction models to reduce complexity of database
service delivery - on-demand usage, rapid self-provisioning and
management of database instances
Multi-tenancy capabilities
Elasticity of workloads
Multiple levels of high availability
Automated resource management and monitoring
Metering of database usage (to allow a charge-back functionality)
23. Why DBaaS? Why now?
The 4 Vs: Volume, Variety, Velocity, Veracity
• Database sprawl and infrastructure growth is overwhelming
o With the growth of data, database infrastructure management has become
hugely expensive, complicated and introduced many risks
• Self service technology is needed
o Today we need “IT on demand” for fast business response while keep up
with compliance, less risk, and proper security
• Cost savings from virtualization & smart IaaS are “a must”
o Database needs/volumes grow while IT budgets are shrinking
• Data driven business decisions are the only way to go
o The business wants the data delivered faster, simpler and more reliable
• Cost-effectively scaling the data layer
o Companies are looking to replace the traditional expensive
database/infrastructure model for scaling an enterprise level of SLAs
24. New Technical Concepts in DBaaS
• DB Instance: A live database instance
• DB Image: Similar to a HV/VM image, but for databases
o Database backup includes the meta data to reconstitute a deployment
• DB Clone: The act of creating a DB instance from a DB image
• DB Pattern: A saved set of provisioning parameters to encourage
standardization on the application group side
• Workload Standard: A package that allows a level of customization
for a DB under the virtual application or DB2 Service for Cloud
o Allows configuration of the OS, DB2 instance, DB2 database
o Linked with a workload such as OLTP, Datamart, etc.
• DBaaS: Defines the architectural and operational approaches of a
new service-oriented delivery of database functionally (as a service)
25. New operational approaches in DBaaS
• Single click provisioning of databases from patterns
• Linked with a workload such as OLTP, Data mart, etc.
• Database can be provisioned via cloning (from backup)
• The database might be a part of application pattern
• A database might be provisioned from another system - Integration
between PureApplciation and PureData system for transactions
o Use a Workload Standard to enforce your best practices
• Logs and monitoring are available straight in the console
o Use context links to navigate for troubleshooting, management and
monitoring
• New considerations on upgrades – system and workload upgrades
• Use of command line – only when feasible
26. Where is the database?
A Maximo deployment from pattern
28. DB2 HADR pattern in Virtual System
on PureApplciation System
Match editions
Match versions
29. Deploy PureData database as part of application
pattern from PureApplication
New option added when
PureData is registered
30. Manage Logging (Database Service Console)
Database Service Console
OS logs
DB2 logs
Agent logs
Bring cursor on
file – arrow link
will pop up –
click to
download log file
31. Pre-integrated DB2 Monitoring
See detailed DB2 metrics from the Workload Console
Launches a new
browser Tab/window in
context to Database
Overview page.
33. IBM PureSystems & DBaaS
The ideal Platform as a Service (PaaS) for databases
• DBaaS provides a deep built-in integration of application and
database server capabilities in a simple, but powerful combination
intended to simplify the way applications and databases are designed,
deployed, run and managed.
• DBaaS offers a single-click pattern based development and
deployment via IBM provided database patterns and workloads that
speeds up the deployment of new applications and databases and
enforces creating of reusable assets for consistent enterprise
interactions.
• The capabilities to create custom patterns and workloads provide
optimized way of establishing and enforce enterprise standards.
• The pattern based management simplifies the database development
and deployment while the inbuilt best practices allow to obtain
optimized deployments right out of the box.
• DBaaS provides a simplified way of database development even for
complex task like creating of high availability and disaster recovery
(HADR) or DB2 cluster setups.
34. What is new in DBaaS on PureApplication System
DBaaS 1.1.0.8 - Sept 2013
• Added support for DB2 v10.5 (AKA Kepler) and DB2 BLU (for data mart)
o IBM DB2 for BLU Acceleration Pattern was added
• Added HADR for OLTP (HA in same rack with auto failover) (not related to HADR in vSys)
• Increased max VM size to 16 cores and 2TB disk
• Allow manual scaling up for existing DBaaS VM (CPU/Memory/Disk)
• DB2 versions available on IPAS:
o a choice of DB2 10.5.0.1 (DB2 10.5 FP1)
o a choice of DB2 10.1.0.2 (DB2 10.1 FP2)
o a choice of DB2 9.7.0.8 (DB2 9.7 FP8)
NOTE: DBaaS 1.1.0.8 is available separately on Fix Central (9/26/13) from where it
can be downloaded and imported as needed
35. Two key takeaways
How DBaaS applies to your business?
1) Explore the value
the SD might
provide to your
business
•
The scientific
research is motivated
to collaborate more
than ever
•
SD is Big Data
•
2) Explore the values of DBaaS for your
organization
•
The PureSystems
family provides an
easy way for
collaboration
Rapid transformation in data delivery is required by the
businesses today and is touching every side of our society
o
Even more conservative environments like scientific
research have to adapt to the new requirements to
stay relevant
•
IBM PureSystems provide an ideal platform in enabling the
efficiency of database provisioning and management
•
Use the patterns of expertise
o
•
They deliver real value in time and resources savings
for applications and databases alike.
Embrace the change DBaaS brings to you and your
organization
o
Simplicity means automation, less risk, more reliable
and cost effective data delivery for your business
36. Thank You
Your feedback is important!
• Access the Conference Agenda Builder to
complete your session surveys
o Any web or mobile browser at
http://iod13surveys.com/surveys.html
o Any Agenda Builder kiosk onsite
Questions?
Thomas Jackman
DRI/AIC
Maria Nichole Schwenger
IBM
Technical Lead for
Analysis & Computation
PureSystems Technical Specialist
thomas.jackman@dri.edu
schwenge@us.ibm.com
37. Learn More about IBM Cloud
Visit the EXPO
Cloud Booth
SoftLayer Booth
Connected Car
Cloud Sessions
Business Leadership Forums
Connected Car is Mobile, Social, Cloud,
Big Data – Tues, 10-11 a.m. in S. Pacific I
Social, Mobile, Analytics, Cloud, and
Beyond for the Automotive Industry -Tues, 4:30-5:45 p.m. in S. Pacific B
Online
Technology Forums
ibm.com/cloud
twitter.com/ibmcloud
youtube.com/ibmcloud
Forty unique Cloud Sessions across 72
time slots – check your event guide for
details!
39. DB2 deployment options in PureApplication system
Virtual systems using DB2 hypervisor-edition images
Ability to create custom patterns
Traditional configuration and administration model
Provides patterns for common topologies
Automated provisioning of images into patterns
DBaaS (Database-as-a-Service) using Database Patterns (virtual applications)
Simplified interaction model
Highly standardized and automated
Integrated life cycle management
Patterns are solutions derived from standardized industry best practices
Shared between users/teams
Connections to existing remote or existing local databases - option for both Virtual
Applciations and Virtual systems