SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
Derek Laufenberg
derek.laufenberg@actian.com
262-754-4792

1
“It was the best of times ….” with apologies to Dickens, but today there are many choices in
data management. It is truly a “best of times” moment for choice. That choice, is a double
edged sword, databases are not created equal. Not all problems are created equal either.
Database designs have inherent tradeoffs forced by the problem the DMS was intend to
solve. Selecting the wrong technology can doom a project at worst, or end up costing it
millions over the lifetime of the application.

3
This talk isn't going to identify a “best database” between these two technologies, as we
will see, best is determined by the fit to the particular problem being solve. What I hope
you will gain from our time today is a better understanding of the core components, design
tradeoffs, and intended use cases, so you can make better choices on your next data
management project.

4
Credit – LandScape according to 451 Group, 2012.
Introduction
Databases have been around for over 50 years, from the beginning of electronic
computation data storage has always been fought with challenges – what to store, the
format in which to store, how to retrieve it later. How to protect it and how to share it.
The challenges of persistence are persistent even today after 60 odd years of computing.
Being on the technical side of database sales for over 14 years, I've learned that “one size
doesn't fit all” when it comes to data management. Different problems often demand
different approaches. The last 5 or so years has given us an explosion in New SQL or No SQL
technologies all aimed at better solving some part of the persistence problem.

5
Great summary - http://en.wikipedia.org/wiki/A_Tale_of_Two_Cities
http://www.sparknotes.com/lit/twocities/

I chose “A tale of two databases” as the title for today's talk, with apologies to Dickens, as a
motivator to look at two very different database products within the Actian portfolio.
Actian has a large offering of data management and integration products, and I encourage
you to check out our website for the larger picture, but for this discussion we're going to
focus on and look under the covers of only two products: Versant and ParAcell SMP (aka
Vectorwise) to see how they tick, and what makes one an operational DB and the other a
powerful analytic database.
Both are enterprise databases, each with 1000's of deployments, but what I find interesting
as a systems engineer is where they share design concepts and the key areas where they
differ.

6
Architecture Overview
The flavor and color of a city is conveyed through its architecture and inhabitants; without
straining the analogy, the style of a database is also understood through it architecture and
components from which it's made. We will see like any pair of modern cities there is much
in common; between the our two database protagonists there is also much common
ground, but there are important differences which should guide a systems designer's choice
of technology.
Cast of Characters – Our Cities
•Versant Object Database
•ParAccel SMP aka Vectorwise
A Tale of Two Cites is a story about, well two cities during the French-English War. The cites
server as the main characters with their politics, geography and inhabitants providing the
details and coloring for the story. A third major character or theme in the Dickens novel, is
water.

7
Daily life for early cities centered around the water. They were built on water to provide
economic advantage and improve the quality of life. Water is life. Uncontrolled or
contained, water can too be the ruin of a city. Early city inhabitants weren’t always to
careful with what was put into that life giving river or lake. Fortunately today we know how
our water cycles work and are much more careful, even reclaiming the once mistreated
bodies of water.
In our modern day story, our water is data. It flows, it changes, and has a life-cycle all its
own. Data is life for companies today. How it is managed, shaped, and used by a company
greatly affects its overall prosperity.
Today a company’s information is just as important. Like water, care must be taken to both
store and let it flow, creating value from it huge potential.

** kite boarder pictured is the author enjoying water’s potential on Lake Michigan

8
If our protagonists are the databases, our story needs some form of antagonists which our
technical heroes can overcome.
Data management projects have different concerns and the tools used for the project must
match the concerns.

9
10 Duplication prohibited

19.06.2013

Model driven: thinking about your problem domain in classes, modelling in OO
Complex models in OO
Application types can often fall into to of of these broad categores. Data driven – common
rules used by many applications or reports.
Aggregations found in reporting or data warehouse are a particular strenght of Vectorwise..

10 Copyright © 2013 Actian Corporation

10
11
12
Vectorwise is typically deployed at the heart of the BI/Reporting system to provide high
speed reporting. Actian partners with the leading BI & Reporting vendors.

13
Please forgive the marketing here, but the cost effective commodity hardware shows how
well Vectorwise’s re-designed query takes advantage of the new CPU and multi-core
designs. More on this later.

14
15 Duplication prohibited

19.06.2013

Versant on the other hand is all about dealing with really complicated problem domains.
The class diagram above just shows a few classes. Typical applications have hundreds, even
thousands of classes.

15 Copyright © 2013 Actian Corporation

15
A picture can explain the complexity better.
This is actually a map of the Schema – SID Shared Information and Data model
Deep inheritance – sometimes 15 levels or more. Collections all over, most of them are
polymorphic.

16
With those typical use cases in mind, lets see how these technologies approach the data
management problem.

17
Database’s share some common structures when viewed at a high level. The common
elements come from the fact that they are solving the same problems utilizing different
means or with a different focus. But common structures vary greatly in their
implementation and tradeoffs that make one system excel at fast execution of ad hoc query
or the navigation of a complex telecommunications network.

18
The data models employed by these two systems again have some similarities albeit
different naming conventions and a few wrinkles in how their respective schema is defined.
Both systems support the basic data types: chars, ints, floats, strings with minor variances
on width. In both systems, these basic types are used to compose more complex
structures: tables or classes which on the surface look pretty similar. The Vectorwise data
model is based on the SQL standard and supports most of the SQL types. Data definition
language (DDL) and data manipulation language (DML) is SQL. SQL is used to create table
definitions, insert or update, or delete. We won't be going into the SQL details here as most
people are familiar with the model, but lets compare it to the Versant model, because here
we see some major differences.

19
Where the two data models diverge is seen in the object database's need to support
abstractions commonly found in the object oriented programming languages, these
concepts include: pointers, type inheritance, and collections. This doesn't imply that these
concepts can't be expressed in a RDB like Vectorwise, in fact ORM tools like JPA or
Hibernate help manage persistence problem by hiding RDM nature and SQL from the
application developer. However this hiding isn't without considerable cost in operational
friction, also known as impedance mismatch in the OODB literature.

20
We see here the central SQL focus for Vectorwise.

21
With Versant, we see the application client built with object management resources: cache,
transaction manager, and transport over the network.
Part of the friction comes from dealing with OO concepts mentioned above. Versant
backend supports these abstractions innately, and is best understood with an example.

22
23
24 Duplication prohibited

19.06.2013

Our Versant Object Database Server together with the respective client API store the
objects, instances of application classes, directly in database storage.
Typically, objects have references to other objects, of varying types – base class or interface
types. Once stored, this network of objects, or any part of it, can be retrieved later by
queries, followed by navigation across object references in the respective language.
Only the objects accessed during a transaction are loaded into the client side cache.
Once a method is called on an object reference of a not yet loaded object that object is
retrieved from the server doing a lookup based on its type independent logical object id.

24 Copyright © 2013 Actian Corporation

24
25 Duplication prohibited

19.06.2013

The persistent capable class model of the application corresponds to the schema of the
database.
persistent capable classes are marked in the source code, or get listed in a configuration
file. Our tools read this information and generate the additional code that connect simple
classes to our database system.
We add the enhancer step. The enhancer takes the byte code of the application classes and
adds the code that makes classes persistent capable, and persistent aware, respectively.
In the source code above, the lines are marked that create a database connection, and
control a transaction.
Please note that only the Employee instance is made persistent explicitly. But because the
Department and Phone instances are reachable from the Employee instance they are made
persistent as well.
This is ‘Persistence by reachability’.

For the example, we'll use Java's defacto persistence standard JPA, as our database binding
language. With JPA we can highlight Versant's implementation details behind OO language
abstractions. The DDL and DML for Versant is Java and the JPA API. This is truly a NO SQL
interface to the database.

25 Copyright © 2013 Actian Corporation

25
Annotations within the Java code coupled with an added compilation step to extract the
schema and give the Java application a direct line into the database.
With JPA, the persistent class's byte code is modified to support change tracking, data
marshaling,
cascading persistence, and on demand object loading logic. Annotations indicate what
classes are destined for the database and support the nuances of how attributes should be
stored. Interestingly, with V/JPA, you need far fewer attribute annotations because the
database better understands OO concepts like inheritance and collections.

25
26 Duplication prohibited

19.06.2013

Change Tracking - We know, which objects were modified in the current transaction, and
we store them at commit
Transparent lazy loading Per default, objects get only loaded once they are de-referenced –
a method is called on them
Persistence by Reachability
New objects get stored, if they are reachable by any already persisted object. Only the root
object of a network of objects needs to persisted explicitly.
JPA is an ORM tool, JSR 220 was principally the work of the RDB community to eliminate
the development friction found when using Java and JDBC to store complicated object
models.
Hiding the persistence implementation from the developer, leads to more consistent and
simpler programming for the developer. Object Relational Mapping details are needed, and
many of the JPA annotations are used to identify special handling required for mapping the
class into one or more tables. Versant has adopted JPA as the latest binding on top of its
object database. Because of the inherent treatment of OO many mapping annotations
aren't required because of the back end's understanding of the object model.

26 Copyright © 2013 Actian Corporation

26
27
Communications
Communications for both these systems is similar, a Java application for Vectorwise would
use JDBC to query and return data sets, which could then be used to construct the objects
if required by the application's object model.
A JPA O/RM layer could be used here to hide dataset to object translations if desired, but
that isn't really Vectorwise's nature, a more typical use would be a BI application accessing
the contents.
Versant JPA uses an internal protocol built with RPC against the object server to load or
update objects within the JPA programming interface. Objects are marshaled in a binary
form and instantiated in the JVM for use by the application. In some cases, in complete
objects, hollow objects, are created inside the VM, but the lazy loading protocol ensures
they will be fully loaded prior to use by the application.

28
Transactions are central to the operation of both systems. They are the means through
which all data flows in and out of the server. Data creation, updates, deletions, and even
the schema manipulation itself is bounded by a transaction.
In 1983, Reuter & Harder coined the term ACID1 to describe transactions.
Both Versant and ParAccel are ACID databases, however they go about it through different
mechanisms. This brings us to our next comparison, locking and versioning.

1Haerder, T.; Reuter, A. (1983). "Principles of transaction-oriented database
recovery". ACM Computing Surveys 15 (4): 287. doi:10.1145/289.291

29
Locking vs Multiversioning
Versant uses a 2-phase locking protocol which gathers locks on all the objects being used to
ensure no two transactions are attempting to write to the same data (object). This is
mechanized with a locking table and transaction graph. Shared or read locks are collected
as the transactions work with data. They are then followed up with update (semi-exclusive)
or write (exclusive) locks when the transaction attempts changing the data. Deadlocks are
detected, as well as a timeout to prevent a transaction from waiting forever.
With this approach, updates are done in place on the existing data. Very likely the same
physical pages in memory and disk are updated as the object was read from. The locks
ensure transaction serialization.
I should mention that Versant supports both a pessimistic and optimistic locking schemes.
Even optimistic locking uses the read and write locks temporarily as objects are read or the
transaction commits.
--The counter part in Vectorwise is a multiversioning concurrency control (MVCC) system
whereby each transaction sees a consistent database at a given point in time – a snapshot
controlled by the transaction ID. A given transaction won't see a half-completed transaction
operating on the same data because other transactions doesn't overwrite the original data,
they create a new version with a later transaction-ID to prevent contaminating earlier
transactions. No locks or wait graphs need be maintained. Deleted and updated entries

30
need to be purged if space is a concern.

We have two different means of managing concurrency and serialization of transactions. The
Versant method is historically similar to RDBMS which support row and table locking.
Vectorwise's MVCC increases throughput at the expense of data growth and needed
propagation events.
If you require strict serialization of transactions, or want to limit growth, the locking model
will suit your needs. If analytic speed and concurrent read concerns are your core concern,
the MVCC will be faster, at the possible cost of stale data.
We are starting to see why Vectorwise is used for analytic, read-heavy reporting and Versant
finds itself used for operational processing.

30
One major difference found between these technologies is in how they physically store
data both on the disk and in memory. Of particular interest to me is the Vectorwise's
columnar approach, it is designed for pure analytical efficiency. In contrast to the
underlying storage model used by Versant which is similar to what is found in many
database systems.
Versant model older design, N-ary Storage Model, but there are some interesting tricks it
uses to optimize performance for networked object graphs.
Common in most database storage system are the concepts of volumes and pages. A
volume is a collection of pages and Versant can have as many volumes as need for the
database. A volume is mapped to a file and can be located on anything from raw devices to
storage area network (SAN) drives.
[DeWitt] [Zukowski]
NSM = N-ary Storage Model - row contains all columns
DSM – Decomposed Storage model = N attributes into N vertical storage elements
PAX = Partition Attributes Across = multiple columns stored on a page, but attributes stored
vertically

Vectorwise
Block size must be set prior to table creation.

31
Versant does allow for variability on the page, multiple types of objects or variable length
structures.
The min/max stats help reduce the columnar blocks that need be evaluated for a query.

32
Compression of data both on disk and in RAM reduces the IO bottlenecks that large data
systems confront today. By decompressing into the CPU’s cache VW takes advantage of the
Processor IO.
Column structure works really well for compression. Similar data is grouped together
allowing VW to pick an optimal compression strategy. Here optimal is not just storage
density, but also ease of decompression into the CPU cache for later processing.

33
34
Although Versant uses a traditional layout, where objects get located on a given data page,
there are some tricks it uses to efficient locate connected object.
Common in most database storage system are the concepts of volumes and pages. A
volume is a collection of pages and Versant can have as many volumes as need for the
database. A volume is mapped to a file and can be located on anything from raw devices to
storage area network (SAN) drives.
Pages are further broken down to slots used to store object instances. Multiple object
instances may stored on a page and accessed through the object's slot location. Larger
objects will span contiguous pages. Page size in Versant is modest 16K bytes; this is often
large enough hold many objects and still small enough not to waste too much space with
deleted objects. Normally, objects of the same type get stored in the same page on next
available slot, but as an optimization, it is possible to co-locate a parent and its children on
the same page. This extra effort results in a extremely efficient object loading when the
parent is used with its children frequently.

35
36 Duplication prohibited

19.06.2013

The LOID is used to identify an object and represent references, but how it is used to locate
an object internally?
Central to accessing any object is the Address Translation of the LOID to a physical
volume:page:slot triple. This triple identifies the objects location on the disk and is
accomplished by a multi-level hash table. It is highly optimized and cached in memory since
it is used for accessing every object.
Client side, the red object is already loaded in the client cache. It contiains references to
two other objects of grey color that are not yet loaded. If the application now calls a
method on any of these, then the Loid of this object is looked up in the client side hash
table. It has no address, so the Loid is send to the server, where a lookup is done in the
Association Table (AT). That lookup provides the physical object location in the respective
data volume.
The physical page is loaded into the server cache and the object is sent back to the client
and instantiated in client memory.

36 Copyright © 2013 Actian Corporation

36
One final point about the LOID in Versant. LOID references are designed to be crossdatabase references. Here we have an application using 4 objects, but they come from two
different databases.
This give the application designer great flexibility in deciding how to partition data. The
application simply connects to all the databases involved for the cross-db references.
Transaction use a 2-phase commit protocol.

37
What good is a database without a means to find answers to our particular questions or
efficiently service an application's demand for data. Like the other components we've
looked at there are some similarities between these two technologies, but also some big
difference.

Indexes in Vectorwise are typically not needed. Often, VW is setup so the compressed DB
lives entirely in memory and the auto-page indexes the redesigned query engine are
enough that scanning the data without indexing performs well enough that no index tuning
is required.
Versant on the other hand allows nearly any attribute or even collection to be indexed.
Versant’s query engine will then use the index automatically or with hints supplied by the
user.

38
Circa 2003
SQL vs C benchmark on the TPC-H
This difference between the database and the custom C program is huge… why is the
overhead of using a database so high, what’s being left on the table?
This difference started the X100 project to try to reclaim the 100 times loss in performance.
We’ve seen the storage model change for VW, but lets look further for the query
processing.

39
Each level of the data handling was studied for performance loss.
Compiler optimization are easier to take advantage of in smaller units. Often don’t get fully
exploited in large programs.
Modern CPU have better instruction sets and larger chip caches which can be used for
vector processing.

40
Results of the work some 40x improvement.

41
The work on Ingres is very critical to Vectorwise (ParAccel SMP) as the main interface to
both Ingres and Vectorwise are the same. It is not until the Optimizer which processes the
SQL query and generates the x100 algebra that the two components separate.
After generating the result set from VW it is the Ingres components that make this available
to the application.

Aliamaki, DeWitt, Hill, Skounakis – Weaving Relations for Cache Performance
NSM = N-ary Storage Model - row contains all columns
DSM – Decomposed Storage model = N attributes into N vertical storage elements
PAX = Partition Attributes Across = multiple columns stored on a page, but attributes stored
vertically

42
Taking all the performance features into account for VW query processing.
This is great for Reporting where you data isn’t changing frequently.

43
With Versant, queries are typically used to locate the beginning of a graph or top level
objects. Once the starting point is identified, the connected objects are frequently retried
by the application as required (lazy loading) or automatically with a default fetching. The
group loading saves round trips to the server and is much more efficient on the network.

On the Versant side, query is done via OQL or JPQL. This example is JPQL. The Book has a
simple collection Authors and we want to find an Author of “Smith”.
Notice the syntax is a little SQL like. But we directly operate on the collection
Book.authors, using “auth” as a working variable.
On execution, the Book extent would be searched for all the books with a Smith author.
This would end up scanning all the books and evaluating the Authors collections, returning
the object ids for the matching books.
ResultList holds the objects and the rest of the Java program would process that list.

44
The thing about relationships is they don’t change often. By baking them into the server’s
data structure and making them cheap to evaluate, Versant avoids join operations which
can be quite costly. IF you look at typical ORM code, you see a fair amount of join activity
whenever collection classes are involve.
Following a few links down a list can end up with a very expensive group of joins. Where
as managing the references with LOID allow for direct navigation to the object. The server
takes advantage of this in query expressions that involve paths or collections like the
example.

45
Closing Comments
This brings us to the end of our tale and hope you enjoyed our time together as much as I
did. Each of the components we've examined should have given you insight into the design
and tradeoff made by the different engineering teams. When taken as a whole they provide
consistent powerful framework for solving hard real world problems. Each of these
products has thousands of users which rely on their respective products for business critical
applications. The engineers who built those applications made strategic choices for the
data management system at the heart of their project.

46
47

Contenu connexe

Tendances

Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultDaniel Upton
 
Database in banking industries
Database in banking industriesDatabase in banking industries
Database in banking industriesnajammm007
 
Trekk Cross-Media Series: Using XML to Create Once - Distribute Everywhere
Trekk Cross-Media Series: Using XML to Create Once - Distribute EverywhereTrekk Cross-Media Series: Using XML to Create Once - Distribute Everywhere
Trekk Cross-Media Series: Using XML to Create Once - Distribute EverywhereJeffrey Stewart
 
Sql server 2012_and_intel_e7_processor_more_capability_and_higher_value_for_m...
Sql server 2012_and_intel_e7_processor_more_capability_and_higher_value_for_m...Sql server 2012_and_intel_e7_processor_more_capability_and_higher_value_for_m...
Sql server 2012_and_intel_e7_processor_more_capability_and_higher_value_for_m...Dr. Wilfred Lin (Ph.D.)
 
Technology
TechnologyTechnology
TechnologyRo Hith
 
TCA/TCO Benefits of Consolidating Databases and x86 Servers on IBM Enterprise...
TCA/TCO Benefits of Consolidating Databases and x86 Servers on IBM Enterprise...TCA/TCO Benefits of Consolidating Databases and x86 Servers on IBM Enterprise...
TCA/TCO Benefits of Consolidating Databases and x86 Servers on IBM Enterprise...IBM India Smarter Computing
 
Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013EMC
 

Tendances (10)

Data ware house
Data ware houseData ware house
Data ware house
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Database in banking industries
Database in banking industriesDatabase in banking industries
Database in banking industries
 
Trekk Cross-Media Series: Using XML to Create Once - Distribute Everywhere
Trekk Cross-Media Series: Using XML to Create Once - Distribute EverywhereTrekk Cross-Media Series: Using XML to Create Once - Distribute Everywhere
Trekk Cross-Media Series: Using XML to Create Once - Distribute Everywhere
 
Data models
Data modelsData models
Data models
 
Sql server 2012_and_intel_e7_processor_more_capability_and_higher_value_for_m...
Sql server 2012_and_intel_e7_processor_more_capability_and_higher_value_for_m...Sql server 2012_and_intel_e7_processor_more_capability_and_higher_value_for_m...
Sql server 2012_and_intel_e7_processor_more_capability_and_higher_value_for_m...
 
zEnterprise Executive Overview
zEnterprise Executive OverviewzEnterprise Executive Overview
zEnterprise Executive Overview
 
Technology
TechnologyTechnology
Technology
 
TCA/TCO Benefits of Consolidating Databases and x86 Servers on IBM Enterprise...
TCA/TCO Benefits of Consolidating Databases and x86 Servers on IBM Enterprise...TCA/TCO Benefits of Consolidating Databases and x86 Servers on IBM Enterprise...
TCA/TCO Benefits of Consolidating Databases and x86 Servers on IBM Enterprise...
 
Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013
 

Similaire à NoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases

Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
locotalk-whitepaper-2016
locotalk-whitepaper-2016locotalk-whitepaper-2016
locotalk-whitepaper-2016Anthony Wijnen
 
NOSQL Database Engines for Big Data Management
NOSQL Database Engines for Big Data ManagementNOSQL Database Engines for Big Data Management
NOSQL Database Engines for Big Data Managementijtsrd
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudantPeter Tutty
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
 
The Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadThe Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadDeborah Gastineau
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLijscai
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLIJSCAI Journal
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLijscai
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLIJSCAI Journal
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Karen Thompson
 
What Are The Best Databases for Web Applications In 2023.pdf
What Are The Best Databases for Web Applications In 2023.pdfWhat Are The Best Databases for Web Applications In 2023.pdf
What Are The Best Databases for Web Applications In 2023.pdfLaura Miller
 
Module-1.pptx63.pptx
Module-1.pptx63.pptxModule-1.pptx63.pptx
Module-1.pptx63.pptxShrinivasa6
 
No sql – rise of the clusters
No sql – rise of the clustersNo sql – rise of the clusters
No sql – rise of the clustersresponseteam
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 

Similaire à NoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases (20)

Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
locotalk-whitepaper-2016
locotalk-whitepaper-2016locotalk-whitepaper-2016
locotalk-whitepaper-2016
 
NOSQL Database Engines for Big Data Management
NOSQL Database Engines for Big Data ManagementNOSQL Database Engines for Big Data Management
NOSQL Database Engines for Big Data Management
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
The Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadThe Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) Had
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
What Are The Best Databases for Web Applications In 2023.pdf
What Are The Best Databases for Web Applications In 2023.pdfWhat Are The Best Databases for Web Applications In 2023.pdf
What Are The Best Databases for Web Applications In 2023.pdf
 
AtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White PapaerAtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White Papaer
 
Module-1.pptx63.pptx
Module-1.pptx63.pptxModule-1.pptx63.pptx
Module-1.pptx63.pptx
 
No sql – rise of the clusters
No sql – rise of the clustersNo sql – rise of the clusters
No sql – rise of the clusters
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
DBMS Notes.pdf
DBMS Notes.pdfDBMS Notes.pdf
DBMS Notes.pdf
 

Plus de ✔ Eric David Benari, PMP

SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...
SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...
SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...✔ Eric David Benari, PMP
 
Database Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTO
Database Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTODatabase Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTO
Database Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTO✔ Eric David Benari, PMP
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...✔ Eric David Benari, PMP
 
Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...
Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...
Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...✔ Eric David Benari, PMP
 
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, SisenseDatabase Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense✔ Eric David Benari, PMP
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph✔ Eric David Benari, PMP
 
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, CouchbaseDatabase Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase✔ Eric David Benari, PMP
 
MariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UN
MariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UNMariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UN
MariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UN✔ Eric David Benari, PMP
 
MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! Scott Bonn...
MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! Scott Bonn...MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! Scott Bonn...
MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! Scott Bonn...✔ Eric David Benari, PMP
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
 

Plus de ✔ Eric David Benari, PMP (10)

SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...
SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...
SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...
 
Database Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTO
Database Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTODatabase Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTO
Database Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTO
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
 
Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...
Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...
Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...
 
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, SisenseDatabase Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, CouchbaseDatabase Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
 
MariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UN
MariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UNMariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UN
MariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UN
 
MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! Scott Bonn...
MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! Scott Bonn...MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! Scott Bonn...
MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! Scott Bonn...
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 

Dernier

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

NoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases

  • 2. “It was the best of times ….” with apologies to Dickens, but today there are many choices in data management. It is truly a “best of times” moment for choice. That choice, is a double edged sword, databases are not created equal. Not all problems are created equal either. Database designs have inherent tradeoffs forced by the problem the DMS was intend to solve. Selecting the wrong technology can doom a project at worst, or end up costing it millions over the lifetime of the application. 3
  • 3. This talk isn't going to identify a “best database” between these two technologies, as we will see, best is determined by the fit to the particular problem being solve. What I hope you will gain from our time today is a better understanding of the core components, design tradeoffs, and intended use cases, so you can make better choices on your next data management project. 4
  • 4. Credit – LandScape according to 451 Group, 2012. Introduction Databases have been around for over 50 years, from the beginning of electronic computation data storage has always been fought with challenges – what to store, the format in which to store, how to retrieve it later. How to protect it and how to share it. The challenges of persistence are persistent even today after 60 odd years of computing. Being on the technical side of database sales for over 14 years, I've learned that “one size doesn't fit all” when it comes to data management. Different problems often demand different approaches. The last 5 or so years has given us an explosion in New SQL or No SQL technologies all aimed at better solving some part of the persistence problem. 5
  • 5. Great summary - http://en.wikipedia.org/wiki/A_Tale_of_Two_Cities http://www.sparknotes.com/lit/twocities/ I chose “A tale of two databases” as the title for today's talk, with apologies to Dickens, as a motivator to look at two very different database products within the Actian portfolio. Actian has a large offering of data management and integration products, and I encourage you to check out our website for the larger picture, but for this discussion we're going to focus on and look under the covers of only two products: Versant and ParAcell SMP (aka Vectorwise) to see how they tick, and what makes one an operational DB and the other a powerful analytic database. Both are enterprise databases, each with 1000's of deployments, but what I find interesting as a systems engineer is where they share design concepts and the key areas where they differ. 6
  • 6. Architecture Overview The flavor and color of a city is conveyed through its architecture and inhabitants; without straining the analogy, the style of a database is also understood through it architecture and components from which it's made. We will see like any pair of modern cities there is much in common; between the our two database protagonists there is also much common ground, but there are important differences which should guide a systems designer's choice of technology. Cast of Characters – Our Cities •Versant Object Database •ParAccel SMP aka Vectorwise A Tale of Two Cites is a story about, well two cities during the French-English War. The cites server as the main characters with their politics, geography and inhabitants providing the details and coloring for the story. A third major character or theme in the Dickens novel, is water. 7
  • 7. Daily life for early cities centered around the water. They were built on water to provide economic advantage and improve the quality of life. Water is life. Uncontrolled or contained, water can too be the ruin of a city. Early city inhabitants weren’t always to careful with what was put into that life giving river or lake. Fortunately today we know how our water cycles work and are much more careful, even reclaiming the once mistreated bodies of water. In our modern day story, our water is data. It flows, it changes, and has a life-cycle all its own. Data is life for companies today. How it is managed, shaped, and used by a company greatly affects its overall prosperity. Today a company’s information is just as important. Like water, care must be taken to both store and let it flow, creating value from it huge potential. ** kite boarder pictured is the author enjoying water’s potential on Lake Michigan 8
  • 8. If our protagonists are the databases, our story needs some form of antagonists which our technical heroes can overcome. Data management projects have different concerns and the tools used for the project must match the concerns. 9
  • 9. 10 Duplication prohibited 19.06.2013 Model driven: thinking about your problem domain in classes, modelling in OO Complex models in OO Application types can often fall into to of of these broad categores. Data driven – common rules used by many applications or reports. Aggregations found in reporting or data warehouse are a particular strenght of Vectorwise.. 10 Copyright © 2013 Actian Corporation 10
  • 10. 11
  • 11. 12
  • 12. Vectorwise is typically deployed at the heart of the BI/Reporting system to provide high speed reporting. Actian partners with the leading BI & Reporting vendors. 13
  • 13. Please forgive the marketing here, but the cost effective commodity hardware shows how well Vectorwise’s re-designed query takes advantage of the new CPU and multi-core designs. More on this later. 14
  • 14. 15 Duplication prohibited 19.06.2013 Versant on the other hand is all about dealing with really complicated problem domains. The class diagram above just shows a few classes. Typical applications have hundreds, even thousands of classes. 15 Copyright © 2013 Actian Corporation 15
  • 15. A picture can explain the complexity better. This is actually a map of the Schema – SID Shared Information and Data model Deep inheritance – sometimes 15 levels or more. Collections all over, most of them are polymorphic. 16
  • 16. With those typical use cases in mind, lets see how these technologies approach the data management problem. 17
  • 17. Database’s share some common structures when viewed at a high level. The common elements come from the fact that they are solving the same problems utilizing different means or with a different focus. But common structures vary greatly in their implementation and tradeoffs that make one system excel at fast execution of ad hoc query or the navigation of a complex telecommunications network. 18
  • 18. The data models employed by these two systems again have some similarities albeit different naming conventions and a few wrinkles in how their respective schema is defined. Both systems support the basic data types: chars, ints, floats, strings with minor variances on width. In both systems, these basic types are used to compose more complex structures: tables or classes which on the surface look pretty similar. The Vectorwise data model is based on the SQL standard and supports most of the SQL types. Data definition language (DDL) and data manipulation language (DML) is SQL. SQL is used to create table definitions, insert or update, or delete. We won't be going into the SQL details here as most people are familiar with the model, but lets compare it to the Versant model, because here we see some major differences. 19
  • 19. Where the two data models diverge is seen in the object database's need to support abstractions commonly found in the object oriented programming languages, these concepts include: pointers, type inheritance, and collections. This doesn't imply that these concepts can't be expressed in a RDB like Vectorwise, in fact ORM tools like JPA or Hibernate help manage persistence problem by hiding RDM nature and SQL from the application developer. However this hiding isn't without considerable cost in operational friction, also known as impedance mismatch in the OODB literature. 20
  • 20. We see here the central SQL focus for Vectorwise. 21
  • 21. With Versant, we see the application client built with object management resources: cache, transaction manager, and transport over the network. Part of the friction comes from dealing with OO concepts mentioned above. Versant backend supports these abstractions innately, and is best understood with an example. 22
  • 22. 23
  • 23. 24 Duplication prohibited 19.06.2013 Our Versant Object Database Server together with the respective client API store the objects, instances of application classes, directly in database storage. Typically, objects have references to other objects, of varying types – base class or interface types. Once stored, this network of objects, or any part of it, can be retrieved later by queries, followed by navigation across object references in the respective language. Only the objects accessed during a transaction are loaded into the client side cache. Once a method is called on an object reference of a not yet loaded object that object is retrieved from the server doing a lookup based on its type independent logical object id. 24 Copyright © 2013 Actian Corporation 24
  • 24. 25 Duplication prohibited 19.06.2013 The persistent capable class model of the application corresponds to the schema of the database. persistent capable classes are marked in the source code, or get listed in a configuration file. Our tools read this information and generate the additional code that connect simple classes to our database system. We add the enhancer step. The enhancer takes the byte code of the application classes and adds the code that makes classes persistent capable, and persistent aware, respectively. In the source code above, the lines are marked that create a database connection, and control a transaction. Please note that only the Employee instance is made persistent explicitly. But because the Department and Phone instances are reachable from the Employee instance they are made persistent as well. This is ‘Persistence by reachability’. For the example, we'll use Java's defacto persistence standard JPA, as our database binding language. With JPA we can highlight Versant's implementation details behind OO language abstractions. The DDL and DML for Versant is Java and the JPA API. This is truly a NO SQL interface to the database. 25 Copyright © 2013 Actian Corporation 25
  • 25. Annotations within the Java code coupled with an added compilation step to extract the schema and give the Java application a direct line into the database. With JPA, the persistent class's byte code is modified to support change tracking, data marshaling, cascading persistence, and on demand object loading logic. Annotations indicate what classes are destined for the database and support the nuances of how attributes should be stored. Interestingly, with V/JPA, you need far fewer attribute annotations because the database better understands OO concepts like inheritance and collections. 25
  • 26. 26 Duplication prohibited 19.06.2013 Change Tracking - We know, which objects were modified in the current transaction, and we store them at commit Transparent lazy loading Per default, objects get only loaded once they are de-referenced – a method is called on them Persistence by Reachability New objects get stored, if they are reachable by any already persisted object. Only the root object of a network of objects needs to persisted explicitly. JPA is an ORM tool, JSR 220 was principally the work of the RDB community to eliminate the development friction found when using Java and JDBC to store complicated object models. Hiding the persistence implementation from the developer, leads to more consistent and simpler programming for the developer. Object Relational Mapping details are needed, and many of the JPA annotations are used to identify special handling required for mapping the class into one or more tables. Versant has adopted JPA as the latest binding on top of its object database. Because of the inherent treatment of OO many mapping annotations aren't required because of the back end's understanding of the object model. 26 Copyright © 2013 Actian Corporation 26
  • 27. 27
  • 28. Communications Communications for both these systems is similar, a Java application for Vectorwise would use JDBC to query and return data sets, which could then be used to construct the objects if required by the application's object model. A JPA O/RM layer could be used here to hide dataset to object translations if desired, but that isn't really Vectorwise's nature, a more typical use would be a BI application accessing the contents. Versant JPA uses an internal protocol built with RPC against the object server to load or update objects within the JPA programming interface. Objects are marshaled in a binary form and instantiated in the JVM for use by the application. In some cases, in complete objects, hollow objects, are created inside the VM, but the lazy loading protocol ensures they will be fully loaded prior to use by the application. 28
  • 29. Transactions are central to the operation of both systems. They are the means through which all data flows in and out of the server. Data creation, updates, deletions, and even the schema manipulation itself is bounded by a transaction. In 1983, Reuter & Harder coined the term ACID1 to describe transactions. Both Versant and ParAccel are ACID databases, however they go about it through different mechanisms. This brings us to our next comparison, locking and versioning. 1Haerder, T.; Reuter, A. (1983). "Principles of transaction-oriented database recovery". ACM Computing Surveys 15 (4): 287. doi:10.1145/289.291 29
  • 30. Locking vs Multiversioning Versant uses a 2-phase locking protocol which gathers locks on all the objects being used to ensure no two transactions are attempting to write to the same data (object). This is mechanized with a locking table and transaction graph. Shared or read locks are collected as the transactions work with data. They are then followed up with update (semi-exclusive) or write (exclusive) locks when the transaction attempts changing the data. Deadlocks are detected, as well as a timeout to prevent a transaction from waiting forever. With this approach, updates are done in place on the existing data. Very likely the same physical pages in memory and disk are updated as the object was read from. The locks ensure transaction serialization. I should mention that Versant supports both a pessimistic and optimistic locking schemes. Even optimistic locking uses the read and write locks temporarily as objects are read or the transaction commits. --The counter part in Vectorwise is a multiversioning concurrency control (MVCC) system whereby each transaction sees a consistent database at a given point in time – a snapshot controlled by the transaction ID. A given transaction won't see a half-completed transaction operating on the same data because other transactions doesn't overwrite the original data, they create a new version with a later transaction-ID to prevent contaminating earlier transactions. No locks or wait graphs need be maintained. Deleted and updated entries 30
  • 31. need to be purged if space is a concern. We have two different means of managing concurrency and serialization of transactions. The Versant method is historically similar to RDBMS which support row and table locking. Vectorwise's MVCC increases throughput at the expense of data growth and needed propagation events. If you require strict serialization of transactions, or want to limit growth, the locking model will suit your needs. If analytic speed and concurrent read concerns are your core concern, the MVCC will be faster, at the possible cost of stale data. We are starting to see why Vectorwise is used for analytic, read-heavy reporting and Versant finds itself used for operational processing. 30
  • 32. One major difference found between these technologies is in how they physically store data both on the disk and in memory. Of particular interest to me is the Vectorwise's columnar approach, it is designed for pure analytical efficiency. In contrast to the underlying storage model used by Versant which is similar to what is found in many database systems. Versant model older design, N-ary Storage Model, but there are some interesting tricks it uses to optimize performance for networked object graphs. Common in most database storage system are the concepts of volumes and pages. A volume is a collection of pages and Versant can have as many volumes as need for the database. A volume is mapped to a file and can be located on anything from raw devices to storage area network (SAN) drives. [DeWitt] [Zukowski] NSM = N-ary Storage Model - row contains all columns DSM – Decomposed Storage model = N attributes into N vertical storage elements PAX = Partition Attributes Across = multiple columns stored on a page, but attributes stored vertically Vectorwise Block size must be set prior to table creation. 31
  • 33. Versant does allow for variability on the page, multiple types of objects or variable length structures. The min/max stats help reduce the columnar blocks that need be evaluated for a query. 32
  • 34. Compression of data both on disk and in RAM reduces the IO bottlenecks that large data systems confront today. By decompressing into the CPU’s cache VW takes advantage of the Processor IO. Column structure works really well for compression. Similar data is grouped together allowing VW to pick an optimal compression strategy. Here optimal is not just storage density, but also ease of decompression into the CPU cache for later processing. 33
  • 35. 34
  • 36. Although Versant uses a traditional layout, where objects get located on a given data page, there are some tricks it uses to efficient locate connected object. Common in most database storage system are the concepts of volumes and pages. A volume is a collection of pages and Versant can have as many volumes as need for the database. A volume is mapped to a file and can be located on anything from raw devices to storage area network (SAN) drives. Pages are further broken down to slots used to store object instances. Multiple object instances may stored on a page and accessed through the object's slot location. Larger objects will span contiguous pages. Page size in Versant is modest 16K bytes; this is often large enough hold many objects and still small enough not to waste too much space with deleted objects. Normally, objects of the same type get stored in the same page on next available slot, but as an optimization, it is possible to co-locate a parent and its children on the same page. This extra effort results in a extremely efficient object loading when the parent is used with its children frequently. 35
  • 37. 36 Duplication prohibited 19.06.2013 The LOID is used to identify an object and represent references, but how it is used to locate an object internally? Central to accessing any object is the Address Translation of the LOID to a physical volume:page:slot triple. This triple identifies the objects location on the disk and is accomplished by a multi-level hash table. It is highly optimized and cached in memory since it is used for accessing every object. Client side, the red object is already loaded in the client cache. It contiains references to two other objects of grey color that are not yet loaded. If the application now calls a method on any of these, then the Loid of this object is looked up in the client side hash table. It has no address, so the Loid is send to the server, where a lookup is done in the Association Table (AT). That lookup provides the physical object location in the respective data volume. The physical page is loaded into the server cache and the object is sent back to the client and instantiated in client memory. 36 Copyright © 2013 Actian Corporation 36
  • 38. One final point about the LOID in Versant. LOID references are designed to be crossdatabase references. Here we have an application using 4 objects, but they come from two different databases. This give the application designer great flexibility in deciding how to partition data. The application simply connects to all the databases involved for the cross-db references. Transaction use a 2-phase commit protocol. 37
  • 39. What good is a database without a means to find answers to our particular questions or efficiently service an application's demand for data. Like the other components we've looked at there are some similarities between these two technologies, but also some big difference. Indexes in Vectorwise are typically not needed. Often, VW is setup so the compressed DB lives entirely in memory and the auto-page indexes the redesigned query engine are enough that scanning the data without indexing performs well enough that no index tuning is required. Versant on the other hand allows nearly any attribute or even collection to be indexed. Versant’s query engine will then use the index automatically or with hints supplied by the user. 38
  • 40. Circa 2003 SQL vs C benchmark on the TPC-H This difference between the database and the custom C program is huge… why is the overhead of using a database so high, what’s being left on the table? This difference started the X100 project to try to reclaim the 100 times loss in performance. We’ve seen the storage model change for VW, but lets look further for the query processing. 39
  • 41. Each level of the data handling was studied for performance loss. Compiler optimization are easier to take advantage of in smaller units. Often don’t get fully exploited in large programs. Modern CPU have better instruction sets and larger chip caches which can be used for vector processing. 40
  • 42. Results of the work some 40x improvement. 41
  • 43. The work on Ingres is very critical to Vectorwise (ParAccel SMP) as the main interface to both Ingres and Vectorwise are the same. It is not until the Optimizer which processes the SQL query and generates the x100 algebra that the two components separate. After generating the result set from VW it is the Ingres components that make this available to the application. Aliamaki, DeWitt, Hill, Skounakis – Weaving Relations for Cache Performance NSM = N-ary Storage Model - row contains all columns DSM – Decomposed Storage model = N attributes into N vertical storage elements PAX = Partition Attributes Across = multiple columns stored on a page, but attributes stored vertically 42
  • 44. Taking all the performance features into account for VW query processing. This is great for Reporting where you data isn’t changing frequently. 43
  • 45. With Versant, queries are typically used to locate the beginning of a graph or top level objects. Once the starting point is identified, the connected objects are frequently retried by the application as required (lazy loading) or automatically with a default fetching. The group loading saves round trips to the server and is much more efficient on the network. On the Versant side, query is done via OQL or JPQL. This example is JPQL. The Book has a simple collection Authors and we want to find an Author of “Smith”. Notice the syntax is a little SQL like. But we directly operate on the collection Book.authors, using “auth” as a working variable. On execution, the Book extent would be searched for all the books with a Smith author. This would end up scanning all the books and evaluating the Authors collections, returning the object ids for the matching books. ResultList holds the objects and the rest of the Java program would process that list. 44
  • 46. The thing about relationships is they don’t change often. By baking them into the server’s data structure and making them cheap to evaluate, Versant avoids join operations which can be quite costly. IF you look at typical ORM code, you see a fair amount of join activity whenever collection classes are involve. Following a few links down a list can end up with a very expensive group of joins. Where as managing the references with LOID allow for direct navigation to the object. The server takes advantage of this in query expressions that involve paths or collections like the example. 45
  • 47. Closing Comments This brings us to the end of our tale and hope you enjoyed our time together as much as I did. Each of the components we've examined should have given you insight into the design and tradeoff made by the different engineering teams. When taken as a whole they provide consistent powerful framework for solving hard real world problems. Each of these products has thousands of users which rely on their respective products for business critical applications. The engineers who built those applications made strategic choices for the data management system at the heart of their project. 46
  • 48. 47