Database Choices

Database Choices
@LynnLangit
May 2014 – Techorama

Databases Now -> a Menu of Choices

Why Change? ->”Small” Big Data
Your data -
BEHAVIORAL
Your data -
TRANSACTIONAL
PUBLIC data
PREMIUM
data

Current Data Questions
• “Should we evaluate Hadoop?”
• “How much data is Big Data?”
• “What are the limits of SQL Server?”
• “Which NoSQL databases (if any) should we consider?”
• “How safe is the cloud really?”
• “How do we mine the data for usable information?”

6
DEMO - About Open Source
• Free • Not Free
 Rapid iteration, innovation
 Can start up for free (on premise)
 Can ‘rent’ for cheap or free on the cloud
 Can use with the command line for free
 Some vendors offer free online training
 Ex. www.neo4j.org
 Constant releases
 Can be deceptively hard to set up (time is
money)
 Don’t forget to turn it off if on the cloud!
 GUI tools, support, training cost $$$
 Ex. www.neo4j.com

Database Choices – The first level of choice
Data
A.
Hadoop
B. NoSQL
C.
Relational
On Premise or In the Cloud

How you ‘get’ Hadoop
•roll your own
A. Open source
•Cloudera
•MapR
•Hortonworks
•More…
B. Commercial distribution
•AWS
•HDInsight
C. Rent it via the cloud

11
Demo - Cloudera Hadoop Enterprise

Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes and greater
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response
Time
Can be near immediate Has latency (due to batch processing)

15
Database Choices
On Premise
• RDBMS
• NoSQL
• Hadoop
In Cloud
• RDBMS
• NoSQL
• Hadoop

An Aside…SQL Server 2012++ ‘NoSQL’
• SQL Server 2012 Columnstore Index
• SQL Server 2012 Tabular Model (SSAS)
2012 2014
SSAS Tabular Models X X
NC Columnstore Index X X
Clustered (writable)
Columnstore Index
X
In-memory OLTP X

But wait…
is there a
RELATIONAL database
that scales,
that is cheap,
that runs in the cloud?

DEMO - AWS Redshift
• About $1k per Terabyte per year - relational

So many NoSQL options
• More than just the Elephant in the room
• Over 150+ types of NoSQL databases

Flavors of NoSQL
Key/Value
Volatile
Key/value
Persistent
Wide-Column Document Graph

Key / Value Database
• Just keys and values
– No schema
• Persistent or Volatile
• Examples
– AWS Dynamo DB
– Riak

DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud

File (BLOB) Storage Buckets in the Cloud
• Amazon – S3 or Glacier
• Google – Cloud Storage
• Microsoft Azure BLOBS

DEMO - Battle of the Buckets
• Google Cloud Storage VS.
• Windows Azure BLOBS VS.
• AWS S3  (Archiving) in to AWS Glacier

Column Database
• Wide, sparse column sets
• Schema-light
• Examples:
– HBase w/Hadoop
– Google Cloud Datastore
– SQL Server Columnstore Indexes or SSAS Tabular
Models

Types of Column Databases
• Column-families
– Non-relational
– Sparse
– Examples:
• HBase
• Cassandra
• xVelocity (SQL 2012 Tabular)
• Column-stores
– Relational
– Dense
– Example:
• SQL Server 2012 Columnstore index

DEMO – Google Cloud Datastore

DEMO – SQL Server ‘NoSQL’
• SQL Server Columnstore Index
• SQL Server SSAS Tabular Model

Document Database
• document-oriented (collection of
JSON documents) w/semi structured
data
– Encodings include BSON, JSON,
XML…
• binary forms
– PDF, Microsoft Office documents --
Word, Excel…)
• Examples:
– MongoDB
– Couchbase

Graph Databases
• a lot of many-to-many relationships
• recursive self-joins
• when your primary objective is quickly finding
connections, patterns and relationships
between the objects within lots of data
• Examples:
– Neo4j
– AlgebraixData
– Google Freebase

Cloud-hosted, partially managed RDBMS
• AWS RDS
– SQL Server
– MySQL
– PostgreSQL
– Oracle
• Google
– MySQL
• Microsoft
– SQLAzure

DEMO - AWS RDS
• SQL Server, MySQL or Oracle
• Essential to understand pricing models

NoSQL Applied
Log Files
•Columnstore
•HBase
Product
Catalogs
•Key/Value
•DynamoDB
Social Games
•Document
•MongoDB
Social
aggregators
•Graph
•Neo4j
Line-of-
Business
•RDBMS
•SQL Server

Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
Managed RDBMS RDS – all major RDBMS Cloud SQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables
Streaming or ML Kinesis Prospective Search &
Prediction API
StreamInsight
NoSQL Document or Graph MongoDB on EC2
Neo4j on EC2
None
Freebase
MongoDB on Microsoft Cloud
Neo4j on Microsoft Cloud
Hadoop (HBase) Elastic MapReduce (S3 & EC2) None HDInsight
Dremel/Warehousing RedShift BigQuery None
Cloud ETL Data Pipelines None None

But wait…
how do I query
NoSQL data?

Example – translate ANSI SQL to MapReduce

Can Excel help?
Connector to
Hadoop
Power BI
Data Quality
Services
Master Data
Services
Integration
with Azure
Data Market
Data Mining
w/Predixion

NoSQL To-Do List
Understand types of NoSQL databases
• Use NoSQL when business needs designate
• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud
• Quick and cheap for behavioral data
• Mashup cloud datasets
• Good for specialized use cases, i.e. dev, test , training environments
Learn NoSQL access technologies & services
• New query languages, i.e. MapReduce, R, Infer.NET
• New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel
connectors, etc…
• Windows Azure Data Market, other public data markets

www.TeachingKidsProgramming.org
• Free Courseware (Java, Small Basic or C# [on Pluralsight])
• Do a Recipe  Teach a Kid (Ages 10 ++)
• recipes)

43
A Big Thank You To Our Sponsors
Gold Partners
Silver & Track Partners
Platinum Partners

Database Choices

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Database Choices

Similaire à Database Choices (20)

Plus de Lynn Langit

Plus de Lynn Langit (20)

Dernier

Dernier (20)

Database Choices

Notes de l'éditeur