SlideShare une entreprise Scribd logo
1  sur  43
April 10-12 | Chicago, IL
NoSQL: An Analysis
Andrew J. Brust, Founder and CEO, Blue Badge Insights
April 10-12 | Chicago, IL
Please silence
cell phones
Meet Andrew
CEO and Founder, Blue Badge Insights
Big Data blogger for ZDNet
Microsoft Regional Director, MVP
Co-chair VSLive! and 17 years as a speaker
Founder, Microsoft BI User Group of NYC
• http://www.msbinyc.com
Co-moderator, NYC .NET Developers Group
• http://www.nycdotnetdev.com
“Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News
brustblog.com, Twitter: @andrewbrust
3
Andrew’s New Blog (bit.ly/bigondata)
Read all about it!
Agenda
Why NoSQL?
Concepts
NoSQL Categories
Provisioning, market, applicability
Take-aways
Why NoSQL?
NoSQL Data Fodder
Addresses Preferences
Notes
Friends,
Followers
Documents
“Web Scale”
This the term used to justify NoSQL
Scenario is simple needs but “made up for in
volume”
• Millions of concurrent users
Think of sites like Amazon or Google
Think of non-transactional tasks like loading
catalog data to display product page, or
environment preferences
NoSQL Common Traits
Non-relational
Non-schematized/schema-free
Open source
Distributed
Eventual consistency
“Web scale”
Developed at big Internet companies
CONCEPTS
Consistency
CAP Theorem
• Databases may only excel at two of the following three attributes:
consistency, availability and partition tolerance
NoSQL does not offer “ACID” guarantees
• Atomicity, consistency, isolation and durability
Instead offers “eventual consistency”
Similar to DNS propagation
Things like inventory, account balances should be consistent
• Imagine updating a server in Seattle that stock was depleted
• Imagine not updating the server in NY
• Customer in NY goes to order 50 pieces of the item
• Order processed even though no stock
Things like catalog information don’t have to be, at least not immediately
• If a new item is entered into the catalog, it’s OK for some customers to see it
even before the other customers’ server knows about it
But catalog info must come up quickly
• Therefore don’t lock data in one location while waiting to update the other
Therefore, OK to sacrifice consistency for speed, in some cases
Consistency
CAP Theorem
Consistency
Availability
Partition
Tolerance
Relational
NoSQL
Indexing
Most NoSQL databases are indexed by key
Some allow so-called “secondary” indexes
Often the primary key indexes are clustered
HBase uses HDFS (the Hadoop Distributed File System), which is
append-only
• Writes are logged
• Logged writes are batched
• File is re-created and sorted
Queries
Typically no query language
Instead, create procedural program
Sometimes SQL is supported
Sometimes MapReduce code is used…
MapReduce
This is not Hadoop’s MapReduce, but it’s conceptually related
Map step: pre-processes data
Reduce step: summarizes/aggregates data
Will show a MapReduce code sample for Mongo soon
Will demo map code on CouchDB
Sharding
A partitioning pattern where separate servers store partitions
Fan-out queries supported
Partitions may be duplicated, so replication also provided
• Good for disaster recovery
Since “shards” can be geographically distributed, sharding can act like a
CDN
Good for keeping data close to processing
• Reduces network traffic when MapReduce splitting takes place
NOSQL CATEGORIES
Key-Value Stores
The most common; not necessarily the most popular
Has rows, each with something like a big dictionary/associative array
• Schema may differ from row to row
Common on cloud platforms
• e.g. Amazon SimpleDB, Azure Table Storage
MemcacheDB, Voldemort, Couchbase, DynamoDB (AWS), Dynomite,
Redis and Riak
20
Key-Value Stores
Table: Customers
Row ID: 101
First_Name: Andrew
Last_Name: Brust
Address: 123 Main Street
Last_Order: 1501
Row ID: 202
First_Name: Jane
Last_Name: Doe
Address: 321 Elm Street
Last_Order: 1502
Table: Orders
Row ID: 1501
Price: 300 USD
Item1: 52134
Item2: 24457
Row ID: 1502
Price: 2500 GBP
Item1: 98456
Item2: 59428
Database
Wide Column Stores
Has tables with declared column families
• Each column family has “columns” which are KV pairs that can vary from row to row
These are the most foundational for large sites
• BigTable (Google)
• HBase (Originally part of Yahoo-dominated Hadoop project)
• Cassandra (Facebook)
• Calls column families “super columns” and tables “super column families”
They are the most “Big Data”-ready
• Especially HBase + Hadoop
Table: Customers
Row ID: 101
Super Column: Name
Column: First_Name:
Andrew
Column: Last_Name: Brust
Super Column: Address
Column: Number: 123
Column: Street: Main Street
Super Column: Orders
Column: Last_Order: 1501
Table: Orders
Row ID: 1501
Super Column: Pricing
Column: Price: 300
USD
Super Column: Items
Column: Item1: 52134
Column: Item2: 24457
Row ID: 1502
Super Column: Pricing
Column: Price: 2500
GBP
Super Column: Items
Column: Item1: 98456
Column: Item2: 59428
Row ID: 202
Super Column: Name
Column: First_Name: Jane
Column: Last_Name: Doe
Super Column: Address
Column: Number: 321
Column: Street: Elm Street
Super Column: Orders
Column: Last_Order: 1502
Wide Column Stores
April 10-12 | Chicago, IL
Demo
Wide Column Stores
Document Stores
Have “databases,” which are akin to tables
Have “documents,” akin to rows
• Documents are typically JSON objects
• Each document has properties and values
• Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. contained
JSON objects - Allows for hierarchical storage)
• Can have attachments as well
Old versions are retained
• So Doc Stores work well for content management
Some view doc stores as specialized KV stores
Most popular with developers, startups, VCs
The biggies:
• CouchDB
• Derivatives
• MongoDB
Document Store Application Orientation
Documents can each be addressed by URIs
CouchDB supports full REST interface
Very geared towards JavaScript and JSON
• Documents are JSON objects
• CouchDB/MongoDB use JavaScript as native language
In CouchDB, “view functions” also have unique URIs and they return
HTML
• So you can build entire applications in the database
Database: Customers
Document ID: 101
First_Name: Andrew
Last_Name: Brust
Address:
Orders:
Database: Orders
Document ID: 1501
Price: 300 USD
Item1: 52134
Item2: 24457
Document ID: 1502
Price: 2500 GBP
Item1: 98456
Item2: 59428
Number: 123
Street: Main Street
Most_recent: 1501
Document ID: 202
First_Name: Jane
Last_Name: Doe
Address:
Orders:
Number: 321
Street: Elm Street
Most_recent: 1502
Document Stores
April 10-12 | Chicago, IL
Demo
Document Stores
Graph Databases
Great for social network applications and others where relationships are
important
Nodes and edges
• Edge like a join
• Nodes like rows in a table
Nodes can also have properties and values
Neo4j is a popular graph db
Database
Sent invitation
to
Commented on
photo by
Friend
of
Address
Placed order
Item
2
Item
1
Joe Smith Jane
Doe
Andrew Brust
Street: 123 Main
Street
City: New York
State: NY
Zip: 10014
ID: 52134
Type: Dress
Color: Blue
ID: 24457
Type: Shirt
Color: Red
ID: 252
Total Price: 300
USD
George Washington
Graph Databases
PROVISIONING, MARKET, APPLICABILITY
NoSQL + BI
NoSQL databases are bad for ad hoc query and data warehousing
BI applications involve models; models rely on schema
Extract, transform and load (ETL) may be your friend
Wide-column stores, however are good for “Big Data”
• See next slide
Wide-column stores and column-oriented databases are similar
technologically
NoSQL + Big Data
Big Data and NoSQL are interrelated
Typically, Wide-Column stores used in Big Data scenarios
Prime example:
• HBase and Hadoop
Why?
• Lack of indexing not a problem
• Consistency not an issue
• Fast reads very important
• Distributed file systems important too
• Commodity hardware and disk assumptions also important
• Not Web scale but massive scale-out, so similar concerns
Going “NoSQL-Like” on the MS Cloud
Azure Table Storage (a key-value store)
SQL Azure XML columns (supports variable schema, hierarchy)
SQL Azure Federation (a sharding implementation)
OData (HTTP/JSON data APIs)
Running NoSQL database products using Azure VMs…
34
NoSQL on Windows Azure
Platform as a Service
• Cloudant: https://cloudant.com/azure/
• MongoDB (via MongoLab): http://blog.mongolab.com/2012/10/azure/
MongoDB, DIY:
• On an Azure Worker Role:
http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+Worker+Roles
• On a Windows VM:
http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Windows+Installer
• On a Linux VM:
http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Linux+Tutorial
http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-
vm/
NoSQL on Windows Azure
Others, DIY (Linux VMs):
• Couchbase:
http://blog.couchbase.com/couchbase-server-new-windows-azure
• CouchDB: http://ossonazure.interoperabilitybridges.com/articles/couchdb-
installer-for-windows-azure
• Riak:
http://basho.com/blog/technical/2012/10/09/Riak-on-Microsoft-Azure/
• Redis: http://blogs.msdn.com/b/tconte/archive/2012/06/08/running-redis-
on-a-centos-linux-vm-in-windows-azure.aspx
• Cassandra: http://www.windowsazure.com/en-us/manage/linux/other-
resources/how-to-run-cassandra-with-linux/
And With MS On-Premise Technologies
SQL Server 2008/2008R2/2012 “Beyond Relational” Features
• Sparse columns (like Wide Column Stores)
• Geospatial (geometry, geography data types)
• FILESTREAM, FileTable (like Document Store attachments)
• Full Text Search, Semantic Similarity Search
• HierarchyID (can simulate Graph Database functionality)
SQL Server Parallel Data Warehouse Edition (PDW)
• Distributed architecture (like MapReduce/Hadoop)
• PolyBase in PDW v2 (interfaces PDW and HDFS)
37
TAKE-AWAYS
Compromises
Eventual consistency
Write buffering
Only primary keys can be indexed
Queries must be written as programs
Tooling
• Productivity (= money)
Summing Up
• Line of Business -> Relational
• Large, public (consumer)-facing sites -> NoSQL
• Complex data structures -> Relational
• Big Data -> NoSQL
• Transactional -> Relational
• Content Management -> NoSQL
• Enterprise->Relational
• Consumer Web -> NoSQL
Thank you
• andrew.brust@bluebadgeinsights.com
• @andrewbrust on twitter
• Want to get on Blue Badge Insights’ list?”
Text “bluebadge” to 22828
Win a Microsoft Surface Pro!
Complete an online SESSION EVALUATION
to be entered into the draw.
Draw closes April 12, 11:59pm CT
Winners will be announced on the PASS BA
Conference website and on Twitter.
Go to passbaconference.com/evals or follow the QR code link displayed on
session signage throughout the conference venue.
Your feedback is important and valuable. All feedback will be used to improve
and select sessions for future events.
April 10-12, Chicago, IL
Thank you!
Diamond Sponsor Platinum Sponsor

Contenu connexe

Tendances

Big Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandBig Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandAndrew Brust
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisAndrew Brust
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooAndrew Brust
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms Andrew Brust
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012Andrew Brust
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedAnant Kumar
 

Tendances (20)

Big Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandBig Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-Land
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Rdbms vs. no sql
Rdbms vs. no sqlRdbms vs. no sql
Rdbms vs. no sql
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
 

En vedette

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabsAndrew Brust
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
Town of Ladysmith Economic Development Plan 2013
Town of Ladysmith Economic Development Plan 2013Town of Ladysmith Economic Development Plan 2013
Town of Ladysmith Economic Development Plan 2013ladysmithdowntown
 
NoSQL and SQL Databases
NoSQL and SQL DatabasesNoSQL and SQL Databases
NoSQL and SQL DatabasesGaurav Paliwal
 
No SQL Databases (a thorough analysis)
No SQL Databases (a thorough analysis)No SQL Databases (a thorough analysis)
No SQL Databases (a thorough analysis)catprasanna
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases MongoDB
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL Databases for Implementing Data Services – Should I Care?NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL Databases for Implementing Data Services – Should I Care?Guido Schmutz
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDsDean Chen
 
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Dave Segleau
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and consFabio Fumarola
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 

En vedette (20)

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabs
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Town of Ladysmith Economic Development Plan 2013
Town of Ladysmith Economic Development Plan 2013Town of Ladysmith Economic Development Plan 2013
Town of Ladysmith Economic Development Plan 2013
 
NoSQL and SQL Databases
NoSQL and SQL DatabasesNoSQL and SQL Databases
NoSQL and SQL Databases
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
No SQL Databases (a thorough analysis)
No SQL Databases (a thorough analysis)No SQL Databases (a thorough analysis)
No SQL Databases (a thorough analysis)
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL Databases for Implementing Data Services – Should I Care?NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL Databases for Implementing Data Services – Should I Care?
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
 
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 

Similaire à NoSQL: An Analysis

SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database ChoicesLynn Langit
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesKyle Banerjee
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singhMayank Singh
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 
Framing the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLInside Analysis
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsDustin Vannoy
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQLMongoDB
 
Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Derek Jacoby
 

Similaire à NoSQL: An Analysis (20)

SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
mongodb_DS.pptx
mongodb_DS.pptxmongodb_DS.pptx
mongodb_DS.pptx
 
NoSQL
NoSQLNoSQL
NoSQL
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database Choices
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
Framing the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQL
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQL
 
Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8
 

Plus de Andrew Brust

Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionAndrew Brust
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012Andrew Brust
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataAndrew Brust
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmAndrew Brust
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Andrew Brust
 

Plus de Andrew Brust (6)

Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in Action
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch Paradigm
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis
 

Dernier

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Dernier (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

NoSQL: An Analysis

  • 1. April 10-12 | Chicago, IL NoSQL: An Analysis Andrew J. Brust, Founder and CEO, Blue Badge Insights
  • 2. April 10-12 | Chicago, IL Please silence cell phones
  • 3. Meet Andrew CEO and Founder, Blue Badge Insights Big Data blogger for ZDNet Microsoft Regional Director, MVP Co-chair VSLive! and 17 years as a speaker Founder, Microsoft BI User Group of NYC • http://www.msbinyc.com Co-moderator, NYC .NET Developers Group • http://www.nycdotnetdev.com “Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News brustblog.com, Twitter: @andrewbrust 3
  • 4. Andrew’s New Blog (bit.ly/bigondata)
  • 8. NoSQL Data Fodder Addresses Preferences Notes Friends, Followers Documents
  • 9. “Web Scale” This the term used to justify NoSQL Scenario is simple needs but “made up for in volume” • Millions of concurrent users Think of sites like Amazon or Google Think of non-transactional tasks like loading catalog data to display product page, or environment preferences
  • 10. NoSQL Common Traits Non-relational Non-schematized/schema-free Open source Distributed Eventual consistency “Web scale” Developed at big Internet companies
  • 12. Consistency CAP Theorem • Databases may only excel at two of the following three attributes: consistency, availability and partition tolerance NoSQL does not offer “ACID” guarantees • Atomicity, consistency, isolation and durability Instead offers “eventual consistency” Similar to DNS propagation
  • 13. Things like inventory, account balances should be consistent • Imagine updating a server in Seattle that stock was depleted • Imagine not updating the server in NY • Customer in NY goes to order 50 pieces of the item • Order processed even though no stock Things like catalog information don’t have to be, at least not immediately • If a new item is entered into the catalog, it’s OK for some customers to see it even before the other customers’ server knows about it But catalog info must come up quickly • Therefore don’t lock data in one location while waiting to update the other Therefore, OK to sacrifice consistency for speed, in some cases Consistency
  • 15. Indexing Most NoSQL databases are indexed by key Some allow so-called “secondary” indexes Often the primary key indexes are clustered HBase uses HDFS (the Hadoop Distributed File System), which is append-only • Writes are logged • Logged writes are batched • File is re-created and sorted
  • 16. Queries Typically no query language Instead, create procedural program Sometimes SQL is supported Sometimes MapReduce code is used…
  • 17. MapReduce This is not Hadoop’s MapReduce, but it’s conceptually related Map step: pre-processes data Reduce step: summarizes/aggregates data Will show a MapReduce code sample for Mongo soon Will demo map code on CouchDB
  • 18. Sharding A partitioning pattern where separate servers store partitions Fan-out queries supported Partitions may be duplicated, so replication also provided • Good for disaster recovery Since “shards” can be geographically distributed, sharding can act like a CDN Good for keeping data close to processing • Reduces network traffic when MapReduce splitting takes place
  • 20. Key-Value Stores The most common; not necessarily the most popular Has rows, each with something like a big dictionary/associative array • Schema may differ from row to row Common on cloud platforms • e.g. Amazon SimpleDB, Azure Table Storage MemcacheDB, Voldemort, Couchbase, DynamoDB (AWS), Dynomite, Redis and Riak 20
  • 21. Key-Value Stores Table: Customers Row ID: 101 First_Name: Andrew Last_Name: Brust Address: 123 Main Street Last_Order: 1501 Row ID: 202 First_Name: Jane Last_Name: Doe Address: 321 Elm Street Last_Order: 1502 Table: Orders Row ID: 1501 Price: 300 USD Item1: 52134 Item2: 24457 Row ID: 1502 Price: 2500 GBP Item1: 98456 Item2: 59428 Database
  • 22. Wide Column Stores Has tables with declared column families • Each column family has “columns” which are KV pairs that can vary from row to row These are the most foundational for large sites • BigTable (Google) • HBase (Originally part of Yahoo-dominated Hadoop project) • Cassandra (Facebook) • Calls column families “super columns” and tables “super column families” They are the most “Big Data”-ready • Especially HBase + Hadoop
  • 23. Table: Customers Row ID: 101 Super Column: Name Column: First_Name: Andrew Column: Last_Name: Brust Super Column: Address Column: Number: 123 Column: Street: Main Street Super Column: Orders Column: Last_Order: 1501 Table: Orders Row ID: 1501 Super Column: Pricing Column: Price: 300 USD Super Column: Items Column: Item1: 52134 Column: Item2: 24457 Row ID: 1502 Super Column: Pricing Column: Price: 2500 GBP Super Column: Items Column: Item1: 98456 Column: Item2: 59428 Row ID: 202 Super Column: Name Column: First_Name: Jane Column: Last_Name: Doe Super Column: Address Column: Number: 321 Column: Street: Elm Street Super Column: Orders Column: Last_Order: 1502 Wide Column Stores
  • 24. April 10-12 | Chicago, IL Demo Wide Column Stores
  • 25. Document Stores Have “databases,” which are akin to tables Have “documents,” akin to rows • Documents are typically JSON objects • Each document has properties and values • Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. contained JSON objects - Allows for hierarchical storage) • Can have attachments as well Old versions are retained • So Doc Stores work well for content management Some view doc stores as specialized KV stores Most popular with developers, startups, VCs The biggies: • CouchDB • Derivatives • MongoDB
  • 26. Document Store Application Orientation Documents can each be addressed by URIs CouchDB supports full REST interface Very geared towards JavaScript and JSON • Documents are JSON objects • CouchDB/MongoDB use JavaScript as native language In CouchDB, “view functions” also have unique URIs and they return HTML • So you can build entire applications in the database
  • 27. Database: Customers Document ID: 101 First_Name: Andrew Last_Name: Brust Address: Orders: Database: Orders Document ID: 1501 Price: 300 USD Item1: 52134 Item2: 24457 Document ID: 1502 Price: 2500 GBP Item1: 98456 Item2: 59428 Number: 123 Street: Main Street Most_recent: 1501 Document ID: 202 First_Name: Jane Last_Name: Doe Address: Orders: Number: 321 Street: Elm Street Most_recent: 1502 Document Stores
  • 28. April 10-12 | Chicago, IL Demo Document Stores
  • 29. Graph Databases Great for social network applications and others where relationships are important Nodes and edges • Edge like a join • Nodes like rows in a table Nodes can also have properties and values Neo4j is a popular graph db
  • 30. Database Sent invitation to Commented on photo by Friend of Address Placed order Item 2 Item 1 Joe Smith Jane Doe Andrew Brust Street: 123 Main Street City: New York State: NY Zip: 10014 ID: 52134 Type: Dress Color: Blue ID: 24457 Type: Shirt Color: Red ID: 252 Total Price: 300 USD George Washington Graph Databases
  • 32. NoSQL + BI NoSQL databases are bad for ad hoc query and data warehousing BI applications involve models; models rely on schema Extract, transform and load (ETL) may be your friend Wide-column stores, however are good for “Big Data” • See next slide Wide-column stores and column-oriented databases are similar technologically
  • 33. NoSQL + Big Data Big Data and NoSQL are interrelated Typically, Wide-Column stores used in Big Data scenarios Prime example: • HBase and Hadoop Why? • Lack of indexing not a problem • Consistency not an issue • Fast reads very important • Distributed file systems important too • Commodity hardware and disk assumptions also important • Not Web scale but massive scale-out, so similar concerns
  • 34. Going “NoSQL-Like” on the MS Cloud Azure Table Storage (a key-value store) SQL Azure XML columns (supports variable schema, hierarchy) SQL Azure Federation (a sharding implementation) OData (HTTP/JSON data APIs) Running NoSQL database products using Azure VMs… 34
  • 35. NoSQL on Windows Azure Platform as a Service • Cloudant: https://cloudant.com/azure/ • MongoDB (via MongoLab): http://blog.mongolab.com/2012/10/azure/ MongoDB, DIY: • On an Azure Worker Role: http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+Worker+Roles • On a Windows VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Windows+Installer • On a Linux VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Linux+Tutorial http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux- vm/
  • 36. NoSQL on Windows Azure Others, DIY (Linux VMs): • Couchbase: http://blog.couchbase.com/couchbase-server-new-windows-azure • CouchDB: http://ossonazure.interoperabilitybridges.com/articles/couchdb- installer-for-windows-azure • Riak: http://basho.com/blog/technical/2012/10/09/Riak-on-Microsoft-Azure/ • Redis: http://blogs.msdn.com/b/tconte/archive/2012/06/08/running-redis- on-a-centos-linux-vm-in-windows-azure.aspx • Cassandra: http://www.windowsazure.com/en-us/manage/linux/other- resources/how-to-run-cassandra-with-linux/
  • 37. And With MS On-Premise Technologies SQL Server 2008/2008R2/2012 “Beyond Relational” Features • Sparse columns (like Wide Column Stores) • Geospatial (geometry, geography data types) • FILESTREAM, FileTable (like Document Store attachments) • Full Text Search, Semantic Similarity Search • HierarchyID (can simulate Graph Database functionality) SQL Server Parallel Data Warehouse Edition (PDW) • Distributed architecture (like MapReduce/Hadoop) • PolyBase in PDW v2 (interfaces PDW and HDFS) 37
  • 39. Compromises Eventual consistency Write buffering Only primary keys can be indexed Queries must be written as programs Tooling • Productivity (= money)
  • 40. Summing Up • Line of Business -> Relational • Large, public (consumer)-facing sites -> NoSQL • Complex data structures -> Relational • Big Data -> NoSQL • Transactional -> Relational • Content Management -> NoSQL • Enterprise->Relational • Consumer Web -> NoSQL
  • 41. Thank you • andrew.brust@bluebadgeinsights.com • @andrewbrust on twitter • Want to get on Blue Badge Insights’ list?” Text “bluebadge” to 22828
  • 42. Win a Microsoft Surface Pro! Complete an online SESSION EVALUATION to be entered into the draw. Draw closes April 12, 11:59pm CT Winners will be announced on the PASS BA Conference website and on Twitter. Go to passbaconference.com/evals or follow the QR code link displayed on session signage throughout the conference venue. Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events.
  • 43. April 10-12, Chicago, IL Thank you! Diamond Sponsor Platinum Sponsor

Notes de l'éditeur

  1. http://www.chegg.com/textbooks/foundations-of-sql-server-2008-r2-business-intelligence-2nd-edition-9781430233244-1430233249http://www.chegg.com/textbooks/smart-business-intelligence-solutions-with-microsoft-sql-server-2008-1st-edition-9780735625808-0735625808