SlideShare une entreprise Scribd logo
1  sur  43
Database Choices
@LynnLangit
May 2014 – Techorama
Databases Now -> a Menu of Choices
Why Change? ->”Small” Big Data
Your data -
BEHAVIORAL
Your data -
TRANSACTIONAL
PUBLIC data
PREMIUM
data
Current Data Questions
• “Should we evaluate Hadoop?”
• “How much data is Big Data?”
• “What are the limits of SQL Server?”
• “Which NoSQL databases (if any) should we consider?”
• “How safe is the cloud really?”
• “How do we mine the data for usable information?”
5
6
DEMO - About Open Source
• Free • Not Free
 Rapid iteration, innovation
 Can start up for free (on premise)
 Can ‘rent’ for cheap or free on the cloud
 Can use with the command line for free
 Some vendors offer free online training
 Ex. www.neo4j.org
 Constant releases
 Can be deceptively hard to set up (time is
money)
 Don’t forget to turn it off if on the cloud!
 GUI tools, support, training cost $$$
 Ex. www.neo4j.com
Database Choices – The first level of choice
Data
A.
Hadoop
B. NoSQL
C.
Relational
On Premise or In the Cloud
Working with Hadoop
About Hadoop MapReduce
HDFS
How you ‘get’ Hadoop
•roll your own
A. Open source
•Cloudera
•MapR
•Hortonworks
•More…
B. Commercial distribution
•AWS
•HDInsight
C. Rent it via the cloud
11
Demo - Cloudera Hadoop Enterprise
Demo – AWS MapReduce
Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes and greater
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response
Time
Can be near immediate Has latency (due to batch processing)
15
Database Choices
On Premise
• RDBMS
• NoSQL
• Hadoop
In Cloud
• RDBMS
• NoSQL
• Hadoop
An Aside…SQL Server 2012++ ‘NoSQL’
• SQL Server 2012 Columnstore Index
• SQL Server 2012 Tabular Model (SSAS)
2012 2014
SSAS Tabular Models X X
NC Columnstore Index X X
Clustered (writable)
Columnstore Index
X
In-memory OLTP X
But wait…
is there a
RELATIONAL database
that scales,
that is cheap,
that runs in the cloud?
DEMO - AWS Redshift
• About $1k per Terabyte per year - relational
So many NoSQL options
• More than just the Elephant in the room
• Over 150+ types of NoSQL databases
Flavors of NoSQL
Key/Value
Volatile
Key/value
Persistent
Wide-Column Document Graph
Key / Value Database
• Just keys and values
– No schema
• Persistent or Volatile
• Examples
– AWS Dynamo DB
– Riak
DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud
File (BLOB) Storage Buckets in the Cloud
• Amazon – S3 or Glacier
• Google – Cloud Storage
• Microsoft Azure BLOBS
DEMO - Battle of the Buckets
• Google Cloud Storage VS.
• Windows Azure BLOBS VS.
• AWS S3  (Archiving) in to AWS Glacier
Column Database
• Wide, sparse column sets
• Schema-light
• Examples:
– HBase w/Hadoop
– Google Cloud Datastore
– SQL Server Columnstore Indexes or SSAS Tabular
Models
Types of Column Databases
• Column-families
– Non-relational
– Sparse
– Examples:
• HBase
• Cassandra
• xVelocity (SQL 2012 Tabular)
• Column-stores
– Relational
– Dense
– Example:
• SQL Server 2012 Columnstore index
DEMO – Google Cloud Datastore
DEMO – SQL Server ‘NoSQL’
• SQL Server Columnstore Index
• SQL Server SSAS Tabular Model
Document Database
• document-oriented (collection of
JSON documents) w/semi structured
data
– Encodings include BSON, JSON,
XML…
• binary forms
– PDF, Microsoft Office documents --
Word, Excel…)
• Examples:
– MongoDB
– Couchbase
Demo - MongoDB
Graph Databases
• a lot of many-to-many relationships
• recursive self-joins
• when your primary objective is quickly finding
connections, patterns and relationships
between the objects within lots of data
• Examples:
– Neo4j
– AlgebraixData
– Google Freebase
DEMO – Neo4J
Cloud-hosted, partially managed RDBMS
• AWS RDS
– SQL Server
– MySQL
– PostgreSQL
– Oracle
• Google
– MySQL
• Microsoft
– SQLAzure
DEMO - AWS RDS
• SQL Server, MySQL or Oracle
• Essential to understand pricing models
NoSQL Applied
Log Files
•Columnstore
•HBase
Product
Catalogs
•Key/Value
•DynamoDB
Social Games
•Document
•MongoDB
Social
aggregators
•Graph
•Neo4j
Line-of-
Business
•RDBMS
•SQL Server
Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
Managed RDBMS RDS – all major RDBMS Cloud SQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables
Streaming or ML Kinesis Prospective Search &
Prediction API
StreamInsight
NoSQL Document or Graph MongoDB on EC2
Neo4j on EC2
None
Freebase
MongoDB on Microsoft Cloud
Neo4j on Microsoft Cloud
Hadoop (HBase) Elastic MapReduce (S3 & EC2) None HDInsight
Dremel/Warehousing RedShift BigQuery None
Cloud ETL Data Pipelines None None
But wait…
how do I query
NoSQL data?
Example – translate ANSI SQL to MapReduce
Can Excel help?
Connector to
Hadoop
Power BI
Data Quality
Services
Master Data
Services
Integration
with Azure
Data Market
Data Mining
w/Predixion
Demo – Excel Power Query
NoSQL To-Do List
Understand types of NoSQL databases
• Use NoSQL when business needs designate
• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud
• Quick and cheap for behavioral data
• Mashup cloud datasets
• Good for specialized use cases, i.e. dev, test , training environments
Learn NoSQL access technologies & services
• New query languages, i.e. MapReduce, R, Infer.NET
• New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel
connectors, etc…
• Windows Azure Data Market, other public data markets
www.TeachingKidsProgramming.org
• Free Courseware (Java, Small Basic or C# [on Pluralsight])
• Do a Recipe  Teach a Kid (Ages 10 ++)
• recipes)
43
A Big Thank You To Our Sponsors
Gold Partners
Silver & Track Partners
Platinum Partners

Contenu connexe

Tendances

Machine Learning on the Microsoft Stack
Machine Learning on the Microsoft StackMachine Learning on the Microsoft Stack
Machine Learning on the Microsoft StackLynn Langit
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Fwdays
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseGrant Fritchey
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge DatabasesLynn Langit
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeBizTalk360
 
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...Databricks
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Windows Developer
 
Azure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersAzure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersMichaela Murray
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWSStylight
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data FactoryBizTalk360
 
Cortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftCortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftMSAdvAnalytics
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101Ike Ellis
 

Tendances (20)

Machine Learning on the Microsoft Stack
Machine Learning on the Microsoft StackMachine Learning on the Microsoft Stack
Machine Learning on the Microsoft Stack
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
 
Azure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersAzure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginners
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWS
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
Cortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftCortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ Microsoft
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
AWS & Database Analytics
AWS & Database AnalyticsAWS & Database Analytics
AWS & Database Analytics
 

Similaire à Database Choices

Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Mike King
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond RelationalLynn Langit
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsDustin Vannoy
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
HSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSHSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSAmazon Web Services
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)Amazon Web Services
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your StartupAmazon Web Services
 

Similaire à Database Choices (20)

NoSQL
NoSQLNoSQL
NoSQL
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
HSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSHSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWS
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 

Plus de Lynn Langit

VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWSLynn Langit
 
Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless ArchitecturesLynn Langit
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids ProgrammingLynn Langit
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on DockerLynn Langit
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina LanguageLynn Langit
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsLynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesLynn Langit
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data PipelinesLynn Langit
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids ProgrammingLynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless RealityLynn Langit
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsLynn Langit
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSLynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless RealityLynn Langit
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for BioinformaticsLynn Langit
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsLynn Langit
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformLynn Langit
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformSQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformLynn Langit
 

Plus de Lynn Langit (20)

VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
 
Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless Architectures
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on Docker
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina Language
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa Skills
 
Practical cloud
Practical cloudPractical cloud
Practical cloud
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examples
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids Programming
 
Practical Cloud
Practical CloudPractical Cloud
Practical Cloud
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWS
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformSQL Server on Google Cloud Platform
SQL Server on Google Cloud Platform
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Database Choices

  • 2. Databases Now -> a Menu of Choices
  • 3. Why Change? ->”Small” Big Data Your data - BEHAVIORAL Your data - TRANSACTIONAL PUBLIC data PREMIUM data
  • 4. Current Data Questions • “Should we evaluate Hadoop?” • “How much data is Big Data?” • “What are the limits of SQL Server?” • “Which NoSQL databases (if any) should we consider?” • “How safe is the cloud really?” • “How do we mine the data for usable information?”
  • 5. 5
  • 6. 6 DEMO - About Open Source • Free • Not Free  Rapid iteration, innovation  Can start up for free (on premise)  Can ‘rent’ for cheap or free on the cloud  Can use with the command line for free  Some vendors offer free online training  Ex. www.neo4j.org  Constant releases  Can be deceptively hard to set up (time is money)  Don’t forget to turn it off if on the cloud!  GUI tools, support, training cost $$$  Ex. www.neo4j.com
  • 7. Database Choices – The first level of choice Data A. Hadoop B. NoSQL C. Relational On Premise or In the Cloud
  • 10. How you ‘get’ Hadoop •roll your own A. Open source •Cloudera •MapR •Hortonworks •More… B. Commercial distribution •AWS •HDInsight C. Rent it via the cloud
  • 11. 11 Demo - Cloudera Hadoop Enterprise
  • 12.
  • 13. Demo – AWS MapReduce
  • 14. Example Comparison: RDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes and greater Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
  • 15. 15 Database Choices On Premise • RDBMS • NoSQL • Hadoop In Cloud • RDBMS • NoSQL • Hadoop
  • 16. An Aside…SQL Server 2012++ ‘NoSQL’ • SQL Server 2012 Columnstore Index • SQL Server 2012 Tabular Model (SSAS) 2012 2014 SSAS Tabular Models X X NC Columnstore Index X X Clustered (writable) Columnstore Index X In-memory OLTP X
  • 17. But wait… is there a RELATIONAL database that scales, that is cheap, that runs in the cloud?
  • 18. DEMO - AWS Redshift • About $1k per Terabyte per year - relational
  • 19. So many NoSQL options • More than just the Elephant in the room • Over 150+ types of NoSQL databases
  • 21. Key / Value Database • Just keys and values – No schema • Persistent or Volatile • Examples – AWS Dynamo DB – Riak
  • 22. DEMO - AWS DynamoDB • Key/Value store on the AWS cloud
  • 23. File (BLOB) Storage Buckets in the Cloud • Amazon – S3 or Glacier • Google – Cloud Storage • Microsoft Azure BLOBS
  • 24. DEMO - Battle of the Buckets • Google Cloud Storage VS. • Windows Azure BLOBS VS. • AWS S3  (Archiving) in to AWS Glacier
  • 25. Column Database • Wide, sparse column sets • Schema-light • Examples: – HBase w/Hadoop – Google Cloud Datastore – SQL Server Columnstore Indexes or SSAS Tabular Models
  • 26. Types of Column Databases • Column-families – Non-relational – Sparse – Examples: • HBase • Cassandra • xVelocity (SQL 2012 Tabular) • Column-stores – Relational – Dense – Example: • SQL Server 2012 Columnstore index
  • 27. DEMO – Google Cloud Datastore
  • 28. DEMO – SQL Server ‘NoSQL’ • SQL Server Columnstore Index • SQL Server SSAS Tabular Model
  • 29. Document Database • document-oriented (collection of JSON documents) w/semi structured data – Encodings include BSON, JSON, XML… • binary forms – PDF, Microsoft Office documents -- Word, Excel…) • Examples: – MongoDB – Couchbase
  • 31. Graph Databases • a lot of many-to-many relationships • recursive self-joins • when your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data • Examples: – Neo4j – AlgebraixData – Google Freebase
  • 33. Cloud-hosted, partially managed RDBMS • AWS RDS – SQL Server – MySQL – PostgreSQL – Oracle • Google – MySQL • Microsoft – SQLAzure
  • 34. DEMO - AWS RDS • SQL Server, MySQL or Oracle • Essential to understand pricing models
  • 35. NoSQL Applied Log Files •Columnstore •HBase Product Catalogs •Key/Value •DynamoDB Social Games •Document •MongoDB Social aggregators •Graph •Neo4j Line-of- Business •RDBMS •SQL Server
  • 36. Cloud Offerings– RDBMS AND NoSQL AWS Google Microsoft Managed RDBMS RDS – all major RDBMS Cloud SQL SQL Azure NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables Streaming or ML Kinesis Prospective Search & Prediction API StreamInsight NoSQL Document or Graph MongoDB on EC2 Neo4j on EC2 None Freebase MongoDB on Microsoft Cloud Neo4j on Microsoft Cloud Hadoop (HBase) Elastic MapReduce (S3 & EC2) None HDInsight Dremel/Warehousing RedShift BigQuery None Cloud ETL Data Pipelines None None
  • 37. But wait… how do I query NoSQL data?
  • 38. Example – translate ANSI SQL to MapReduce
  • 39. Can Excel help? Connector to Hadoop Power BI Data Quality Services Master Data Services Integration with Azure Data Market Data Mining w/Predixion
  • 40. Demo – Excel Power Query
  • 41. NoSQL To-Do List Understand types of NoSQL databases • Use NoSQL when business needs designate • Use the right type of NoSQL for your business problem Try out NoSQL on the cloud • Quick and cheap for behavioral data • Mashup cloud datasets • Good for specialized use cases, i.e. dev, test , training environments Learn NoSQL access technologies & services • New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon Karmasphere, Microsoft Excel connectors, etc… • Windows Azure Data Market, other public data markets
  • 42. www.TeachingKidsProgramming.org • Free Courseware (Java, Small Basic or C# [on Pluralsight]) • Do a Recipe  Teach a Kid (Ages 10 ++) • recipes)
  • 43. 43 A Big Thank You To Our Sponsors Gold Partners Silver & Track Partners Platinum Partners

Notes de l'éditeur

  1. http://pragprog.com/book/rwdata/seven-databases-in-seven-weeks
  2. http://hortonworks.com/technology/hortonworksdataplatform/ More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report “Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase. In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.” http://www.cloudera.com/
  3. http://hortonworks.com/technology/hortonworksdataplatform/ More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report “Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase. In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.” http://www.cloudera.com/
  4. http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-live.html
  5. http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
  6. Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  7. http://nosql-database.org/ http://hadoop.apache.org/ & http://www.mongodb.org/ Wikipedia - http://en.wikipedia.org/wiki/NoSQL List of noSQL databases – http://nosql-database.org/ The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
  8. http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/ http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  9. http://en.wikipedia.org/wiki/Project_Voldemort http://aws.amazon.com/ http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Introduction.html http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
  10. http://code.google.com Access via REST APIs Very Cheap, but not much functionality included Lots of code to write for application development But…can be a good backup solution
  11. http://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.html http://stage.hypertable.com/index.php/documentation/architecture/ http://code.google.com/appengine/ http://code.google.com/appengine/articles/datastore/overview.html
  12. http://cwebbbi.wordpress.com/2012/02/14/so-what-is-the-bi-semantic-model/ http://www.databasejournal.com/features/mssql/understanding-new-column-store-index-of-sql-server-2012.html http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html http://ayende.com/blog/4500/that-no-sql-thing-column-family-databases
  13. https://developers.google.com/datastore/docs/concepts/overview http://googledevelopers.blogspot.com/2014/01/get-started-with-google-cloud-platform.html
  14. http://en.wikipedia.org/wiki/MongoDB http://www.mongodb.org/downloads http://www.mongodb.org/display/DOCS/Drivers
  15. http://en.wikipedia.org/wiki/MongoDB & http://try.mongodb.org/ http://www.mongodb.org/downloads http://www.mongodb.org/display/DOCS/Drivers
  16. http://www.infinitegraph.com/what-is-a-graph-database.html and http://www.neo4j.org/ http://en.wikipedia.org/wiki/Graph_database http://www.freebase.com/
  17. http://www.neo4j.org/learn/try
  18. For Google - http://code.google.com For AWS - https://console.aws.amazon.com/console/home
  19. Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  20. http://rickosborne.org/download/SQL-to-MongoDB.pdf
  21. http://www.microsoft.com/en-us/bi/default.aspx http://dennyglee.com/ Demos -   http://www.youtube.com/watch?v=djfpPsGwm6A and http://www.youtube.com/watch?v=uh9bKWO1K7U
  22. Lynn