SlideShare une entreprise Scribd logo
1  sur  50
NoSQL
Databases
         &
Managing Big Data
Talking about
What is BIG Data
NoSQL
MongoDB
Future of BIG Data
@spf13

                  AKA
Steve Francia
15+ years building
the internet

  Father, husband,
  skateboarder



Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
Company behind MongoDB
Offices in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic

Well Funded: Sequoia, Union Square, Flybridge
What is
   BIG
    data   ?
2000
Google Inc
Today announced it has released
the largest search engine on the
Internet.

Google’s new index, comprising
more than 1 billion URLs
2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs

(and the number of individual web
pages out there is growing by
several billion pages per day).
Data Growth                                   1,000
1000



 750


                                                       500
 500


                                                250
 250
                                          120
                                  55
            4      10     24
       1
   0
    2000   2001   2002   2003   2004     2005   2006   2007   2008

                           Millions of URLs
An unprecedented
amount of data is
being created and is
accessible
What good is it if
we can’t utilize this
data?
?
What is
NoSQL
What is NoSQL?




Key / Value   Column   Graph   Document
Key-Value Stores
A mapping from a key to a value
The store doesn't know anything about the the
key or value
The store doesn't know anything about the
insides of the value
Operations :
•Set, get, or delete a key-value pair
Column-Oriented
            Stores
Like a relational store, but flipped around: all
data for a column is kept together
An index provides a means to get a column
value for a record
Operations:
 •Get, insert, delete records; updating fields
Streaming column data in and out of Hadoop
Graph Databases
Stores vertex-to-vertex edges
Operations:
 •Getting and setting edges
 •Sometimes possible to annotate vertices
 or edges
Query languages support finding paths
between vertices, subject to various
constraints
Document Stores
The store is a container for documents
Documents are made up of named fields
   (think object/array/dict/hash...)
Can query on any document field(s)
Operations:
•Insert and delete documents
•Update fields within documents
MySQL

Data Model     Columns    Key:Value     Columns   Documents Relational

            Eventual /    Eventual /
Consistency                             Strong      Strong       Strong
            Quorum        Quorum

               Multi-       Multi-      Single      Single       Single
Availability
               Master       Master      Master      Master       Master

                                                   Range or
Partitioning    Hash        Hash         Range                    N/A
                                                    Hash

                Thrift,    Native        Rest,      Native
  Query                                                           SQL
                CQL       Drivers (6)    Thrift   Drivers (12)
Introduction to
MongoDB
What do we want in
 an ideal world?
What do we want in
       an ideal world?
•Horizontal scaling
  •cloud compatible
  •works with standard
  servers
•Fast
•Development is easy
  •Features
  •The Right Data Model
  •Schema Agility
MongoDB philosophy
 Keep functionality when we can (key/value
 stores are great, but we need more)
 Non-relational (no joins) makes scaling
 horizontally practical
 Document data models are good
 Database technology should run anywhere
 virtualized, cloud, metal, etc
Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped files
i.e. read-through write-through
memory caching.
Database Landscape
Scalability & Performance


                            Memcached
                                             MongoDB



                                                   RDBMS


                                Depth of Functionality
“
MongoDB has the best
features of key/value
stores, document
databases and
relational databases
in one.
        John Nunemaker
Relational made normalized
     data look like this
                      Category
                  • Name
                  • Url




                           Article
       User       • Name
                                              Tag
• Name            • Slug             • Name
• Email Address   • Publish date     • Url
                  • Text




                     Comment
                  • Comment
                  • Date
                  • Author
Document databases make
normalized data look like this
                            Article
                     • Name
                     • Slug
                     • Publish date
        User         • Text
   • Name            • Author
   • Email Address
                         Comment[]
                      • Comment
                      • Date
                      • Author

                            Tag[]
                      • Value

                         Category[]
                      • Value
MongoD
  B
Start with an
              (or array, hash, dict, e

place1 = {

   name : "10gen HQ",

 address : "578 Broadway 7th Floor",

   city : "New York",

    zip : "10011",
   tags : [ "business", "awesome" ]
}
Inserting the record
    Initial Data Load


               > db.places.insert(place1)

> db.places.insert(place1)
Querying
{

    name : "10gen HQ",

 address : "134 5th Avenue 3rd Floor",

    city : "New York",

     zip : "10011",
   tags : [ "business", "awesome" ]
}

> db.places.findOne({ zip: "10011",
            tags: "awesome" })

> db.places.find({tags: "business" })
Nested Documents
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
   name : "10gen HQ",

 address : "578 Broadway 7th Floor",

   city : "New York",

    zip : "10011",
   tags : [ "business", "awesome" ],
     tips :   [{

        

    author : "Fred",

        

    date : "Sat Apr 25 2010 20:51:03",

        

    text : "Best Place Ever!"

    }]
}
Updating
> db.places.update(
  {name : "10gen HQ"},
  { $push :
     { tips :
         { author : "nosh",
           date : 6/26/2011, 
           text : "Office hours are great!"
         }
     }
  }
)
MongoDB
Use Cases
CMS / Blog
Needs:
• Business needed modern data store for rapid development and
  scale

Solution:
• Use PHP & MongoDB

Results:
• Real time statistics
• All data, images, etc stored together
  easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth
Photo Meta-Data
Problem:
• Business needed more flexibility than Oracle could deliver

Solution:
• Use MongoDB instead of Oracle

Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
Customer Analytics
Problem:
• Deal with massive data volume across all customer sites

Solution:
• Use MongoDB to replace Google Analytics / Omniture options

Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features
Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding enabled horizontal scale
• Very happily looking at other places to use MongoDB
Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents

Solution:
• Switched from MySQL to MongoDB

Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efficiently) in
  RDBMS

Solution:
• Switched from MySQL to MongoDB

Results:
•   Massive simplification of code base
•   Rapidly build, halving time to market (and cost)
•   Eliminated need for external caching system
•   50x+ performance improvement over MySQL
Tons more
   MongoDB casts a wide net

  people keep coming up with
 new and brilliant ways to use it
In Good Company




      and 1000s more
The
  Futureof
      BIGdata
What is BIG?
  BIG today is
normal tomorrow
Data Growth                                                 9,000
9000



6750


                                                                   4,400
4500


                                                           2,150
2250
                                                   1,000
                                             500
                         55     120   250
       1   4   10   24
  0
   2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

                              Millions of URLs
Data Growth                                                 9,000
9000



6750


                                                                   4,400
4500


                                                           2,150
2250
                                                   1,000
                                             500
                         55     120   250
       1   4   10   24
  0
   2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

                              Millions of URLs
2012
Generating over
250 Millions of
tweets per day
MongoDB enables
us to scale with
the redefinition
of BIG.
MongoDB
    High                           Easy
Performance                     Development
         { author : “steve”,
           date : new Date(),
           text : “About MongoDB...”,
           tags : [“tech”, “database”]}




   Horizontally Scalable
http://spf13.com
                           http://github.com/s
                           @spf13




Question
    download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
NoSQL databases and managing big data

Contenu connexe

Tendances

Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 

Tendances (20)

introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
Tableau Presentation
Tableau PresentationTableau Presentation
Tableau Presentation
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 

En vedette

Big data retos y oportunidades para el turismo
Big data retos y oportunidades para el turismoBig data retos y oportunidades para el turismo
Big data retos y oportunidades para el turismo
Lia Arce
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
Steven Francia
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
Jesus Rodriguez
 

En vedette (20)

NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture Patterns
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Bi dutch meeting data science
Bi dutch meeting data scienceBi dutch meeting data science
Bi dutch meeting data science
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
Gdg 2013
Gdg 2013Gdg 2013
Gdg 2013
 
Big data
Big dataBig data
Big data
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
Diapositivas
DiapositivasDiapositivas
Diapositivas
 
Las Islas de Felicidad No Son Suficiente (Conferencia Agile-Spain 2.013)
Las Islas de Felicidad No Son Suficiente (Conferencia Agile-Spain 2.013)Las Islas de Felicidad No Son Suficiente (Conferencia Agile-Spain 2.013)
Las Islas de Felicidad No Son Suficiente (Conferencia Agile-Spain 2.013)
 
Manuel Machado - Big data, de la investigación científica a la gestión empres...
Manuel Machado - Big data, de la investigación científica a la gestión empres...Manuel Machado - Big data, de la investigación científica a la gestión empres...
Manuel Machado - Big data, de la investigación científica a la gestión empres...
 
Big data retos y oportunidades para el turismo
Big data retos y oportunidades para el turismoBig data retos y oportunidades para el turismo
Big data retos y oportunidades para el turismo
 
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
 
Workshop UOC Empresa sobre gamificación
Workshop UOC Empresa sobre gamificaciónWorkshop UOC Empresa sobre gamificación
Workshop UOC Empresa sobre gamificación
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
Evolución a Big Data en la empresa no tecnológica
Evolución a Big Data en la empresa no tecnológicaEvolución a Big Data en la empresa no tecnológica
Evolución a Big Data en la empresa no tecnológica
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 

Similaire à NoSQL databases and managing big data

Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
Neo4j
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
Steven Francia
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and Sitecore
Tim Schneider
 

Similaire à NoSQL databases and managing big data (20)

A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
 
NoSQL
NoSQLNoSQL
NoSQL
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
Rising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational DatabasesRising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational Databases
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
MongoDB & NoSQL 101
 MongoDB & NoSQL 101 MongoDB & NoSQL 101
MongoDB & NoSQL 101
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQL
 
NoSQL with Mongodb
NoSQL with MongodbNoSQL with Mongodb
NoSQL with Mongodb
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and Sitecore
 

Plus de Steven Francia

Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 

Plus de Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

NoSQL databases and managing big data

  • 1. NoSQL Databases & Managing Big Data
  • 2. Talking about What is BIG Data NoSQL MongoDB Future of BIG Data
  • 3. @spf13 AKA Steve Francia 15+ years building the internet Father, husband, skateboarder Chief Solutions Architect @ responsible for drivers, integrations, web & docs
  • 4. Company behind MongoDB Offices in NYC, Palo Alto, London & Dublin 100+ employees Support, consulting, training Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic Well Funded: Sequoia, Union Square, Flybridge
  • 5. What is BIG data ?
  • 6. 2000 Google Inc Today announced it has released the largest search engine on the Internet. Google’s new index, comprising more than 1 billion URLs
  • 7. 2008 Our indexing system for processing links indicates that we now count 1 trillion unique URLs (and the number of individual web pages out there is growing by several billion pages per day).
  • 8. Data Growth 1,000 1000 750 500 500 250 250 120 55 4 10 24 1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 Millions of URLs
  • 9. An unprecedented amount of data is being created and is accessible
  • 10. What good is it if we can’t utilize this data?
  • 12. What is NoSQL? Key / Value Column Graph Document
  • 13. Key-Value Stores A mapping from a key to a value The store doesn't know anything about the the key or value The store doesn't know anything about the insides of the value Operations : •Set, get, or delete a key-value pair
  • 14. Column-Oriented Stores Like a relational store, but flipped around: all data for a column is kept together An index provides a means to get a column value for a record Operations: •Get, insert, delete records; updating fields Streaming column data in and out of Hadoop
  • 15. Graph Databases Stores vertex-to-vertex edges Operations: •Getting and setting edges •Sometimes possible to annotate vertices or edges Query languages support finding paths between vertices, subject to various constraints
  • 16. Document Stores The store is a container for documents Documents are made up of named fields (think object/array/dict/hash...) Can query on any document field(s) Operations: •Insert and delete documents •Update fields within documents
  • 17. MySQL Data Model Columns Key:Value Columns Documents Relational Eventual / Eventual / Consistency Strong Strong Strong Quorum Quorum Multi- Multi- Single Single Single Availability Master Master Master Master Master Range or Partitioning Hash Hash Range N/A Hash Thrift, Native Rest, Native Query SQL CQL Drivers (6) Thrift Drivers (12)
  • 19. What do we want in an ideal world?
  • 20. What do we want in an ideal world? •Horizontal scaling •cloud compatible •works with standard servers •Fast •Development is easy •Features •The Right Data Model •Schema Agility
  • 21. MongoDB philosophy Keep functionality when we can (key/value stores are great, but we need more) Non-relational (no joins) makes scaling horizontally practical Document data models are good Database technology should run anywhere virtualized, cloud, metal, etc
  • 22. Under the hood Written in C++ Runs nearly everywhere Data serialized to BSON Extensive use of memory-mapped files i.e. read-through write-through memory caching.
  • 23. Database Landscape Scalability & Performance Memcached MongoDB RDBMS Depth of Functionality
  • 24. “ MongoDB has the best features of key/value stores, document databases and relational databases in one. John Nunemaker
  • 25. Relational made normalized data look like this Category • Name • Url Article User • Name Tag • Name • Slug • Name • Email Address • Publish date • Url • Text Comment • Comment • Date • Author
  • 26. Document databases make normalized data look like this Article • Name • Slug • Publish date User • Text • Name • Author • Email Address Comment[] • Comment • Date • Author Tag[] • Value Category[] • Value
  • 28. Start with an (or array, hash, dict, e place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] }
  • 29. Inserting the record Initial Data Load > db.places.insert(place1) > db.places.insert(place1)
  • 30. Querying { name : "10gen HQ", address : "134 5th Avenue 3rd Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] } > db.places.findOne({ zip: "10011", tags: "awesome" }) > db.places.find({tags: "business" })
  • 31. Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ], tips : [{ author : "Fred", date : "Sat Apr 25 2010 20:51:03", text : "Best Place Ever!" }] }
  • 32. Updating > db.places.update( {name : "10gen HQ"}, { $push : { tips : { author : "nosh", date : 6/26/2011, text : "Office hours are great!" } } } )
  • 34. CMS / Blog Needs: • Business needed modern data store for rapid development and scale Solution: • Use PHP & MongoDB Results: • Real time statistics • All data, images, etc stored together easy access, easy deployment, easy high availability • No need for complex migrations • Enabled very rapid development and growth
  • 35. Photo Meta-Data Problem: • Business needed more flexibility than Oracle could deliver Solution: • Use MongoDB instead of Oracle Results: • Developed application in one sprint cycle • 500% cost reduction compared to Oracle • 900% performance improvement compared to Oracle
  • 36. Customer Analytics Problem: • Deal with massive data volume across all customer sites Solution: • Use MongoDB to replace Google Analytics / Omniture options Results: • Less than one week to build prototype and prove business case • Rapid deployment of new features
  • 37. Archiving Why MongoDB: • Existing application built on MySQL • Lots of friction with RDBMS based archive storage • Needed more scalable archive storage backend Solution: • Keep MySQL for active data (100mil) • MongoDB for archive (2+ billion) Results: • No more alter table statements taking over 2 months to run • Sharding enabled horizontal scale • Very happily looking at other places to use MongoDB
  • 38. Online Dictionary Problem: • MySQL could not scale to handle their 5B+ documents Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Eliminated need for external caching system • 20x performance improvement over MySQL
  • 39. E-commerce Problem: • Multi-vertical E-commerce impossible to model (efficiently) in RDBMS Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Rapidly build, halving time to market (and cost) • Eliminated need for external caching system • 50x+ performance improvement over MySQL
  • 40. Tons more MongoDB casts a wide net people keep coming up with new and brilliant ways to use it
  • 41. In Good Company and 1000s more
  • 42. The Futureof BIGdata
  • 43. What is BIG? BIG today is normal tomorrow
  • 44. Data Growth 9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
  • 45. Data Growth 9,000 9000 6750 4,400 4500 2,150 2250 1,000 500 55 120 250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Millions of URLs
  • 47. MongoDB enables us to scale with the redefinition of BIG.
  • 48. MongoDB High Easy Performance Development { author : “steve”, date : new Date(), text : “About MongoDB...”, tags : [“tech”, “database”]} Horizontally Scalable
  • 49. http://spf13.com http://github.com/s @spf13 Question download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com

Notes de l'éditeur

  1. \n
  2. 10\n15\n10\n5\n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. * memcache, redis, membase\n* mongodb, couch\n* cassandra, riak\n* neo4j, flockdb\n\n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. One site is generating nearly as many URLs as the entire internet 6 years ago.\n
  46. \n
  47. \n
  48. \n
  49. \n