SlideShare une entreprise Scribd logo
1  sur  46
What Should I Know
                          about NoSQL?
                                                                                             Cris J. Holdorph
                                                                                                 Software Architect
                                                                                                       Unicon, Inc.

                                                                                                  Jasig Conference
                                                                                                  Westminster, CO
                                                                                                     May 24, 2011




© Copyright Unicon, Inc., 2008. Some rights reserved. This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/
Lethal SQL




             2
3
Agenda
1. Definitions
2. History
3. Projects
4. Example Case Studies




                          4
Definitions




              5
Definitions
●
    RDBMS
●
    SQL
●
    CRUD
●
    ACID
    –   Atomicity, Consistency, Isolation, Durability
●
    BASE
    –   Basically Available, Soft state, Eventual
        consistency


                                                        6
7
Definitions
●
    Big Data
●
    Sharding
●
    Cloud Computing
●
    Distributed File System
●
    Key Value Store




                                8
History




          9
Map Reduce
●
    Patented software framework introduced by Google
    in 2004 to support distributed computing on large
    data sets on clusters of computers.
●
    Naming originally inspired by map and reduce
    functions of functional programming (but their
    purpose is not the same as it was there)
●
    Map
    –   The master node takes the input, partitions it up into
        smaller sub-problems, and distributes those to worker nodes
●
    Reduce
    –   The master node then takes the answers to all the sub-
        problems and combines them in some way to get the output
                                                               10
What does NoSQL Stand For?
●
    NoSQL
●
    No SQL
●
    Not SQL
●
    Not Only SQL
●
    Not the RDBMS
●
    Wikipedia:
    –   Carlo Strozzi used the term "NoSQL" in 1998 to
        name his lightweight, open-source relational
        database that did not expose an SQL interface.

                                                         11
History
●
    Some techniques have existed for over 25
     years
●
    Teradata selling product for more then 20
      years
●
    RDBMS dates back to 1970




                                                12
CAP Theorem
●
    A conjecture made by Eric Brewer at the
      Symposium on Principles of Distributed
      Computing (2000)
●
    States only possible to achieve 2 of 3
    –   Consistency (all nodes see the same data at the
        same time)
    –   Availability (node failures do not prevent survivors
        from continuing to operate)
    –   Partition Tolerance (the system continues to
        operate despite arbitrary message loss)

                                                         13
CAP
●
    Consistent and Available
    –   ACID systems, MySQL cluster, Oracle Coherence,
        Drizzle
●
    Consistent and Partition Tolerance
    –   SCLA (strongly consistent, loosely available)
    –   HBase, Bigtable
●
    Available and Partition Tolerant
    –   BASE systems (CouchDB, SimpleDB, MongoDB
●
    Cassandra (sits between SCLA/BASE
    systems)
                                                        14
Projects




           15
Hadoop
●
    Open-source software for reliable, scalable,
     distributed computing (Hadoop website)
    –   Hadoop Common
    –   HDFS
    –   MapReduce
●
    Created Initially in early 2006 to support
     search engine project Nutch
●
    Inspired by the Google File System and
      MapReduce papers (Oct 2003)

                                                 16
Hadoop Related Projects
●
    Hbase
    –   A scalable, distributed database that supports
        structured data storage for large tables
●
    Hive
    –   A data warehouse infrastructure that provides
        data summarization and ad hoc querying
●
    Pig
    –   A high-level data-flow language and execution
        framework for parallel computation
●
    Cassandra
    –   uses Hadoop for MapReduce                        17
Who Uses Hadoop
●
    EBay (532 nodes, Search optimization)
●
    Facebook (1100x8 node cluster, 300x8 node cluster, more on
    this later)
●
    GumGum (Ken Weiner, 20+ node cluster on Amazon EC2)
●
    Hulu (log storage analysis)
●
    Last.fm (44x2 nodes log analysis, 20x2 nodes profile analysis)
●
    LinkedIn (120x2x4 nodes, 520x2x4 nodes, "People you may
    know")
●
    Twitter (more on this later)
●
    Yahoo! (100,000 cpus running Hadoop, more on this later)



                                                                18
CouchDB
●
    Apache open source document oriented database
    written in Erlang (concurrent programming lang)
●
    Designed to scale horizontally
●
    Stores documents (one or more field value pairs
    expressed as JSON)
●
    ACID Semantics
●
    Map/Reduce Views and Indexes (written in server
    side javascript)
●
    Bi-direction replication (with conflict resolution)
●
    REST API

                                                          19
http://couchdb.apache.org/img/sketch.png

                                           20
CouchDB Sample Document

"Subject": "I like Plankton"
"Author": "Rusty"
"PostedDate": "5/23/2006"
"Tags": ["plankton", "baseball", "decisions"]
"Body": "I decided today that I don't like baseball. I
like plankton."




         http://couchdb.apache.org/docs/intro.html

                                                         21
Who uses CouchDB?
●
    Ubuntu One – cloud storage service
    –   http://ubuntuone.com/
●
    "I Play WoW" facebook app
    –   http://blog.socklabs.com/2008/12/24/iplaywow_monthly_actives.html

●
    Wego - travel site
    –   http://www.wego.com/




                                                                            22
Cassandra
●
    Fault Tolerant (replication, failed nodes can
    be replaced with no downtime)
●
    Decentralized (ever node in cluster is
    identical, no bottlenicks)
●
    Supports either Synchronous or
    Asynchronous update replication
●
    Supports more then simple key/value pair
●
    Elastic (read/write throughput increase
    linearly as machines are added)
●
    Durable (suitable for applictions that can't
                                                    23
    afford to lose data)
Cassandra
●
    Initially developed by Facebook for Inbox
    Search (until replaced by HBase)
●
    Key-value store where values can be multiple
    values
●
    Some inspiration from Amazon's Dynamo
    (another key-value store)




                                                24
Who uses Cassandra?
●
    Facebook (previously)
●
    Twitter
●
    Digg
●
    Cisco




                                    25
MongoDB
●
    Name is derived from "humongous"
●
    Document oriented database written in C++
●
    Manages collections of JSON-like documents
●
    Binaries available for windows, linux, OS X,
    Solaris
●
    Supports dates, regular expressions code,
    binary data (all BSON types)
●
    Cursors for query results
●
    Any field can be queried at any time
                                                   26
MongoDB
●
    Queries can include user-defined JavaScript
    functions
●
    Master/Slave (only master supports writes,
    slaves can be read from)
●
    Scales horizontally using sharding
●
    Support for Map/Reduce




                                                 27
Who uses MongoDB?
●
    New York Times
●
    Shutterfly
●
    Foursquare
●
    SourceForge
●
    Intuit




                                 28
Google Big Table
●
    Built on GFS (Google File System)
●
    Can be used with Google App Engine
●
    Maps two aribtrary strings and a timestamp
●
    Designed to scale into the petabyte range
●
    Designed to scale across hundreds or
    thousands of machines
●
    Portions of a table (tablets) can be
    compressed
●
    HBase was modeled after BigTable
                                                29
Who uses Big Table?
●
    Google Reader
●
    Google Maps
●
    Google Book Search
●
    Google Earth
●
    Blogger.com
●
    Google Code
●
    Orkut
●
    YouTube
●
    Gmail                           30
Amazon SimpleDB
●
    Written in Erlang
●
    Used with Amazon EC2 and Amazon S3
●
    Easy access to lookup and query functions
●
    Without support for the less used complex database
    functions
●
    Do not need to pre-define data formats that will be stored
●
    Scalable (with size limitations)
     –   10gb per domain, up to 250 domains
●
    Fast/Reliable
●
    Supports eventually consistent read and consistent read
●
    Potentially Inexpensive
                                                           31
SimpleDB Data Model




http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/DataModel.html
                                                                                 32
SimpleDB Data Model
●
    Customer Account (amazon web services account)
●
    Domains (similar to tables, or spreadsheet tabs)
●
    Items (similar to rows)
●
    Attributes (similar to columns)
●
    Values (similar to cells)
     –   Unlike a spreadsheet, however, multiple values can be
         associated with a cell
●
    One domain can contain different types of data
    (some attributes not filled in)


                                                                 33
SimpleDB API Summary
●
    CreateDomain
●
    DeleteDomain
●
    ListDomains
●
    PutAttributes
●
    BatchPutAttributes
●
    DeleteAttributes
●
    BatchDeleteAttributes
●
    GetAttributes
●
    Select
●
    DomainMetadata                  34
Who uses SimpleDB?
●
    Netflix
●
    Other Amazon EC2 customers...




                                    35
memcached
●
    General purpose distributed memory caching system
●
    Often used to cache in RAM that might otherwise be
    obtained from an external data source
●
    LRU (when cache is full)
●
    Can be distributed across multiple machines




                                                    36
Who uses memcached?
●
    YouTube
●
    Zynga
●
    Facebook
●
    Twitter




                                    37
Terracotta
●
    JVM in-memory distributed cache / store
●
    The object store can be persistent
●
    Distribution between nodes is handled through
    Terracotta server
●
    Supports multiple Terracotta servers
●
    Nodes only receive data they need/reference




                                                    38
Who uses Terracotta?
●
    Sakai (thanks to John Wiley & Sons)
●
    PartyGaming (PartyPoker.com)
●
    Adobe
●
    Pearson




                                          39
Example Case Studies




                       40
Yahoo!
●
    Hadoop
    –   http://developer.yahoo.com/blogs/hadoop
    –   More than 100,000 CPUs in >36,000 computers
        running Hadoop
    –   Our biggest cluster: 4000 nodes (2*4cpu boxes w
        4*1TB disk & 16GB RAM)
    –   Used to support research for Ad Systems and Web
        Search
    –   Also used to do scaling tests to support
        development of Hadoop on larger clusters
    –   >60% of Hadoop Jobs within Yahoo are Pig jobs
                                                        41
Twitter
●
    How Twitter Uses NoSQL
    –   http://goo.gl/Bwxoe
●
    Scribe
    –   Syslog stopped scaling
●
    Hadoop
    –   Needs to store more data per day than it can reliably write to a
        single hard drive
●
    Pig
    –   Used for interacting with Hadoop
●
    Hbase
    –   People Search
●
    FlockDB
    –   Social Graph Analysis                                              42
Netflix
    ●
        NoSQL at Netflix
         –   http://goo.gl/SDcsZ
    ●
        SimpleDB
         –   Highly durable, with writes automatically replicated across
             availability zones within a region
         –   Love it when others do heavy lifting for us
●
        Hadoop/HBase
         –   Convenient, high-performance column-oriented distributed
             database solution
         –   HBase makes it really easy to grow your cluster and re-distribute
             load across nodes at runtime
●
        Cassandra
         –   Adding more servers, without the need to re-shard
                                                                            43
Facebook
●
    http://goo.gl/J9EVW
●
    350 million users sending over 15 billion person-to-person messages
    per month
●
    Chat service supports over 300 million users who send over 120 billion
    messages per month
●
    Two patterns emerged
     –   A short set of temporal data that tends to be volatile
     –   An ever-growing set of data that rarely gets accessed
●
    Evaluate clusters of MySQL, Apache Cassandra, Apache HBase, and a
    couple of other systems
     –   MySQL proved to not handle the long tail of data well (as
         indexes/data grows large performance suffers
     –   Cassandra's eventual consistency model to be a difficult pattern to
         reconcile for our new Messages infrastructure.
                                                                         44
“There is a learning curve and an
operational overhead. Still, the scalability,
availability and performance advantages of
the NoSQL persistence model are evident
and are paying for themselves already, and
will be central to our long-term cloud
strategy.”
           Yury Izrailevsky, Netflix



                                           45
Questions & Answers




         Cris J. Holdorph
         Software Architect
         Unicon, Inc.

         Twitter: @holdorph

         holdorph@unicon.net
         www.unicon.net        46

Contenu connexe

Tendances

Using mruby in the nosql database Avocadodb
Using mruby in the nosql database AvocadodbUsing mruby in the nosql database Avocadodb
Using mruby in the nosql database Avocadodbavocadodb
 
Drupal Migration
Drupal MigrationDrupal Migration
Drupal Migration永对 陈
 
GeoNetwork workshop introduction mapwindow conference 2012 Velp
GeoNetwork workshop introduction mapwindow conference 2012 VelpGeoNetwork workshop introduction mapwindow conference 2012 Velp
GeoNetwork workshop introduction mapwindow conference 2012 Velppvangenuchten
 
LDAP at Lightning Speed
 LDAP at Lightning Speed LDAP at Lightning Speed
LDAP at Lightning SpeedC4Media
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015Zohar Elkayam
 

Tendances (6)

Using mruby in the nosql database Avocadodb
Using mruby in the nosql database AvocadodbUsing mruby in the nosql database Avocadodb
Using mruby in the nosql database Avocadodb
 
Drupal Migration
Drupal MigrationDrupal Migration
Drupal Migration
 
MySQL - NDB Cluster
MySQL - NDB ClusterMySQL - NDB Cluster
MySQL - NDB Cluster
 
GeoNetwork workshop introduction mapwindow conference 2012 Velp
GeoNetwork workshop introduction mapwindow conference 2012 VelpGeoNetwork workshop introduction mapwindow conference 2012 Velp
GeoNetwork workshop introduction mapwindow conference 2012 Velp
 
LDAP at Lightning Speed
 LDAP at Lightning Speed LDAP at Lightning Speed
LDAP at Lightning Speed
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
 

Similaire à No SQL Technologies

HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemAdam Marcus
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem yarapavan
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackOpenStack_Online
 
Ceph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Community
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongPatrick McGarry
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETNikos Kormpakis
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Community
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the moveCodemotion
 
OSOM Operations in the Cloud
OSOM Operations in the CloudOSOM Operations in the Cloud
OSOM Operations in the Cloudmstuparu
 
OSOM - Operations in the Cloud
OSOM - Operations in the CloudOSOM - Operations in the Cloud
OSOM - Operations in the CloudMarcela Oniga
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialPHP Support
 
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Community
 
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's CephCeph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's CephCeph Community
 

Similaire à No SQL Technologies (20)

HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem
 
Drop acid
Drop acidDrop acid
Drop acid
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
 
Ceph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade in
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strong
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
 
OSOM Operations in the Cloud
OSOM Operations in the CloudOSOM Operations in the Cloud
OSOM Operations in the Cloud
 
OSOM - Operations in the Cloud
OSOM - Operations in the CloudOSOM - Operations in the Cloud
OSOM - Operations in the Cloud
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js Tutorial
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
 
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's CephCeph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's Ceph
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 

Plus de Cris Holdorph

Programming for Performance
Programming for PerformanceProgramming for Performance
Programming for PerformanceCris Holdorph
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheCris Holdorph
 
Developing JSR 286 Portlets
Developing JSR 286 PortletsDeveloping JSR 286 Portlets
Developing JSR 286 PortletsCris Holdorph
 
Adding Performance Testing to a Software Development Project
Adding Performance Testing to a Software Development ProjectAdding Performance Testing to a Software Development Project
Adding Performance Testing to a Software Development ProjectCris Holdorph
 
Sakai and IMS LIS Integration
Sakai and IMS LIS IntegrationSakai and IMS LIS Integration
Sakai and IMS LIS IntegrationCris Holdorph
 
Clustering Sakai with Terracotta
Clustering Sakai with TerracottaClustering Sakai with Terracotta
Clustering Sakai with TerracottaCris Holdorph
 
Introduction to Terracotta
Introduction to TerracottaIntroduction to Terracotta
Introduction to TerracottaCris Holdorph
 

Plus de Cris Holdorph (7)

Programming for Performance
Programming for PerformanceProgramming for Performance
Programming for Performance
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
 
Developing JSR 286 Portlets
Developing JSR 286 PortletsDeveloping JSR 286 Portlets
Developing JSR 286 Portlets
 
Adding Performance Testing to a Software Development Project
Adding Performance Testing to a Software Development ProjectAdding Performance Testing to a Software Development Project
Adding Performance Testing to a Software Development Project
 
Sakai and IMS LIS Integration
Sakai and IMS LIS IntegrationSakai and IMS LIS Integration
Sakai and IMS LIS Integration
 
Clustering Sakai with Terracotta
Clustering Sakai with TerracottaClustering Sakai with Terracotta
Clustering Sakai with Terracotta
 
Introduction to Terracotta
Introduction to TerracottaIntroduction to Terracotta
Introduction to Terracotta
 

Dernier

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

No SQL Technologies

  • 1. What Should I Know about NoSQL? Cris J. Holdorph Software Architect Unicon, Inc. Jasig Conference Westminster, CO May 24, 2011 © Copyright Unicon, Inc., 2008. Some rights reserved. This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/
  • 3. 3
  • 4. Agenda 1. Definitions 2. History 3. Projects 4. Example Case Studies 4
  • 6. Definitions ● RDBMS ● SQL ● CRUD ● ACID – Atomicity, Consistency, Isolation, Durability ● BASE – Basically Available, Soft state, Eventual consistency 6
  • 7. 7
  • 8. Definitions ● Big Data ● Sharding ● Cloud Computing ● Distributed File System ● Key Value Store 8
  • 10. Map Reduce ● Patented software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. ● Naming originally inspired by map and reduce functions of functional programming (but their purpose is not the same as it was there) ● Map – The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes ● Reduce – The master node then takes the answers to all the sub- problems and combines them in some way to get the output 10
  • 11. What does NoSQL Stand For? ● NoSQL ● No SQL ● Not SQL ● Not Only SQL ● Not the RDBMS ● Wikipedia: – Carlo Strozzi used the term "NoSQL" in 1998 to name his lightweight, open-source relational database that did not expose an SQL interface. 11
  • 12. History ● Some techniques have existed for over 25 years ● Teradata selling product for more then 20 years ● RDBMS dates back to 1970 12
  • 13. CAP Theorem ● A conjecture made by Eric Brewer at the Symposium on Principles of Distributed Computing (2000) ● States only possible to achieve 2 of 3 – Consistency (all nodes see the same data at the same time) – Availability (node failures do not prevent survivors from continuing to operate) – Partition Tolerance (the system continues to operate despite arbitrary message loss) 13
  • 14. CAP ● Consistent and Available – ACID systems, MySQL cluster, Oracle Coherence, Drizzle ● Consistent and Partition Tolerance – SCLA (strongly consistent, loosely available) – HBase, Bigtable ● Available and Partition Tolerant – BASE systems (CouchDB, SimpleDB, MongoDB ● Cassandra (sits between SCLA/BASE systems) 14
  • 15. Projects 15
  • 16. Hadoop ● Open-source software for reliable, scalable, distributed computing (Hadoop website) – Hadoop Common – HDFS – MapReduce ● Created Initially in early 2006 to support search engine project Nutch ● Inspired by the Google File System and MapReduce papers (Oct 2003) 16
  • 17. Hadoop Related Projects ● Hbase – A scalable, distributed database that supports structured data storage for large tables ● Hive – A data warehouse infrastructure that provides data summarization and ad hoc querying ● Pig – A high-level data-flow language and execution framework for parallel computation ● Cassandra – uses Hadoop for MapReduce 17
  • 18. Who Uses Hadoop ● EBay (532 nodes, Search optimization) ● Facebook (1100x8 node cluster, 300x8 node cluster, more on this later) ● GumGum (Ken Weiner, 20+ node cluster on Amazon EC2) ● Hulu (log storage analysis) ● Last.fm (44x2 nodes log analysis, 20x2 nodes profile analysis) ● LinkedIn (120x2x4 nodes, 520x2x4 nodes, "People you may know") ● Twitter (more on this later) ● Yahoo! (100,000 cpus running Hadoop, more on this later) 18
  • 19. CouchDB ● Apache open source document oriented database written in Erlang (concurrent programming lang) ● Designed to scale horizontally ● Stores documents (one or more field value pairs expressed as JSON) ● ACID Semantics ● Map/Reduce Views and Indexes (written in server side javascript) ● Bi-direction replication (with conflict resolution) ● REST API 19
  • 21. CouchDB Sample Document "Subject": "I like Plankton" "Author": "Rusty" "PostedDate": "5/23/2006" "Tags": ["plankton", "baseball", "decisions"] "Body": "I decided today that I don't like baseball. I like plankton." http://couchdb.apache.org/docs/intro.html 21
  • 22. Who uses CouchDB? ● Ubuntu One – cloud storage service – http://ubuntuone.com/ ● "I Play WoW" facebook app – http://blog.socklabs.com/2008/12/24/iplaywow_monthly_actives.html ● Wego - travel site – http://www.wego.com/ 22
  • 23. Cassandra ● Fault Tolerant (replication, failed nodes can be replaced with no downtime) ● Decentralized (ever node in cluster is identical, no bottlenicks) ● Supports either Synchronous or Asynchronous update replication ● Supports more then simple key/value pair ● Elastic (read/write throughput increase linearly as machines are added) ● Durable (suitable for applictions that can't 23 afford to lose data)
  • 24. Cassandra ● Initially developed by Facebook for Inbox Search (until replaced by HBase) ● Key-value store where values can be multiple values ● Some inspiration from Amazon's Dynamo (another key-value store) 24
  • 25. Who uses Cassandra? ● Facebook (previously) ● Twitter ● Digg ● Cisco 25
  • 26. MongoDB ● Name is derived from "humongous" ● Document oriented database written in C++ ● Manages collections of JSON-like documents ● Binaries available for windows, linux, OS X, Solaris ● Supports dates, regular expressions code, binary data (all BSON types) ● Cursors for query results ● Any field can be queried at any time 26
  • 27. MongoDB ● Queries can include user-defined JavaScript functions ● Master/Slave (only master supports writes, slaves can be read from) ● Scales horizontally using sharding ● Support for Map/Reduce 27
  • 28. Who uses MongoDB? ● New York Times ● Shutterfly ● Foursquare ● SourceForge ● Intuit 28
  • 29. Google Big Table ● Built on GFS (Google File System) ● Can be used with Google App Engine ● Maps two aribtrary strings and a timestamp ● Designed to scale into the petabyte range ● Designed to scale across hundreds or thousands of machines ● Portions of a table (tablets) can be compressed ● HBase was modeled after BigTable 29
  • 30. Who uses Big Table? ● Google Reader ● Google Maps ● Google Book Search ● Google Earth ● Blogger.com ● Google Code ● Orkut ● YouTube ● Gmail 30
  • 31. Amazon SimpleDB ● Written in Erlang ● Used with Amazon EC2 and Amazon S3 ● Easy access to lookup and query functions ● Without support for the less used complex database functions ● Do not need to pre-define data formats that will be stored ● Scalable (with size limitations) – 10gb per domain, up to 250 domains ● Fast/Reliable ● Supports eventually consistent read and consistent read ● Potentially Inexpensive 31
  • 33. SimpleDB Data Model ● Customer Account (amazon web services account) ● Domains (similar to tables, or spreadsheet tabs) ● Items (similar to rows) ● Attributes (similar to columns) ● Values (similar to cells) – Unlike a spreadsheet, however, multiple values can be associated with a cell ● One domain can contain different types of data (some attributes not filled in) 33
  • 34. SimpleDB API Summary ● CreateDomain ● DeleteDomain ● ListDomains ● PutAttributes ● BatchPutAttributes ● DeleteAttributes ● BatchDeleteAttributes ● GetAttributes ● Select ● DomainMetadata 34
  • 35. Who uses SimpleDB? ● Netflix ● Other Amazon EC2 customers... 35
  • 36. memcached ● General purpose distributed memory caching system ● Often used to cache in RAM that might otherwise be obtained from an external data source ● LRU (when cache is full) ● Can be distributed across multiple machines 36
  • 37. Who uses memcached? ● YouTube ● Zynga ● Facebook ● Twitter 37
  • 38. Terracotta ● JVM in-memory distributed cache / store ● The object store can be persistent ● Distribution between nodes is handled through Terracotta server ● Supports multiple Terracotta servers ● Nodes only receive data they need/reference 38
  • 39. Who uses Terracotta? ● Sakai (thanks to John Wiley & Sons) ● PartyGaming (PartyPoker.com) ● Adobe ● Pearson 39
  • 41. Yahoo! ● Hadoop – http://developer.yahoo.com/blogs/hadoop – More than 100,000 CPUs in >36,000 computers running Hadoop – Our biggest cluster: 4000 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM) – Used to support research for Ad Systems and Web Search – Also used to do scaling tests to support development of Hadoop on larger clusters – >60% of Hadoop Jobs within Yahoo are Pig jobs 41
  • 42. Twitter ● How Twitter Uses NoSQL – http://goo.gl/Bwxoe ● Scribe – Syslog stopped scaling ● Hadoop – Needs to store more data per day than it can reliably write to a single hard drive ● Pig – Used for interacting with Hadoop ● Hbase – People Search ● FlockDB – Social Graph Analysis 42
  • 43. Netflix ● NoSQL at Netflix – http://goo.gl/SDcsZ ● SimpleDB – Highly durable, with writes automatically replicated across availability zones within a region – Love it when others do heavy lifting for us ● Hadoop/HBase – Convenient, high-performance column-oriented distributed database solution – HBase makes it really easy to grow your cluster and re-distribute load across nodes at runtime ● Cassandra – Adding more servers, without the need to re-shard 43
  • 44. Facebook ● http://goo.gl/J9EVW ● 350 million users sending over 15 billion person-to-person messages per month ● Chat service supports over 300 million users who send over 120 billion messages per month ● Two patterns emerged – A short set of temporal data that tends to be volatile – An ever-growing set of data that rarely gets accessed ● Evaluate clusters of MySQL, Apache Cassandra, Apache HBase, and a couple of other systems – MySQL proved to not handle the long tail of data well (as indexes/data grows large performance suffers – Cassandra's eventual consistency model to be a difficult pattern to reconcile for our new Messages infrastructure. 44
  • 45. “There is a learning curve and an operational overhead. Still, the scalability, availability and performance advantages of the NoSQL persistence model are evident and are paying for themselves already, and will be central to our long-term cloud strategy.” Yury Izrailevsky, Netflix 45
  • 46. Questions & Answers Cris J. Holdorph Software Architect Unicon, Inc. Twitter: @holdorph holdorph@unicon.net www.unicon.net 46