Contenu connexe




  1. NOSQL, NO? Introductory presentation
  2. RELATIONAL  SQL  ACID  Relational algebra  Optimal for ad-hoc queries  Tables, Columns, Rows  Sharding can be difficult  Metadata separate from data  Normalized data  Optimized storage
  3. POPULAR RDBMS  MySQL  Informix  SQL Server  Progress  Oracle  Pervasive  Postgres  Sybase  DB2  Access  Interbase, Firebird …
  4. SQL  Unified language to create and query both data and metadata  Similar to English  Verbose(!)  Can get complex for non-trivial queries  Does not expose execution plan – you say what you want it to return, not how
  5. SQL EXAMPLES  If you can say what you mean, you can query the existing data  Results are near-instant when querying based on primary key select * from valute where id=1 and sid=42  Results are fast when querying based on non-unique index select valuta from valute where ((id=1 and sid=42)) and (valute.firma_id=123 and valute.firma__sid=1)  Very readable for trivial queries select r.customer,sum(rs.iznos) sveukupno from racuni r join racuni_stavke rs on where order by rs.ordinal
  6. SQL EXAMPLES  Not so readable for non-trivial queries select "MP" tip_prometa, mprac.broj broj_racuna, mprac_stavke.kolicina kolicina, (mprac.tecaj*mprac_stavke.kolicina*mprac_stavke.rabat_iznos) rabat_iznos, (round(mprac_stavke.cijena - mprac_stavke.rabat_iznos - mprac_stavke.rabat2_iznos - mprac_stavke.rabat3_iznos - mprac_stavke.porez1 - mprac_stavke.porez2 - mprac_stavke.porez_potrosnja,6)*mprac_stavke.kolicina) iznos, (mprac_stavke.kolicina* ifnull((select sum(pn_cijena*kolicina)/sum(kolicina) from mprac_skl left join skl_stavke on mprac_skl.skl_id=skl_stavke.skl_id and mprac_skl.skl__sid=skl_stavke.skl__sid where and mprac_skl.mprac__sid=mprac.sid and skl_stavke.artikl_id=mprac_stavke.artikl_id and skl_stavke.artikl__sid=mprac_stavke.artikl__sid ),0) ) iznos_nabavno, ifnull( (select sum(mprac_stavke.kolicina*ambalaze.naknada_kom) from artikli_ambalaze left join ambalaze on and ambalaze.sid=artikli_ambalaze.ambalaza__sid where and artikli_ambalaze.artikl__sid=artikli.sid and ambalaze.kalkulacija="N" ),0) naknada, radnici_komercijalisti.ime racun_komercijalist_ime, (select naziv from skladista where skladista.tip_skladista="M" and pj_id=mprac.pj_id limit 1) skladiste_naziv , pj.naziv pj_naziv, mprac.datum, cast(concat("(",if(DayOfWeek(mprac.datum)=1,7,DayOfWeek(mprac.datum)-1),") ", if(DayOfWeek(mprac.datum)=1,"1 Nedjelja", if(DayOfWeek(mprac.datum)=2,"2 Ponedjeljak", if(DayOfWeek(mprac.datum)=3,"3 Utorak", if(DayOfWeek(mprac.datum)=4,"4 Srijeda", if(DayOfWeek(mprac.datum)=5,"5 Èetvratk", if(DayOfWeek(mprac.datum)=6,"6 Petak", if(DayOfWeek(mprac.datum)=7,"7 Subota","")))))))) as char(15)) dan_u_tjednu, cast(month(mprac.datum) as unsigned) mjesec, cast(week(mprac.datum) as unsigned) tjedan, cast(quarter(mprac.datum) as unsigned) kvartal, cast(year(mprac.datum) as unsigned) godina, cast(if(tipovi_komitenata.tip="F",trim(concat(partneri.ime," ",partneri.prezime)),partneri.naziv) as char(200)) kupac_naziv, partneri_mjesta.postanski_broj kupac_mjesto, partneri_mjesta.mjesto kupac_mjesto_naziv, partneri_grupe_mjesta.naziv …
  7. RDBMS SCALING  Vertical scaling • Better CPU, more CPUs • More RAM • More disks • SAN  Partitioning  Sharding
  8. PARTITIONING  With many rows and heavy usage, partitioning is a must  What to partition • Tables • Indexes • Views  Typical cases • Monthly data • Alphabetical keys
  9. RDBMS SHARDING  Sharding means using several databases where each represents part of data (500 clients on one server, another 500 on another)  Requires changing application code connect(calculate_server_from(sharding_key))  Impossible to join data from different databases, so choose your sharding key wisely  Very difficult to repartition your databases based on a new key
  10. RDBMS METADATA  Metadata: data describing other data  RDBMS structures are explicitly defined, and each data type is optimized for storage  Lots of constraints  Can get slow with lot of data
  11. NOSQL  “Not SQL”, “Not only SQL”  Core NoSQL databases invented mostly because RDBMS made life very hard for huge and heavy traffic web databases  NoSQL databases are the ones significantly different from relational databases
  12. NOSQL TYPES  Wide Column Store / Column Families  Document Store  Key Value / Tuple Store  Graph Databases  Object Databases  XML Databases  Multivalue Databases
  13. 4 MAIN DATA MODELS  Key-Value Stores  BigTable Clones (aka "ColumnFamily")  Document Databases  Graph Databases Source:
  14. KEY/VALUE STORES  Lineage: Amazon's Dynamo paper and Distributed HashTables.  Data model: A global collection of key-value pairs.  Example: Voldemort, Dynomite, Tokyo Cabinet Source:
  15. BIGTABLE CLONES  Lineage: Google's BigTable paper.  Data model: Column family, i.e. a tabular model where each row at least in theory can have an individual configuration of columns.  Example: HBase, Hypertable, Cassandra Source:
  16. DOCUMENT DATABASES  Lineage: Inspired by Lotus Notes.  Data model: Collections of documents, which contain key-value collections (called "documents").  Example: CouchDB, MongoDB, Riak Source:
  17. GRAPH DATABASES  Lineage: Draws from Euler and graph theory.  Data model: Nodes & relationships, both which can hold key-value pairs  Example: AllegroGraph, InfoGrid, Neo4j Source:
  18. POPULAR NOSQL  Hadoop / Hbase  MemcacheDB  Cassandra  Voldemort  Amazon SimpleDB  Hypertable  MongoDB  Cloudata  CouchDB  IBM Lotus/Domino  Redis
  19. NOSQL CHARACTERISTICTS  Almost infinite horizontal scaling  Very fast  Performance doesn’t deteriorate with growth (much)  No fixed table schemas  No join operations  Ad-hoc queries difficult or impossible  Structured storage  Almost everything happens in RAM
  20. REAL-WORLD USE  Cassandra • Facebook (original developer, used it till late 2010) • Twitter • Digg • Reddit • Rackspace • Cisco  BigTable • Google (open-source version is HBase)  MongoDB • Foursquare • Craigslist • • SourceForge • GitHub
  21. WHY NOSQL?  Handles huge databases (I know, I said it before)  Redundancy, data is pretty safe on commodity hardware  Super flexible queries using map/reduce  Rapid development (no fixed schema, yeah!)  Very fast for common use cases
  22. PERFORMANCE  RDBMS uses buffer to ensure ACID properties  NoSQL does not guarantee ACID and is therefore much faster  We don’t need ACID everywhere!  I used MySQL and switched to MongDB for my analytics app • Data processing (every minute) is 4x faster with MongoDB, despite being a lot more detailed (due to much simple development)
  23. SCALING  Simple web application with not much traffic • Application server, database server all on one machine
  24. SCALING  More traffic comes in • Application server • Database server
  25. SCALING  Even more traffic comes in • Load balancer • Application server x2 • Database server
  26. SCALING  Even more traffic comes in • Load balancer x N • easy • Application server x N • easy • Database server xN • hard for SQL databases
  27. SQL SLOWDOWN  Not linear!  /scaling-sql-and-nosql-databases-in-the- cloud
  28. NOSQL SCALING  Need more storage? • Add more servers!  Need higher performance? • Add more servers!  Need better reliability? • Add more servers!
  29. SCALING SUMMARY  You can scale SQL databases (Oracle, MySQL, SQL Server…) • This will cost you dearly • If you don’t have a lot of money, you will reach limits quickly  You can scale NoSQL databases • Very easy horizontal scaling • Lots of open-source solutions • Scaling is one of the basic incentives for design, so it is well handled • Scaling is the cause of trade-offs causing you to have to use map/reduce
  30. RAM  Why map/reduce? I just need some simple queries. Tomorrow I will need some other queries….  SQL databases are optimized for very efficient disk access, but for significant scaling need RAM caching (MySQL+memcached)  NoSQL databases are designed to keep whole working set in RAM
  31. WORKING SET  In real-world use working set is much less than complete database • For analytics 99% of queries will be regarding last 30 days  As you need RAM only for working set, you can use commodity servers, VPS, and just add more as your app becomes more popular
  32. WORKING SET WOES  Foursquare has millions of users and working set the same as the database  They used a single 66GB Amazon EC2 High-Memory Quadruple Extra Large Instance (with cheese) for millions of users  When their RAM usage was 65GB, they decided to shard  Too late, they started to have disk swaps  Disk is much slower than RAM - 100x slowdown  Server could not keep up due to swapping  11 hours outage (ouch!)
  33. MAP/REDUCE  Google’s framework for processing highly distributable problems across huge datasets using a large number of computers  Let’s define large number of computers • Cluster if all of them have same hardware • Grid unless Cluster (if !Cluster for old-style programmers)
  34. MAP/REDUCE  Process split into two phases • Map • Take the input, partition it delegate to other machines • Other machines can repeat the process, leading to tree structure • Each machine returns results to the machine who gave it the task • Reduce • collect results from machines you gave the tasks • combine results and return it to requester • Slower than sequential data processing, but massively parallel • Sort petabyte of data in a few hours • Input, Map, Shuffle, Reduce, Output
  35. MAP/REDUCE EXAMPLE  You need to write two functions  Count different words in a set of documents
  36. MONGODB  Document store  Basic support for dynamic (ad hoc) queries  Query by example (nice!)
  37. MONGODB  Conditional Operators • <, <=, >, >= • $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and, $size, $type  Regular expressions
  38. MONGODB  Data is stored as BSON (binary JSON) • Makes it very well suited for languages with native JSON support  Map/Reduce written in Javascript • Slow! There is one single thread of execution in Javascript  Master/slave replication (auto failover with replica sets)  Sharding built-in  Uses memory mapped files for data storage  Performance over features  On 32bit systems, limited to ~2.5Gb  An empty database takes up 192Mb  GridFS to store big data + metadata (not actually an FS) Source:
  39. CASSANDRA  Written in: Java  Protocol: Custom, binary (Thrift)  Tunable trade-offs for distribution and replication (N, R, W)  Querying by column, range of keys  BigTable-like features: columns, column families  Writes are much faster than reads (!) • Constant write time regardless of database size  Map/reduce possible with Apache Hadoop Source:
  40. HBASE  Written in: Java  Main point: Billions of rows X millions of columns  Modeled after BigTable  Map/reduce with Hadoop  Query predicate push down via server side scan and get filters  Optimizations for real time queries  A high performance Thrift gateway  HTTP supports XML, Protobuf, and binary  Cascading, hive, and pig source and sink modules  No single point of failure  While Hadoop streams data efficiently, it has overhead for starting map/reduce jobs. HBase is column oriented key/value store and allows for low latency read and writes.  Random access performance is like MySQL Source:
  41. REDIS  Written in: C/C++  Main point: Blazing fast  Disk-backed in-memory database,  Master-slave replication  Simple values or hash tables by keys,  Has sets (also union/diff/inter)  Has lists (also a queue; blocking pop)  Has hashes (objects of multiple fields)  Sorted sets (high score table, good for range queries)  Has transactions (!)  Values can be set to expire (as in a cache)  Pub/Sub lets one implement messaging (!) Source:
  42. COUCHDB  Written in: Erlang  Main point: DB consistency, ease of use  Bi-directional (!) replication, continuous or ad-hoc, with conflict detection, thus, master-master replication. (!)  MVCC - write operations do not block reads  Previous versions of documents are available  Crash-only (reliable) design  Needs compacting from time to time  Views: embedded map/reduce  Formatting views: lists & shows  Server-side document validation possible  Authentication possible  Real-time updates via _changes (!)  Attachment handling  CouchApps (standalone JS apps) Source:
  43. HADOOP  Apache project  A framework that allows for the distributed processing of large data sets across clusters of computers  Designed to scale up from single servers to thousands of machines  Designed to detect and handle failures at the application layer, instead of relying on hardware for it
  44. HADOOP  Created by Doug Cutting, who named it after his son's toy elephant  Hadoop subprojects • Cassandra • HBase • Pig  Hive was a Hadoop subproject, but is now a top-level Apache project  Used by many large & famous organizations •  Scales to hundreds or thousands of computers, each with several processor cores  Designed to efficiently distribute large amounts of work across a set of machines  Hundreds of gigabytes of data constitute the low end of Hadoop-scale  Built to process "web-scale" data on the order of hundreds of gigabytes to terabytes or petabytes
  45. HADOOP  See with-apache-hadoop-pig  Uses Java, but allows streaming so other languages can easily send and accept data items to/from Hadoop
  46. HADOOP  Uses distributed file system (HDFS) • Designed to hold very large amounts of data (terabytes or even petabytes) • Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications • Data organized into directories and files • Files are divided into block (64MB by default) and distributed across nodes  Design of HDFS is based on the design of the Google File System
  47. HIVE  A petabyte-scale data warehouse system for Hadoop  Easy data summarization, ad-hoc queries  Query the data using a SQL-like language called HiveQL  Hive compiler generates map-reduce jobs for most queries
  48. PIG  Platform for analyzing large data sets  High-level language for expressing data analysis programs  Compiler produces sequences of Map-Reduce programs  Textual language called Pig Latin • Ease of programming • System optimizes task execution automatically • Users can create their own functions
  49. PIG LATIN  Pig Latin – high level Map/Reduce programming  Equivalent to SQL for RDBMS systems.  Pig Latin can be extended using Java User Defined Functions  “Word Count” script in Pig Latin
  52. SUMMARY  NoSQL is a great problem solver if you need it  Choose your NoSQL platform carefully as each is designed for specific purpose  Get used to Map/Reduce  It’s not a sin to use NoSQL alongside (yes)SQL database  I am really happy to work with MongoDB  instead of MySQL