Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

North Bay Ruby Meetup 101911

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 23 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (19)

Les utilisateurs ont également aimé (20)

Publicité

Similaire à North Bay Ruby Meetup 101911 (20)

Publicité

Plus récents (20)

North Bay Ruby Meetup 101911

  1. 1. Devs, Ops, & Data A gentle introduction to the world of Data Engine Yard
  2. 2. Agenda •Context •RDBMs •Do’s and Don’ts •NoSQL •Survey •General Advice •Questions Engine Yard 2
  3. 3. I work with Data! • Data Engineer @ Engine Yard • Organizer of the DFW Big My Team Data Group (@dfwbigdata) is Hiring! • Previous life • Sr Web Developer • Data Architect • Student: MS in CSCI & MS in Info Mgmt from WashU STL Engine Yard 3
  4. 4. The Universe Relational NoSQL/CoSQL “The rest” VS VS Engine Yard 4
  5. 5. Relational World • ACID Properties - Atomicity • Either all of a transaction’s actions are committed or none are - Consistency • Any transaction the database performs will take it from one consistent state to another - Isolation • Operations cannot access data that has been modified during a transaction that has not yet completed - Durability • DBMS recover the committed transaction updates against any kind of system failure (hardware or software) Engine Yard 6
  6. 6. Relational World • Relational Model - How data should be formatted (Normalized) • Unified Language for Querying - SQL • Data - Tabular, structured, relatively centralized • Scale vertically - Go up until no more - Sharding goes against the model! • Established theory & algorithms Engine Yard 7
  7. 7. RDBMS Performance • Size of your data matters - Migrations - Schema changes • Hardware matters - Disk IO - RAM • Restores are not Magic • Debug your queries - explain() - slow query logs • Indexes! Engine Yard 8
  8. 8. A bit of context • CAP - Consistency • All clients consistent view of data - Availability • Clients have access to read & write data - Partition Tolerance • System won’t fail if individual nodes can’t communicate • Horizontal Scaling • Data interaction is DB-specific • Data - Unstructured - Large quantities Engine Yard 10
  9. 9. A bit of context Data Model Column- C Key/Value Document Graph Oriented o n s Single Membase, i Master MongoDB Neo4j Redis* s t e n Multi- Cassandra, c Master/ Riak CouchDB HBase, Dynamo Hypertable y Engine Yard 11
  10. 10. A bit of context http://blog.nahurst.com/visual-guide-to-nosql-systems Engine Yard 12
  11. 11. Survey of NoSQL Stores • Disk-backed in-memory database • Datatype Server (awesome!) • Has notion of transactions • Pros: - Blazing fast, easy to set up • Cons: - May not be best for large databases • Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). - Stock prices. Analytics. Real-time data collection. Real-time communication Engine Yard 13
  12. 12. Survey of NoSQL Stores • Document Oriented DB (Erlang) • Flexible replication (MM, MS) • Pros: - DB consistency, ease of use - MVCC - write operations do not block reads • Cons: - Needs compacting from time to time • Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important. - CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi- site deployments. Engine Yard 14
  13. 13. Survey of NoSQL Stores • Dynamo-based key/value store • Pros: - Fault tolerance - Distributed - Scalable • Cons: - Learning curve is a bit steep - multi-site replication in commercial version only • Best used: If you want something Cassandra- like (Dynamo-like) but simpler. - Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Engine Yard 15
  14. 14. Survey of NoSQL Stores • Document Oriented DB (binary JSON) • Memory Mapped Files, Schema-less • Pros: - Easy to get started - 2 Ruby ORMS (MongoId, MongoMapper) • Cons: - Cluster reconfiguration tricky • Best used: If you need dynamic queries. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks. - For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back. Engine Yard 16
  15. 15. Survey of NoSQL Stores • Column-based DB - Facebook • Pros: - Best of BigTable (column families) and Dynamo - Querying by column, range of keys • Cons: - Bloat and complexity (Java) • Best used: When you write more than you read (logging). If every component of the system must be in Java. - Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis. Engine Yard 17
  16. 16. Survey of NoSQL Stores • Graph DB written in Java • Pros: - Native way to describe relationships - Advanced path-finding with multiple algorithms • Cons: -? • Best used: For graph-style data. Neo4j is quite different from the others in this sense. - Social relations, public transport links, road maps, network topologies. Engine Yard 18
  17. 17. Survey of NoSQL Stores • MapReduce Framework • Distributed FS, task tracker, ... (full Ecosystem) • Pros: - Process extremely large data volumes • Cons: - Bloat and complexity (Java), steep learning curve • Best used: When you write more than you read (logging). If every component of the system must be in Java. - Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis. Engine Yard 19
  18. 18. “The rest” • Full Text Search - Awesome search capabilities - Helps reduce load of your database • Caches - Very fast! Use in any application where low-latency data access - Membase • Memcache compatible, but with persistence and clustering • All nodes are identical (master-master replication) Engine Yard 20
  19. 19. Keep in mind • Data and query models • Durability needs • Scalability needs • Partition needs - data on multiple servers? • Consistency - reads! • Server performance • Analytical workload Engine Yard 21
  20. 20. Advice • Give them a try! - Fast to set up • Data Models matter - Going from RDBMS to NoSQL will require a conversion step so plan for it • No silver bullet - Your app will likely use different repositories for specific usages Engine Yard 22
  21. 21. Questions? Engine Yard 23

Notes de l'éditeur

  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

×