Non-Relational Databases: This hurts. I like it.

1. Non-Relational Databases: This hurts. I like it. Christopher Groskopf / bouvard / @onyxfish

3. First! A Hypothetical

4. I want to query space.

6. 100,000 stars

7. 3.5 years of constant observation

8. Sensitive measurements

9. How would you store this data so that your researchers can analyze it effectively?

10. (Hint: It is probably not sqlite on a thumb drive.)

11. The Relational Model

13. Enforces data integrity

14. Minimizes repetition

15. Proven

18. Joins rapidly become a bottleneck

19. Difficult to scale up

20. Gets in the way of parallelization

21. Optimization may mitigate the benefits of normalization

22. The Non-Relational Model

24. Master ↔ Master replication

25. Scales well

26. Map/Reduce means everything runs in parallel

28. Integrity-enforcement migrates to code

29. Limited ORM tooling

30. Significant learning curve

31. Proven only in a subset of cases

32. Second! Platforms

34. Often, they offer Master ↔ Master replication

35. In most cases they store schema-less data

36. Typically they scale by “automatic” sharding

37. Sometimes they offer “eventual consistency”

38. For the most part they are fast

39. Generally they are targeted at web applications

40. Frequently we can't define what they are

42. Imagine if Memcache was your database

43. That is more or less what an NRDB is

44. Except that everything is permanently “cached” to disk

45. And only the most common result sets are in held in RAM (it could be all of them)

46. In most cases this is faster than computing fresh results based on indices (that is, SQL)

48. Berkeley DB

49. BigTable

50. Cassandra

51. CouchDB

52. HyperTable

53. MongoDB

54. Project Voldemort

55. SimpleDB

56. Tokyo Cabinet

58. Berkeley DB ->

59. BigTable ->

60. Cassandra ->

61. CouchDB ->

62. HyperTable ->

63. MongoDB ->

64. Project Voldemort ->

65. SimpleDB ->

67. This is not a fad.

69. Unstructured data

70. Massive datasets (broad > deep)

71. Fuzzy and/or fault tolerant data

72. Versioned data

73. Logging

74. When eventual consistency is good enough

75. If you are storing a JSON or XML string in your SQL database: I Have Your Medicine

77. Deeply hierarchical datasets

78. Data integrity that must be enforced by a DBA

79. High security applications where the database must enforce that security (LAN/WAN facing)

80. Transactional data (banking, analytics, etc.)

81. Usage is highly unpredictable, combinatorial, or likely to change suddenly

82. Third! Voter's Daily and CouchDB

86. Understood by the Gov2.0 community

87. Reusable / Educational / Transparent

89. “Speaks” JSON

90. “Thinks” Javascript (optionally, Python)

91. RESTful API

92. Pre-collates Views (on insert) for fast reads

93. Supports Master ↔ Master replication

94. “Futon” management interface

95. Written in Erlang

96. An Example JSON Document { " _id ": "2006-12-06T00:00:00Z - C-SPAN House Ways and Means Committee Schedule Scraper" , " _rev ": "1-2ca577e0a4a25ad2704fdf5a20161f9f" , " datetime ": "2006-12-06T00:00:00Z" , " end_datetime ": null , " title ": "Hearing on Patient Safety and Quality Issues in End Stage Renal Disease Treatment" , " description ": null , " branch ": "Legislative" , " entity ": "House of Representatives" , " source_url ": "http://www3.capwiz.com/c-span/dbq/officials/schedule.dbq?committee= hways&command=committee_schedules&chambername=House&chamber=H& period=" , " source_text ": "DECEMBER 06, 2006 000a0009Hearing on Patient Safety and Quality Issues in End Stage Renal Disease Treatment " , " access_datetime ": "2009-09-28T04:19:02Z" , " parser_name ": "C-SPAN House Ways and Means Committee Schedule Scraper" , " parser_version ": "0.1" }

100. Harnessing “high availability” requires a large up-front investment of development time

101. Map/Reduce and SQL shouldn't even be used in the same sentence (GQL is a stupid name)

102. Schema-less data is fantastic

103. Integrity checking in code is not so bad (that is what abstraction is for)

104. Doing Joins in code is actually very liberating

106. But you ought to learn one anyway

107. It's not just for Twitter and bleeding edge startups

108. Amazon, Facebook, Google, IBM, and Microsoft all get this

109. Sometimes it is simply the right tool for the job

111. CouchDB & Map/Reduce Emulator: http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html

112. NASA's Kepler Mission: http://kepler.nasa.gov/

113. ReadWriteWeb on NRDBs: http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php

114. Voter's Daily: http://github.com/bouvard/votersdaily

Non-Relational Databases: This hurts. I like it.

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Non-Relational Databases: This hurts. I like it.

Similaire à Non-Relational Databases: This hurts. I like it. (20)

Dernier

Dernier (20)

Non-Relational Databases: This hurts. I like it.