Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Consultez-les par la suite

1 sur 82 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à NoSQL (20)


Plus récents (20)


  1. 1. NoSQL  Databases     Yousof  Alsatom   Wirtscha1sinforma3k  Master  Program   Humboldt-­‐Universität  zu  Berlin     2012    
  2. 2. Agenda   •  Rela3onal  databases  model   •  Advantages  &  Disadvantages   •  NoSql   •  Basic  Concepts,  Technique  and  PaOern  in  comparison  with  DBRMS   •  Consistency   •  Par33oning   •  Storage  Layout   2  
  3. 3. Agenda   •  NoSQL  data  model   •  Key  –  Value   •  DynamoDB   •  Big  table  –  column  family   •  Google  bigtable   •  Document  Databases   •  CouchDB   •  GraphDB     •  Neo4j   •  Conclusion   3  
  4. 4. Database  and  DBMS   •  In  essence,  a  database  is  a  collec3on  of  data  that  exists  over  a  long  period  of   3me,  o1en  many  years.     •   Commonly,  the  term  database  refers  to  a  collec3on  of  data  that  is  managed   by  a  Database  Management  System  (DBMS).   •  A  DBMS  is  a  (powerful)  tool  for  crea3ng  and  managing  large  amounts  of  data   efficiently  and  allowing  it  to  persist  over  long  periods  of  3me,  safely.     4  
  5. 5. Rela9onal  Model   •  A  rela3onal  database  is  a  collec3on  of  data  items  organized  as  a  set  of   formally-­‐described  tables  from  which  data  can  be  accessed  or  reassembled  in   many  different  ways  without  having  to  reorganize  the  database  tables.   [techtarget.com].   Edgar  Frank  "Ted"  Codd     (August  23,  1923  –  April  18,  2003)   IBM,   5  
  6. 6. Rela9onal  Database   •  A  rela9onal  database  is  a  collec3on  of  data  items  organized  as  a  set  of   formally  described  tables  from  which  data  can  be  accessed  easily  [Wikipedia].   6  
  7. 7. Example,  Project  Management  System  [Qian  Sha,  2003]   7  
  8. 8. Example,  Project  Management  System  [Qian  Sha,  2003]   8  
  9. 9. Example,  Project  Management  System  [Qian  Sha,  2003]   •  Possible  queries   •  Give  ma  all  employees  who  is  working  in  project  X   •  Give  me  the  percentage  of  progress  for  project  Y     9  
  10. 10. Rela9onal  Database,  Advantages   •  Reliability       •  ACID     •  Atomicity  :  All  or  nothing   •  Consistency     •  Isola3on   •  concurrent  execu3on  of  transac3ons  results  in  a  system  state  that  could   have  been  obtained  if  transac3ons  are  executed  serially   •  Durability   •  means  that  once  a  transac3on  has  been  commiJed,  it  will  remain  so,   even  in  the  event  of  power  loss,  crashes,  or  errors.     10  
  11. 11. Rela9onal  Database,  Limita9on   •  Scalability       •  Users  can  scale  a  rela3onal  database  by  running  it  on  a  more  powerful— and  expensive—  computer.     •  To  scale  beyond  a  certain  point,  though,  it  must  be  distributed  across   mul3ple  servers.     •  Rela3onal  databases  don’t  work  easily  in  a  distributed  manner  because   joining  their  tables  across  a  distributed  system  is  difficult.  [Jeremy   Zawodny]   •  Complexity       •  Convert  all  data  into  tables,  Complex,  slow  (Exampl  :  Wikipedia)   •  SQL  can  work  only  with  structured  data  [  Prof.  Stefan  Edlich,  Beuth  University   of  Applied  Sciences  in  Berlin]     11  
  12. 12. Rela9onal  Database,  Limita9on   Spandauer Str.1, Berlin 12  
  13. 13. Problem!   Diversity   Connec3vity   Data  size   ?   ?   ?   13  
  14. 14. 14  
  15. 15. NoSQL   •  Not  using  the  rela3onal  model  (nor  the  SQL  language)   •  No  schema,  allowing  fields  to  be  added  to  any  record  without  controls     •  Open  source   •  Designed  to  work  on  large  clusters   •  Based  on  the  needs  of  21st  century  web  proper3es   15  
  16. 16. NoSQL,  History   •  Carlo  Strozzi  used  the  term  NoSQL  in  1998  to  name  his  lightweight,  open-­‐ source  rela3onal  database  that  did  not  expose  the  standard  SQL  interface.   •  Johan  Oskarsson  has  organized  a  meetup  for  folks  interested  in  distributed   structured  data  storage  and  is  calling  it  NoSQL.  The  event,  being  held  June   11th  in  San  Fransisco,   16  
  17. 17. NoSQL   •  Consistency     •  It  uses  an  eventual  consistency  (consistency  model  used  in  the  parallel   programming).   •  Weak  consistent   •  Par33oning     •  Automa3c  Par33oning  (Data  is  growing  )   •  Storage  Layout   •  Row-­‐Based  Storage  Layout   •  Columnar  Storage  Layout     •  …     17  
  18. 18. NoSQL   •  Data  Model   •  Key  /  Value   •  Bigtable   •  DocumentDB   •  GraphDB   18  
  19. 19. Key  /  Value   19  
  20. 20. Hash  Table   •  Type  Unsorted  associa3ve  array       •  Invented:  1953       •  Time  complexity  :  in  big  O  nota3on     Average   Worst  case   Space   O(n)   O(n)   Search   O(1  +  n/k)   O(n)   Insert   O(1)   O(n)   Delete   O(1  +  n/k)   O(n)   Wikipedia  :  hOp://en.wikipedia.org/wiki/Hash_tables   20  
  21. 21. Key  –  Value   •  The  infrastructure  is  made  up  by  tens   of  thousands  of  servers  and  network   components  located  in  many   datacenters  around  the  world.     •  Availability  &  reliability    are  the  most   important  factors  for  Amazon   •  Dynamo  targets  to  achieve  high   availability  with  less  consistency   Service-­‐oriented  architecture  of  Amazon’s  plaXorm   Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007. 21  
  22. 22. Key  –  Value,  Dynamo  History   •  Giuseppe  DeCandia  militate  against  RDMBSs  at  Amazon   •  They  admit  that  advances  have  been  made  to  scale  and  par33on  RDBMSs   but  state  that  such  setups  remain  difficult  to  configure  and  operate,    2006   •  Dynamo  has  built  on  2007   22  
  23. 23. Dynamo,  Consistency  Hashing   Data  is  par33oned  and  replicated  using  consistent  hashing       •  Goal  :  Scalability  and  Availability   •   the  output  range  of  a  hash  func3on  is  treated  as  a  fixed  circular  space  or   ““ring”   •  Ordered  (new  node  take  random  key)   •  Clockwise   •  Departure  or  arrival  a  node  effect  only            neighbors       •  Each  node  becomes  responsible  for  the  region  in  the  ring  between  it  and  its   predecessor  node  on  the  ring.     •  ”Virtual  Nodes”:  Each  node  can  be  responsible  for  more  than  one  virtual  node.   Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007. 23  
  24. 24. Dynamo,  Vector  Clock   •  Data  Versioning,  Dynamo  uses  vector   Object   Node   clocks  in  order  to  capture  causality   between  different  versions  of  the   same  object.     Clock   •  A  vector  clock  is  a  list  of  (node,   counter)  pairs.   •  Every  version  of  every  object  is   associated  with  one  vector  clock.   •  If  the  counters  on  the  first  object’s   clock  are  less-­‐than-­‐or-­‐equal  to  all  of   the  nodes  in  the  second  clock,  then   the  first  is  an  ancestor  of  the  second   and  can  be  forgoOen.         Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007.   24  
  25. 25. Dynamo,  Overview     Source  :  hOp://de.wikipedia.org/wiki/Amazon_Dynamo     25  
  26. 26. Dynamo,  Sloppy  Quorum     •  Handling  Failures,  Sloppy  Quorum   •  A  quorum  is  the  minimum  number  of  votes  that  a  distributed  transac3on   has  to  obtain  in  order  to  be  allowed  to  perform  an  opera3on  in  a   distributed  system.  [Wikipedia]   •  Sloppy  Quorum     •  read  and  write  opera3ons  are  performed  on  the  first  N  healthy  nodes   from  the  preference  list,  which  may  not  always  be  the  first  N  nodes   encountered  while  walking  the  consistent  hashing  ring.     •  Example  :   •  A  is  down  …   •  D  has  meta  data   •  When  A  come  back,  D  will  aOempt  to   deliver  the  replica  to  A     Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007. 26    
  27. 27. Dynamo,  Gossip-­‐based  membership  protocol  and  failure   detec9on.   •  A  gossip-­‐based  protocol  propagates  membership  changes  and  maintains   an  eventually  consistent  view  of  membership.     27  
  28. 28. Key  –  Value,  Dynamo   Problem   Technique   Advantage   Par33oning   Consistent  Hashing   Incremental  Scalability   Vector  clocks  with  reconcilia3on   Version  size  is  decoupled  from  update   High  Availability  for  writes   during  reads   rates.   Handling  temporary  failures   Sloppy  Quorum  and  hinted  handoff   Provides  high  availability  and   durability  guarantee  when  some  of       the  replicas  are  not  available.   Synchronizes  divergent  replicas  in  the   Recovering  from  permanent  failures   An3-­‐entropy  using  Merkle  trees   background.   Preserves  symmetry  and  avoids  having   a  centralized  registry  for  storing   Gossip-­‐based  membership  protocol   membership  and  node  liveness   Membership  and  failure  detec3on   and  failure  detec3on.   informa3on.     Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007.   28  
  29. 29. Key  –  Value,  Dynamo   •  Query  Model   •  get(key)  :  objects,  context     •  Context:  metadata  such  as  the  object  version  is  stored,  it  is  useful   in  case  of  conflict   •  put(key,  context,  object),  The  key  is  hashed  by  the  MD5  algorithm       29  
  30. 30. Other  Key  /  Value  NoSQL  tools   Riak  makes  data  highly  available  for  use  in  read  and  write-­‐intensive  web   applica3ons.   30  
  31. 31. Bigtable   31  
  32. 32. Bigtable   •  Bigtable  is  described  as  “a  distributed  storage  system  for  managing   structured  data  that  is  designed  to  scale  to  a  very  large  size:  petabytes  of   data  across  thousands  of  commodity  servers”  [Google  Labs]   •  Bigtable     •  distributed,   •  Persistent  mul3-­‐  dimensional  sorted  map.     •  The  map  is  indexed  by  a  row  key,  column  key,  and  a  3mestamp   •  Each  value  in  the  map  is  an  uninterpreted  array  of  bytes.     •  (row:string,  column:string,  3me:int64)  →  string     32  
  33. 33. Google’s  Bigtable   •  It  is  used  by  over  sixty  projects  at  Google  as  of  2006,     •  Web  indexing   •  Google  Earth   •  Google  Analy3cs   •  Orkut   •  Google  Docs   33  
  34. 34. Google’s  Bigtable,  Data  Model   •  Store  CNN  Web  pages   •  Row  name  is  the  reversed  URL     •  Contents  column  family  contains  the  page  contents     •  Anchor column family contains the text of any anchors that reference the page   Row   Column  Family   A  Distributed  Storage  System  for  Structured  Data.  November  2006.     hOp://labs.google.com/papers/bigtable-­‐osdi06.pdf     34  
  35. 35. Google’s  Bigtable,  Data  Model   •  CNN’s  home  page  is  referenced  by  both  the  Sports  Illustrated  and  the  MY-­‐ look  home  pages.   •  The  row  contains  columns  named  anchor:cnnsi.com  and   anchor:my.look.ca.     •  t3  :  3me  stamp   Row   Column  Family   A  Distributed  Storage  System  for  Structured  Data.  November  2006.     hOp://labs.google.com/papers/bigtable-­‐osdi06.pdf     35  
  36. 36. Google’s  Bigtable,  Data  Model   Tablet,  Rows  from  same  domain   Com.google.docs   Com.google.mail   Com.google.play   Tablet,  lexicographic  order   36  
  37. 37. Google’s  Bigtable,  Data  Model   •  Notes   •  Has  no  fixed  of  number  of  rows  or  columns   •  Every  value  also  has  an  associated  3mestamp     •  Each  value  is  addressed  by  the  triple  (domain-­‐name,  column-­‐name,   3mestamp)     37  
  38. 38. Google’s  Bigtable,  Query  Model   •  Wri3ng  to  table     38  
  39. 39. Google’s  Bigtable,  Query  Model   •  Reading  from  table   39  
  40. 40. Google’s  Bigtable,  More   •  Example  with  eclipse  :  hOp://www.kobu.com/appeng/index-­‐en.htm       •  Bigtable  as  a  web  service  :  hOp://bigtable.appspot.com/   •  Performance  and  benchmarking:  Chang,  Fay  ;  Dean,  Jeffrey  ;  Ghemawat,   Sanjay  ;  Hsieh,  Wilson  C.  ;  Wallach,  Deborah  A.  ;  Burrows,  Mike  ;  Chandra,   Tushar  ;  Fikes,  Andrew  ;  Gruber,  Robert  E.:  Bigtable:  A  Distributed   Storage  System  for  Structured  Data.  November  2006.  –  hOp:// labs.google.com/papers/bigtable-­‐osdi06.pdf     40  
  41. 41. Other  Bigtable  NoSQL  tools   Use  HBase  when  you  need  random,  real3me  read/write  access  to  your  Big   Data.  This  project's  goal  is  the  hos3ng  of  very  large  tables     41  
  42. 42. Document  Databases   42  
  43. 43. Document  Databases   •  Storing,  retrieving,  and  managing  document-­‐oriented,  or  semi  structured   data,  informa3on   •  Documents  encapsulate  and  encode  data  (or  informa3on)  in  some   standard  formats  or  encodings.     •  Encodings  in  use  include  XML,  YAML,  JSON,  and  BSON,  as  well  as  binary   forms  like  PDF  and  Microso1  Office  documents  (MS  Word,  Excel,  and  so   on).   Wikipedia  :  hOp://en.wikipedia.org/wiki/Document-­‐oriented_database 43  
  44. 44. CouchDB   •  Distributed  Database  System   •  Before  each  document  saved  as  XML     •  Javascript  func3on  (JSON  for  steriliza3on)  select  and  aggregate  documents     •  Current  Release  :  1.2  (April  2012)   •  Started  on  2005   •  Ini3a3ve  :  Damien  Katz   44  
  45. 45. CouchDB,  Overview   •  Implemented  by  ERLANG     •  ERLANG     •  Func3onal  language     •  It  was  designed  by  Ericsson  to  support  distributed,  fault-­‐tolerant,  so1-­‐ real-­‐3me,  non-­‐stop  applica3ons.   •  Code  example    fac(N)  when  N  >  0,  is_integer(N)  -­‐>  N  *  fac(N-­‐1)   45  
  46. 46. CouchDB,  Overview   •  Documents  consist  of  named  fields     •  key/name  and  a  value.   •  Fieldname  has  to  be  unique  within  a  document   •  Value  may  a  string  (of  arbitrary  length),  number,  boolean,  date,  an   ordered  list  or  an  associa3ve  map,  document  could  refer  to  another   document     •  Example,  wiki  ar3cle  (document):   •  "Title"  :  "CouchDB”,   •  "Last  editor"  :  "”,   •  "Last  modified":  "9/23/2010”,   •  "Categories":  ["Database",  "NoSQL",  "Document  Database"],     •  "Body":  "CouchDB  is  a  ...",   •  "Reviewed":  false   46  
  47. 47. CouchDB,  Overview   •  Each document has an id : 128 bit value •  Version number 32 bit value •  B-Trees do document indexing (id, version, some meta-data) 47  
  48. 48. CouchDB   •  CouchDB  uses  B-­‐tree  storage  engine  for  all  internal  data,  documents,  and   views.   •  Using  MapReduce,  return  and  key  or  range,  complexity  O(log  N)   Source  :CouchDB  the  Defini3ve  Guide,  O’REILLY,  Andelson,  Lebnardt  &  Slater   48  
  49. 49. CouchDB,  Revisions     •  If  you  want  to  change  a  field  in  specific  document?   •  Load  document     •  Change  it  in  JSON  or  your  object  in  actual  programming   •  For  update  or  delete  a  document,  CouchDB  expects  you  include  a  _rev   •  When  CouchDB  confirms  changes,  it  generate  a  new  _rev   •  This  revision  system  also  called  a  Mul3-­‐Version  Concurrency  control   MVCC   49  
  50. 50. CouchDB,  Locking  Mechanism     •  Mul3  Version  Concurrency  Control  MVCC   •  Documents  in  CouchDB  saved  like  they  are  in  Subversion  Control   Source  :  CouchDB  the  Defini3ve  Guide,  O’REILLY,  Andelson,  Lebnardt  &  Slater   50  
  51. 51. CouchDB,  Views   {   "_id":"hello-­‐world",     "_rev":"43FBA4E7AB",     "3tle":"Hello  World”,   "body":"Well  hello  and  welcome  to  my  new  blog...",     "date":"2009/01/15  15:52:20"     }       {   "_id":"bought-­‐a-­‐cat",     "_rev":"4A3BBEE711",     "3tle":"Bought  a  Cat",   "body":"I  went  to  the  the  pet  store  earlier  and  brought  home  a  liOle  kiOy...",   "date":"2009/02/17  21:13:39"     }       func3on(doc)  {    if(doc.date  &&  doc.3tle)  {      emit(doc.date,  doc.3tle);  }      }       51  
  52. 52. CouchDB,  AJachement   •  CouchDB  documents  can  have  aOachments  just  like  an  email  message  can   have  aOachments.     •  AOachment  is  iden3fied  by     •  Name     •  MIME  type  (or  Content-­‐Type),  any  data   •  Number  of  bytes  the  aOachment  contains.     •  Example  :     •  curl  -­‐vX  PUT  hOp:// 6e1295ed6c29495e54cc05947f18c8af/    artwork.jpg? rev=2-­‐2739352689  -­‐-­‐data-­‐binary  @artwork.jpg  -­‐H  "Content-­‐Type:   image/jpg"     •  Retrieve  aOachment:   •  h7p:// artwork.jpg     52  
  53. 53. CouchDB,  Replica9on   •  CouchDB  replica3on  is  a  mechanism  to  synchronize  databases.       •  Replica3on  synchronizes  two  databases  locally  or  remotely.     53  
  54. 54. CouchDB,  Replica9on   •  Create  target  Database  (it  is  not  automa3c)   •  curl  -­‐X  PUT  hOp://­‐replica   •  Perform  replica3on:   •  curl  -­‐vX  POST  hOp://    -­‐d   '{"source":"albums","target":"albums-­‐replica"}'     •  What  we  did  local  replica3on,  it  is  useful  for  backup  or  to  ac3viate  roll  back   •  It  is  important  to  note  that  replica3on  replicates  the  database  only  as  it   was  at  the  point  in  3me  when  replica3on  was  started.     54  
  55. 55. Other  Document  Database  tools   •  MongoDB  (from  "humongous")  is  a  scalable,  high-­‐performance,  open   source  NoSQL  database.  WriOen  in  C++,   55  
  56. 56. Graph  Database   hOp://www.herr-­‐rau.de/wordpress/2006/06/your-­‐website-­‐as-­‐a-­‐graph.htm   56  
  57. 57. Graph  Databases   •  A  graph  database  uses  graph  structures  with  nodes,  edges,  and  proper3es   to  represent  and  store  data.  By  defini3on,  a  graph  database  is  any  storage   system  that  provides  index-­‐free  adjacency.  This  means  that  every  element   contains  a  direct  pointer  to  its  adjacent  element  and  no  index  lookups  are   necessary  [Wikipedia].   57  
  58. 58. Graph  Databases   Survey  of  Graph  Database  Models  ,  ACM  Compu3ng  Surveys,  Vol.  40,  No.  1,  Ar3cle  1,   Publica3on  date:  February  2008.  RENZO  ANGLES  and  CLAUDIO  GUTIERREZ,  University  Chile       58  
  59. 59. Graph  Databases,  Data  model  proper9es     •  Graph  databases  are  o1en  faster  for  associa3ve  data  sets   •  Scale  more  naturally  to  large  data  sets  as  they  do  not  typically  require   expensive  join  opera3ons.     •  As  they  depend  less  on  a  rigid  schema,  they  are  more  suitable  to  manage   ad-­‐hoc  and  changing  data  with  evolving  schemas.   •  Graph  databases  are  a  powerful  tool  for  graph-­‐like  queries   •  Compu3ng  the  shortest  path  between  two  nodes  in  the  graph.     •  Other  graph-­‐like  queries  can  be  performed  over  a  graph  database  in  a   natural  way  (for  example  graph's  diameter  computa3ons  or   community  detec3on).   59  
  60. 60. Graph  Databases,  Neo4j   •  Neo4j  is  an  open-­‐source  graph  database,  implemented  in  Java.   •  The  developers  describe  Neo4j  as  "embedded,  disk-­‐based,  fully   transac3onal  Java  persistence  engine  that  stores  data  structured  in  graphs   rather  than  in  tables".     •  Neo4j  version  1.0  was  released  in  February,  2010.     •  Neo4j  was  developed  by  Neo  Technology,  Inc.,  based  in  the  San  Francisco   Bay  Area,  US  and  Malmö,  Sweden.     60  
  61. 61. Neo4j,  Node  &  Rela9on   •  A  Graph  contains  Nodes  and  Rela3onships   •  “A  Graph  —records  data  in→  Nodes   —which  have→  Proper3es”   •  “Nodes  —are  organized  by→   Rela3onships  —which  also  have→   Proper3es”   61  
  62. 62. Neo4j,  Traversal   •   Query  a  Graph  with  a  Traversal   •  Traversal  —navigates→  a   Graph;  it  —iden3fies→  Paths   —which  order→  Nodes   •  A  Traversal  is  how  you  query   a  Graph,  naviga3ng  from   star3ng  Nodes  to  related   Nodes  according  to  an   algorithm,  finding  answers  to   ques3ons  like  “what  music   do  my  friends  like  that  I  don’t   yet  own,”  or  “if  this  power   supply  goes  down,  what  web   services  are  affected?”   62  
  63. 63. Neo4j,  Indexes   •  Indexes  look-­‐up  Nodes  or  Rela3onships   •  “An  Index  —maps  from→  Proper3es   —to  either→  Nodes  or  Rela3onships”   •  O1en,  you  want  to  find  a  specific   Node  or  Rela9onship  according  to  a   Property  it  has.  Rather  than   traversing  the  en3re  graph,  use  an   Index  to  perform  a  look-­‐up,  for   ques3ons  like  “find  the  Account  for   username  master-­‐of-­‐graphs.”   63  
  64. 64. Neo4j,  Database   •  Neo4j  is  a  Graph  Database   •  “A  Graph  Database  — manages  a→  Graph   and  —also  manages   related→  Indexes”   64  
  65. 65. Neo4j    Helloworld  example     firstNode  =  graphDb.createNode();   firstNode.setProperty(  "message",  "Hello,  "  );   secondNode  =  graphDb.createNode();   secondNode.setProperty(  "message",  "World!"  );       rela3onship  =  firstNode.createRela3onshipTo(  secondNode,  RelTypes.KNOWS  );   rela3onship.setProperty(  "message",  "brave  Neo4j  "  );   65  
  66. 66. Neo4j    &  Java  &  eclipse     Tutorial  :   hOp://technoracle.blogspot.de/2012/05/third-­‐neo4j-­‐tutorial-­‐geˆng-­‐started.html   •  import  org.neo4j.graphdb.GraphDatabaseService;   •  DB_PATH  =  “/Users/neo4j-­‐1.8”   •  GraphDatabaseService  graphDb;   •  Node  myFirstNode;   •  Rela3onship  myRela3onship;   •  graphDb  =  new  GraphDatabaseFactory().newEmbeddedDatabase(  DB_PATH  );   •  myFirstNode  =  graphDb.createNode();   •  myFirstNode.setProperty(  "name",  "Duane  Nickull,  I  Braineater"  );   •  mySecondNode  =  graphDb.createNode();   •  mySecondNode.setProperty(  "name",  "Randy  Rampage,  Annihilator"  );   •  myRela3onship  =  myFirstNode.createRela3onshipTo(  mySecondNode,   RelTypes.KNOWS  );   •  myRela3onship.setProperty(  "rela3onship-­‐type",  "knows"  );   66  
  67. 67. Other  Graph  Database  tools   •  BigData  RDF   •  SPARQL   •  RDFS+  inference   67  
  68. 68. Conclusion 68  
  69. 69. NoSQL,  BASE   •  NoSQL  characterized  by  BASE:   •      •  Basically  Available:  Use  replica3on  to  reduce  the  likelihood  of  data   unavailability  and  use  sharding,  or  par33oning  the  data  among  many   different  storage  servers,  to  make  any  remaining  failures  par3al.  The  result  is   a  system  that  is  always  available,  even  if  subsets  of  the  data  become   unavailable  for  short  periods  of  3me.     •  So1  state:  While  ACID  systems  assume  that  data  consistency  is  a  hard   requirement,  NoSQL  systems  allow  data  to  be  inconsistent  and  relegate   designing  around  such  inconsistencies  to  applica3on  developers.     •  Eventually  consistent:  Although  applica3ons  must  deal  with  instantaneous   consistency,  NoSQL  systems  ensure  that  at  some  future  point  in  3me  the  data   assumes  a  consistent  state.  In  contrast  to  ACID  systems  that  enforce   consistency  at  transac3on  commit,  NoSQL  guarantees  consistency  only  at   some  undefined  future  3me.     69  
  70. 70. ACID  vs.  BASE   noSQL  Databases,  Prof.  Walter  Kriha,  StuOgart  Media  University   70  
  71. 71. Sta9s9cs   •  The  worldwide  NoSQL  market  is  expected  to  reach  $3.4  Billion  by  2018  at  a   CAGR  of  21%  between  2013  and  2018.  NoSQL  market  will  generate  $14   Billion  in  revenues  over  the  period  2013  –  2018.   •  CAGR  :  Compound  annual  growth  rate   •  V(t0)  :  start  value,  V(tn)    :  finish  value,     •  tn-­‐  t0    :  number  of  years.     Resource  :  hOp://www.marketresearchmedia.com/2010/11/11/nosql-­‐market/   71  
  72. 72. When  to  USE?     Size   Key  -­‐  Value   Bigtable   Doc-­‐DB   GraphDB   Complexity   From neo4j 72  
  73. 73. When  to  USE?     hOp://paolodedios.com/blog/2010/5/19/the-­‐visual-­‐guide-­‐to-­‐nosql-­‐systems.html   73  
  74. 74. Who  uses  NoSQL   FlockDB   Dynamo   Cassandra   Bigtable   74  
  75. 75. Resources   http://www.stu-dentdiaries.com/2010_05_01_archive.html 75  
  76. 76. Resources,  Books   76  
  77. 77. Papers   1.  DeCandia,  Giuseppe  ;  Hastorun,  Deniz  ;  Jampani,  Madan  ;  Kakulapa3,  Gu-­‐   navardhan  ;  Lakshman,  Avinash  ;  Pilchin,  Alex  ;  Sivasubramanian,  Swaminathan  ;   Vosshall,  Peter  ;  Vogels,  Werner:  Dynamo:  Amazon’s  Highly  Available  Key-­‐value   Store.  September  2007.     2.  Chang,  Fay  ;  Dean,  Jeffrey  ;  Ghemawat,  Sanjay  ;  Hsieh,  Wilson  C.  ;  Wallach,  Deborah   A.  ;  Burrows,  Mike  ;  Chandra,  Tushar  ;  Fikes,  Andrew  ;  Gruber,  Robert  E.:  Bigtable:  A   Distributed  Storage  System  for  Structured  Data.  November  2006.  –  hOp:// labs.google.com/papers/bigtable-­‐osdi06.pdf     3.  Fay  Chang,  Jeffrey  Dean,  Sanjay  Ghemawat,  Wilson  C.  Hsieh,  Deborah  A.  Wallach   Mike  Burrows,  Tushar  Chandra,  Andrew  Fikes,  Robert  E.  Gruber:  Bigtable:  A   Distributed  Storage  System  for  Structured  Data  2006   4.  RENZO  ANGLES  and  CLAUDIO  GUTIERREZ,  University  Chile  :  Survey  of  Graph   Database  Models  ,  ACM  Compu3ng  Surveys,  Vol.  40,  No.  1,  Ar3cle  1,  Publica3on   date:  February  2008.     77  
  78. 78. Papers   5.  Survey  of  Graph  Database  Performance  on  the  HPC  Scalable  Graph  Analysis   Benchmark,  D.  Dominguez-­‐Sal,  P.  Urb  ́on-­‐Bayes,  A.  Gim   enez-­‐Van  ̃o  ́,  S.  Go   ́ ́mez-­‐Villamor,   N.   Mart   ́ınez-­‐Baz   ́an,   and   J.L.   Larriba-­‐Pey,   Universitat   Polit`ecnica  de  Catalunya,    2010   6.  Chad   Vicknair,   Michael   Macias:   A   Comparison   of   a   Graph   Database   and   a   Rela3onal   Database,   A   Data   Provenance   Perspec3ve   ,   ACMSE   ’10,   April   15-­‐17,  2010,  Oxford,  MS,  USA     7.  Bradford   Stephens.   HBase   vs.   Cassandra:   NoSQL   Bat-­‐   tle!,   2009.   hOp:// www.roadtofailure.com/2009/10/29/   hbase-­‐vs-­‐cassandra-­‐nosql-­‐baOle/ comment-­‐page-­‐1/,  last  accessed  on  February  2011.     8.  ON-­‐LINE  PROJECT  MANAGEMENT  SYSTEM,  Qian  Sha   Bachelor   of   Economics,   Capital   University   of   Economics   and   Business,   2003   Will  NoSQL  Databases  Live  Up  to  Their  Promise?  Neal  LeaviO,  2010   78  
  79. 79. Papers   9.  Karger,   D.,   Lehman,   E.,   Leighton,   T.,   Panigrahy,   R.,   Levine,   M.,   and   Lewin,   D.   1997.   Consistent  hashing  and  random  trees:  distributed  caching  protocols  for  relieving  hot   spots   on   the   World   Wide   Web.   In   Proceedings   of   the   Twenty-­‐Ninth   Annual   ACM   Symposium   on   theory   of   Compu3ng   (El   Paso,   Texas,   United   States,   May   04   -­‐   06,   1997).  STOC  '97.  ACM  Press,  New  York,  NY,  654-­‐663.   10. Lamport,   L.Time,   clocks   and   the   ordering   of   events   in   a   distributed   system.   ACM   Communica3ons,  21(7),  pp.  558-­‐  565,  1978.     11. André   Allavena   ,   Alan   Demers,   John   E.   Hopcro1   :   Correctness   of   a   Gossip   Based   Membership  Protocol    NY  2005,  ACM  1-­‐58113-­‐994-­‐2/05/0007     79  
  80. 80. Resources,  Web  link     •  Introduc3on  data  structure  for  GraphDB,  Shunya  Kimura    :     hOp://www.slideshare.net/skimura/graphdatabase-­‐data-­‐structure   •  Compare  nosql  database  :  hOp://nosql.findthebest.com/   •  Oracle  White  paper  Sep.2011  Oracle  NoSQL  Database   •  CouchDB:  hOp://www.couchbase.com/   •  Open  Source  implementa3on  of  Big  Table:  HBase,  hOp://hbase.apache.org/   •  hOp://www.db-­‐class.org/course/video/preview_list  (Stanford  university)   •  hOp://technirvanaa.wordpress.com/tag/nosql-­‐disadvantages/          (March.  2011)   •  hOp://www.kavistechnology.com/blog/?p=1577                  (March  2010)     •  hOp://www.couchbase.com/press-­‐releases/couchbase-­‐survey-­‐shows-­‐accelerated-­‐ adop3on-­‐nosql-­‐2012            (Survey  2012)             •  hOp://www.couchbase.com/why-­‐nosql/nosql-­‐database   •  Couch  DB  wiki  :  hOp://wiki.apache.org/couchdb/     •  hOp://highlyscalable.wordpress.com/2012/03/01/nosql-­‐data-­‐modeling-­‐techniques/    (Very  good)   •  hOp://neo4j.org/   •  hOp://blog.neo4j.org/2010/03/modeling-­‐categories-­‐in-­‐graph-­‐database.html   •  Neo4j  documenta3on  :  hOp://components.neo4j.org/neo4j/1.8.M05/apidocs/   •  SQL  Databases  v.  noSQL  Databases,  Michael  Stonebraker,  MIT,  2010     80  
  81. 81. Do  you  want  to  know  more?   •  What  The  Heck  Are  You  Actually  Using  Nosql  For?   hOp://highscalability.com/blog/2010/12/6/what-­‐the-­‐heck-­‐are-­‐you-­‐actually-­‐ using-­‐nosql-­‐for.html     Nice  Tutorials  for  couchDB     hOp://couchapp.org/page/videos     81  
  82. 82. CouchDB,  Example   •  Download  CouchDB  from  :  hOp://couchdb.apache.org/   •  Example  source  :  Source  :  CouchDB  the  Defini3ve  Guide,  O’REILLY,   Andelson,  Lebnardt  &  Slater  ( hOp://guide.couchdb.org/dra1/tour.html#figure/4  )   •  GO  -­‐>      hOp://   82