SlideShare une entreprise Scribd logo
1  sur  64
Télécharger pour lire hors ligne
riak
            A friendly key/value store for the web.



       ION
 EV NAT

     010D
D
    2 N TLA
    POR


                                  A primer by Bruce Williams
D PO
                          EV R
                            N TLA
                              A N
                               TI D
                                 O
                                   N
  My name is
Bruce Williams.
                         ct ed
                       di g
                    ad din
                I’m lee
           a nd e b
                 th      e.
              to    e dg
D PO
                         EV R
                           N TLA
                             A N
                              TI D
                                O
                                  N
2001 - Present Day


 wa yyy before it was
 a viable job choice.
D PO
                             EV R
                               N TLA
                                 A N
                                  TI D
                                    O
                                      N
But I use other
languages, too.
                    rom .
                 y f ms
              all ig
            ci ad
      es pe ar
            rp
      o the
D PO
                                                                          EV R
                                                                            N TLA
                                                                              A N
                                                                               TI D
                                                                                 O
                                                                                   N
Photo by oddsteph - http://flic.kr/p/6vWPBU




                                                                            me
                                                                         su of
                                                                      as e
                                                               Le t’s      on ll
                                                                     a is    ba
                                                               J av base
                                                                   t he ats.
                                                                          b


                                              Choose the Right Weapon
D PO
                              EV R
                                N TLA
                                  A N
                                   TI D
                                     O
                                       N
Based in the D.C. area.


            (but I’m not.)
You may find the following
 conspicuously missing in
        this talk:
                   r y!
               o r
             S
D PO
                                    EV R
                                      N TLA
     I will not be




                                        A N
                                         TI D
                                           O
                                             N
presenting a paper on
  Dynamo, the CAP
   theorem, vector
clocks, merkle trees,
          etc. These are explained
                     elsewhere by my
                    alg orithmic betters.
D PO
                                   EV R
                                     N TLA
                                       A N
                                        TI D
                                          O
                                            N
I will not be dwelling
 on performance or
     redundancy.
            Expect some vague
           statements like “very
          fa st” and “very robust.”
D PO
                               EV R
                                 N TLA
                                   A N
                                    TI D
                                      O
                                        N
 I will not try to
convince you that
  “NoSQL” is the
    messiah.
        I t’s an alternative that
         m  akes sense in some
                situations.
D PO
                                 EV R
                                   N TLA
                                     A N
                                      TI D
                                        O
                                          N
I will not be conducting
       a large-scale
     comparison of
competing technologies.
             b ut I’d love to hear
           abou t what you use, and
                       why
What is Riak?
D PO
                      EV R
                        N TLA
                          A N
                           TI D
                             O
                               N
NoSQL
 and of the Dynamo
    persuasion.
D PO
                           EV R
                             N TLA
                               A N
                                TI D
                                  O
                                    N
Open Source
      & a commercial
       “EnterpriseDS”
      version with some
     proprietary pieces
D PO
                             EV R
                               N TLA
                                 A N
                                  TI D
                                    O
                                      N
Key/Value Store
      With some metadata.
D PO
                              EV R
                                N TLA
                                  A N
                                   TI D
                                     O
                                       N
Schema-less
   Great  for sparse data,
      but requires more
           discipline.
D PO
                                 EV R
                                   N TLA
                                     A N
                                      TI D
                                        O
                                          N
Datatype Agnostic
       Con tent-Type is King.
D PO
                                 EV R
                                   N TLA
                                     A N
                                      TI D
                                        O
                                          N
Language Agnostic
               REST & PBC

    Erlang, Javascript, Java,
       PHP, Python, Ruby, ...
D PO
                                EV R
                                  N TLA
                                    A N
                                     TI D
                                       O
                                         N
Distributed
  It’s [mostly] Erlang, what
       did you expect?
D PO
                          EV R
                            N TLA
                              A N
                               TI D
                                 O
                                   N
Masterless
   All nodes are equal
D PO
                           EV R
                             N TLA
                               A N
                                TI D
                                  O
                                    N
Scalable
   o r “easy to scale.”
D PO
                         EV R
                           N TLA
                             A N
                              TI D
                                O
                                  N
Eventually
Consistent
     and CAP tunable.
D PO
                     EV R
                       N TLA
                         A N
                          TI D
                            O
                              N
Uses Map/Reduce
      and “Link.”
Getting
Up & Running
N
        O
      TI D
     A N
   N TLA
              http://riak.basho.com




 EV R
D PO
N
        O
      TI D
     A N
   N TLA
 EV R
D PO
              hg & git
D PO
                                                   EV R
A Quick Local Cluster




                                                     N TLA
                                                       A N
                                                        TI D
                                                          O
                                                            N
         $ ./riak1/bin/riak start
         $ ./riak2/bin/riak start
         $ ./riak3/bin/riak start



                               Start three
                                           “nodes”

  $ ./riak2/bin/riak-admin join riak1@127.0.0.1
  $ ./riak3/bin/riak-admin join riak1@127.0.0.1




                            Join them in
                                         to a cluster
Your Data
D PO
                                 EV R
       Object




                                   N TLA
                                     A N
                                      TI D
                                        O
                                          N
                 Content Type
                 Body
                 + Links


The thing you’re storing.
D PO
                                      EV R
                         ca
           Key




                                        N TLA
                            n




                                          A N
                                           TI D
                        de be




                                             O
                                               N
                    au fi
                      to    ne use
                   ge ma d o r-
                     ne tic r
                       ra a
                          te lly
                             d
            pic1




The identifier for the object.
D PO
                                   EV R
        Bucket




                                     N TLA
                                       A N
                                        TI D
                                          O
                                            N
                         “p thin
                           ic
                         wi

                             1” “im
                               is ag
                                 un es
                                   iq ”
               pic1




                                     ue
        pic2      pic3

           images


The type or category of object.
D PO
                                     EV R
    Addressability




                                       N TLA
                                         A N
                                          TI D
                                            O
                                              N
                      <i
                      ma
                       ge
             images




                           s/
                             pi
                               c1
                              >
              pic1




Refer to objects by bucket and key.
D PO
                                                     EV R
        Example




                                                       N TLA
                                                         A N
                                                          TI D
                                                            O
                                                              N
require 'riak'

client = Riak::Client.new
client.bucket('images').new('pic1').tap do |pic1|
  pic1.content_type = 'image/jpeg'
  pic1.data = File.read('/path/to/jpg')
  pic1.store
end




        $g em install riak-client
D PO
                                                       EV R
         Example




                                                         N TLA
                                                           A N
                                                            TI D
                                                              O
                                                                N
client.bucket('people').new('bruce').tap do |bruce|
  bruce.data = {
    name: 'Bruce Williams',
    email: 'bruce@codefluency.com'
  }
  bruce.store
end

puts client['people']['bruce'].data['name']




        “application/json” is the
         d efault for riak-client
D PO
                                    EV R
            Links




                                      N TLA
                                        A N
                                         TI D
                                           O
                                             N
        st
          or
            ed
images                    people

            he
             re
 pic1                     bruce

            can also be
             “tagged”


         Connect objects
D PO
                                                           EV R
                Example




                                                             N TLA
                                                               A N
                                                                TI D
                                                                  O
                                                                    N
 client['people']['bruce'].tap do |bruce|
   bruce.links << client['images']['pic1'].to_link('avatar')
   bruce.store
 end




client['people']['bruce'].walk(:tag => 'avatar')
Hooks

pre-commit
reject or transform an object to be committed

post-commit
notify external services, build your own indexe
Where does it go?
D PO
                           EV R
    The Ring




                             N TLA
                               A N
                                TI D
                                  O
                                    N
A 160-bit integer space
D PO
                                  EV R
          The Ring




                                    N TLA
                                      A N
                                       TI D
                                         O
                                           N
broken into equal sized partitions.
N
        O
      TI D
     A N
   N TLA
 EV R
D PO
                                                              st more functional)
                                                          looks kinda like this
  The Ring




                                                     (it’s ju
                                                       It
   Photo by marchdoe - http://flickr.com/photos/marchdoe/457741149
D PO
                              EV R
     The Ring




                                N TLA
                                  A N
                                   TI D
                                     O
                                       N
Each partition is managed
by a vnode (virtual node),
D PO
                       EV R
  The Ring




                         N TLA
                           A N
                            TI D
                              O
                                N
Each vnode runs on
 a [physical] node.
D PO
                                EV R
        The Ring




                                  N TLA
                                    A N
                                     TI D
                                       O
                                         N
             1   2

             3   4



Each node owns an equal share of
     vnodes (& partitions)
D PO
                                     EV R
     Replication




                                       N TLA
                                         A N
                                          TI D
                                            O
                                              N
                      3
                      is
                       th
                          e
                           de
                             fa
                              ult
          n_val = 3




Objects are written to multiple
          partitions.
,
                                  ils
          N
        O
      TI D
     A N
   N TLA                       fa
 EV R                        ”       up
                           “2 ck
                                          Uses Hinted Handoff to deal with



D PO
                        de pi      k.
                      no s
                    n     er lac
                  he th s
                 W e o the
                   th
  Availability




                                                   node failures.
                                     4
                             2

                                     3
                             1
D PO
                                                EV R
    Persistence




                                                  N TLA
                                                    A N
                                                     TI D
                                                       O
                                                         N
  dets                ets                 fs

           gb_trees           innostore



 bitcask              multi               +




Supports pluggable backends
CAP Tuning
D PO
                                                EV R
                    GET




                                                  N TLA
                                                    A N
                                                     TI D
                                                       O
                                                         N
r
how many replicas need to agree (default: 2)
D PO
                                                 EV R
                     PUT




                                                   N TLA
                                                     A N
                                                      TI D
                                                        O
                                                          N
r
how many replicas need to agree when retrieving an
existing object before the write (default: 2)

w
how many replicas to write to before returning a
successful response (default: 2).

dw
how many replicas to commit to durable storage
before returning a successful response (default: 0)
(Map|Link)*Reduce
D PO
                                          EV R
              Map




                                            N TLA
                                              A N
                                               TI D
                                                 O
                                                   N
      obj                    [result, ...]


             your function


Map functions take one piece of data
as input, and produce zero or more
         results as output.
Data-locality is important in Riak.
Map phases are run where the data is
               stored.

 You can have multiple map phases.

  The input to a map definition is a
   series of [bucket, key] names.

                        unlike CouchDB
D PO
                                             EV R
              Link




                                               N TLA
                                                 A N
                                                  TI D
                                                    O
                                                      N
      obj                        [linked_obj, ...]



            link walk, using a
                 pattern

A special kind of map phase; links
matching a pattern are “walked” to
    find objects to be output.
D PO
                                             EV R
                 Reduce




                                               N TLA
                                                 A N
                                                  TI D
                                                    O
                                                      N
    [obj, ...]                   [result]



                 your function

Reduce functions combine the output
of many "map" step evaluations, into
             one result
The reduce phase occurs on the
      “coordinating node.”

Reduces may be run multiple times
  as more input comes in (eg, re-
             reduce)
D PO
                                                  EV R
         Example




                                                    N TLA
                                                      A N
                                                       TI D
                                                         O
                                                           N
bruce = client['people']['bruce']
melissa = client['people']['melissa']




          lets assume these have ages

 addy = client['addresses'].new('123fake')
 addy.data = {
   street: '123 Fake St',
   city: 'Portland', state: 'OR', zip: '97214'
 }
 addy.links << bruce.to_link('resident')
 addy.links << melissa.to_link('resident')
 addy.store
D PO
                                                                 EV R
                Example




                                                                   N TLA
                                                                     A N
                                                                      TI D
                                                                        O
                                                                          N
Riak::MapReduce.new(client).add(addy).
  link(tag: 'resident').
  map("function (v) { return [Riak.mapValuesJson(v)[0]['age'] || 0] }").
  reduce(function: 'Riak.reduceSum', keep: true).
  run




           We should get an array with one value
Hurdles
D PO
                                         EV R
                                           N TLA
 No range queries.




                                             A N
                                              TI D
                                                O
                                                  N
        Sorry, Cassandra fans


  Things like time
 series data require
creative approaches.
      like bucket and key naming, etc
D PO
                                       EV R
                                         N TLA
                                           A N
   Don’t list keys.




                                            TI D
                                              O
                                                N
         ever, if you can avoid it.



  Processing an entire
bucket is more expensive
 than you might think.

          because it lists keys
D PO
                                      EV R
                                        N TLA
                                          A N
                                           TI D
                                             O
                                               N
Watch your encoding.

MapReduce Javascript
phases need your data
to be in valid Unicode.
        you’ll get a “bad encoding” error
sy
a
E Questions?
N
        O
      TI D
     A N
   N TLA
 EV R
D PO
                        @wbruce
              Thanks!

Contenu connexe

En vedette

Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
Dipti Borkar
 
Riak Operations
Riak OperationsRiak Operations
Riak Operations
gschofield
 

En vedette (15)

Couchbase Performance Benchmarking 2012
Couchbase Performance Benchmarking 2012Couchbase Performance Benchmarking 2012
Couchbase Performance Benchmarking 2012
 
The New Guardian
The New GuardianThe New Guardian
The New Guardian
 
Riak - From Small to Large
Riak - From Small to LargeRiak - From Small to Large
Riak - From Small to Large
 
Social Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper IndustrySocial Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper Industry
 
Riak in Ten Minutes
Riak in Ten MinutesRiak in Ten Minutes
Riak in Ten Minutes
 
Schema Design for Riak
Schema Design for RiakSchema Design for Riak
Schema Design for Riak
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
What is literature
What is literatureWhat is literature
What is literature
 
Coding with Riak (from Velocity 2015)
Coding with Riak (from Velocity 2015)Coding with Riak (from Velocity 2015)
Coding with Riak (from Velocity 2015)
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
Literature and Literary Standards
Literature and Literary StandardsLiterature and Literary Standards
Literature and Literary Standards
 
Riak Operations
Riak OperationsRiak Operations
Riak Operations
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.uk
 
Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Riak: A friendly key/value store for the web.

  • 1. riak A friendly key/value store for the web. ION EV NAT 010D D 2 N TLA POR A primer by Bruce Williams
  • 2. D PO EV R N TLA A N TI D O N My name is Bruce Williams. ct ed di g ad din I’m lee a nd e b th e. to e dg
  • 3. D PO EV R N TLA A N TI D O N 2001 - Present Day wa yyy before it was a viable job choice.
  • 4. D PO EV R N TLA A N TI D O N But I use other languages, too. rom . y f ms all ig ci ad es pe ar rp o the
  • 5. D PO EV R N TLA A N TI D O N Photo by oddsteph - http://flic.kr/p/6vWPBU me su of as e Le t’s on ll a is ba J av base t he ats. b Choose the Right Weapon
  • 6. D PO EV R N TLA A N TI D O N Based in the D.C. area. (but I’m not.)
  • 7. You may find the following conspicuously missing in this talk: r y! o r S
  • 8. D PO EV R N TLA I will not be A N TI D O N presenting a paper on Dynamo, the CAP theorem, vector clocks, merkle trees, etc. These are explained elsewhere by my alg orithmic betters.
  • 9. D PO EV R N TLA A N TI D O N I will not be dwelling on performance or redundancy. Expect some vague statements like “very fa st” and “very robust.”
  • 10. D PO EV R N TLA A N TI D O N I will not try to convince you that “NoSQL” is the messiah. I t’s an alternative that m akes sense in some situations.
  • 11. D PO EV R N TLA A N TI D O N I will not be conducting a large-scale comparison of competing technologies. b ut I’d love to hear abou t what you use, and why
  • 13. D PO EV R N TLA A N TI D O N NoSQL and of the Dynamo persuasion.
  • 14. D PO EV R N TLA A N TI D O N Open Source & a commercial “EnterpriseDS” version with some proprietary pieces
  • 15. D PO EV R N TLA A N TI D O N Key/Value Store With some metadata.
  • 16. D PO EV R N TLA A N TI D O N Schema-less Great for sparse data, but requires more discipline.
  • 17. D PO EV R N TLA A N TI D O N Datatype Agnostic Con tent-Type is King.
  • 18. D PO EV R N TLA A N TI D O N Language Agnostic REST & PBC Erlang, Javascript, Java, PHP, Python, Ruby, ...
  • 19. D PO EV R N TLA A N TI D O N Distributed It’s [mostly] Erlang, what did you expect?
  • 20. D PO EV R N TLA A N TI D O N Masterless All nodes are equal
  • 21. D PO EV R N TLA A N TI D O N Scalable o r “easy to scale.”
  • 22. D PO EV R N TLA A N TI D O N Eventually Consistent and CAP tunable.
  • 23. D PO EV R N TLA A N TI D O N Uses Map/Reduce and “Link.”
  • 25. N O TI D A N N TLA http://riak.basho.com EV R D PO
  • 26. N O TI D A N N TLA EV R D PO hg & git
  • 27. D PO EV R A Quick Local Cluster N TLA A N TI D O N $ ./riak1/bin/riak start $ ./riak2/bin/riak start $ ./riak3/bin/riak start Start three “nodes” $ ./riak2/bin/riak-admin join riak1@127.0.0.1 $ ./riak3/bin/riak-admin join riak1@127.0.0.1 Join them in to a cluster
  • 29. D PO EV R Object N TLA A N TI D O N Content Type Body + Links The thing you’re storing.
  • 30. D PO EV R ca Key N TLA n A N TI D de be O N au fi to ne use ge ma d o r- ne tic r ra a te lly d pic1 The identifier for the object.
  • 31. D PO EV R Bucket N TLA A N TI D O N “p thin ic wi 1” “im is ag un es iq ” pic1 ue pic2 pic3 images The type or category of object.
  • 32. D PO EV R Addressability N TLA A N TI D O N <i ma ge images s/ pi c1 > pic1 Refer to objects by bucket and key.
  • 33. D PO EV R Example N TLA A N TI D O N require 'riak' client = Riak::Client.new client.bucket('images').new('pic1').tap do |pic1| pic1.content_type = 'image/jpeg' pic1.data = File.read('/path/to/jpg') pic1.store end $g em install riak-client
  • 34. D PO EV R Example N TLA A N TI D O N client.bucket('people').new('bruce').tap do |bruce| bruce.data = { name: 'Bruce Williams', email: 'bruce@codefluency.com' } bruce.store end puts client['people']['bruce'].data['name'] “application/json” is the d efault for riak-client
  • 35. D PO EV R Links N TLA A N TI D O N st or ed images people he re pic1 bruce can also be “tagged” Connect objects
  • 36. D PO EV R Example N TLA A N TI D O N client['people']['bruce'].tap do |bruce| bruce.links << client['images']['pic1'].to_link('avatar') bruce.store end client['people']['bruce'].walk(:tag => 'avatar')
  • 37. Hooks pre-commit reject or transform an object to be committed post-commit notify external services, build your own indexe
  • 39. D PO EV R The Ring N TLA A N TI D O N A 160-bit integer space
  • 40. D PO EV R The Ring N TLA A N TI D O N broken into equal sized partitions.
  • 41. N O TI D A N N TLA EV R D PO st more functional) looks kinda like this The Ring (it’s ju It Photo by marchdoe - http://flickr.com/photos/marchdoe/457741149
  • 42. D PO EV R The Ring N TLA A N TI D O N Each partition is managed by a vnode (virtual node),
  • 43. D PO EV R The Ring N TLA A N TI D O N Each vnode runs on a [physical] node.
  • 44. D PO EV R The Ring N TLA A N TI D O N 1 2 3 4 Each node owns an equal share of vnodes (& partitions)
  • 45. D PO EV R Replication N TLA A N TI D O N 3 is th e de fa ult n_val = 3 Objects are written to multiple partitions.
  • 46. , ils N O TI D A N N TLA fa EV R ” up “2 ck Uses Hinted Handoff to deal with D PO de pi k. no s n er lac he th s W e o the th Availability node failures. 4 2 3 1
  • 47. D PO EV R Persistence N TLA A N TI D O N dets ets fs gb_trees innostore bitcask multi + Supports pluggable backends
  • 49. D PO EV R GET N TLA A N TI D O N r how many replicas need to agree (default: 2)
  • 50. D PO EV R PUT N TLA A N TI D O N r how many replicas need to agree when retrieving an existing object before the write (default: 2) w how many replicas to write to before returning a successful response (default: 2). dw how many replicas to commit to durable storage before returning a successful response (default: 0)
  • 52. D PO EV R Map N TLA A N TI D O N obj [result, ...] your function Map functions take one piece of data as input, and produce zero or more results as output.
  • 53. Data-locality is important in Riak. Map phases are run where the data is stored. You can have multiple map phases. The input to a map definition is a series of [bucket, key] names. unlike CouchDB
  • 54. D PO EV R Link N TLA A N TI D O N obj [linked_obj, ...] link walk, using a pattern A special kind of map phase; links matching a pattern are “walked” to find objects to be output.
  • 55. D PO EV R Reduce N TLA A N TI D O N [obj, ...] [result] your function Reduce functions combine the output of many "map" step evaluations, into one result
  • 56. The reduce phase occurs on the “coordinating node.” Reduces may be run multiple times as more input comes in (eg, re- reduce)
  • 57. D PO EV R Example N TLA A N TI D O N bruce = client['people']['bruce'] melissa = client['people']['melissa'] lets assume these have ages addy = client['addresses'].new('123fake') addy.data = { street: '123 Fake St', city: 'Portland', state: 'OR', zip: '97214' } addy.links << bruce.to_link('resident') addy.links << melissa.to_link('resident') addy.store
  • 58. D PO EV R Example N TLA A N TI D O N Riak::MapReduce.new(client).add(addy). link(tag: 'resident'). map("function (v) { return [Riak.mapValuesJson(v)[0]['age'] || 0] }"). reduce(function: 'Riak.reduceSum', keep: true). run We should get an array with one value
  • 60. D PO EV R N TLA No range queries. A N TI D O N Sorry, Cassandra fans Things like time series data require creative approaches. like bucket and key naming, etc
  • 61. D PO EV R N TLA A N Don’t list keys. TI D O N ever, if you can avoid it. Processing an entire bucket is more expensive than you might think. because it lists keys
  • 62. D PO EV R N TLA A N TI D O N Watch your encoding. MapReduce Javascript phases need your data to be in valid Unicode. you’ll get a “bad encoding” error
  • 64. N O TI D A N N TLA EV R D PO @wbruce Thanks!