SlideShare une entreprise Scribd logo
1  sur  80
Télécharger pour lire hors ligne
Riak Search
    A Full-Text Search
   and Indexing Engine
         based on Riak


   Erlang Factory· London· June 2010

               Basho Technologies
          Rusty Klophaus - @rklophaus
Why did we build it?
What are the major goals?
  How does it work?




                            2
Part One
Why did we build
 Riak Search?




                   3
Riak is
a scalable, highly-available, networked,
    open-source key/value store.




                                           4
Writing to a Key/Value Store




                 Key/Value




       CLIENT                  RIAK
                                      5
Writing to a Key/Value Store




                     Object




       CLIENT                  RIAK
                                      6
Querying a Key/Value Store




                       Key


                     Object

       CLIENT                 RIAK
                                     7
Querying Riak via LinkWalking

               Key + Instructions
                                     Walk to
                                     Related
                                      Keys


                  Object(s)

       CLIENT                       RIAK
                                               8
Querying Riak via Map/Reduce

            Key(s) + JS Functions
                                      Map

                                      Map

                                     Reduce
             Computed Value(s)

       CLIENT                       RIAK
                                              9
Key/Value Stores
       like
Key-Based Queries




                    10
Query by Secondary Index


            where Category == "Shoes"




                           WTF!? I'm a
                           KV store!



      CLIENT                            RIAK
                                               11
Full-Text Query


                  "Converse AND Shoes"




                                This is
                              getting old.



       CLIENT                             RIAK
                                                 12
These kinds of queries
   need an Index.

*Market Opportunity!*



                         13
Part Two
What are the major
goals of Riak Search?




                        14
An application built on Riak.




              Your
                                Riak
          Application




                                       15
Hrm... I need an index.




              Your
                             Riak
          Application     Index
                          Object




                                    16
Hrm... I need an index with more features.




       Your
                         ???           Riak
  Application




                                              17
Lucene should do the trick...




       Your
                      Lucene    Riak
   Application




                                       18
...shard to add more storage capacity...




      Your
   Application
                 Lucene   Lucene   Lucene   Riak




                                                   19
...replicate to add more throughput.




                 Lucene   Lucene   Lucene
      Your
   Application
                 Lucene   Lucene   Lucene   Riak
                 Lucene   Lucene   Lucene




                                                   20
...replicate to add more throughput.




                 Lucene   Lucene   Lucene
      Your
   Application
                 Lucene   Lucene   Lucene   Riak
                 Lucene   Lucene   Lucene




                  Operations nightmare!

                                                   21
What do we really want?




       Your       Riak-ified
                               Riak
   Application      Lucene




                                      22
What do we really want?




       Your          Riak
                            Riak
   Application     Search




                                   23
Functionality? Be like Lucene (and more).

• Lucene Syntax
• Leverages Java Lucene Analyzers
• Solr Endpoints
• Integration via Riak Post-Commit Hook (Index)
• Integration via Riak Map/Reduce (Query)
• Near-Realtime
• Schema-less


                                                  24
Operations? Be like Riak.

• No special nodes
• Add nodes, get more compute and storage
• Automatically load balance
• Replicas for durability and performance
• Index and query in parallel
• Swappable storage backends




                                            25
Part Three
How do we do it?




                   26
A Gentle Introduction to
  Document Indexing




                           27
The Inverted Index

    Document           Inverted Index



                           day, 1
                           dog, 1
  #1
       Every dog has
       his day.
                           every, 1
                           has, 1
                           his, 1




                                        28
The Inverted Index

   Documents            Combined Inverted Index
                                and, 4
 #1   Every dog has
      his day.
                                bag, 3
                                bark, 2
                                bite, 2
      The dog's bark
 #2
                                cat, 3
      is worse than
      his bite.                 cat, 4
                                day, 1
                                dog, 1
 #3
      Let the cat out
      of the bag.               dog, 2
                                dog, 4
                                every, 1
#4    It's raining
      cats and dogs.            has, 1
                                ...


                                                  29
At Query Time...

                   "dog AND cat"




                       AND



           dog                     cat
                                         30
At Query Time...

                   AND


           dog            cat


         dog, 1
                         cat, 3
         dog, 2
                         cat, 4
         dog, 4


                                  31
At Query Time...
                      Result: 4




                        AND
                 (Merge Intersection)




             1
                                        3
             2
                                        4
             4


                                            32
At Query Time...
                   Result: 1, 2, 3, 4




                           OR
                      (Merge Union)




             1
                                        3
             2
                                        4
             4


                                            33
Complex Behavior from Simple Structures




                                          34
Storage Approaches...




                        35
Riak Search uses
Consistent Hashing
 to store data on
     Partitions



                     36
Introduction to Consistent Hashing and Partitions

                Partitions = 10
                Number of Nodes = 5
                Partitions per Node = 2
                Replicas (NVal) = 2




                                                    37
Introduction to Consistent Hashing and Partitions


             Object




                                                    38
Document Partitioning
        vs.
  Term Partitioning




                        39
...and the
Resulting Tradeoffs




                      40
Document Partitioning @ Index Time


     #1   Every dog has
          his day.




                                     41
Document Partitioning @ Query Time

                          "dog OR cat"




                                         42
Term Partitioning @ Index Time
                                 day, 1
                                 dog, 1
     #1   Every dog has
          his day.
                                 every, 1
                                 has, 1
                                 his, 1




                                            43
Term Partitioning @ Index Time



                       dog, 1
     day, 1                      has, 1




     his, 1                      every, 1



                                            44
Term Partitioning @ Query Time

                           "dog OR cat"




                                          45
Tradeoffs...


    Document Partitioning       Term Partitioning

   + Lower Latency Queries   - Higher Latency Queries
   - Lower Throughput        + Higher Throughput
   - Lots of Disk Seeks      - Hotspots in Ring
                               (the "Obama" problem)




                                                        46
Riak Search: Term Partitioning

Term-partitioning is the most viable approach for our beta
clients’ needs: high throughput on Really Big Datasets.
Optimizations:
     • Term splitting to reduce hot spots
     • Bloom filters & caching to save query-time bandwidth
     • Batching to save query-time & index-time bandwidth
Support for either approach eventually.




                                                             47
Diving Deeper:
The Lifecycle of a Query




                           48
Parse the Query




                  49
The Query




   meeting AND (face OR phone)




                                 50
The Query as an Erlang Term (Parse w/ Leex and Yecc)


       [{land, [
           {term,"meeting",[]},
           {lor,[
               {term,"face",[]},
               {term,"phone",[]}
           ]}
       ]}]




                                                       51
The Query as a Graph
              #land

        #term          #lor

    "meeting"          #term   #term

                   "phone"     "face"




                                        52
Plan the Query




                 53
System Catalog



                 Catalog   Catalog
             Catalog           Catalog
            Catalog            Catalog
                 Catalog Catalog




                                         54
System Catalog


                          Term Weight
 Term            TermID        &
                          File Offset




                                        55
Consult the System Catalog for Term/Node Weights
               #land

        #term          #lor

    "meeting"         #term         #term

 23 @ node B        "phone"          "face"

                 17 @ node A      13 @ node C


                                                   56
Use Term Weights to Plan the Query

               #land

        #term          #lor

    "meeting"        #term            #term

 23 @ node B        "phone"            "face"

                17 @ node A          13 @ node C

                                                   57
Use Term Weights to Plan the Query

               #land

        #term          #lor

    "meeting"        #term            #term

 23 @ node B        "phone"            "face"

                17 @ node A          13 @ node C

                                                   58
Use Term Weights to Plan the Query
               #node@B

                 #land

       #term             #node@A

   "meeting"               #lor
                #term                 #term
23 @ node B
              "phone"                 "face"

                                     13 @ node C
           17 @ node A
                                                   59
The Node-Assigned Query as an Erlang Term
  [{node,
      {land, [
          {node,
               {lor, [
                   {term,{"email","body","face"}, [
                       {node_weight,'node_c@127.0.0.1', 13}
                   ]},
                   {term,{"email","body", "phone"}, [
                       {node_weight,'node_a@127.0.0.1', 17}
                   ]}
               ]},
               'node_a@127.0.0.1'
          },
          {term, {"email","body","meeting"}, [
               {node_weight,'node_b@127.0.0.1', 23}
          ]}
      ]},
      'node_b@127.0.0.1'
  }]



                                                              60
Execute the Query




                    61
Spawn the Query Processes
               #node@B

                #land

       #term             #node@A

    "meeting"               #lor

               #term               #term

             "phone"               "face"

                                            62
Spawn the Query Processes
               #node@B

                #land

       #term             #node@A

    "meeting"               #lor

               #term               #term

             "phone"               "face"

                                            63
Spawn the Query Processes
               #node@B

                #land

       #term             #node@A

    "meeting"               #lor

               #term               #term

             "phone"               "face"

                                            64
Spawn the Query Processes
               #node@B

                #land

       #term             #node@A

    "meeting"               #lor

               #term               #term

             "phone"               "face"

                                            65
Spawn the Query Processes
               #node@B

                #land

       #term             #node@A

    "meeting"               #lor

               #term               #term

             "phone"               "face"

                                            66
Spawn the Query Processes & Stream the Results
               #node@B

                 #land

       #term             #node@A

    "meeting"              #lor

                #term               #term

              "phone"               "face"

                                                 67
Spawn the Query Processes & Stream the Results
               #node@B

                 #land

       #term             #node@A

    "meeting"              #lor

                #term               #term

              "phone"               "face"

                                                 68
Spawn the Query Processes & Stream the Results
               #node@B

                 #land

       #term             #node@A

    "meeting"              #lor

                #term               #term

              "phone"               "face"

                                                 69
Terminate When Finished
               #node@B

                   #land

       #term               #node@A

    'disconnect'            #lor

               #term                 #term

            'disconnect'           'disconnect'


                                                  70
Message Format




                 71
The Message Format

      Message ::
          {results, [Result]} |
          {results, disconnect}

      Result ::
          {DocID, Properties}

      DocID ::
          term()

      Properties ::
          proplist()

                                  72
The Message Format




     {results,       [
         {375,       []},
         {961,       [{color, "red"}]},
         {155,       [{pos, [1,2,5]}]}
     ]}




                                          73
Yay for Erlang!

• Clean lines between load balancing and logic, single-
  and multi-node look the same
• Easy to create new operators, rapid development of
  experimental features
• Linked processes make cleanup a breeze
• Significant code reduction over early Java prototypes




                                                          74
Part Four
 Review




            75
Riak Search turns this...


                  "Converse AND Shoes"




                              WTF!? I'm a
                               KV store!



        CLIENT                             RIAK
                                                  76
...into this...


                  "Converse AND Shoes"




                                Gladly!



           CLIENT                         RIAK
                                                 77
...into this...


                  "Converse AND Shoes"




                    Keys or Objects




           CLIENT                        RIAK
                                                78
...while keeping operations easy.




        Your            Riak
                                    Riak
    Application       Search




                                           79
Thanks! Questions?

                        Search Team:

                        John Muellerleile - @jrecursive

                        Rusty Klophaus - @rklophaus

                        Kevin Smith - @kevsmith



Currently working with a small set of Beta users.
Open-source release planned for Q3.
www.basho.com

Contenu connexe

En vedette

Spring Cleaning for Your Smartphone
Spring Cleaning for Your SmartphoneSpring Cleaning for Your Smartphone
Spring Cleaning for Your SmartphoneLookout
 
Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)thetechnicalweb
 
In Pursuit of the Holy Grail: Building Isomorphic JavaScript Apps
In Pursuit of the Holy Grail: Building Isomorphic JavaScript AppsIn Pursuit of the Holy Grail: Building Isomorphic JavaScript Apps
In Pursuit of the Holy Grail: Building Isomorphic JavaScript AppsSpike Brehm
 
Interoperability With RabbitMq
Interoperability With RabbitMqInteroperability With RabbitMq
Interoperability With RabbitMqAlvaro Videla
 
Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBWilliam Candillon
 
Erlang plus BDB: Disrupting the Conventional Web Wisdom
Erlang plus BDB: Disrupting the Conventional Web WisdomErlang plus BDB: Disrupting the Conventional Web Wisdom
Erlang plus BDB: Disrupting the Conventional Web Wisdomguest3933de
 
Shrinking the Haystack" using Solr and OpenNLP
Shrinking the Haystack" using Solr and OpenNLPShrinking the Haystack" using Solr and OpenNLP
Shrinking the Haystack" using Solr and OpenNLPlucenerevolution
 
Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...
Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...
Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...C4Media
 
AST - the only true tool for building JavaScript
AST - the only true tool for building JavaScriptAST - the only true tool for building JavaScript
AST - the only true tool for building JavaScriptIngvar Stepanyan
 
Erlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughputErlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughputPaolo Negri
 
Pregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingPregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingChris Bunch
 
TEW4 Yatce deprecated slides
TEW4 Yatce deprecated slidesTEW4 Yatce deprecated slides
TEW4 Yatce deprecated slidesUENISHI Kota
 
Kostis Sagonas: Cool Tools for Modern Erlang Program Developmen
Kostis Sagonas: Cool Tools for Modern Erlang Program DevelopmenKostis Sagonas: Cool Tools for Modern Erlang Program Developmen
Kostis Sagonas: Cool Tools for Modern Erlang Program DevelopmenKonstantin Sorokin
 

En vedette (18)

Spring Cleaning for Your Smartphone
Spring Cleaning for Your SmartphoneSpring Cleaning for Your Smartphone
Spring Cleaning for Your Smartphone
 
NkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application serverNkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application server
 
Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)
 
In Pursuit of the Holy Grail: Building Isomorphic JavaScript Apps
In Pursuit of the Holy Grail: Building Isomorphic JavaScript AppsIn Pursuit of the Holy Grail: Building Isomorphic JavaScript Apps
In Pursuit of the Holy Grail: Building Isomorphic JavaScript Apps
 
Interoperability With RabbitMq
Interoperability With RabbitMqInteroperability With RabbitMq
Interoperability With RabbitMq
 
Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDB
 
Erlang plus BDB: Disrupting the Conventional Web Wisdom
Erlang plus BDB: Disrupting the Conventional Web WisdomErlang plus BDB: Disrupting the Conventional Web Wisdom
Erlang plus BDB: Disrupting the Conventional Web Wisdom
 
Shrinking the Haystack" using Solr and OpenNLP
Shrinking the Haystack" using Solr and OpenNLPShrinking the Haystack" using Solr and OpenNLP
Shrinking the Haystack" using Solr and OpenNLP
 
Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...
Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...
Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...
 
AST - the only true tool for building JavaScript
AST - the only true tool for building JavaScriptAST - the only true tool for building JavaScript
AST - the only true tool for building JavaScript
 
Erlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughputErlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughput
 
Pregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingPregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph Processing
 
Streams for the Web
Streams for the WebStreams for the Web
Streams for the Web
 
Simple made easy
Simple made easySimple made easy
Simple made easy
 
TEW4 Yatce deprecated slides
TEW4 Yatce deprecated slidesTEW4 Yatce deprecated slides
TEW4 Yatce deprecated slides
 
Microxchg Microservices
Microxchg MicroservicesMicroxchg Microservices
Microxchg Microservices
 
Kostis Sagonas: Cool Tools for Modern Erlang Program Developmen
Kostis Sagonas: Cool Tools for Modern Erlang Program DevelopmenKostis Sagonas: Cool Tools for Modern Erlang Program Developmen
Kostis Sagonas: Cool Tools for Modern Erlang Program Developmen
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 

Similaire à Riak Search - Erlang Factory London 2010

Riak Search - Berlin Buzzwords 2010
Riak Search - Berlin Buzzwords 2010Riak Search - Berlin Buzzwords 2010
Riak Search - Berlin Buzzwords 2010Rusty Klophaus
 
Riak - From Small to Large - StrangeLoop
Riak - From Small to Large - StrangeLoopRiak - From Small to Large - StrangeLoop
Riak - From Small to Large - StrangeLoopRusty Klophaus
 
Riak - From Small to Large
Riak - From Small to LargeRiak - From Small to Large
Riak - From Small to LargeRusty Klophaus
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Lili Wu
 
Essential action script 3.0
Essential action script 3.0Essential action script 3.0
Essential action script 3.0UltimateCodeX
 
Eclipse IDE for Scala (2.9 story)
Eclipse IDE for Scala (2.9 story)Eclipse IDE for Scala (2.9 story)
Eclipse IDE for Scala (2.9 story)Iulian Dragos
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing RiakKevin Smith
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing RiakKevin Smith
 
Personalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in AmericaPersonalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in AmericaAdrian Trenaman
 
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...Lucidworks
 

Similaire à Riak Search - Erlang Factory London 2010 (10)

Riak Search - Berlin Buzzwords 2010
Riak Search - Berlin Buzzwords 2010Riak Search - Berlin Buzzwords 2010
Riak Search - Berlin Buzzwords 2010
 
Riak - From Small to Large - StrangeLoop
Riak - From Small to Large - StrangeLoopRiak - From Small to Large - StrangeLoop
Riak - From Small to Large - StrangeLoop
 
Riak - From Small to Large
Riak - From Small to LargeRiak - From Small to Large
Riak - From Small to Large
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products
 
Essential action script 3.0
Essential action script 3.0Essential action script 3.0
Essential action script 3.0
 
Eclipse IDE for Scala (2.9 story)
Eclipse IDE for Scala (2.9 story)Eclipse IDE for Scala (2.9 story)
Eclipse IDE for Scala (2.9 story)
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
 
Personalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in AmericaPersonalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in America
 
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
 

Plus de Rusty Klophaus

Everybody Polyglot! - Cross-Language RPC with Erlang
Everybody Polyglot! - Cross-Language RPC with ErlangEverybody Polyglot! - Cross-Language RPC with Erlang
Everybody Polyglot! - Cross-Language RPC with ErlangRusty Klophaus
 
Winning the Erlang Edit•Build•Test Cycle
Winning the Erlang Edit•Build•Test CycleWinning the Erlang Edit•Build•Test Cycle
Winning the Erlang Edit•Build•Test CycleRusty Klophaus
 
Querying Riak Just Got Easier - Introducing Secondary Indices
Querying Riak Just Got Easier - Introducing Secondary IndicesQuerying Riak Just Got Easier - Introducing Secondary Indices
Querying Riak Just Got Easier - Introducing Secondary IndicesRusty Klophaus
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Rusty Klophaus
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRusty Klophaus
 
Riak from Small to Large
Riak from Small to LargeRiak from Small to Large
Riak from Small to LargeRusty Klophaus
 
Getting Started with Riak - NoSQL Live 2010 - Boston
Getting Started with Riak - NoSQL Live 2010 - BostonGetting Started with Riak - NoSQL Live 2010 - Boston
Getting Started with Riak - NoSQL Live 2010 - BostonRusty Klophaus
 

Plus de Rusty Klophaus (7)

Everybody Polyglot! - Cross-Language RPC with Erlang
Everybody Polyglot! - Cross-Language RPC with ErlangEverybody Polyglot! - Cross-Language RPC with Erlang
Everybody Polyglot! - Cross-Language RPC with Erlang
 
Winning the Erlang Edit•Build•Test Cycle
Winning the Erlang Edit•Build•Test CycleWinning the Erlang Edit•Build•Test Cycle
Winning the Erlang Edit•Build•Test Cycle
 
Querying Riak Just Got Easier - Introducing Secondary Indices
Querying Riak Just Got Easier - Introducing Secondary IndicesQuerying Riak Just Got Easier - Introducing Secondary Indices
Querying Riak Just Got Easier - Introducing Secondary Indices
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared State
 
Riak from Small to Large
Riak from Small to LargeRiak from Small to Large
Riak from Small to Large
 
Getting Started with Riak - NoSQL Live 2010 - Boston
Getting Started with Riak - NoSQL Live 2010 - BostonGetting Started with Riak - NoSQL Live 2010 - Boston
Getting Started with Riak - NoSQL Live 2010 - Boston
 

Dernier

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Riak Search - Erlang Factory London 2010

  • 1. Riak Search A Full-Text Search and Indexing Engine based on Riak Erlang Factory· London· June 2010 Basho Technologies Rusty Klophaus - @rklophaus
  • 2. Why did we build it? What are the major goals? How does it work? 2
  • 3. Part One Why did we build Riak Search? 3
  • 4. Riak is a scalable, highly-available, networked, open-source key/value store. 4
  • 5. Writing to a Key/Value Store Key/Value CLIENT RIAK 5
  • 6. Writing to a Key/Value Store Object CLIENT RIAK 6
  • 7. Querying a Key/Value Store Key Object CLIENT RIAK 7
  • 8. Querying Riak via LinkWalking Key + Instructions Walk to Related Keys Object(s) CLIENT RIAK 8
  • 9. Querying Riak via Map/Reduce Key(s) + JS Functions Map Map Reduce Computed Value(s) CLIENT RIAK 9
  • 10. Key/Value Stores like Key-Based Queries 10
  • 11. Query by Secondary Index where Category == "Shoes" WTF!? I'm a KV store! CLIENT RIAK 11
  • 12. Full-Text Query "Converse AND Shoes" This is getting old. CLIENT RIAK 12
  • 13. These kinds of queries need an Index. *Market Opportunity!* 13
  • 14. Part Two What are the major goals of Riak Search? 14
  • 15. An application built on Riak. Your Riak Application 15
  • 16. Hrm... I need an index. Your Riak Application Index Object 16
  • 17. Hrm... I need an index with more features. Your ??? Riak Application 17
  • 18. Lucene should do the trick... Your Lucene Riak Application 18
  • 19. ...shard to add more storage capacity... Your Application Lucene Lucene Lucene Riak 19
  • 20. ...replicate to add more throughput. Lucene Lucene Lucene Your Application Lucene Lucene Lucene Riak Lucene Lucene Lucene 20
  • 21. ...replicate to add more throughput. Lucene Lucene Lucene Your Application Lucene Lucene Lucene Riak Lucene Lucene Lucene Operations nightmare! 21
  • 22. What do we really want? Your Riak-ified Riak Application Lucene 22
  • 23. What do we really want? Your Riak Riak Application Search 23
  • 24. Functionality? Be like Lucene (and more). • Lucene Syntax • Leverages Java Lucene Analyzers • Solr Endpoints • Integration via Riak Post-Commit Hook (Index) • Integration via Riak Map/Reduce (Query) • Near-Realtime • Schema-less 24
  • 25. Operations? Be like Riak. • No special nodes • Add nodes, get more compute and storage • Automatically load balance • Replicas for durability and performance • Index and query in parallel • Swappable storage backends 25
  • 26. Part Three How do we do it? 26
  • 27. A Gentle Introduction to Document Indexing 27
  • 28. The Inverted Index Document Inverted Index day, 1 dog, 1 #1 Every dog has his day. every, 1 has, 1 his, 1 28
  • 29. The Inverted Index Documents Combined Inverted Index and, 4 #1 Every dog has his day. bag, 3 bark, 2 bite, 2 The dog's bark #2 cat, 3 is worse than his bite. cat, 4 day, 1 dog, 1 #3 Let the cat out of the bag. dog, 2 dog, 4 every, 1 #4 It's raining cats and dogs. has, 1 ... 29
  • 30. At Query Time... "dog AND cat" AND dog cat 30
  • 31. At Query Time... AND dog cat dog, 1 cat, 3 dog, 2 cat, 4 dog, 4 31
  • 32. At Query Time... Result: 4 AND (Merge Intersection) 1 3 2 4 4 32
  • 33. At Query Time... Result: 1, 2, 3, 4 OR (Merge Union) 1 3 2 4 4 33
  • 34. Complex Behavior from Simple Structures 34
  • 36. Riak Search uses Consistent Hashing to store data on Partitions 36
  • 37. Introduction to Consistent Hashing and Partitions Partitions = 10 Number of Nodes = 5 Partitions per Node = 2 Replicas (NVal) = 2 37
  • 38. Introduction to Consistent Hashing and Partitions Object 38
  • 39. Document Partitioning vs. Term Partitioning 39
  • 41. Document Partitioning @ Index Time #1 Every dog has his day. 41
  • 42. Document Partitioning @ Query Time "dog OR cat" 42
  • 43. Term Partitioning @ Index Time day, 1 dog, 1 #1 Every dog has his day. every, 1 has, 1 his, 1 43
  • 44. Term Partitioning @ Index Time dog, 1 day, 1 has, 1 his, 1 every, 1 44
  • 45. Term Partitioning @ Query Time "dog OR cat" 45
  • 46. Tradeoffs... Document Partitioning Term Partitioning + Lower Latency Queries - Higher Latency Queries - Lower Throughput + Higher Throughput - Lots of Disk Seeks - Hotspots in Ring (the "Obama" problem) 46
  • 47. Riak Search: Term Partitioning Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets. Optimizations: • Term splitting to reduce hot spots • Bloom filters & caching to save query-time bandwidth • Batching to save query-time & index-time bandwidth Support for either approach eventually. 47
  • 50. The Query meeting AND (face OR phone) 50
  • 51. The Query as an Erlang Term (Parse w/ Leex and Yecc) [{land, [ {term,"meeting",[]}, {lor,[ {term,"face",[]}, {term,"phone",[]} ]} ]}] 51
  • 52. The Query as a Graph #land #term #lor "meeting" #term #term "phone" "face" 52
  • 54. System Catalog Catalog Catalog Catalog Catalog Catalog Catalog Catalog Catalog 54
  • 55. System Catalog Term Weight Term TermID & File Offset 55
  • 56. Consult the System Catalog for Term/Node Weights #land #term #lor "meeting" #term #term 23 @ node B "phone" "face" 17 @ node A 13 @ node C 56
  • 57. Use Term Weights to Plan the Query #land #term #lor "meeting" #term #term 23 @ node B "phone" "face" 17 @ node A 13 @ node C 57
  • 58. Use Term Weights to Plan the Query #land #term #lor "meeting" #term #term 23 @ node B "phone" "face" 17 @ node A 13 @ node C 58
  • 59. Use Term Weights to Plan the Query #node@B #land #term #node@A "meeting" #lor #term #term 23 @ node B "phone" "face" 13 @ node C 17 @ node A 59
  • 60. The Node-Assigned Query as an Erlang Term [{node, {land, [ {node, {lor, [ {term,{"email","body","face"}, [ {node_weight,'node_c@127.0.0.1', 13} ]}, {term,{"email","body", "phone"}, [ {node_weight,'node_a@127.0.0.1', 17} ]} ]}, 'node_a@127.0.0.1' }, {term, {"email","body","meeting"}, [ {node_weight,'node_b@127.0.0.1', 23} ]} ]}, 'node_b@127.0.0.1' }] 60
  • 62. Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 62
  • 63. Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 63
  • 64. Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 64
  • 65. Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 65
  • 66. Spawn the Query Processes #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 66
  • 67. Spawn the Query Processes & Stream the Results #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 67
  • 68. Spawn the Query Processes & Stream the Results #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 68
  • 69. Spawn the Query Processes & Stream the Results #node@B #land #term #node@A "meeting" #lor #term #term "phone" "face" 69
  • 70. Terminate When Finished #node@B #land #term #node@A 'disconnect' #lor #term #term 'disconnect' 'disconnect' 70
  • 72. The Message Format Message :: {results, [Result]} | {results, disconnect} Result :: {DocID, Properties} DocID :: term() Properties :: proplist() 72
  • 73. The Message Format {results, [ {375, []}, {961, [{color, "red"}]}, {155, [{pos, [1,2,5]}]} ]} 73
  • 74. Yay for Erlang! • Clean lines between load balancing and logic, single- and multi-node look the same • Easy to create new operators, rapid development of experimental features • Linked processes make cleanup a breeze • Significant code reduction over early Java prototypes 74
  • 76. Riak Search turns this... "Converse AND Shoes" WTF!? I'm a KV store! CLIENT RIAK 76
  • 77. ...into this... "Converse AND Shoes" Gladly! CLIENT RIAK 77
  • 78. ...into this... "Converse AND Shoes" Keys or Objects CLIENT RIAK 78
  • 79. ...while keeping operations easy. Your Riak Riak Application Search 79
  • 80. Thanks! Questions? Search Team: John Muellerleile - @jrecursive Rusty Klophaus - @rklophaus Kevin Smith - @kevsmith Currently working with a small set of Beta users. Open-source release planned for Q3. www.basho.com