SlideShare a Scribd company logo
1 of 31
Finding stuff under the Couch
   with CouchDB-Lucene



           Martin Rehfeld
        @ RUG-B 01-Apr-2010
CouchDB

•   JSON document store
•   all documents in a given database reside in
    one large pool and may be retrieved using
    their ID ...
•   ... or through Map & Reduce based indexes
So how do you do full
    text search?
You potentially could
 achieve this with just
Map & Reduce functions
But that would mean
implementing an actual
   search engine ...
... and this has been done
           before.
Enter Lucene
Apache Lucene is a high-
performance, full-featured text search
engine library written entirely in Java.
It is a technology suitable for nearly
any application that requires full-text
search, especially cross-platform.
                  Courtesy of The Apache Foundation
Lucene Features
•   ranked searching
•   many powerful query types: phrase queries,
    wildcard queries, proximity queries, range
    queries and more
•   fielded searching (e.g., title, author, contents)
•   boolean operators
•   sorting by any field
•   allows simultaneous update and searching
CouchDB Integration
•   couchdb-lucene
    (ready to run Lucene plus
    CouchDB interface)

•   Search interface via
    http_db_handlers, usually
    _fti


•   Indexer interface via
    CouchDB
    update_notification
    facility and fulltext design
    docs
Sample design document,
          i.e., _id: „_design/search“


{
    "fulltext": {
      "by_name": {
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                     Name of the index
{
    "fulltext": {
      "by_name": {
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                     Name of the index
{
    "fulltext": {              Default options
      "by_name": {             (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }
}
Sample design document,
          i.e., _id: „_design/search“

                       Name of the index
{
    "fulltext": {                Default options
      "by_name": {               (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }     Index function
}
Sample design document,
          i.e., _id: „_design/search“

                       Name of the index
{
    "fulltext": {                 Default options
      "by_name": {                (can be overridden per field)
      "defaults": { "store":"yes" },
      "index":"function(doc) { var ret=new
Document(); ret.add(doc.name); return ret }"
    }
    }     Index function Builds and returns documents to
}                        be put into Lucene‘s index (may
                         return an array of multiple
                         documents)
Querying the index
http://localhost:5984/your-couch-db/_fti/
your-design-document-name/your-index-name?

 q=
   
   
   
   
   query string

 sort=	 	      	 
     comma-separated fields to sort on

 limit=	 	     	 
     max number of results to return

 skip=
    
   
   
   offset
 include_docs=

       include CouchDB documents in
 
 
   
   
   
   
   response
A full stack example
CouchDB Person
         Document
{
    "_id": "9db68c69726e486b811859937fbb6b09",
    "_rev": "1-c890039865e37eb8b911ff762162772e",
    "name": "Martin Rehfeld",
    "email": "martin.rehfeld@glnetworks.de",
    "notes": "Talks about CouchDB Lucene"
}
Objectives

•   Search for people by name
•   Search for people by any field‘s content
•   Querying from Ruby
•   Paginating results
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
}
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();


                      }   content added to
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
                          „default“ field
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
}
Index Function
function(doc) {
  // first check if doc is a person document!
  ...
  var ret=new Document();


                      }   content added to
  ret.add(doc.name);
  ret.add(doc.email);
  ret.add(doc.notes);
                          „default“ field
  ret.add(doc.name, {field:“name“, store:“yes“});
  ret.add(doc.email, {field:“email“, store:“yes“});
  return ret;
                                content added to
}
                                named fields
Field Options
name           description                 available options

          the field name to index
field                                          user-defined
                   under
                                      date, double, float, int, long,
type       the type of the field
                                                string
        whether the data is stored.
store   The value will be returned               yes, no
           in the search result
                                          analyzed,
        whether (and how) the data analyzed_no_norms, no,
index
                is indexed              not_analyzed,
                                   not_analyzed_no_norms
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
 {
     "q": "default:couchdb",
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
                                  default field
 {
     "q": "default:couchdb",      is queried
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index I
http://localhost:5984/mydb/_fti/search/
global?q=couchdb
                                  default field
 {
     "q": "default:couchdb",      is queried Content of fields
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,                              with store:“yes“
     "total_rows": 1,
     "search_duration": 0,                     option are returned
     "fetch_duration": 8,
     "rows":    [                              with the query
       {
                                               results
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index II
http://localhost:5984/mydb/_fti/search/
global?q=name:rehfeld
 {
     "q": "name:rehfeld",
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying the Index II
http://localhost:5984/mydb/_fti/search/
global?q=name:rehfeld
 {
     "q": "name:rehfeld",                       name field
     "etag": "119e498956048ea8",
     "skip": 0,
     "limit": 25,
                                                is queried
     "total_rows": 1,
     "search_duration": 0,
     "fetch_duration": 8,
     "rows":    [
       {
         "id": "9db68c69726e486b811859937fbb6b09",
         "score": 4.520571708679199,
         "fields":        {
           "name": "Martin Rehfeld",
           "email": "martin.rehfeld@glnetworks.de",
         }
       }
     ]
 }
Querying from Ruby

class Search
  include HTTParty

 base_uri "localhost:5984/#{CouchPotato::Config.database_name}/_fti/search"
 format :json

  def self.query(options = {})
    index = options.delete(:index)
    get("/#{index}", :query => options)
  end
end
Controller / Pagination
class SearchController < ApplicationController
  HITS_PER_PAGE = 10

  def index
    result = Search.query(params.merge(:skip => skip, :limit => HITS_PER_PAGE))
    @hits = WillPaginate::Collection.create(params[:page] || 1, HITS_PER_PAGE,
                                            result['total_rows']) do |pager|
      pager.replace(result['rows'])
    end
  end

private

  def skip
    params[:page] ? (params[:page].to_i - 1) * HITS_PER_PAGE : 0
  end
end
Resources

•   http://couchdb.apache.org/
•   http://lucene.apache.org/java/docs/index.html
•   http://github.com/rnewson/couchdb-lucene
•   http://lucene.apache.org/java/3_0_1/
    queryparsersyntax.html
Q &A



!
    Martin Rehfeld

    http://inside.glnetworks.de
    martin.rehfeld@glnetworks.de

    @klickmich

More Related Content

What's hot

What's hot (9)

Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
분산저장시스템 개발에 대한 12가지 이야기
분산저장시스템 개발에 대한 12가지 이야기분산저장시스템 개발에 대한 12가지 이야기
분산저장시스템 개발에 대한 12가지 이야기
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
논문투고 체크리스트 A to z (에디티지제공)
논문투고 체크리스트 A to z (에디티지제공)논문투고 체크리스트 A to z (에디티지제공)
논문투고 체크리스트 A to z (에디티지제공)
 
Memcached
MemcachedMemcached
Memcached
 
NGSIv1 を知っている開発者向けの NGSIv2 の概要 (Orion 1.13.0対応)
NGSIv1 を知っている開発者向けの NGSIv2 の概要 (Orion 1.13.0対応)NGSIv1 を知っている開発者向けの NGSIv2 の概要 (Orion 1.13.0対応)
NGSIv1 を知っている開発者向けの NGSIv2 の概要 (Orion 1.13.0対応)
 
View Inheritance in Odoo 16
View Inheritance in Odoo 16View Inheritance in Odoo 16
View Inheritance in Odoo 16
 
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기
 
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
 

Viewers also liked

Couch Db In 60 Minutes
Couch Db In 60 MinutesCouch Db In 60 Minutes
Couch Db In 60 Minutes
George Ang
 
Couch db@nosql+taiwan
Couch db@nosql+taiwanCouch db@nosql+taiwan
Couch db@nosql+taiwan
Kenzou Yeh
 

Viewers also liked (20)

Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes(R)évolutions, l'innovation entre les lignes
(R)évolutions, l'innovation entre les lignes
 
MySQL Indexes
MySQL IndexesMySQL Indexes
MySQL Indexes
 
Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiques
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
 
ZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDB
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
Couch Db In 60 Minutes
Couch Db In 60 MinutesCouch Db In 60 Minutes
Couch Db In 60 Minutes
 
Couch db
Couch dbCouch db
Couch db
 
Couch db@nosql+taiwan
Couch db@nosql+taiwanCouch db@nosql+taiwan
Couch db@nosql+taiwan
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHP
 
Couch db
Couch dbCouch db
Couch db
 
CouchDB
CouchDBCouchDB
CouchDB
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental Complexity
 
CouchDB Vs MongoDB
CouchDB Vs MongoDBCouchDB Vs MongoDB
CouchDB Vs MongoDB
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
CouchDB
CouchDBCouchDB
CouchDB
 
Search engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-frSearch engine-optimization-starter-guide-fr
Search engine-optimization-starter-guide-fr
 

Similar to CouchDB-Lucene

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
03 form-data
03 form-data03 form-data
03 form-data
snopteck
 
MongoDb and NoSQL
MongoDb and NoSQLMongoDb and NoSQL
MongoDb and NoSQL
TO THE NEW | Technology
 

Similar to CouchDB-Lucene (20)

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)Postman Collection Format v2.0 (pre-draft)
Postman Collection Format v2.0 (pre-draft)
 
03 form-data
03 form-data03 form-data
03 form-data
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB App
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
Power tools in Java
Power tools in JavaPower tools in Java
Power tools in Java
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
 
Academy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data managementAcademy PRO: Elasticsearch. Data management
Academy PRO: Elasticsearch. Data management
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDb and NoSQL
MongoDb and NoSQLMongoDb and NoSQL
MongoDb and NoSQL
 
Webinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDBWebinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDB
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

CouchDB-Lucene

  • 1. Finding stuff under the Couch with CouchDB-Lucene Martin Rehfeld @ RUG-B 01-Apr-2010
  • 2. CouchDB • JSON document store • all documents in a given database reside in one large pool and may be retrieved using their ID ... • ... or through Map & Reduce based indexes
  • 3. So how do you do full text search?
  • 4. You potentially could achieve this with just Map & Reduce functions
  • 5. But that would mean implementing an actual search engine ...
  • 6. ... and this has been done before.
  • 7. Enter Lucene Apache Lucene is a high- performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Courtesy of The Apache Foundation
  • 8. Lucene Features • ranked searching • many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more • fielded searching (e.g., title, author, contents) • boolean operators • sorting by any field • allows simultaneous update and searching
  • 9. CouchDB Integration • couchdb-lucene (ready to run Lucene plus CouchDB interface) • Search interface via http_db_handlers, usually _fti • Indexer interface via CouchDB update_notification facility and fulltext design docs
  • 10. Sample design document, i.e., _id: „_design/search“ { "fulltext": { "by_name": { "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 11. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { "by_name": { "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 12. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } }
  • 13. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } Index function }
  • 14. Sample design document, i.e., _id: „_design/search“ Name of the index { "fulltext": { Default options "by_name": { (can be overridden per field) "defaults": { "store":"yes" }, "index":"function(doc) { var ret=new Document(); ret.add(doc.name); return ret }" } } Index function Builds and returns documents to } be put into Lucene‘s index (may return an array of multiple documents)
  • 15. Querying the index http://localhost:5984/your-couch-db/_fti/ your-design-document-name/your-index-name? q= query string sort= comma-separated fields to sort on limit= max number of results to return skip= offset include_docs= include CouchDB documents in response
  • 16. A full stack example
  • 17. CouchDB Person Document { "_id": "9db68c69726e486b811859937fbb6b09", "_rev": "1-c890039865e37eb8b911ff762162772e", "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", "notes": "Talks about CouchDB Lucene" }
  • 18. Objectives • Search for people by name • Search for people by any field‘s content • Querying from Ruby • Paginating results
  • 19. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; }
  • 20. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); } content added to ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); „default“ field ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; }
  • 21. Index Function function(doc) { // first check if doc is a person document! ... var ret=new Document(); } content added to ret.add(doc.name); ret.add(doc.email); ret.add(doc.notes); „default“ field ret.add(doc.name, {field:“name“, store:“yes“}); ret.add(doc.email, {field:“email“, store:“yes“}); return ret; content added to } named fields
  • 22. Field Options name description available options the field name to index field user-defined under date, double, float, int, long, type the type of the field string whether the data is stored. store The value will be returned yes, no in the search result analyzed, whether (and how) the data analyzed_no_norms, no, index is indexed not_analyzed, not_analyzed_no_norms
  • 23. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb { "q": "default:couchdb", "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 24. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb default field { "q": "default:couchdb", is queried "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 25. Querying the Index I http://localhost:5984/mydb/_fti/search/ global?q=couchdb default field { "q": "default:couchdb", is queried Content of fields "etag": "119e498956048ea8", "skip": 0, "limit": 25, with store:“yes“ "total_rows": 1, "search_duration": 0, option are returned "fetch_duration": 8, "rows": [ with the query { results "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 26. Querying the Index II http://localhost:5984/mydb/_fti/search/ global?q=name:rehfeld { "q": "name:rehfeld", "etag": "119e498956048ea8", "skip": 0, "limit": 25, "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 27. Querying the Index II http://localhost:5984/mydb/_fti/search/ global?q=name:rehfeld { "q": "name:rehfeld", name field "etag": "119e498956048ea8", "skip": 0, "limit": 25, is queried "total_rows": 1, "search_duration": 0, "fetch_duration": 8, "rows": [ { "id": "9db68c69726e486b811859937fbb6b09", "score": 4.520571708679199, "fields": { "name": "Martin Rehfeld", "email": "martin.rehfeld@glnetworks.de", } } ] }
  • 28. Querying from Ruby class Search include HTTParty base_uri "localhost:5984/#{CouchPotato::Config.database_name}/_fti/search" format :json def self.query(options = {}) index = options.delete(:index) get("/#{index}", :query => options) end end
  • 29. Controller / Pagination class SearchController < ApplicationController HITS_PER_PAGE = 10 def index result = Search.query(params.merge(:skip => skip, :limit => HITS_PER_PAGE)) @hits = WillPaginate::Collection.create(params[:page] || 1, HITS_PER_PAGE, result['total_rows']) do |pager| pager.replace(result['rows']) end end private def skip params[:page] ? (params[:page].to_i - 1) * HITS_PER_PAGE : 0 end end
  • 30. Resources • http://couchdb.apache.org/ • http://lucene.apache.org/java/docs/index.html • http://github.com/rnewson/couchdb-lucene • http://lucene.apache.org/java/3_0_1/ queryparsersyntax.html
  • 31. Q &A ! Martin Rehfeld http://inside.glnetworks.de martin.rehfeld@glnetworks.de @klickmich

Editor's Notes

  1. short recap of what CouchDB is
  2. some (very) limited examples are actually floating around
  3. mapping all documents, split them into words, push through a stemmer, and cross-index them with the documents containing them
  4. ... multiple times, in fact
  5. add all searchable content to the default field, add fields for searching by individual field or using contents in view
  6. the stored field contents can be used to render search results without touching CouchDB
  7. the stored field contents can be used to render search results without touching CouchDB
  8. could be as simple as that (using the httparty gem &amp; Couch Potato) sans error handling
  9. using the Search class in an controller + pagination; utilizing the will_paginate gem