SlideShare une entreprise Scribd logo
1  sur  67
Télécharger pour lire hors ligne
Ferret
A Ruby Search Engine
  Brian Sam-Bodden
Agenda

• What is Ferret?
• Concepts
• Fields
• Indexing
• Installing Ferret
Agenda

• The Recipe
• Documents
• Ferret::Index::Index
• FQL
• Ferret in you App
Agenda

• Ferret in Rails
• Resources
What is Ferret?

• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the         Search Engine

• Port to Ruby by David Balmain
What is Ferret?

• Initially a 100% pure Ruby port
• Since 0.9 many core functions are
  implemented in C

• Fast! Now Faster than Lucene ;-)
Concepts
Concepts

• Index : Sequence of documents
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text string, keyed by field name
Fields of a Document in
        an Index
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed

  • Vectored: Frequency and location of Terms are
    stored
It’s all about Indexing

• Indexing is the processing of a source
  document into plain text tokens that Ferret
  can manipulate
• For any non-plaintext sources such as PDF,
  Word, Excel you need to:
  • Extract
  • Analyze
Installing Ferret
Installing Ferret



gem install ferret
Installing Ferret
Installing Ferret
Installing Ferret



    }
Installing Ferret



    }   Pick the latest version
        for your platform
The Recipe
The Recipe

1. Create some Documents
The Recipe

1. Create some Documents

2. Create an Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

4. Perform some Queries
Example Documents
 Create some Documents
Example Documents
  Create some Documents




 “Any String is a Document”
Example Documents
 Create some Documents
Example Documents
   Create some Documents




[“This”, “is also”, “a document”]
Example Documents
 Create some Documents
Example Documents
 Create some Documents
Ferret::Index::Index
     Create an Index
Ferret::Index::Index
            Create an Index

• Indexes are encapsulated by the class
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
 ➡ index = Ferret::I.new()
Ferret::Index::Index
     Adding Documents to the Index

• Index provides the add_document
  method

• It also provides the << alias
• Adding documents is then as easy as:
 ➡ index << “This is a document”
 ➡ index << {:first => “Bob”, :last => “Smith”}
Ferret::Index::Index
   Perform some Queries
Ferret::Index::Index
         Perform some Queries

• Index provides the search and
  search_each methods
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
Ferret::Index::Index
           Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
Ferret::Index::Index
            Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
 ➡ search_each(query, options = {}) {|doc, score| ... }
Playing with Ferret in irb
Playing with Ferret in irb
Ferret Query Language

• Ferret own Query Language, FQL is a
  powerful way to specify search queries

• FQL supports many query types,
  including:

     • Term         • Range
     • Phrase       • Wild
     • Field        • Fuzz
     • Boolean
Index.explain

• The explain method of Index describes
  how a document score against a query
 • Very useful for debugging
 • and for learning how Ferret works
Index.explain
Ferret in your App
Application


                   Database             Web


                                                                   User
                                          Manual
              File System
                                           Input


                                                      Get User’s             Present
                        Gather Data                                       Search Results
                                                        Query



                              Index
                            Documents                        Search Index
Ferret




                                              Index
Ferret in Rails

• Acts As Ferret is an ActiveRecord
  extension

• Available as a plugin
• Provides a simplified interface to
  Ferret
• Maintained by Jens Kramer
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails
• Simple model has two searchable
  fields title and body:
Ferret in Rails

• After a quick rake db:migrate we now
  have some data to play with
• Fire up the Rails Console and let’s see
  what acts_as_ferret can do for our
  models
Ferret in Rails
Want more?

• Ferret is improving constantly
• Acts As Ferret seems to catch up
  quickly

• Real-life usage seems to require some
  good engineering on your part

  • Background indexing
  • Hot swap of indexes?
Want more?

• We only covered the simplest
  constructs in Ferret

• Ferret’s API provides enough
  flexibility for the most demanding
  searching needs
Online Resources

• http://ferret.davebalmain.com
• http://lucene.apache.org
• http://lucenebook.com
• http://projects.jkraemer.net/acts_as_ferret
In-Print Resources
Thanks!

Contenu connexe

Tendances

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Consuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache CamelConsuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache Cameltherealgaston
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced editionAlexander Tokarev
 
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerDoing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerLucidworks
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksLucidworks
 
NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?Eberhard Wolff
 
Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?therealgaston
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-endgagravarr
 
Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)DevDays
 
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAHCARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAHRelawan Jurnal Indonesia
 
W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)Robert Sanderson
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?gagravarr
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerMark Tabladillo
 

Tendances (20)

Django
DjangoDjango
Django
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Consuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache CamelConsuming External Content and Enriching Content with Apache Camel
Consuming External Content and Enriching Content with Apache Camel
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerDoing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters Kluwer
 
Thinking restfully
Thinking restfullyThinking restfully
Thinking restfully
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, Lucidworks
 
NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?
 
Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?Do you need an external search platform for Adobe Experience Manager?
Do you need an external search platform for Adobe Experience Manager?
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-end
 
Apache tika
Apache tikaApache tika
Apache tika
 
Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)
 
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAHCARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
 
W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL Server
 

Similaire à Ferret A Ruby Search Engine

Ferret
FerretFerret
Ferretkalog
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23BIWUG
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
Domain Specific Development using T4
Domain Specific Development using T4Domain Specific Development using T4
Domain Specific Development using T4Joubin Najmaie
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Fhir dev days 2017 fhir profiling - overview and introduction v07
Fhir dev days 2017   fhir profiling - overview and introduction v07Fhir dev days 2017   fhir profiling - overview and introduction v07
Fhir dev days 2017 fhir profiling - overview and introduction v07DevDays
 
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...Albert Hoitingh
 
#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better TogetherAgnes Molnar
 
Crossref LIVE US Online
Crossref LIVE US OnlineCrossref LIVE US Online
Crossref LIVE US OnlineCrossref
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.netWillem Meints
 
Content Registration - Crossref LIVE Hannover
Content Registration - Crossref LIVE HannoverContent Registration - Crossref LIVE Hannover
Content Registration - Crossref LIVE HannoverCrossref
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxMichael Hackstein
 

Similaire à Ferret A Ruby Search Engine (20)

Ferret
FerretFerret
Ferret
 
Ferret
FerretFerret
Ferret
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Domain Specific Development using T4
Domain Specific Development using T4Domain Specific Development using T4
Domain Specific Development using T4
 
Apache solr
Apache solrApache solr
Apache solr
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Fhir dev days 2017 fhir profiling - overview and introduction v07
Fhir dev days 2017   fhir profiling - overview and introduction v07Fhir dev days 2017   fhir profiling - overview and introduction v07
Fhir dev days 2017 fhir profiling - overview and introduction v07
 
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...
Dutch Information Worker User Group - January 2022 - eDiscovery and Microsoft...
 
Fedora4
Fedora4Fedora4
Fedora4
 
#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together
 
Crossref LIVE US Online
Crossref LIVE US OnlineCrossref LIVE US Online
Crossref LIVE US Online
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
 
Content Registration - Crossref LIVE Hannover
Content Registration - Crossref LIVE HannoverContent Registration - Crossref LIVE Hannover
Content Registration - Crossref LIVE Hannover
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB Foxx
 

Plus de elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

Plus de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Dernier

💕COD Call Girls In Kurukshetra 08168329307 Pehowa Escort Service
💕COD Call Girls In Kurukshetra 08168329307 Pehowa Escort Service💕COD Call Girls In Kurukshetra 08168329307 Pehowa Escort Service
💕COD Call Girls In Kurukshetra 08168329307 Pehowa Escort ServiceApsara Of India
 
Call Numbe 9892124323, Vashi call girls, Juhu Call Girls, Powai Call Girls Se...
Call Numbe 9892124323, Vashi call girls, Juhu Call Girls, Powai Call Girls Se...Call Numbe 9892124323, Vashi call girls, Juhu Call Girls, Powai Call Girls Se...
Call Numbe 9892124323, Vashi call girls, Juhu Call Girls, Powai Call Girls Se...Pooja Nehwal
 
Call Girls In Lajpat Nagar__ 8448079011 __Escort Service In Delhi
Call Girls In Lajpat Nagar__ 8448079011 __Escort Service In DelhiCall Girls In Lajpat Nagar__ 8448079011 __Escort Service In Delhi
Call Girls In Lajpat Nagar__ 8448079011 __Escort Service In DelhiRaviSingh594208
 
UDAIPUR CALL GIRLS 96O287O969 CALL GIRL IN UDAIPUR ESCORT SERVICE
UDAIPUR CALL GIRLS 96O287O969 CALL GIRL IN UDAIPUR ESCORT SERVICEUDAIPUR CALL GIRLS 96O287O969 CALL GIRL IN UDAIPUR ESCORT SERVICE
UDAIPUR CALL GIRLS 96O287O969 CALL GIRL IN UDAIPUR ESCORT SERVICEApsara Of India
 
High Class Call Girls in Bangalore 📱9136956627📱
High Class Call Girls in Bangalore 📱9136956627📱High Class Call Girls in Bangalore 📱9136956627📱
High Class Call Girls in Bangalore 📱9136956627📱Pinki Misra
 
Best VIP Call Girls Noida Sector 18 Call Me: 8264348440
Best VIP Call Girls Noida Sector 18 Call Me: 8264348440Best VIP Call Girls Noida Sector 18 Call Me: 8264348440
Best VIP Call Girls Noida Sector 18 Call Me: 8264348440soniya singh
 
💞5✨ Hotel Karnal Call Girls 08168329307 Noor Mahal Karnal Escort Service
💞5✨ Hotel Karnal Call Girls 08168329307 Noor Mahal Karnal Escort Service💞5✨ Hotel Karnal Call Girls 08168329307 Noor Mahal Karnal Escort Service
💞5✨ Hotel Karnal Call Girls 08168329307 Noor Mahal Karnal Escort ServiceApsara Of India
 
Top 10 Makeup Brands in India for women
Top 10  Makeup Brands in India for womenTop 10  Makeup Brands in India for women
Top 10 Makeup Brands in India for womenAkshitaBhatt19
 
Call girls in Vashi Services : 9167673311 Free Delivery 24x7 at Your Doorstep
Call girls in Vashi Services :  9167673311 Free Delivery 24x7 at Your DoorstepCall girls in Vashi Services :  9167673311 Free Delivery 24x7 at Your Doorstep
Call girls in Vashi Services : 9167673311 Free Delivery 24x7 at Your DoorstepPooja Nehwal
 
💗📲09602870969💕-Royal Escorts in Udaipur Call Girls Service Udaipole-Fateh Sag...
💗📲09602870969💕-Royal Escorts in Udaipur Call Girls Service Udaipole-Fateh Sag...💗📲09602870969💕-Royal Escorts in Udaipur Call Girls Service Udaipole-Fateh Sag...
💗📲09602870969💕-Royal Escorts in Udaipur Call Girls Service Udaipole-Fateh Sag...Apsara Of India
 
Hi Profile Escorts In Udaipur 09602870969 Call Girls in Sobaghpura Bhopalpura
Hi Profile Escorts In Udaipur 09602870969 Call Girls in Sobaghpura BhopalpuraHi Profile Escorts In Udaipur 09602870969 Call Girls in Sobaghpura Bhopalpura
Hi Profile Escorts In Udaipur 09602870969 Call Girls in Sobaghpura BhopalpuraApsara Of India
 
Call Girls In Karol Bagh__ 8448079011 Escort Service in Delhi
Call Girls In Karol Bagh__ 8448079011 Escort Service in DelhiCall Girls In Karol Bagh__ 8448079011 Escort Service in Delhi
Call Girls In Karol Bagh__ 8448079011 Escort Service in DelhiRaviSingh594208
 
Call Girls in mahipalpur Delhi 8264348440 ✅ call girls ❤️
Call Girls in mahipalpur Delhi 8264348440 ✅ call girls ❤️Call Girls in mahipalpur Delhi 8264348440 ✅ call girls ❤️
Call Girls in mahipalpur Delhi 8264348440 ✅ call girls ❤️soniya singh
 
💞Call Girls In Sonipat 08168329307 Sonipat Kundli GTK Bypass EsCoRt Service
💞Call Girls In Sonipat 08168329307 Sonipat Kundli GTK Bypass EsCoRt Service💞Call Girls In Sonipat 08168329307 Sonipat Kundli GTK Bypass EsCoRt Service
💞Call Girls In Sonipat 08168329307 Sonipat Kundli GTK Bypass EsCoRt ServiceApsara Of India
 
Moscow City People project Roman Kurganov
Moscow City People project Roman KurganovMoscow City People project Roman Kurganov
Moscow City People project Roman KurganovRomanKurganov
 
WhatsApp 📞 8448380779 ✅Call Girls In Bhangel Sector 102 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Bhangel Sector 102 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Bhangel Sector 102 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Bhangel Sector 102 ( Noida)Delhi Call girls
 
Call Girls In Vashi Call Girls Pooja 📞 9892124323 ✅Book Hot And Sexy Girls
Call Girls In Vashi Call Girls Pooja 📞 9892124323 ✅Book Hot And Sexy GirlsCall Girls In Vashi Call Girls Pooja 📞 9892124323 ✅Book Hot And Sexy Girls
Call Girls In Vashi Call Girls Pooja 📞 9892124323 ✅Book Hot And Sexy GirlsPooja Nehwal
 
💞Sexy Call Girls In Ambala 08168329307 Shahabad Call Girls Escort Service
💞Sexy Call Girls In Ambala 08168329307 Shahabad Call Girls Escort Service💞Sexy Call Girls In Ambala 08168329307 Shahabad Call Girls Escort Service
💞Sexy Call Girls In Ambala 08168329307 Shahabad Call Girls Escort ServiceApsara Of India
 

Dernier (20)

💕COD Call Girls In Kurukshetra 08168329307 Pehowa Escort Service
💕COD Call Girls In Kurukshetra 08168329307 Pehowa Escort Service💕COD Call Girls In Kurukshetra 08168329307 Pehowa Escort Service
💕COD Call Girls In Kurukshetra 08168329307 Pehowa Escort Service
 
Call Numbe 9892124323, Vashi call girls, Juhu Call Girls, Powai Call Girls Se...
Call Numbe 9892124323, Vashi call girls, Juhu Call Girls, Powai Call Girls Se...Call Numbe 9892124323, Vashi call girls, Juhu Call Girls, Powai Call Girls Se...
Call Numbe 9892124323, Vashi call girls, Juhu Call Girls, Powai Call Girls Se...
 
Call Girls In Lajpat Nagar__ 8448079011 __Escort Service In Delhi
Call Girls In Lajpat Nagar__ 8448079011 __Escort Service In DelhiCall Girls In Lajpat Nagar__ 8448079011 __Escort Service In Delhi
Call Girls In Lajpat Nagar__ 8448079011 __Escort Service In Delhi
 
Siegfried Hottelmann: An Opportunistic Migrant, Part 1
Siegfried Hottelmann: An Opportunistic Migrant, Part 1Siegfried Hottelmann: An Opportunistic Migrant, Part 1
Siegfried Hottelmann: An Opportunistic Migrant, Part 1
 
UDAIPUR CALL GIRLS 96O287O969 CALL GIRL IN UDAIPUR ESCORT SERVICE
UDAIPUR CALL GIRLS 96O287O969 CALL GIRL IN UDAIPUR ESCORT SERVICEUDAIPUR CALL GIRLS 96O287O969 CALL GIRL IN UDAIPUR ESCORT SERVICE
UDAIPUR CALL GIRLS 96O287O969 CALL GIRL IN UDAIPUR ESCORT SERVICE
 
High Class Call Girls in Bangalore 📱9136956627📱
High Class Call Girls in Bangalore 📱9136956627📱High Class Call Girls in Bangalore 📱9136956627📱
High Class Call Girls in Bangalore 📱9136956627📱
 
Best VIP Call Girls Noida Sector 18 Call Me: 8264348440
Best VIP Call Girls Noida Sector 18 Call Me: 8264348440Best VIP Call Girls Noida Sector 18 Call Me: 8264348440
Best VIP Call Girls Noida Sector 18 Call Me: 8264348440
 
💞5✨ Hotel Karnal Call Girls 08168329307 Noor Mahal Karnal Escort Service
💞5✨ Hotel Karnal Call Girls 08168329307 Noor Mahal Karnal Escort Service💞5✨ Hotel Karnal Call Girls 08168329307 Noor Mahal Karnal Escort Service
💞5✨ Hotel Karnal Call Girls 08168329307 Noor Mahal Karnal Escort Service
 
Top 10 Makeup Brands in India for women
Top 10  Makeup Brands in India for womenTop 10  Makeup Brands in India for women
Top 10 Makeup Brands in India for women
 
Call girls in Vashi Services : 9167673311 Free Delivery 24x7 at Your Doorstep
Call girls in Vashi Services :  9167673311 Free Delivery 24x7 at Your DoorstepCall girls in Vashi Services :  9167673311 Free Delivery 24x7 at Your Doorstep
Call girls in Vashi Services : 9167673311 Free Delivery 24x7 at Your Doorstep
 
💗📲09602870969💕-Royal Escorts in Udaipur Call Girls Service Udaipole-Fateh Sag...
💗📲09602870969💕-Royal Escorts in Udaipur Call Girls Service Udaipole-Fateh Sag...💗📲09602870969💕-Royal Escorts in Udaipur Call Girls Service Udaipole-Fateh Sag...
💗📲09602870969💕-Royal Escorts in Udaipur Call Girls Service Udaipole-Fateh Sag...
 
Hi Profile Escorts In Udaipur 09602870969 Call Girls in Sobaghpura Bhopalpura
Hi Profile Escorts In Udaipur 09602870969 Call Girls in Sobaghpura BhopalpuraHi Profile Escorts In Udaipur 09602870969 Call Girls in Sobaghpura Bhopalpura
Hi Profile Escorts In Udaipur 09602870969 Call Girls in Sobaghpura Bhopalpura
 
Call Girls In Karol Bagh__ 8448079011 Escort Service in Delhi
Call Girls In Karol Bagh__ 8448079011 Escort Service in DelhiCall Girls In Karol Bagh__ 8448079011 Escort Service in Delhi
Call Girls In Karol Bagh__ 8448079011 Escort Service in Delhi
 
Call Girls in mahipalpur Delhi 8264348440 ✅ call girls ❤️
Call Girls in mahipalpur Delhi 8264348440 ✅ call girls ❤️Call Girls in mahipalpur Delhi 8264348440 ✅ call girls ❤️
Call Girls in mahipalpur Delhi 8264348440 ✅ call girls ❤️
 
💞Call Girls In Sonipat 08168329307 Sonipat Kundli GTK Bypass EsCoRt Service
💞Call Girls In Sonipat 08168329307 Sonipat Kundli GTK Bypass EsCoRt Service💞Call Girls In Sonipat 08168329307 Sonipat Kundli GTK Bypass EsCoRt Service
💞Call Girls In Sonipat 08168329307 Sonipat Kundli GTK Bypass EsCoRt Service
 
Russian Call Girls Rohini Sector 25 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 25 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Rohini Sector 25 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 25 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Moscow City People project Roman Kurganov
Moscow City People project Roman KurganovMoscow City People project Roman Kurganov
Moscow City People project Roman Kurganov
 
WhatsApp 📞 8448380779 ✅Call Girls In Bhangel Sector 102 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Bhangel Sector 102 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Bhangel Sector 102 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Bhangel Sector 102 ( Noida)
 
Call Girls In Vashi Call Girls Pooja 📞 9892124323 ✅Book Hot And Sexy Girls
Call Girls In Vashi Call Girls Pooja 📞 9892124323 ✅Book Hot And Sexy GirlsCall Girls In Vashi Call Girls Pooja 📞 9892124323 ✅Book Hot And Sexy Girls
Call Girls In Vashi Call Girls Pooja 📞 9892124323 ✅Book Hot And Sexy Girls
 
💞Sexy Call Girls In Ambala 08168329307 Shahabad Call Girls Escort Service
💞Sexy Call Girls In Ambala 08168329307 Shahabad Call Girls Escort Service💞Sexy Call Girls In Ambala 08168329307 Shahabad Call Girls Escort Service
💞Sexy Call Girls In Ambala 08168329307 Shahabad Call Girls Escort Service
 

Ferret A Ruby Search Engine

  • 1. Ferret A Ruby Search Engine Brian Sam-Bodden
  • 2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
  • 3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
  • 4. Agenda • Ferret in Rails • Resources
  • 5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
  • 6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
  • 8. Concepts • Index : Sequence of documents
  • 9. Concepts • Index : Sequence of documents • Document : Sequence of fields
  • 10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
  • 11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
  • 12. Fields of a Document in an Index
  • 13. Fields of a Document in an Index • Fields are individually searchable units that are:
  • 14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
  • 15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
  • 16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
  • 17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
  • 18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
  • 24. Installing Ferret } Pick the latest version for your platform
  • 26. The Recipe 1. Create some Documents
  • 27. The Recipe 1. Create some Documents 2. Create an Index
  • 28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
  • 29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
  • 30. Example Documents Create some Documents
  • 31. Example Documents Create some Documents “Any String is a Document”
  • 32. Example Documents Create some Documents
  • 33. Example Documents Create some Documents [“This”, “is also”, “a document”]
  • 34. Example Documents Create some Documents
  • 35. Example Documents Create some Documents
  • 36. Ferret::Index::Index Create an Index
  • 37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
  • 38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
  • 39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
  • 40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
  • 41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
  • 42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
  • 43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
  • 44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
  • 45. Ferret::Index::Index Perform some Queries
  • 46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
  • 47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
  • 48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
  • 49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
  • 50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
  • 53. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
  • 54. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
  • 56. Ferret in your App Application Database Web User Manual File System Input Get User’s Present Gather Data Search Results Query Index Documents Search Index Ferret Index
  • 57. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
  • 58. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 59. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 60. Ferret in Rails • Simple model has two searchable fields title and body:
  • 61. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
  • 63. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
  • 64. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
  • 65. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret