SlideShare une entreprise Scribd logo
1  sur  155
Search in the Biblical
      Domain
    Brian Seagraves (Bible.org)
What is “Search”?
What is “Search”?
•   Information/Document Retrieval
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
    •   Finding previously seen documents that are
        related to some user-supplied terms.
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
    •   Finding previously seen documents that are
        related to some user-supplied terms.
•   Advanced Definition:
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
    •   Finding previously seen documents that are
        related to some user-supplied terms.
•   Advanced Definition:
    •   Finding relevant content for some query by
        understanding the contextual meaning of
        terms in the search index and query.
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
    •   Finding previously seen documents that are
        related to some user-supplied terms.
•   Advanced Definition:
    •   Finding relevant content for some query by
        understanding the contextual meaning of
        terms in the search index and query.
    •   Semantic Search
Types and Sources of
      Content
Types and Sources of
       Content

• The Bible and its verses
Types and Sources of
       Content

• The Bible and its verses
• Articles, Journals, and other extra-biblical
  content
Types and Sources of
       Content

• The Bible and its verses
• Articles, Journals, and other extra-biblical
  content
• The web
Information Retrieval
      Engines
Information Retrieval
       Engines
• Sphinx - http://sphinxsearch.com
Information Retrieval
       Engines
• Sphinx - http://sphinxsearch.com
• Lucene - http://lucene.apache.org/
Information Retrieval
       Engines
• Sphinx - http://sphinxsearch.com
• Lucene - http://lucene.apache.org/
 • Solr - http://lucene.apache.org/solr/
Information Retrieval
       Engines
• Sphinx - http://sphinxsearch.com
• Lucene - http://lucene.apache.org/
 • Solr - http://lucene.apache.org/solr/
• MySQL Fulltext Search - kinda
Solr
Solr
• Open Source
Solr
• Open Source
• Full-text search
Solr
• Open Source
• Full-text search
• Hit Highlighting
Solr
• Open Source
• Full-text search
• Hit Highlighting
• Facets
Solr
• Open Source
• Full-text search
• Hit Highlighting
• Facets
• Java
Solr
• Open Source
• Full-text search
• Hit Highlighting
• Facets
• Java
• REST-like HTTP/XML and JSON APIs
Solr Documents
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
 • Bible Verse
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
 • Bible Verse
 • Journal Article
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
 • Bible Verse
 • Journal Article
 • Commentary Chapter/Section
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
 • Bible Verse
 • Journal Article
 • Commentary Chapter/Section
 • Web Page
Solr Documents
Solr Documents
•   Documents have one or more Fields
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
     •   String
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
     •   String
     •   Text
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
     •   String
     •   Text
     •   Date
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
     •   String
     •   Text
     •   Date
     •   and More!
Solr Fields
Solr Fields

• Field Types can have:
Solr Fields

• Field Types can have:
 • Filters
Solr Fields

• Field Types can have:
 • Filters
    • Remove parts of the content
Solr Fields

• Field Types can have:
 • Filters
    • Remove parts of the content
 • Tokenizers
Solr Fields

• Field Types can have:
 • Filters
    • Remove parts of the content
 • Tokenizers
    • Split content into chunks/tokens
Solr Fields
Solr Fields
• The “String” Field Type
Solr Fields
• The “String” Field Type
• <fieldType
  name="string"
  class="solr.StrField" />
Solr Fields
• The “String” Field Type
• <fieldType
  name="string"
  class="solr.StrField" />
• No Filter; No Tokenizer
Solr Fields
• The “String” Field Type
• <fieldType
  name="string"
  class="solr.StrField" />
• No Filter; No Tokenizer
 • Field content won’t be split or changed
<fieldtype name="html_text" class="solr.TextField" >
  <analyzer type="index">
     <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
     <filter class="solr.StopFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" />
     <filter class="solr.StopFilterFactory"/>
     <filter class="solr.WordDelimiterFilterFactory" />
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory" />
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldtype>
Sample Schema (cont.)
<fieldtype
 name="sint"
 class="solr.SortableIntField"
 omitNorms="true" />
<fieldtype
 name="string"
 class="solr.StrField"
 sortMissingLast="true"
 omitNorms="true"/>
Sample Schema (cont.)
<fields>
 
 <field name="id" type="sint" indexed="true" stored="true" multiValued="false" />
 
 <field name="abbr" type="string" indexed="true" stored="true" multiValued="false" />

 <field name="name" type="string" indexed="true" stored="true" multiValued="false" />

 <field name="book" type="sint" indexed="true" stored="true" multiValued="false" />

 <field name="chapter" type="sint" indexed="true" stored="true" multiValued="false" />

 <field name="verse" type="sint" indexed="true" stored="true" multiValued="false" />
    <field name="ot_nt" type="string" indexed="true" stored="true" multiValued="false" />
   <field name="net" type="text" indexed="false" stored="true" multiValued="false" />
    <field name="all_index" type="html_text" indexed="true" stored="false" />
</fields>

<copyField source="net" dest="all_index" />
<uniqueKey>id</uniqueKey>
<defaultSearchField>all_index</defaultSearchField>
<solrQueryParser defaultOperator="OR" />
Put Data in Solr
Put Data in Solr
• Remember, Solr communicates using XML
  over HTTP
Put Data in Solr
• Remember, Solr communicates using XML
  over HTTP
• No concept of updating a document -
  delete, then add
Put Data in Solr
• Remember, Solr communicates using XML
  over HTTP
• No concept of updating a document -
  delete, then add
• To add, POST XML to update handler
Put Data in Solr
• Remember, Solr communicates using XML
  over HTTP
• No concept of updating a document -
  delete, then add
• To add, POST XML to update handler
 • http://localhost:8080/solr/bible/update
Add XML
<add>
 <doc>
   <id>1</id>
   <net>In the beginning God created the heavens and
   the earth.</net>
 </doc>
</add>
PHP API
• No XML!
• $client = new SolrClient($options);
  $doc = new SolrInputDocument();
  $doc->addField('id', 1); //Must be Integer

  $doc->addField('net', ‘In the beginning God
  created the heavens and the earth.’);
  $client->addDocument($doc);
Querying Solr
Querying Solr

• HTTP GET Request
Querying Solr

• HTTP GET Request
• http://localhost:8080/solr/bible3/select?q=god
Querying Solr

• HTTP GET Request
• http://localhost:8080/solr/bible3/select?q=god
• | Path to Solr ||Core||Handler||Query |
Querying Solr

• HTTP GET Request
• http://localhost:8080/solr/bible3/select?q=god
• | Path to Solr ||Core||Handler||Query |
•   Returns XML By Default
Querying Solr

• HTTP GET Request
• http://localhost:8080/solr/bible3/select?q=god
• | Path to Solr ||Core||Handler||Query |
•   Returns XML By Default

•   Can return JSON and more
Querying Solr
Querying Solr

•   Queries the defaultSearchField by default
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>

•   Can query other fields by using the syntax:field:value
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>

•   Can query other fields by using the syntax:field:value

    •   http://localhost:8080/solr/bible3/select?q=id:27974
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>

•   Can query other fields by using the syntax:field:value

    •   http://localhost:8080/solr/bible3/select?q=id:27974

•   Multiple queries / Booleans
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>

•   Can query other fields by using the syntax:field:value

    •   http://localhost:8080/solr/bible3/select?q=id:27974

•   Multiple queries / Booleans
    •   http://localhost:8080/solr/bible3/select?q=god AND book:40
Search Multiple
Translations (Fields)
Search Multiple
         Translations (Fields)
•   Let’s add some fields: kjv and kjv_index
Search Multiple
         Translations (Fields)
•   Let’s add some fields: kjv and kjv_index

•   Add some copy field directives:
    <copyField source="kjv" dest="all_index" />
    <copyField source="kjv" dest="kjv_index" />
Search Multiple
         Translations (Fields)
•   Let’s add some fields: kjv and kjv_index

•   Add some copy field directives:
    <copyField source="kjv" dest="all_index" />
    <copyField source="kjv" dest="kjv_index" />

•   Query: “Shew Thyself”
Search Multiple
           Translations (Fields)
•   Let’s add some fields: kjv and kjv_index

•   Add some copy field directives:
    <copyField source="kjv" dest="all_index" />
    <copyField source="kjv" dest="kjv_index" />

•   Query: “Shew Thyself”

    •   0 Results in the NET
        http://localhost:8080/solr/bible3/select?q=shew%20theyself
Search Multiple
           Translations (Fields)
•   Let’s add some fields: kjv and kjv_index

•   Add some copy field directives:
    <copyField source="kjv" dest="all_index" />
    <copyField source="kjv" dest="kjv_index" />

•   Query: “Shew Thyself”

    •   0 Results in the NET
        http://localhost:8080/solr/bible3/select?q=shew%20theyself
    •   360 Results in the Combined index/field
        http://localhost:8080/solr/bible4/select?q=shew%20theyself
Search Multiple
 Translations
Search Multiple
           Translations
• + Quasi Synonym term/phrase injection
Search Multiple
            Translations
• + Quasi Synonym term/phrase injection
• + Less variation across translations leads to stronger
  possible matches
Search Multiple
            Translations
• + Quasi Synonym term/phrase injection
• + Less variation across translations leads to stronger
  possible matches
• + Matches verses when the source translation isn’t
  known
Search Multiple
            Translations
• + Quasi Synonym term/phrase injection
• + Less variation across translations leads to stronger
  possible matches
• + Matches verses when the source translation isn’t
  known
• - No control over which translation gets more weight
Search Multiple
            Translations
• + Quasi Synonym term/phrase injection
• + Less variation across translations leads to stronger
  possible matches
• + Matches verses when the source translation isn’t
  known
• - No control over which translation gets more weight
• - No control over scoring of matches
Search Multiple
                      Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
•   net_index^6 kjv_index^.5
•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^1%20kjv_index^1&fl=score

•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^6%20kjv_index^.5&fl=score
Scoring
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q




•   Basic Factors
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
    •   Length of matching document (↓ is Better)
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                  t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
    •   Length of matching document (↓ is Better)
        •   “Jesus Wept” - John 11:35
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                   t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
    •   Length of matching document (↓ is Better)
        •   “Jesus Wept” - John 11:35
        •   http://localhost:8080/solr/bible3/select?q=wept
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                   t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
    •   Length of matching document (↓ is Better)
        •   “Jesus Wept” - John 11:35
        •   http://localhost:8080/solr/bible3/select?q=wept
•   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/
    Similarity.html
Search Multiple
 Translations
Search Multiple
              Translations
•   Another way: Dismax
Search Multiple
              Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
Search Multiple
              Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
Search Multiple
                Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
Search Multiple
                 Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
Search Multiple
                 Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
•   net_index^6 kjv_index^.5
Search Multiple
                     Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
•   net_index^6 kjv_index^.5
•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^1%20kjv_index^1&fl=score
Search Multiple
                      Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
•   net_index^6 kjv_index^.5
•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^1%20kjv_index^1&fl=score

•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^6%20kjv_index^.5&fl=score
Topic Tagging
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
• Helpful for “theme” based queries.
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
• Helpful for “theme” based queries.
 • “Social Justice” - no good matches
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
• Helpful for “theme” based queries.
 • “Social Justice” - no good matches
 • “Satan” - Many Names
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
• Helpful for “theme” based queries.
 • “Social Justice” - no good matches
 • “Satan” - Many Names
   • Name Tagging in general can be very helpful
Searching Strong’s
Searching Strong’s

• Add a field for Strong’s: strongs_index
Searching Strong’s

• Add a field for Strong’s: strongs_index
•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756
    2316 3498 235 2198
Searching Strong’s

• Add a field for Strong’s: strongs_index
•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756
    2316 3498 235 2198

• Most of the benefits of text searching
Searching Strong’s

• Add a field for Strong’s: strongs_index
•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756
    2316 3498 235 2198

• Most of the benefits of text searching
 • “Word” frequency
Searching Strong’s

• Add a field for Strong’s: strongs_index
•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756
    2316 3498 235 2198

• Most of the benefits of text searching
 • “Word” frequency
 • Document vs. corpus frequency of search terms
Searching Articles
Searching Articles
• Similar approach to text-based queries
Searching Articles
• Similar approach to text-based queries
 • Stem words
Searching Articles
• Similar approach to text-based queries
 • Stem words
 • Use Synonyms
Searching Articles
• Similar approach to text-based queries
 • Stem words
 • Use Synonyms
 • Remove Stop Words
Searching Articles
• Similar approach to text-based queries
 • Stem words
 • Use Synonyms
 • Remove Stop Words
• Without manual tagging, there’s no automatic way
  to index/search by Bible Reference
Searching Articles
Searching Articles

• Article contains reference: “John 3”
Searching Articles

• Article contains reference: “John 3”
• User searches for “John 3:16” or “John 2-4”
Searching Articles

• Article contains reference: “John 3”
• User searches for “John 3:16” or “John 2-4”
• Results: no meaningful matches at best
  (unless the documents match the query
  “John”
Searching Articles
Searching Articles
• Solr-based Solutions:
Searching Articles
• Solr-based Solutions:
 • Identify and index references and their
    composite verses using a grammar.
Searching Articles
• Solr-based Solutions:
 • Identify and index references and their
    composite verses using a grammar.
 • John 1:1-3 -> John 1:1; John 1:2; John 1:3
Searching Articles
• Solr-based Solutions:
 • Identify and index references and their
    composite verses using a grammar.
 • John 1:1-3 -> John 1:1; John 1:2; John 1:3
 • Store in a multivalued field - each
    reference is a “term”
Searching Articles
• Solr-based Solutions:
 • Identify and index references and their
    composite verses using a grammar.
 • John 1:1-3 -> John 1:1; John 1:2; John 1:3
 • Store in a multivalued field - each
    reference is a “term”
 • Must also parse and expand references in
    queries in order to match
Searching Articles
Searching Articles
•   Relational database-based solution:
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
    •   Store: id, articleId, verseId
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
    •   Store: id, articleId, verseId
    •   Parse user query to ids.
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
    •   Store: id, articleId, verseId
    •   Parse user query to ids.
    •   SELECT COUNT(id)
        WHERE verseId IN (ID_LIST)
        GROUP BY articleId
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
    •   Store: id, articleId, verseId
    •   Parse user query to ids.
    •   SELECT COUNT(id)
        WHERE verseId IN (ID_LIST)
        GROUP BY articleId
        •   Higher count -> Article is most likely to me more
            about that reference than other articles with a
            lower count
Searching Articles
Searching Articles
• Relational database-based solution:
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
    (verse occurrences)
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
     (verse occurrences)
 •   Can store id, articleId, verseId, count
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
     (verse occurrences)
 •   Can store id, articleId, verseId, count
     • Then SUM() the counts for each articleId.
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
     (verse occurrences)
 •   Can store id, articleId, verseId, count
     • Then SUM() the counts for each articleId.
     • Negligibly faster.
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
     (verse occurrences)
 •   Can store id, articleId, verseId, count
     • Then SUM() the counts for each articleId.
     • Negligibly faster.
     • Only approx. 3,000,000 rows
Heterogeneous Indexes
Heterogeneous Indexes
•   All content is not created equally.
Heterogeneous Indexes
•   All content is not created equally.
•   Content quality and its affect on the quality of
    your results becomes a factor when you move
    from one resource to > one
Heterogeneous Indexes
•   All content is not created equally.
•   Content quality and its affect on the quality of
    your results becomes a factor when you move
    from one resource to > one
    •   One Bible, One website, One Journal
Heterogeneous Indexes
•   All content is not created equally.
•   Content quality and its affect on the quality of
    your results becomes a factor when you move
    from one resource to > one
    •   One Bible, One website, One Journal
•   Apply a field or document boost to help
    normalize results
Heterogeneous Indexes
•   All content is not created equally.
•   Content quality and its affect on the quality of
    your results becomes a factor when you move
    from one resource to > one
    •   One Bible, One website, One Journal
•   Apply a field or document boost to help
    normalize results
•   Some content gets bumped up and some down
Search in the Biblical Domain - BibleTech: 2011
Search in the Biblical Domain - BibleTech: 2011

Contenu connexe

Tendances

SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
Jan Beeck
 
BISG DOI Overview
BISG DOI OverviewBISG DOI Overview
BISG DOI Overview
Crossref
 

Tendances (18)

Html
HtmlHtml
Html
 
Solr basedsearch
Solr basedsearchSolr basedsearch
Solr basedsearch
 
Unit 3 (it workshop).pptx
Unit 3 (it workshop).pptxUnit 3 (it workshop).pptx
Unit 3 (it workshop).pptx
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
 
Learning sparql 2012 12
Learning sparql 2012 12Learning sparql 2012 12
Learning sparql 2012 12
 
Basic-CSS-tutorial
Basic-CSS-tutorialBasic-CSS-tutorial
Basic-CSS-tutorial
 
Xml
XmlXml
Xml
 
Understanding Taxonomy, Drupal Camp Colorado, June 2009
Understanding Taxonomy, Drupal Camp Colorado, June 2009Understanding Taxonomy, Drupal Camp Colorado, June 2009
Understanding Taxonomy, Drupal Camp Colorado, June 2009
 
Basic css
Basic cssBasic css
Basic css
 
Ruby data types
Ruby data typesRuby data types
Ruby data types
 
Sage Research Method Online
Sage Research Method OnlineSage Research Method Online
Sage Research Method Online
 
Taking document management beyond content types
Taking document management beyond content typesTaking document management beyond content types
Taking document management beyond content types
 
Css
CssCss
Css
 
computer language - html lists
computer language - html listscomputer language - html lists
computer language - html lists
 
Introduction to html
Introduction to htmlIntroduction to html
Introduction to html
 
BISG DOI Overview
BISG DOI OverviewBISG DOI Overview
BISG DOI Overview
 
DOIs for Book Publishers
DOIs for Book PublishersDOIs for Book Publishers
DOIs for Book Publishers
 
Zotero according to Jessica
Zotero according to JessicaZotero according to Jessica
Zotero according to Jessica
 

En vedette

BALLET NACIONAL
BALLET NACIONALBALLET NACIONAL
BALLET NACIONAL
Nietzsche
 
The will to power
The will to powerThe will to power
The will to power
Je Escober
 
Platon (eflatun)
Platon (eflatun)Platon (eflatun)
Platon (eflatun)
sevays067
 
Nietzsche Kimdir?
Nietzsche Kimdir?Nietzsche Kimdir?
Nietzsche Kimdir?
SlaytSunum
 
Allegory of the Cave
Allegory of the CaveAllegory of the Cave
Allegory of the Cave
ellie_rowan
 
Thomas Aquinas
Thomas AquinasThomas Aquinas
Thomas Aquinas
YasirSamad
 

En vedette (20)

BALLET NACIONAL
BALLET NACIONALBALLET NACIONAL
BALLET NACIONAL
 
The will to power
The will to powerThe will to power
The will to power
 
Platon (eflatun)
Platon (eflatun)Platon (eflatun)
Platon (eflatun)
 
Nietzsche Kimdir?
Nietzsche Kimdir?Nietzsche Kimdir?
Nietzsche Kimdir?
 
FRIEDRICH NIETZSCHE
FRIEDRICH NIETZSCHEFRIEDRICH NIETZSCHE
FRIEDRICH NIETZSCHE
 
Saint Thomas Aquinas PHilosophy
Saint Thomas Aquinas PHilosophySaint Thomas Aquinas PHilosophy
Saint Thomas Aquinas PHilosophy
 
Nietzsche's prominent works and God is dead.
Nietzsche's prominent works and God is dead. Nietzsche's prominent works and God is dead.
Nietzsche's prominent works and God is dead.
 
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + DockerDrupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + Docker
 
An Analysis and Interpretation of Plato's Allegory of the Cave
An Analysis and Interpretation of Plato's Allegory of the CaveAn Analysis and Interpretation of Plato's Allegory of the Cave
An Analysis and Interpretation of Plato's Allegory of the Cave
 
Plato’s allegory
Plato’s allegoryPlato’s allegory
Plato’s allegory
 
Allegory of the Cave
Allegory of the CaveAllegory of the Cave
Allegory of the Cave
 
Nietzsche, genio y figura
Nietzsche, genio y figuraNietzsche, genio y figura
Nietzsche, genio y figura
 
Nietzsche
NietzscheNietzsche
Nietzsche
 
Thomas Aquinas
Thomas AquinasThomas Aquinas
Thomas Aquinas
 
Nietzsche
NietzscheNietzsche
Nietzsche
 
Nietzsche's Philosophies
Nietzsche's Philosophies Nietzsche's Philosophies
Nietzsche's Philosophies
 
Socrates
SocratesSocrates
Socrates
 
Socrates Philosophy
Socrates PhilosophySocrates Philosophy
Socrates Philosophy
 
Nietzsche.ppt
Nietzsche.pptNietzsche.ppt
Nietzsche.ppt
 
St. Thomas Aquinas Philosophy
St. Thomas Aquinas PhilosophySt. Thomas Aquinas Philosophy
St. Thomas Aquinas Philosophy
 

Similaire à Search in the Biblical Domain - BibleTech: 2011

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
Library Mashups & APIs
Library Mashups & APIsLibrary Mashups & APIs
Library Mashups & APIs
librarywebchic
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
JSGB
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 

Similaire à Search in the Biblical Domain - BibleTech: 2011 (20)

Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache solr
Apache solrApache solr
Apache solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene intro
Lucene introLucene intro
Lucene intro
 
Library Mashups & APIs
Library Mashups & APIsLibrary Mashups & APIs
Library Mashups & APIs
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Schema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your LibrarySchema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your Library
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Search in the Biblical Domain - BibleTech: 2011

  • 1. Search in the Biblical Domain Brian Seagraves (Bible.org)
  • 3. What is “Search”? • Information/Document Retrieval
  • 4. What is “Search”? • Information/Document Retrieval • Basic Definition:
  • 5. What is “Search”? • Information/Document Retrieval • Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.
  • 6. What is “Search”? • Information/Document Retrieval • Basic Definition: • Finding previously seen documents that are related to some user-supplied terms. • Advanced Definition:
  • 7. What is “Search”? • Information/Document Retrieval • Basic Definition: • Finding previously seen documents that are related to some user-supplied terms. • Advanced Definition: • Finding relevant content for some query by understanding the contextual meaning of terms in the search index and query.
  • 8. What is “Search”? • Information/Document Retrieval • Basic Definition: • Finding previously seen documents that are related to some user-supplied terms. • Advanced Definition: • Finding relevant content for some query by understanding the contextual meaning of terms in the search index and query. • Semantic Search
  • 9. Types and Sources of Content
  • 10. Types and Sources of Content • The Bible and its verses
  • 11. Types and Sources of Content • The Bible and its verses • Articles, Journals, and other extra-biblical content
  • 12. Types and Sources of Content • The Bible and its verses • Articles, Journals, and other extra-biblical content • The web
  • 14. Information Retrieval Engines • Sphinx - http://sphinxsearch.com
  • 15. Information Retrieval Engines • Sphinx - http://sphinxsearch.com • Lucene - http://lucene.apache.org/
  • 16. Information Retrieval Engines • Sphinx - http://sphinxsearch.com • Lucene - http://lucene.apache.org/ • Solr - http://lucene.apache.org/solr/
  • 17. Information Retrieval Engines • Sphinx - http://sphinxsearch.com • Lucene - http://lucene.apache.org/ • Solr - http://lucene.apache.org/solr/ • MySQL Fulltext Search - kinda
  • 18. Solr
  • 20. Solr • Open Source • Full-text search
  • 21. Solr • Open Source • Full-text search • Hit Highlighting
  • 22. Solr • Open Source • Full-text search • Hit Highlighting • Facets
  • 23. Solr • Open Source • Full-text search • Hit Highlighting • Facets • Java
  • 24. Solr • Open Source • Full-text search • Hit Highlighting • Facets • Java • REST-like HTTP/XML and JSON APIs
  • 26. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved
  • 27. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved • Bible Verse
  • 28. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article
  • 29. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article • Commentary Chapter/Section
  • 30. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article • Commentary Chapter/Section • Web Page
  • 32. Solr Documents • Documents have one or more Fields
  • 33. Solr Documents • Documents have one or more Fields • Fields Have types
  • 34. Solr Documents • Documents have one or more Fields • Fields Have types • Integer
  • 35. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float
  • 36. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float • String
  • 37. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float • String • Text
  • 38. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float • String • Text • Date
  • 39. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float • String • Text • Date • and More!
  • 41. Solr Fields • Field Types can have:
  • 42. Solr Fields • Field Types can have: • Filters
  • 43. Solr Fields • Field Types can have: • Filters • Remove parts of the content
  • 44. Solr Fields • Field Types can have: • Filters • Remove parts of the content • Tokenizers
  • 45. Solr Fields • Field Types can have: • Filters • Remove parts of the content • Tokenizers • Split content into chunks/tokens
  • 47. Solr Fields • The “String” Field Type
  • 48. Solr Fields • The “String” Field Type • <fieldType name="string" class="solr.StrField" />
  • 49. Solr Fields • The “String” Field Type • <fieldType name="string" class="solr.StrField" /> • No Filter; No Tokenizer
  • 50. Solr Fields • The “String” Field Type • <fieldType name="string" class="solr.StrField" /> • No Filter; No Tokenizer • Field content won’t be split or changed
  • 51. <fieldtype name="html_text" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" /> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype>
  • 52. Sample Schema (cont.) <fieldtype name="sint" class="solr.SortableIntField" omitNorms="true" /> <fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
  • 53. Sample Schema (cont.) <fields> <field name="id" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="abbr" type="string" indexed="true" stored="true" multiValued="false" /> <field name="name" type="string" indexed="true" stored="true" multiValued="false" /> <field name="book" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="chapter" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="verse" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="ot_nt" type="string" indexed="true" stored="true" multiValued="false" /> <field name="net" type="text" indexed="false" stored="true" multiValued="false" /> <field name="all_index" type="html_text" indexed="true" stored="false" /> </fields> <copyField source="net" dest="all_index" /> <uniqueKey>id</uniqueKey> <defaultSearchField>all_index</defaultSearchField> <solrQueryParser defaultOperator="OR" />
  • 54. Put Data in Solr
  • 55. Put Data in Solr • Remember, Solr communicates using XML over HTTP
  • 56. Put Data in Solr • Remember, Solr communicates using XML over HTTP • No concept of updating a document - delete, then add
  • 57. Put Data in Solr • Remember, Solr communicates using XML over HTTP • No concept of updating a document - delete, then add • To add, POST XML to update handler
  • 58. Put Data in Solr • Remember, Solr communicates using XML over HTTP • No concept of updating a document - delete, then add • To add, POST XML to update handler • http://localhost:8080/solr/bible/update
  • 59. Add XML <add> <doc> <id>1</id> <net>In the beginning God created the heavens and the earth.</net> </doc> </add>
  • 60. PHP API • No XML! • $client = new SolrClient($options); $doc = new SolrInputDocument(); $doc->addField('id', 1); //Must be Integer $doc->addField('net', ‘In the beginning God created the heavens and the earth.’); $client->addDocument($doc);
  • 62. Querying Solr • HTTP GET Request
  • 63. Querying Solr • HTTP GET Request • http://localhost:8080/solr/bible3/select?q=god
  • 64. Querying Solr • HTTP GET Request • http://localhost:8080/solr/bible3/select?q=god • | Path to Solr ||Core||Handler||Query |
  • 65. Querying Solr • HTTP GET Request • http://localhost:8080/solr/bible3/select?q=god • | Path to Solr ||Core||Handler||Query | • Returns XML By Default
  • 66. Querying Solr • HTTP GET Request • http://localhost:8080/solr/bible3/select?q=god • | Path to Solr ||Core||Handler||Query | • Returns XML By Default • Can return JSON and more
  • 68. Querying Solr • Queries the defaultSearchField by default
  • 69. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>
  • 70. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField> • Can query other fields by using the syntax:field:value
  • 71. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField> • Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974
  • 72. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField> • Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974 • Multiple queries / Booleans
  • 73. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField> • Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974 • Multiple queries / Booleans • http://localhost:8080/solr/bible3/select?q=god AND book:40
  • 75. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index
  • 76. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index • Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />
  • 77. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index • Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" /> • Query: “Shew Thyself”
  • 78. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index • Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" /> • Query: “Shew Thyself” • 0 Results in the NET http://localhost:8080/solr/bible3/select?q=shew%20theyself
  • 79. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index • Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" /> • Query: “Shew Thyself” • 0 Results in the NET http://localhost:8080/solr/bible3/select?q=shew%20theyself • 360 Results in the Combined index/field http://localhost:8080/solr/bible4/select?q=shew%20theyself
  • 81. Search Multiple Translations • + Quasi Synonym term/phrase injection
  • 82. Search Multiple Translations • + Quasi Synonym term/phrase injection • + Less variation across translations leads to stronger possible matches
  • 83. Search Multiple Translations • + Quasi Synonym term/phrase injection • + Less variation across translations leads to stronger possible matches • + Matches verses when the source translation isn’t known
  • 84. Search Multiple Translations • + Quasi Synonym term/phrase injection • + Less variation across translations leads to stronger possible matches • + Matches verses when the source translation isn’t known • - No control over which translation gets more weight
  • 85. Search Multiple Translations • + Quasi Synonym term/phrase injection • + Less variation across translations leads to stronger possible matches • + Matches verses when the source translation isn’t known • - No control over which translation gets more weight • - No control over scoring of matches
  • 86. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1. • net_index^6 kjv_index^.5 • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^6%20kjv_index^.5&fl=score
  • 88. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q
  • 89. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors
  • 90. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better)
  • 91. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better)
  • 92. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better)
  • 93. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35
  • 94. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35 • http://localhost:8080/solr/bible3/select?q=wept
  • 95. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35 • http://localhost:8080/solr/bible3/select?q=wept • http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/ Similarity.html
  • 97. Search Multiple Translations • Another way: Dismax
  • 98. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields.
  • 99. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1
  • 100. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights
  • 101. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.
  • 102. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1. • net_index^6 kjv_index^.5
  • 103. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1. • net_index^6 kjv_index^.5 • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score
  • 104. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1. • net_index^6 kjv_index^.5 • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^6%20kjv_index^.5&fl=score
  • 106. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses
  • 107. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses • Helpful for “theme” based queries.
  • 108. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses • Helpful for “theme” based queries. • “Social Justice” - no good matches
  • 109. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses • Helpful for “theme” based queries. • “Social Justice” - no good matches • “Satan” - Many Names
  • 110. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses • Helpful for “theme” based queries. • “Social Justice” - no good matches • “Satan” - Many Names • Name Tagging in general can be very helpful
  • 112. Searching Strong’s • Add a field for Strong’s: strongs_index
  • 113. Searching Strong’s • Add a field for Strong’s: strongs_index • 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198
  • 114. Searching Strong’s • Add a field for Strong’s: strongs_index • 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198 • Most of the benefits of text searching
  • 115. Searching Strong’s • Add a field for Strong’s: strongs_index • 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198 • Most of the benefits of text searching • “Word” frequency
  • 116. Searching Strong’s • Add a field for Strong’s: strongs_index • 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198 • Most of the benefits of text searching • “Word” frequency • Document vs. corpus frequency of search terms
  • 118. Searching Articles • Similar approach to text-based queries
  • 119. Searching Articles • Similar approach to text-based queries • Stem words
  • 120. Searching Articles • Similar approach to text-based queries • Stem words • Use Synonyms
  • 121. Searching Articles • Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words
  • 122. Searching Articles • Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words • Without manual tagging, there’s no automatic way to index/search by Bible Reference
  • 124. Searching Articles • Article contains reference: “John 3”
  • 125. Searching Articles • Article contains reference: “John 3” • User searches for “John 3:16” or “John 2-4”
  • 126. Searching Articles • Article contains reference: “John 3” • User searches for “John 3:16” or “John 2-4” • Results: no meaningful matches at best (unless the documents match the query “John”
  • 129. Searching Articles • Solr-based Solutions: • Identify and index references and their composite verses using a grammar.
  • 130. Searching Articles • Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3
  • 131. Searching Articles • Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3 • Store in a multivalued field - each reference is a “term”
  • 132. Searching Articles • Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3 • Store in a multivalued field - each reference is a “term” • Must also parse and expand references in queries in order to match
  • 134. Searching Articles • Relational database-based solution:
  • 135. Searching Articles • Relational database-based solution: • Assign an id to every verse
  • 136. Searching Articles • Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId
  • 137. Searching Articles • Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids.
  • 138. Searching Articles • Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids. • SELECT COUNT(id) WHERE verseId IN (ID_LIST) GROUP BY articleId
  • 139. Searching Articles • Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids. • SELECT COUNT(id) WHERE verseId IN (ID_LIST) GROUP BY articleId • Higher count -> Article is most likely to me more about that reference than other articles with a lower count
  • 141. Searching Articles • Relational database-based solution:
  • 142. Searching Articles • Relational database-based solution: • Large amount of rows.
  • 143. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences)
  • 144. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count
  • 145. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId.
  • 146. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId. • Negligibly faster.
  • 147. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId. • Negligibly faster. • Only approx. 3,000,000 rows
  • 149. Heterogeneous Indexes • All content is not created equally.
  • 150. Heterogeneous Indexes • All content is not created equally. • Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one
  • 151. Heterogeneous Indexes • All content is not created equally. • Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal
  • 152. Heterogeneous Indexes • All content is not created equally. • Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal • Apply a field or document boost to help normalize results
  • 153. Heterogeneous Indexes • All content is not created equally. • Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal • Apply a field or document boost to help normalize results • Some content gets bumped up and some down

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n
  120. \n
  121. \n
  122. \n
  123. \n
  124. \n
  125. \n
  126. \n
  127. \n
  128. \n
  129. \n
  130. \n
  131. \n