SlideShare une entreprise Scribd logo
1  sur  73
Télécharger pour lire hors ligne
Solr: BEYOND
       THE BASICS!
script: Ian barber (phpir.com)
Art: the internet!
Editor: twitter.com/ianbarber
lettering: ian.barber@gmail.com
http://joind.in/2899
P   REVIOUSLY....
   My site
 search was
slow and the
results were
bad, but Solr               ∑knk,j ni,j
  saved me!
                    tfi,j
                                   ∑k nk,j
                            x id                  ∈ d }|
                                fi,j   | {d:t i
security
comes first!
/etc/solr/solr.xml

Core          Core

CONF          CONF

  /var/solr/data

   /var/solr/lib
<solr sharedLib="/var/solr/lib"
      persistent="true">
  <cores adminPath="/admin/cores">
    <core default="true"
          instanceDir="main"
          name="main">
    </core>
  </cores>
</solr>


                    S    olr.xml
y
                                                                                                                           pla
                                                                                                                      dis
                                                                                                               for                                             <co
                                                                                                      u sed                                                         nfi
                                                                                               nl  y                     ch                                      <!- g>
                                                                                           s o                      ear                                               - S
                                                                                  an  d i                  th e s                                                          et
                                                                           e ma                      of                                                     env
                                                                                                                                                                iro     enc this
                                                                     sch                       ure                                                                                    to
                                                                                                                      nd                                             nme ounte
                                              1.  2">       t his                         nat                     x a                                                     nt,      red 'fals
                                         n=" me of                                 the                    ynt
                                                                                                                a
                                                                                                                                                         con                              an        e'
                                 r sio        na                        fl  ect                  em  a s                                 y                    fig you m                        sev      if
                                                                                                                                                                                                             you
                              ve          e                          re                       ch                                    d  b                          ure        ay                     ere
                       le"            th                       to                         e s                                 lue                                      d.        wan                      con     wa
                     p
                 xam ame" i
                                   s
                                                      t his                      o r t
                                                                                        h
                                                                                                                       t iVa                                                          t s
                                                                                                                                                                                           olr                fig nt so
               e                                                                                                  mul                                                                                             ura        l
          e ="        n                          ge                        r f                                                                                                                    to                   tio r to
     nam bute "                            han                        mbe                                  are                                                   You                                  kee                   n e       c
 ma                                   d c                         nu                                   s                                                                may                                                     rro ontin
     at  tri                 sh oul                      s ion                                 fi eld                                                                         als
                                                                                                                                                                                                          p w
                                                                                                                                                                                                               ork                   r.       ue
                                                    ver                           .                                                 lt                      --> -Dsol
 --                     ns                                                   ons , all                                         fau                                                o s                              ing                    In      wo
    es. licati
                      o
                                         o lr'
                                                s
                                                                   i
                                                                           i
                                                                     cat xist                                  t
                                                                                                          aul e by d
                                                                                                                             e                        <ab                  r.a
                                                                                                                                                                                bor et th                                eve                   a p rking
pos App                                                                                              def tru                                               or                                                                 n i                 rod
                                   s S                        ppl           e                                                                     abo
                                                                                                                                                      rtO tOnCo                     tOn        i                                                      uct after
                              " i                       y a d not                               by                                                                                       Con s to                                 f o
                                                                                                                                                                                                                                       ne                  ion
            n.           1.2                       d b                                     se           d,                                                 nCo       n
                                                                                                                                                                nfi figur                     fig       f                                                           it
      c tio sion=" ould                   ha nge ute di                           ,   fal roduce                                                                    gur        ati                 ura alse                                han
                                                                                                                                                                                                                                               dle
    e                                   c                                      ed                                                                                                                      tio       usi
oll       ver           sh        be             rib                     duc             int                                                      <!                     ati       o                                 n                             r i
                   It         ly           att                      tro ribute                                                               ide -- li                       onE nErro                      nEr
                                                                                                                                                                                                                ror g by                               s m
                                                                                                                                                                                                                                                           is-
          i cs. normal alued                                e   in         t                                                                       nti       b d                  rro
                                                                                                                                                                                      r>
                                                                                                                                                                                             r >$
                                                                                                                                                                                                  {so               =fa        set
     ant not                 iV                        but ons at                                                                                      fie       ire                                  lr.                lse       tin
                                                                                                                                                                                                                                        g t
 sem                 mul
                           t
                                                ttr
                                                     i        i                                                                                             d         cti
                                                                                                                                                                           ves                             abo                              he
              1. 0:                    ue d a dPosit                                                                                       sol
                                                                                                                                                rco     and                      can                           rtO
                                                                                                                                                                                                                   nCo                          sys
                                     l                                                                                                                        use                                                                                   tem
                                iVa FreqAn                                                                   s             ss"                       nfi                               be                               nfi
            e
                              t
                         mul Term                                                                     t e i e "cla                                        g.x
                                                                                                                                                              ml
                                                                                                                                                                    the
                                                                                                                                                                         m t                use                              gur                         pro
                                                                                                                                                                                                                                                             per
    na tur      1. 1:       mi t          ds .                                                tr ibu         Th        l
                                                                                                                                                     sch
                                                                                                                                                          ema     or         o r                 d t
                                                                                                                                                                                                     o i
                                                                                                                                                                                                                                 ati
                                                                                                                                                                                                                                      onE                        ty:
                      : o t fiel                                                            t
                                                                                       " a tions. he re
                                                                                                                     a
                                                                                                                                          he                  .xm                eso                      nst                             rro
                  1.2 tex                                                            e
                                                                                nam fini                  t                        in
                                                                                                                                        t                          l (                lve
                                                                                                                                                                                             any
                                                                                                                                                                                                              ruc
                                                                                                                                                                                                                   t S
                                                                                                                                                                                                                                              r:t
                                                                                                                                                                                                                                                  rue
                    o r                                                  h e "         e          m ine                    s es                    All                  ie:                                            olr                            }< /
             ep t f                                             ns . T ield d deter                                 c las                                dir
                                                                                                                                                             ect
                                                                                                                                                                              Ana
                                                                                                                                                                                  lyz
                                                                                                                                                                                                   "pl
                                                                                                                                                                                                       ugi                    to
        exc      -->                                    iti
                                                              o        y f         tes                      jav
                                                                                                                 a                                                ori                  ers                  ns"                   loa
                                                 e fin used b tribu                                   to                           it           If                     es                   , R
                                                                                                                                                                                                 equ
                                                                                                                                                                                                                  spe
                                                                                                                                                                                                                      cif
                                                                                                                                                                                                                                       d a
                                                                                                                                                                                                                                           n J
                                            e d         e             at                    ref
                                                                                                 er                                               m .a ".                   and                      est                   ied                 ars
                     es >          d  typ        t o b other                            r"                                                rb ati hich/lib                        pat
                                                                                                                                                                                      hs                   Han                   in
                   p             l           l                          e .           l                                                 e ad w                   " d
               <ty -- fie               abe d any                 Typ th "so .                                                     d v         r           ch         ire                  are                  dle                   you
                   <!          t  a l        an          fi  eld       wi           ag e                                  st ore reshol e s nc i    i wh                   cto                    res               rs,                   r
                            us           e           e              g           ck                                    d/            h          ue        lud                   ry                     olv                 etc
                          j       ibu
                                       t          th              n
                                                            rti sis pa                                            exe          ssT         val               ed                    exi                    ed                   ...
                           a ttr ior of s sta                      y                                        ind compre ) to                                      as                     sts                    rel                 ).
                                 av             me          ana
                                                                 l                                   but nal                 lds -->
                                                                                                                                                                      if
                                                                                                                                                                           you                 in                  ati
                                                                                                                                                                                                                       ve
                             beh ass na solr.                                                zed
                                                                                                 ,        io           fie <!--                       <li "
                                                                                                                                                           b                     had
                                                                                                                                                                                                    you                     the
                                 C l        ch  e.                                  an  aly an opt rived                                A d
                                                                                                                                                          e
                                                                                                                                                     tru dir=                          use
                                                                                                                                                                                                        r i
                                                                                                                                                                                                             nst                  ins
                                                                                                           e                                 ir t ="                "./
                                    . apa                                    not pport              h e d                              claL  as op                       lib                d t                  anc
                                                                                                                                                                                                                     eDi
                                                                                                                                                                                                                                       tan
                                                                                                                                                                                                                                           ceD
                               org                                      is          u          n t ).                           -->s   ing ssp       tio                     " /                 he                       r,                   ir.
                                                              t ype ield s led i                     s                         Mis              ath       n b
                                                                                                                                                              y i                >                   fol
                                                                                                                                                                                                          low                  all
                                                           d                                      er                        rt -
                                                                                                                           <!-                       , t
                           -->                       iel           xtF f enab aract                                  " s<
                                                                                                                          o       l                       his      tse                                        ing                   fil
                                              S trF and Te               ( i           ch                     i eld        !-- ib di                            is " lf ad
                                                                                                                                                                ue                                                  syn
                                                                                                                                                                                                                         tax
                                                                                                                                                                                                                                         es
                                                                                                                                                                                                                                             fou
                                       T he        l d         si  on           (in                    S trF         in           Whe       r="
                                                                                                                                                 ../ st=    "tr      use       ds                                             ...                nd
                                   -             e                          e                        .                    tha                                             ful      any                                                               in
                              <!-         t rFi ompres n siz                                  s olr                            t        n a
                                                                                                                                              re n gLa ./c
                                                                                                                                                      .                         for       fil
                                     - S ts c                  ai                         s="                                               ssi gex         ont 4                              es
                                            i            ert                      cla
                                                                                        s
                                                                                                                               dir rtM
                                                                                                                                          i
                                                                                                                                                        isase6rib/                   inc            fo
                                      lim ed a c                             g"                                                  soec                     B sp         ext                lud
                                                                                                                                                                                                ing und i
                                             e                         r in                                  ->             d" i
                                                                                                                          l w l         tor n as                eci         rac                               n t
                                       exc                        "st                                  " -           Fie           l b ed i hy                      fie
                                                                                                                                                                         d i tion/
                                                                                                                                                                                                     all
                                                            me=                                   lse r.Bool ->
                                                                                                                       -
                                                                                                                                         e i w ich                                                          jar he di




    S
                                                         na                                                                              v
                                                                                                                                      ie nc                                             l
                                                                                             "fa sol               <!-
                                                                                                                       -li /retr                          com                n a
                                                                                                                                                                                  ddi ib" /                     s i       r
                                                                                                                                                                                                                    n a ector
                                   --> ldType "/>                                       or        ="                       tb d
                                                                                                                                                 lud
                                                                                                                                                      ed.      ple                    tio         -->                              y
                                          e             e                          e"      las
                                                                                                s                <lisen                                            tare                                                   dir
                                     < f i s =" t r u                         tru                                be b dir
                                                                                                                                  ir=                                ely                   n t                                 ect to th
                                          rm                          e : " ean" c                         ld   <!-            = " . "/ "..
                                                                                                                                         > /                 ut es         mat                   o a                               ory       e
                                     tNo                         typ bool                             hou ou
                                                                                                           f         - I        eld /.
                                                                                                                                     .        ../       rib                     ch                    dir                               .



                                olr’s secret plan!
                               omi                           n                                      s           nd arf Fa y i d           ./d t distatt /                           the                    ect
                                                       lea            ="                      ata                                                                                         reg                  ory
                                                  boo e name                             e d                      Bin              ir         i
                                                                                                                                           irs st/"        " r                                  ex                  , o
                                               -
                                         <!- ldTyp                                   Th                       r.                        gF ti
                                                                                                                                        op                      ege                                 (an                 nly
                                                              e"  />         yp e.                   ="  sol                   is  sin        on        reg
                                                                                                                                                             ex=    x="
                                                                                                                                                                         apa                            cho                    the
                                                e                          t                      ss                       tM                       (wi
                                           < f i s =" t r u d a t a                          cla                      sor                                th      "ap
                                                                                                                                                                      ach che-s
                                                                                                                                                                                                             red
                                                                                                                                                                                                                   on
                                                                                                                                                                                                                                    fil
                                                                                                                                                                                                                                         es
                                            N orm inary                - >           a ry"                     and                                           or
                                                                                                                                                                 wit       e-s       olr                               bot
                                     o mit !--B                  s - e =" b i n                       L ast                                                           hou      olr        -ce                               h e
                                              <             ing                                  ing                                                                       t a     -cl         ll-                              nds
                                                   d  Str pe nam                           M iss                                                                                 reg uster
                                                                                                                                                                                                    d.
                                                                                                                                                                                                        *.                          )
                                             ode eldty                              s ort                                                                                            ex)         ing         jar
                                        enc <fi                                al                                                                                                           is       -          " /
                                                                      pt ion                                                                                                                     use d.*.           >
                                                                    o                                                                                                                                d a        jar
                                                            The                                                                                                                                           nd
                                                                                                                                                                                                              not
                                                                                                                                                                                                                    " /
                                                                                                                                                                                                                        -
                                                   < !--                                                                                                                                                           hin ->
                                                                                                                                                                                                                       g i
                                                                                                                                                                                                                            s
<listener event="firstSearcher"
          class="solr.QuerySenderListener">
 <arr name="queries">
  <lst>
   <str name="q">solr rocks</str>
   <str name="start">0</str>
   <str name="rows">10</str>
  </lst>
  <lst>
   <str name="q">from solrconfig.xml</str>
  </lst>
 </arr>
</listener>
                                cache
                               warming!
Query


Index Configuration


 Request Handlers


search components
Content
          fields
  Type



section
          field
          types
search
types
THe cms!
TITLE




LEAD PARA
                   DATE




            BODY
permalink




Category



           Author         Tags
Scientific
analysis!
how do we turn our
 text into tokens?
Field Type, Storage,
    Tokenisation,
  Filters, and copy
        fields.
<fieldType name="text" class="solr.TextField">
  <analyzer>
    <tokenizer
      class="solr.WhitespaceTokenizerFactory" />
    <filter
      class="solr.StopFilterFactory"/>
    <filter
      class="solr.WordDelimiterFilterFactory"/>
    <filter
     class="solr.LowerCaseFilterFactory"/>
    <filter
     class="solr.SnowballPorterFilterFactory"/>
  </analyzer>
</fieldType>


                      S    chema.xml
ORIGINAL         STANDARD
                  O Reilly       S
   O’Reilly’s
                     wi     FI
  wi-fi guide!
                      GUIDE

keyword          Whitespace
                    O’Reilly’s
   O’Reilly’s
                      wi-fi
  wi-fi guide!
                      guide!
doc 1
        “My
      Phrase?”



stored       INDEXED


             my          doc 1
  “My
Phrase?”
            phrase       doc 1
Ian barber              IAIN BARBOUR

   AN      PRPR            AN      PRPR

<fieldtype name="phonetic"
 class="solr.TextField">
 <analyzer>
  <tokenizer
    class="solr.StandardTokenizerFactory"/>
  <filter
    class="solr.DoubleMetaphoneFilterFactory"
    inject="false"/>
 </analyzer>
</fieldtype>
<filter
 class="solr.WordDelimiterFilterFactory"
 generateWordParts="1" catenateWords="1"
 generateNumberParts="1" catenateNumbers="1"
 catenateAll="0" splitOnCaseChange="1"
/>


delimiters
    O Reilly        S     OReillys
        wi     FI            wifi
         GUIDE
precision versus recall

          vs
<filter
class="solr.SnowballPorterFilterFactory"
language="English"
protected="protwords.txt" />




stemming
     O   Reilli     S      OReilli
         wi    FI            wifi
          GUID
Je ne parle
    pas
 anglais!
TITLE




LEAD PARA      BODY
<fieldType name="tdate"
  class="solr.TrieDateField" omitNorms="true"
  precisionStep="6" positionIncrementGap="0" />

<fieldType name="lowercase"
  class="solr.TextField">
  <analyzer>
    <tokenizer
        class="solr.KeywordTokenizerFactory" />
    <filter
         class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>


                     S    chema.xml
permalink




           Date
category


                      tags


           author
<fields>
<field name="permalink" type="lowercase"
                        required="true" />
<field name="category" type="lowercase" />
<field name="tag" type="lowercase"
                        multiValued="true" />
<field name="title" type="text" required="true"/>
<field name="body" type="text" required="true" />
<field name="author" type="lowercase"
             stored="false" multiValued="true" />
<field name="date" type="tdate"
                            multiValued="true" />
<field name="lead_para" type="text" />
<field name="phonetic" type="phonetic" />
<field name="text" type="text" stored="false"
                            multiValued="true" />


                  S
</fields>
                       chema.xml
<!-- Copy Fields -->
<copyField source="permalink" dest="text" />
<copyField source="category" dest="text" />
<copyField source="title" dest="text" />
<copyField source="lead_para" dest="text" />
<copyField source="body" dest="text" />
<copyField source="author" dest="text" />
<copyField source="category" dest="phonetic" />
<copyField source="title" dest="phonetic" />
<copyField source="lead_para" dest="phonetic" />
<copyField source="body" dest="phonetic" />
<copyField source="author" dest="phonetic" />

<!-- ID -->
<uniqueKey>permalink</uniqueKey>
from solr import *
s=SolrConnection(
             'http://localhost:8080/solr/main')
doc = dict(
  permalink = "http://fooweb.com/strategy/
DCPO",
  category = "strategy",
  title = "DPCO: A Framework For Synergy",
  body = "DPCO, or Dynamic Performance Class
Organisation is a ISO90210 quality oriented
management process [...]",
  author = "Sean Alison",
  date = "2011-03-01T00:00:00Z",
  source_site = "fooweb.com",
)


                    s
s.add(doc)
s.commit()
                         impleadd.py
<add>
 <doc>
  <field name="body">
    DPCO, or Dynamic Performance Class [...]
  </field>
  <field name="category">strategy</field>
  <field name="permalink">
    http://fooweb.com/strategy/DCPO
  </field>
  <field name="source_site">fooweb.com</field>
  <field name="title">
    DPCO: A Framework For Synergy
  </field>
  <field name="date">2011-03-01T00:00:00Z
  </field>
  <field name="author">Sean Alison</field>
 </doc>
</add>
time for the
  gadgets!
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.
DataImportHandler">
  <lst name="defaults">
    <str name="config">
      db-data-config.xml
    </str>
  </lst>
</requestHandler>




                  S    olrconfig.xml
<dataConfig>      D    ata-config.xml
<dataSource driver="com.mysql.jdbc.Driver"
  url="jdbc:mysql://localhost:3306/cms"
  user="root" password="password" />
<document>
 <entity name="story"
    query="SELECT s.id, s.content, CONCAT
       (u.first_name, ' ', u.last_name) as
       author [...] s.status_id = 1"
    deltaImportQuery="SELECT s.id, s.content
       [...] AND s.id = ${dataimporter.delta.id}"
    deltaQuery="SELECT id FROM stories WHERE
       modified > ${dataimporter.last_index_time}"
    transformer=
      "TemplateTransformer,HTMLStripTransformer"
  >
<field column="permalink" name="permalink"
   template="http://fooweb.com/${story.slug}" />
  <field column="publish_date" name="date" />
  <field column="content" name="body"
                          stripHTML="true" />
  <field column="source_site" template="cms" />
  [...]
  <entity
    name="topic"
    query="SELECT [...] st.item_id=${story.id}">
    <field column="category" />
  </entity>
 </entity>
</document>
</dataConfig>
<response>
 <str name="command">full-import</str>
 <str name="status">busy</str>
 <str name="importResponse">
           A command is still running...</str>
 <lst name="statusMessages">
  <str name="Time Elapsed">0:0:14.979</str>
  <str name="Total Requests made">5523</str>
  <str name="Total Rows Fetched">5522</str>
  <str name="Total Documents Processed">
                                    2760</str>
  <str name="Total Documents Skipped">0</str>
  <str name="Full Dump Started">
                     2011-03-02 15:48:00</str>
 </lst>
</response>

     http://SOLR:8080/solr/main/dataimport
The SOLR
  CELL!
<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.
ExtractingRequestHandler">
  <lst name="defaults">
    <str name="uprefix">ignored_</str>
  </lst>
</requestHandler>




                  S    olrconfig.xml
<fieldtype name="ignored" stored="false"
indexed="false" multiValued="true"
class="solr.StrField" />




                       S    chema.xml
<dynamicField
   name="ignored_*"
   type="ignored"
   indexed="false"
  stored="false"
/>                     can it
                        be...
                      schema
                      free?!


D  ynamic Fields
$	
  curl	
  -­‐v	
  
“http://localhost:8080/solr/main/update/extract?
literal.source_site=files
&literal.permalink=http://fooweb.com/arch.pdf
&commit=true
&fmap.content=body
&fmap.Author=author
—data-­‐binary	
  @arch.pdf	
  
-­‐H	
  ‘Content-­‐Type:application/pdf’
Lucidimagination.com/blog/2009/03/09/nutch-solr




               A crawler!
# skip some protocols
-^(https|telnet|file|ftp|mailto):
-[?*!@=]

# allow urls in defined domain
+^http://([a-z0-9-A-Z]*.)*fooweb.com/

# skip URLs with slash-delimited segment that
repeats 3+ times, to break loops
-.*(/[^/]+)/[^/]+1/[^/]+1/

# deny anything else
-.


              r    egex-urlfilter.txt
<mapping>
 <fields>
 <field dest="body" source="content" />
 <field dest="source_site" source="site" />
 <field dest="title" source="title" />
 <field dest="ignored_host" source="host" />
 <field dest="ignored_segment"
                         source="segment" />
 <field dest="ignored_boost" source="boost" />
 <field dest="ignored_digest"
                          source="digest" />
 <field dest="date" source="tstamp" />
 <field dest="permalink" source="url" />
 </fields>
 <uniqueKey>permalink</uniqueKey>



         S
</mapping>

              olrindex-mapping.xml
$	
  echo	
  "http://subsite.fooweb.com"	
  >	
  urls/seed.txt
$	
  bin/nutch	
  inject	
  /var/nutch/crawldb	
  urls


$	
  bin/nutch	
  generate	
  /var/nutch/crawldb	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /var/nutch/segments
$	
  export	
  SEGMENT=/var/nutch/segments/`ls	
  -­‐tr	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /var/nutch/segments|tail	
  -­‐1`
$	
  bin/nutch	
  fetch	
  $SEGMENT	
  -­‐noParsing
$	
  bin/nutch	
  parse	
  $SEGMENT
$	
  bin/nutch	
  updatedb	
  $SEGMENT	
  -­‐filter	
  -­‐normalize
$	
  bin/nutch	
  invertlinks	
  /var/nutch/linkdb	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐dir	
  /var/nutch/segments


$	
  bin/nutch	
  solrindex	
  http://localhost:8080/solr/
main	
  /var/nutch/crawldb	
  /var/nutch/linkdb/	
  /var/
nutch/segments/*
solr goes to
   work!
he has dismax!
<requestHandler name="dismax"
      class="solr.SearchHandler" default="true">
  <lst name="defaults">
    <str name="defType">dismax</str>
    <str name="echoParams">explicit</str>
    <float name="tie">0.01</float>
    <str name="qf">
      text^0.5 category^1.5 title^2 body^1
      permalink^10.0 author^1.8 tag^1.3
    </str>
    <str name="pf">
      text^0.2 title^4 author^1.8 body^1
    </str>
    <str name="mm">3&lt;60%</str>



                   S
  </lst>
</requestHandler>
                        olrconfig.xml
from solr import *
url = 'http://localhost:8080/solr/main'
s = SolrConnection(url)

response = s.query('idie manager')
for hit in response.results:
  print hit['title']
  print hit['body']



$	
  python	
  simplequery.py	
  
Overview	
  of	
  the	
  IDIE	
  manager
To	
  help	
  with	
  those	
  implementing	
  IDIE	
  [...]
IDIE:	
  The	
  801g	
  Of	
  Talent	
  Management
Inspiration-­‐Direction-­‐Influence	
  [...]
<str name="bf">
   recip(ms(NOW,date),3.16e-11,1,1)
</str>




FunctionQuery(1.0/(3.16E-11*float(ms(const
(1299450070912),date(date)))+1.0)), product
of:
    0.9974636 = 1.0/(3.16E-11*float(ms(const
(1299450070912),date(date)=1299369600000))
+1.0)
    1.0 = boost
    0.03730806 = queryNorm
going beyond
    just
   search
  results!
$solr = new Apache_Solr_Service(
              'localhost', 8080, '/solr/main');
$query = "badly drawn";
$p = array(
   'facet' => "true",
   'facet.field' => 'category',
   'facet.mincount' => 1,
);

$r = $solr->search($query, 0, 5, $p);
foreach(
    $r->facet_counts->facet_fields->category
    as $cat => $count) {
  echo $cat, " ", $count, PHP_EOL;
$query = "";
$p = array(
   'q.alt' => "*:*",
   "facet" => "true",
   "facet.date" => 'date',
   "facet.date.start" => "NOW/YEAR-6MONTHS",
   "facet.date.end" => "NOW/YEAR",
   "facet.date.gap" => "+1MONTH",
   "fq" => "category: Reviews",
);

$r = $solr->search($query, 0, 0, $p);
foreach($r->facet_counts->facet_dates->date
                         as $date => $count) {
  echo $date, " ", $count, PHP_EOL;
}
$query = "";
$p = array(
   'q.alt' => "*:*",
   'facet' => "true",
   'facet.mincount' => 1,
   "facet.query" => array("title:gig",
                          "title:album"),
   "fq" => "category:Reviews",
);
$r = $solr->search($query, 0, 0, $p);
foreach($r->facet_counts->facet_queries as
                           $query => $count) {
   echo $query, " ", $count, PHP_EOL;
}
What Fields
 to facet?


         how to
         facet?


              what facets
               to show?
<requestHandler name="mlt"
         class="solr.MoreLikeThisHandler">
  <lst name="defaults">
    <str name="defType">mlt</str>
    <str name="mlt">true</str>
    <str name="mlt.fl">body title</str>
    <str name="mlt.match.include">
      false
    </str>
  </lst>
</requestHandler>




                 S    olrconfig.xml
$solr = new Apache_Solr_Service
         ('localhost', 8080, '/solr/main');
$query = "Losing my backpacking virginity";
$p = array('qt' => "mlt");
$results = $solr->search($query, 0, 3, $p);
foreach($results->response->docs as $doc) {
  echo $doc->title, PHP_EOL;
}


$	
  php	
  mltquery.php	
  
Backpacking	
  across	
  USA	
  social	
  media	
  way
Safe	
  solo	
  travel	
  on	
  New	
  York	
  holidays
Cracking	
  The	
  Big	
  Apple's	
  Big	
  10
THanks!
script: Ian barber (phpir.com)
Art: the internet!
Editor: twitter.com/ianbarber
lettering: ian.barber@gmail.com
http://joind.in/2899
Some useful links!

http://wiki.apache.org/solr
http://nutch.apache.org/
http://lucidimagination.com/blog/
http://robotlibrarian.billdueber.com/
http://code.google.com/p/solr-php-client
http://pypi.python.org/pypi/solrpy
https://www.packtpub.com/solr-1-4-
enterprise-search-server/book

http://github.com/ianbarber/SolrBTB-Talk
Bonus
content!
<searchComponent name="spellcheck"
  class="solr.SpellCheckComponent">
  <str name="queryAnalyzerFieldType">
    textSpell
  </str>
  <lst name="spellchecker">
    <str name="name">default</str>
    <str name="field">spell</str>
    <str name="buildOnCommit">true</str>
    <str name="spellcheckIndexDir">
       /var/lib/solr/spellchecker
    </str>
  </lst>



                 S
</searchComponent>

                       olrconfig.xml
<fieldType name="textSpell"
  class="solr.TextField"
  positionIncrementGap="100"
  omitNorms="true">
  <analyzer>
    <tokenizer
      class="solr.StandardTokenizerFactory" />
    <filter class="solr.StopFilterFactory"
      ignoreCase="true"
      words="stopwords.txt" />
    <filter
      class="solr.LowerCaseFilterFactory" />
    <filter
     class="solr.StandardFilterFactory" />



                       S
  </analyzer>
</fieldType>
                             chema.xml
[...]
     <int name="ps">10</int>
     <int name="qs">5</int>
     <str
name="spellcheck.onlyMorePopular">true</str>
     <str
name="spellcheck.extendedResults">false</str>
     <str name="spellcheck.count">1</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>




                 D     ismax handler
$solr = new Apache_Solr_Service('localhost',
                        8080, '/solr/main');
$p = array(
        'spellcheck' => 'true',
        'spellcheck.collate' => 'true');
$results = $solr->search("roose", 0, 5, $p);
echo "Did you mean " . $results->spellcheck-
>suggestions->collation, PHP_EOL;




$	
  php	
  spellquery.php	
  
Did	
  you	
  mean	
  rose
include_once "Apache/Solr/Service.php";
$solr = new Apache_Solr_Service(
          'localhost', 8080, '/solr/main');
$query = "album review";
$p = array('sort' => 'title_sort desc');
$res = $solr->search($query, 0, 10, $p);
foreach($res->response->docs as $doc) {
  echo $doc->title, PHP_EOL;
}


<field name="title_sort" type="lowercase"
indexed="true" stored="false" />


<copyField source="title"
                  dest="title_sort" />
http://code.google.com/p/solr-php-client


$	
  php	
  sortquery.php	
  
Zola	
  Jesus	
  album	
  review	
  -­‐	
  Stridulum	
  II
Zero	
  7	
  album	
  review	
  -­‐	
  Record
Zebra	
  and	
  Giraffe
Young	
  Knives	
  video	
  interview	
  part	
  2
Young	
  Knives	
  -­‐	
  Road	
  to	
  V	
  winners	
  on	
  tour
You	
  Me	
  At	
  Six	
  @	
  Wembley	
  Arena,	
  London
You	
  Me	
  At	
  Six	
  -­‐	
  Hold	
  Me	
  Down
Yet	
  again...	
  Good	
  Shoes	
  @	
  ULU,	
  London
Yelle:	
  North	
  American	
  tour	
  review
Yelle:	
  interview	
  with	
  a	
  French	
  pop	
  artiste
<highlighting>
<fragmenter name="regex"
  class="[..]highlight.RegexFragmenter">
<lst name="defaults">
 <int name="hl.fragsize">70</int>
 <float name="hl.regex.slop">0.5</float>
 <str name="hl.regex.pattern">
         [-w ,/n"']{20,200}</str>
</lst>
</fragmenter>
<formatter name="html"
 class="[...]highlight.HtmlFormatter"
 default="true">
<lst name="defaults">
 <str name="hl.simple.pre"><![CDATA[<em>]]></str>
 <str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
</highlighting>
$so = new Apache_Solr_Service('localhost',
8080, '/solr/main');
$q = "album review";
$r =$so->search($q,0,5,array('hl'=>"true"));
foreach($r->response->docs as $doc) {
  echo $r->highlighting->{$doc->permalink}-
>title[0], PHP_EOL;
}



$	
  php	
  highlightquery.php	
  
Fenech	
  Soler	
  <em>album</em>	
  <em>review</em>
Weezer	
  -­‐	
  Hurley	
  <em>album</em>	
  <em>review</em>
Feeder	
  <em>album</em>	
  <em>review</em>	
  -­‐	
  
Renegades
Replication   sharding   caching




  The masters of scaling
        are here!
from solr import *
url = 'http://localhost:8080/solr/main'
s = SolrConnection(url)
response = s.query('ISO90210')
if(response.results.numFound == '0'):
  print "No results found!"



$	
  python	
  simplefail.py	
  
No	
  results	
  found!
                                    IS SOLR
                                   DEFEATED?
http://solrurl:8080/solr/main/admin/analysis.jsp
/solr/select/?q="iso 90210"&debugQuery=true



<lst name="debug">
 <str name="rawquerystring">"iso 90210"</
str>
 <str name="querystring">"iso 90210"</str>
 <str name="parsedquery">
+DisjunctionMaxQuery((body:"iso 90210")
~0.01) DisjunctionMaxQuery((body:"iso
90210")~0.01)</str>
/solr/select/?q=iso 90210&debugQuery=true


<lst name="debug">
 <str name="rawquerystring">iso 90210</str>
 <str name="querystring">iso 90210</str>
 <str name="parsedquery">
+((DisjunctionMaxQuery((body:iso)~0.01)
DisjunctionMaxQuery((body:90210)~0.01))~2)
DisjunctionMaxQuery((body:"iso 90210")
~0.01)</str>
 <str name="parsedquery_toString">+
(((body:iso)~0.01 (body:90210)~0.01)~2)
(body:"iso 90210")~0.01</str>
&explainother=90210
0.0 = (NON-MATCH) Failure to meet condition(s) of
required/prohibited clause(s)
 0.0 = no match on required clause
 (body:"iso 90210")
   0.0 = weight(body:"iso 90210" in 0), product of:
     0.6953707 = queryWeight(body:"iso 90210"),
     product of:
       3.8325815 = idf(body: iso=1 90210=1)
       0.18143663 = queryNorm
     0.0 = fieldWeight(body:"iso 90210" in 0),
     product of:
       0.0 = tf(phraseFreq=0.0)
       3.8325815 = idf(body: iso=1 90210=1)
       0.15625 = fieldNorm(field=body, doc=0)
<str name="echoParams">explicit</str>
  <float name="tie">0.01</float>
   <str name="qf">
       text^0.5 category^1.5 title^2 body^1
       permalink^10.0 author^1.8 tag^1.3
   </str>
  <str name="pf">
       text^0.2 title^4 author^1.8 body^1
   </str>
   <str name="mm"> 3&lt;60%</str>
   <int name="ps">10</int>
   <int name="qs">5</int>
</lst>




                 S     olrconfig.xml
from solr import *
url = 'http://localhost:8080/solr/main'
s = SolrConnection(url)
response = s.query('ISO90210')
if(response.results.numFound == '0'):
  print "No results found!"



$	
  python	
  simplefail.py	
  
DPCO:	
  A	
  Framework	
  For	
  Synergy
DPCO,	
  or	
  Dynamic	
  Performance	
  Class	
  Organisation	
  
is	
  a	
  ISO90210	
  quality	
  [...]

Contenu connexe

En vedette

Tips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azureTips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azurelucenerevolution
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Hotspot Garbage Collection - The Useful Parts
Hotspot Garbage Collection - The Useful PartsHotspot Garbage Collection - The Useful Parts
Hotspot Garbage Collection - The Useful PartsjClarity
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance TuningMinh Hoang
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuningJerry Kurian
 
Un concurso en la selva
Un concurso en la selvaUn concurso en la selva
Un concurso en la selvaescnorsup
 
Palmieri - Red Mundial de Comunicación Organizacional: Ciberprotestas
Palmieri  - Red Mundial de Comunicación Organizacional: CiberprotestasPalmieri  - Red Mundial de Comunicación Organizacional: Ciberprotestas
Palmieri - Red Mundial de Comunicación Organizacional: CiberprotestasRicardo Palmieri
 
Digital Era
Digital  EraDigital  Era
Digital Eramauroto
 
Newsletter Banca IFIS Group 2
Newsletter Banca IFIS Group 2Newsletter Banca IFIS Group 2
Newsletter Banca IFIS Group 2Banca Ifis
 

En vedette (18)

Tips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azureTips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azure
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Hotspot Garbage Collection - The Useful Parts
Hotspot Garbage Collection - The Useful PartsHotspot Garbage Collection - The Useful Parts
Hotspot Garbage Collection - The Useful Parts
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
Los objetivos del catalogo
Los objetivos del catalogoLos objetivos del catalogo
Los objetivos del catalogo
 
Cuestionario de redes convergentes
Cuestionario de redes convergentesCuestionario de redes convergentes
Cuestionario de redes convergentes
 
Un concurso en la selva
Un concurso en la selvaUn concurso en la selva
Un concurso en la selva
 
Palmieri - Red Mundial de Comunicación Organizacional: Ciberprotestas
Palmieri  - Red Mundial de Comunicación Organizacional: CiberprotestasPalmieri  - Red Mundial de Comunicación Organizacional: Ciberprotestas
Palmieri - Red Mundial de Comunicación Organizacional: Ciberprotestas
 
Digital Era
Digital  EraDigital  Era
Digital Era
 
Newsletter Banca IFIS Group 2
Newsletter Banca IFIS Group 2Newsletter Banca IFIS Group 2
Newsletter Banca IFIS Group 2
 

Plus de Ian Barber

How to stand on the shoulders of giants
How to stand on the shoulders of giantsHow to stand on the shoulders of giants
How to stand on the shoulders of giantsIan Barber
 
ZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made SimpleZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made SimpleIan Barber
 
Teaching Your Machine To Find Fraudsters
Teaching Your Machine To Find FraudstersTeaching Your Machine To Find Fraudsters
Teaching Your Machine To Find FraudstersIan Barber
 
ZeroMQ Is The Answer: PHP Tek 11 Version
ZeroMQ Is The Answer: PHP Tek 11 VersionZeroMQ Is The Answer: PHP Tek 11 Version
ZeroMQ Is The Answer: PHP Tek 11 VersionIan Barber
 
Debugging: Rules And Tools - PHPTek 11 Version
Debugging: Rules And Tools - PHPTek 11 VersionDebugging: Rules And Tools - PHPTek 11 Version
Debugging: Rules And Tools - PHPTek 11 VersionIan Barber
 
ZeroMQ Is The Answer: DPC 11 Version
ZeroMQ Is The Answer: DPC 11 VersionZeroMQ Is The Answer: DPC 11 Version
ZeroMQ Is The Answer: DPC 11 VersionIan Barber
 
ZeroMQ Is The Answer
ZeroMQ Is The AnswerZeroMQ Is The Answer
ZeroMQ Is The AnswerIan Barber
 
Deployment Tactics
Deployment TacticsDeployment Tactics
Deployment TacticsIan Barber
 
In Search Of: Integrating Site Search (PHP Barcelona)
In Search Of: Integrating Site Search (PHP Barcelona)In Search Of: Integrating Site Search (PHP Barcelona)
In Search Of: Integrating Site Search (PHP Barcelona)Ian Barber
 
Debugging: Rules & Tools
Debugging: Rules & ToolsDebugging: Rules & Tools
Debugging: Rules & ToolsIan Barber
 
In Search Of... (Dutch PHP Conference 2010)
In Search Of... (Dutch PHP Conference 2010)In Search Of... (Dutch PHP Conference 2010)
In Search Of... (Dutch PHP Conference 2010)Ian Barber
 
In Search Of... integrating site search
In Search Of... integrating site search In Search Of... integrating site search
In Search Of... integrating site search Ian Barber
 
Document Classification In PHP - Slight Return
Document Classification In PHP - Slight ReturnDocument Classification In PHP - Slight Return
Document Classification In PHP - Slight ReturnIan Barber
 
Document Classification In PHP
Document Classification In PHPDocument Classification In PHP
Document Classification In PHPIan Barber
 

Plus de Ian Barber (14)

How to stand on the shoulders of giants
How to stand on the shoulders of giantsHow to stand on the shoulders of giants
How to stand on the shoulders of giants
 
ZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made SimpleZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made Simple
 
Teaching Your Machine To Find Fraudsters
Teaching Your Machine To Find FraudstersTeaching Your Machine To Find Fraudsters
Teaching Your Machine To Find Fraudsters
 
ZeroMQ Is The Answer: PHP Tek 11 Version
ZeroMQ Is The Answer: PHP Tek 11 VersionZeroMQ Is The Answer: PHP Tek 11 Version
ZeroMQ Is The Answer: PHP Tek 11 Version
 
Debugging: Rules And Tools - PHPTek 11 Version
Debugging: Rules And Tools - PHPTek 11 VersionDebugging: Rules And Tools - PHPTek 11 Version
Debugging: Rules And Tools - PHPTek 11 Version
 
ZeroMQ Is The Answer: DPC 11 Version
ZeroMQ Is The Answer: DPC 11 VersionZeroMQ Is The Answer: DPC 11 Version
ZeroMQ Is The Answer: DPC 11 Version
 
ZeroMQ Is The Answer
ZeroMQ Is The AnswerZeroMQ Is The Answer
ZeroMQ Is The Answer
 
Deployment Tactics
Deployment TacticsDeployment Tactics
Deployment Tactics
 
In Search Of: Integrating Site Search (PHP Barcelona)
In Search Of: Integrating Site Search (PHP Barcelona)In Search Of: Integrating Site Search (PHP Barcelona)
In Search Of: Integrating Site Search (PHP Barcelona)
 
Debugging: Rules & Tools
Debugging: Rules & ToolsDebugging: Rules & Tools
Debugging: Rules & Tools
 
In Search Of... (Dutch PHP Conference 2010)
In Search Of... (Dutch PHP Conference 2010)In Search Of... (Dutch PHP Conference 2010)
In Search Of... (Dutch PHP Conference 2010)
 
In Search Of... integrating site search
In Search Of... integrating site search In Search Of... integrating site search
In Search Of... integrating site search
 
Document Classification In PHP - Slight Return
Document Classification In PHP - Slight ReturnDocument Classification In PHP - Slight Return
Document Classification In PHP - Slight Return
 
Document Classification In PHP
Document Classification In PHPDocument Classification In PHP
Document Classification In PHP
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 

Solr: Beyond the Basics

  • 1. Solr: BEYOND THE BASICS! script: Ian barber (phpir.com) Art: the internet! Editor: twitter.com/ianbarber lettering: ian.barber@gmail.com http://joind.in/2899
  • 2. P REVIOUSLY.... My site search was slow and the results were bad, but Solr ∑knk,j ni,j saved me! tfi,j ∑k nk,j x id ∈ d }| fi,j | {d:t i
  • 4. /etc/solr/solr.xml Core Core CONF CONF /var/solr/data /var/solr/lib
  • 5. <solr sharedLib="/var/solr/lib" persistent="true"> <cores adminPath="/admin/cores"> <core default="true" instanceDir="main" name="main"> </core> </cores> </solr> S olr.xml
  • 6. y pla dis for <co u sed nfi nl y ch <!- g> s o ear - S an d i th e s et e ma of env iro enc this sch ure to nd nme ounte 1. 2"> t his nat x a nt, red 'fals n=" me of the ynt a con an e' r sio na fl ect em a s y fig you m sev if you ve e re ch d b ure ay ere le" th to e s lue d. wan con wa p xam ame" i s t his o r t h t iVa t s olr fig nt so e mul ura l e =" n ge r f to tio r to nam bute " han mbe are You kee n e c ma d c nu s may rro ontin at tri sh oul s ion fi eld als p w ork r. ue ver . lt --> -Dsol -- ns ons , all fau o s ing In wo es. licati o o lr' s i i cat xist t aul e by d e <ab r.a bor et th eve a p rking pos App def tru or n i rod s S ppl e abo rtO tOnCo tOn i uct after " i y a d not by Con s to f o ne ion n. 1.2 d b se d, nCo n nfi figur fig f it c tio sion=" ould ha nge ute di , fal roduce gur ati ura alse han dle e c ed tio usi oll ver sh be rib duc int <! ati o n r i It ly att tro ribute ide -- li onE nErro nEr ror g by s m is- i cs. normal alued e in t nti b d rro r> r >$ {so =fa set ant not iV but ons at fie ire lr. lse tin g t sem mul t ttr i i d cti ves abo he 1. 0: ue d a dPosit sol rco and can rtO nCo sys l use tem iVa FreqAn s ss" nfi be nfi e t mul Term t e i e "cla g.x ml the m t use gur pro per na tur 1. 1: mi t ds . tr ibu Th l sch ema or o r d t o i ati onE ty: : o t fiel t " a tions. he re a he .xm eso nst rro 1.2 tex e nam fini t in t l ( lve any ruc t S r:t rue o r h e " e m ine s es All ie: olr }< / ep t f ns . T ield d deter c las dir ect Ana lyz "pl ugi to exc --> iti o y f tes jav a ori ers ns" loa e fin used b tribu to it If es , R equ spe cif d a n J e d e at ref er m .a ". and est ied ars es > d typ t o b other r" rb ati hich/lib pat hs Han in p l l e . l e ad w " d <ty -- fie abe d any Typ th "so . d v r ch ire are dle you <! t a l an fi eld wi ag e st ore reshol e s nc i i wh cto res rs, r us e e g ck d/ h ue lud ry olv etc j ibu t th n rti sis pa exe ssT val ed exi ed ... a ttr ior of s sta y ind compre ) to as sts rel ). av me ana l but nal lds --> if you in ati ve beh ass na solr. zed , io fie <!-- <li " b had you the C l ch e. an aly an opt rived A d e tru dir= use r i nst ins e ir t =" "./ . apa not pport h e d claL as op lib d t anc eDi tan ceD org is u n t ). -->s ing ssp tio " / he r, ir. t ype ield s led i s Mis ath n b y i > fol low all d er rt - <!- , t --> iel xtF f enab aract " s< o l his tse ing fil S trF and Te ( i ch i eld !-- ib di is " lf ad ue syn tax es fou T he l d si on (in S trF in Whe r=" ../ st= "tr use ds ... nd - e e . tha ful any in <!- t rFi ompres n siz s olr t n a re n gLa ./c . for fil - S ts c ai s=" ssi gex ont 4 es i ert cla s dir rtM i isase6rib/ inc fo lim ed a c g" soec B sp ext lud ing und i e r in -> d" i l w l tor n as eci rac n t exc "st " - Fie l b ed i hy fie d i tion/ all me= lse r.Bool -> - e i w ich jar he di S na v ie nc l "fa sol <!- -li /retr com n a ddi ib" / s i r n a ector --> ldType "/> or =" tb d lud ed. ple tio --> y e e e" las s <lisen tare dir < f i s =" t r u tru be b dir ir= ely n t ect to th rm e : " ean" c ld <!- = " . "/ ".. > / ut es mat o a ory e tNo typ bool hou ou f - I eld /. . ../ rib ch dir . olr’s secret plan! omi n s nd arf Fa y i d ./d t distatt / the ect lea =" ata reg ory boo e name e d Bin ir i irs st/" " r ex , o - <!- ldTyp Th r. gF ti op ege (an nly e" /> yp e. =" sol is sin on reg ex= x=" apa cho the e t ss tM (wi < f i s =" t r u d a t a cla sor th "ap ach che-s red on fil es N orm inary - > a ry" and or wit e-s olr bot o mit !--B s - e =" b i n L ast hou olr -ce h e < ing ing t a -cl ll- nds d Str pe nam M iss reg uster d. *. ) ode eldty s ort ex) ing jar enc <fi al is - " / pt ion use d.*. > o d a jar The nd not " / - < !-- hin -> g i s
  • 7. <listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst> <str name="q">solr rocks</str> <str name="start">0</str> <str name="rows">10</str> </lst> <lst> <str name="q">from solrconfig.xml</str> </lst> </arr> </listener> cache warming!
  • 8. Query Index Configuration Request Handlers search components
  • 9. Content fields Type section field types search types
  • 11. TITLE LEAD PARA DATE BODY
  • 12. permalink Category Author Tags
  • 13. Scientific analysis! how do we turn our text into tokens? Field Type, Storage, Tokenisation, Filters, and copy fields.
  • 14. <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory"/> </analyzer> </fieldType> S chema.xml
  • 15. ORIGINAL STANDARD O Reilly S O’Reilly’s wi FI wi-fi guide! GUIDE keyword Whitespace O’Reilly’s O’Reilly’s wi-fi wi-fi guide! guide!
  • 16. doc 1 “My Phrase?” stored INDEXED my doc 1 “My Phrase?” phrase doc 1
  • 17. Ian barber IAIN BARBOUR AN PRPR AN PRPR <fieldtype name="phonetic" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> </analyzer> </fieldtype>
  • 18. <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" catenateWords="1" generateNumberParts="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" /> delimiters O Reilly S OReillys wi FI wifi GUIDE
  • 21. Je ne parle pas anglais!
  • 23. <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0" /> <fieldType name="lowercase" class="solr.TextField"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType> S chema.xml
  • 24. permalink Date category tags author
  • 25. <fields> <field name="permalink" type="lowercase" required="true" /> <field name="category" type="lowercase" /> <field name="tag" type="lowercase" multiValued="true" /> <field name="title" type="text" required="true"/> <field name="body" type="text" required="true" /> <field name="author" type="lowercase" stored="false" multiValued="true" /> <field name="date" type="tdate" multiValued="true" /> <field name="lead_para" type="text" /> <field name="phonetic" type="phonetic" /> <field name="text" type="text" stored="false" multiValued="true" /> S </fields> chema.xml
  • 26. <!-- Copy Fields --> <copyField source="permalink" dest="text" /> <copyField source="category" dest="text" /> <copyField source="title" dest="text" /> <copyField source="lead_para" dest="text" /> <copyField source="body" dest="text" /> <copyField source="author" dest="text" /> <copyField source="category" dest="phonetic" /> <copyField source="title" dest="phonetic" /> <copyField source="lead_para" dest="phonetic" /> <copyField source="body" dest="phonetic" /> <copyField source="author" dest="phonetic" /> <!-- ID --> <uniqueKey>permalink</uniqueKey>
  • 27. from solr import * s=SolrConnection( 'http://localhost:8080/solr/main') doc = dict( permalink = "http://fooweb.com/strategy/ DCPO", category = "strategy", title = "DPCO: A Framework For Synergy", body = "DPCO, or Dynamic Performance Class Organisation is a ISO90210 quality oriented management process [...]", author = "Sean Alison", date = "2011-03-01T00:00:00Z", source_site = "fooweb.com", ) s s.add(doc) s.commit() impleadd.py
  • 28. <add> <doc> <field name="body"> DPCO, or Dynamic Performance Class [...] </field> <field name="category">strategy</field> <field name="permalink"> http://fooweb.com/strategy/DCPO </field> <field name="source_site">fooweb.com</field> <field name="title"> DPCO: A Framework For Synergy </field> <field name="date">2011-03-01T00:00:00Z </field> <field name="author">Sean Alison</field> </doc> </add>
  • 29. time for the gadgets!
  • 30. <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. DataImportHandler"> <lst name="defaults"> <str name="config"> db-data-config.xml </str> </lst> </requestHandler> S olrconfig.xml
  • 31. <dataConfig> D ata-config.xml <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/cms" user="root" password="password" /> <document> <entity name="story" query="SELECT s.id, s.content, CONCAT (u.first_name, ' ', u.last_name) as author [...] s.status_id = 1" deltaImportQuery="SELECT s.id, s.content [...] AND s.id = ${dataimporter.delta.id}" deltaQuery="SELECT id FROM stories WHERE modified > ${dataimporter.last_index_time}" transformer= "TemplateTransformer,HTMLStripTransformer" >
  • 32. <field column="permalink" name="permalink" template="http://fooweb.com/${story.slug}" /> <field column="publish_date" name="date" /> <field column="content" name="body" stripHTML="true" /> <field column="source_site" template="cms" /> [...] <entity name="topic" query="SELECT [...] st.item_id=${story.id}"> <field column="category" /> </entity> </entity> </document> </dataConfig>
  • 33. <response> <str name="command">full-import</str> <str name="status">busy</str> <str name="importResponse"> A command is still running...</str> <lst name="statusMessages"> <str name="Time Elapsed">0:0:14.979</str> <str name="Total Requests made">5523</str> <str name="Total Rows Fetched">5522</str> <str name="Total Documents Processed"> 2760</str> <str name="Total Documents Skipped">0</str> <str name="Full Dump Started"> 2011-03-02 15:48:00</str> </lst> </response> http://SOLR:8080/solr/main/dataimport
  • 34. The SOLR CELL!
  • 35. <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction. ExtractingRequestHandler"> <lst name="defaults"> <str name="uprefix">ignored_</str> </lst> </requestHandler> S olrconfig.xml
  • 36. <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" /> S chema.xml
  • 37. <dynamicField name="ignored_*" type="ignored" indexed="false" stored="false" /> can it be... schema free?! D ynamic Fields
  • 38. $  curl  -­‐v   “http://localhost:8080/solr/main/update/extract? literal.source_site=files &literal.permalink=http://fooweb.com/arch.pdf &commit=true &fmap.content=body &fmap.Author=author —data-­‐binary  @arch.pdf   -­‐H  ‘Content-­‐Type:application/pdf’
  • 40. # skip some protocols -^(https|telnet|file|ftp|mailto): -[?*!@=] # allow urls in defined domain +^http://([a-z0-9-A-Z]*.)*fooweb.com/ # skip URLs with slash-delimited segment that repeats 3+ times, to break loops -.*(/[^/]+)/[^/]+1/[^/]+1/ # deny anything else -. r egex-urlfilter.txt
  • 41. <mapping> <fields> <field dest="body" source="content" /> <field dest="source_site" source="site" /> <field dest="title" source="title" /> <field dest="ignored_host" source="host" /> <field dest="ignored_segment" source="segment" /> <field dest="ignored_boost" source="boost" /> <field dest="ignored_digest" source="digest" /> <field dest="date" source="tstamp" /> <field dest="permalink" source="url" /> </fields> <uniqueKey>permalink</uniqueKey> S </mapping> olrindex-mapping.xml
  • 42. $  echo  "http://subsite.fooweb.com"  >  urls/seed.txt $  bin/nutch  inject  /var/nutch/crawldb  urls $  bin/nutch  generate  /var/nutch/crawldb                                            /var/nutch/segments $  export  SEGMENT=/var/nutch/segments/`ls  -­‐tr                                    /var/nutch/segments|tail  -­‐1` $  bin/nutch  fetch  $SEGMENT  -­‐noParsing $  bin/nutch  parse  $SEGMENT $  bin/nutch  updatedb  $SEGMENT  -­‐filter  -­‐normalize $  bin/nutch  invertlinks  /var/nutch/linkdb                                        -­‐dir  /var/nutch/segments $  bin/nutch  solrindex  http://localhost:8080/solr/ main  /var/nutch/crawldb  /var/nutch/linkdb/  /var/ nutch/segments/*
  • 43. solr goes to work!
  • 45. <requestHandler name="dismax" class="solr.SearchHandler" default="true"> <lst name="defaults"> <str name="defType">dismax</str> <str name="echoParams">explicit</str> <float name="tie">0.01</float> <str name="qf"> text^0.5 category^1.5 title^2 body^1 permalink^10.0 author^1.8 tag^1.3 </str> <str name="pf"> text^0.2 title^4 author^1.8 body^1 </str> <str name="mm">3&lt;60%</str> S </lst> </requestHandler> olrconfig.xml
  • 46. from solr import * url = 'http://localhost:8080/solr/main' s = SolrConnection(url) response = s.query('idie manager') for hit in response.results: print hit['title'] print hit['body'] $  python  simplequery.py   Overview  of  the  IDIE  manager To  help  with  those  implementing  IDIE  [...] IDIE:  The  801g  Of  Talent  Management Inspiration-­‐Direction-­‐Influence  [...]
  • 47. <str name="bf"> recip(ms(NOW,date),3.16e-11,1,1) </str> FunctionQuery(1.0/(3.16E-11*float(ms(const (1299450070912),date(date)))+1.0)), product of: 0.9974636 = 1.0/(3.16E-11*float(ms(const (1299450070912),date(date)=1299369600000)) +1.0) 1.0 = boost 0.03730806 = queryNorm
  • 48. going beyond just search results!
  • 49. $solr = new Apache_Solr_Service( 'localhost', 8080, '/solr/main'); $query = "badly drawn"; $p = array( 'facet' => "true", 'facet.field' => 'category', 'facet.mincount' => 1, ); $r = $solr->search($query, 0, 5, $p); foreach( $r->facet_counts->facet_fields->category as $cat => $count) { echo $cat, " ", $count, PHP_EOL;
  • 50. $query = ""; $p = array( 'q.alt' => "*:*", "facet" => "true", "facet.date" => 'date', "facet.date.start" => "NOW/YEAR-6MONTHS", "facet.date.end" => "NOW/YEAR", "facet.date.gap" => "+1MONTH", "fq" => "category: Reviews", ); $r = $solr->search($query, 0, 0, $p); foreach($r->facet_counts->facet_dates->date as $date => $count) { echo $date, " ", $count, PHP_EOL; }
  • 51. $query = ""; $p = array( 'q.alt' => "*:*", 'facet' => "true", 'facet.mincount' => 1, "facet.query" => array("title:gig", "title:album"), "fq" => "category:Reviews", ); $r = $solr->search($query, 0, 0, $p); foreach($r->facet_counts->facet_queries as $query => $count) { echo $query, " ", $count, PHP_EOL; }
  • 52. What Fields to facet? how to facet? what facets to show?
  • 53. <requestHandler name="mlt" class="solr.MoreLikeThisHandler"> <lst name="defaults"> <str name="defType">mlt</str> <str name="mlt">true</str> <str name="mlt.fl">body title</str> <str name="mlt.match.include"> false </str> </lst> </requestHandler> S olrconfig.xml
  • 54. $solr = new Apache_Solr_Service ('localhost', 8080, '/solr/main'); $query = "Losing my backpacking virginity"; $p = array('qt' => "mlt"); $results = $solr->search($query, 0, 3, $p); foreach($results->response->docs as $doc) { echo $doc->title, PHP_EOL; } $  php  mltquery.php   Backpacking  across  USA  social  media  way Safe  solo  travel  on  New  York  holidays Cracking  The  Big  Apple's  Big  10
  • 55. THanks! script: Ian barber (phpir.com) Art: the internet! Editor: twitter.com/ianbarber lettering: ian.barber@gmail.com http://joind.in/2899
  • 58. <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType"> textSpell </str> <lst name="spellchecker"> <str name="name">default</str> <str name="field">spell</str> <str name="buildOnCommit">true</str> <str name="spellcheckIndexDir"> /var/lib/solr/spellchecker </str> </lst> S </searchComponent> olrconfig.xml
  • 59. <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StandardFilterFactory" /> S </analyzer> </fieldType> chema.xml
  • 60. [...] <int name="ps">10</int> <int name="qs">5</int> <str name="spellcheck.onlyMorePopular">true</str> <str name="spellcheck.extendedResults">false</str> <str name="spellcheck.count">1</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler> D ismax handler
  • 61. $solr = new Apache_Solr_Service('localhost', 8080, '/solr/main'); $p = array( 'spellcheck' => 'true', 'spellcheck.collate' => 'true'); $results = $solr->search("roose", 0, 5, $p); echo "Did you mean " . $results->spellcheck- >suggestions->collation, PHP_EOL; $  php  spellquery.php   Did  you  mean  rose
  • 62. include_once "Apache/Solr/Service.php"; $solr = new Apache_Solr_Service( 'localhost', 8080, '/solr/main'); $query = "album review"; $p = array('sort' => 'title_sort desc'); $res = $solr->search($query, 0, 10, $p); foreach($res->response->docs as $doc) { echo $doc->title, PHP_EOL; } <field name="title_sort" type="lowercase" indexed="true" stored="false" /> <copyField source="title" dest="title_sort" />
  • 63. http://code.google.com/p/solr-php-client $  php  sortquery.php   Zola  Jesus  album  review  -­‐  Stridulum  II Zero  7  album  review  -­‐  Record Zebra  and  Giraffe Young  Knives  video  interview  part  2 Young  Knives  -­‐  Road  to  V  winners  on  tour You  Me  At  Six  @  Wembley  Arena,  London You  Me  At  Six  -­‐  Hold  Me  Down Yet  again...  Good  Shoes  @  ULU,  London Yelle:  North  American  tour  review Yelle:  interview  with  a  French  pop  artiste
  • 64. <highlighting> <fragmenter name="regex" class="[..]highlight.RegexFragmenter"> <lst name="defaults"> <int name="hl.fragsize">70</int> <float name="hl.regex.slop">0.5</float> <str name="hl.regex.pattern"> [-w ,/n"']{20,200}</str> </lst> </fragmenter> <formatter name="html" class="[...]highlight.HtmlFormatter" default="true"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA[<em>]]></str> <str name="hl.simple.post"><![CDATA[</em>]]></str> </lst> </formatter> </highlighting>
  • 65. $so = new Apache_Solr_Service('localhost', 8080, '/solr/main'); $q = "album review"; $r =$so->search($q,0,5,array('hl'=>"true")); foreach($r->response->docs as $doc) { echo $r->highlighting->{$doc->permalink}- >title[0], PHP_EOL; } $  php  highlightquery.php   Fenech  Soler  <em>album</em>  <em>review</em> Weezer  -­‐  Hurley  <em>album</em>  <em>review</em> Feeder  <em>album</em>  <em>review</em>  -­‐   Renegades
  • 66. Replication sharding caching The masters of scaling are here!
  • 67. from solr import * url = 'http://localhost:8080/solr/main' s = SolrConnection(url) response = s.query('ISO90210') if(response.results.numFound == '0'): print "No results found!" $  python  simplefail.py   No  results  found! IS SOLR DEFEATED?
  • 69. /solr/select/?q="iso 90210"&debugQuery=true <lst name="debug"> <str name="rawquerystring">"iso 90210"</ str> <str name="querystring">"iso 90210"</str> <str name="parsedquery"> +DisjunctionMaxQuery((body:"iso 90210") ~0.01) DisjunctionMaxQuery((body:"iso 90210")~0.01)</str>
  • 70. /solr/select/?q=iso 90210&debugQuery=true <lst name="debug"> <str name="rawquerystring">iso 90210</str> <str name="querystring">iso 90210</str> <str name="parsedquery"> +((DisjunctionMaxQuery((body:iso)~0.01) DisjunctionMaxQuery((body:90210)~0.01))~2) DisjunctionMaxQuery((body:"iso 90210") ~0.01)</str> <str name="parsedquery_toString">+ (((body:iso)~0.01 (body:90210)~0.01)~2) (body:"iso 90210")~0.01</str>
  • 71. &explainother=90210 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s) 0.0 = no match on required clause (body:"iso 90210") 0.0 = weight(body:"iso 90210" in 0), product of: 0.6953707 = queryWeight(body:"iso 90210"), product of: 3.8325815 = idf(body: iso=1 90210=1) 0.18143663 = queryNorm 0.0 = fieldWeight(body:"iso 90210" in 0), product of: 0.0 = tf(phraseFreq=0.0) 3.8325815 = idf(body: iso=1 90210=1) 0.15625 = fieldNorm(field=body, doc=0)
  • 72. <str name="echoParams">explicit</str> <float name="tie">0.01</float> <str name="qf"> text^0.5 category^1.5 title^2 body^1 permalink^10.0 author^1.8 tag^1.3 </str> <str name="pf"> text^0.2 title^4 author^1.8 body^1 </str> <str name="mm"> 3&lt;60%</str> <int name="ps">10</int> <int name="qs">5</int> </lst> S olrconfig.xml
  • 73. from solr import * url = 'http://localhost:8080/solr/main' s = SolrConnection(url) response = s.query('ISO90210') if(response.results.numFound == '0'): print "No results found!" $  python  simplefail.py   DPCO:  A  Framework  For  Synergy DPCO,  or  Dynamic  Performance  Class  Organisation   is  a  ISO90210  quality  [...]