SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
Introductions…!
  Who	
  the	
  hell	
  am	
  I?	
  
    Jay	
  Hill,	
  Lucid	
  Imagina-on	
  
    7	
  years	
  Lucene	
  experience	
  
    4	
  years	
  Solr	
  experience	
  
    Author	
  of	
  Lucid	
  Training	
  
    SME	
  for	
  Lucid	
  Cer-fica-on	
  
  Who	
  the	
  hell	
  are	
  you?	
  
    New	
  to	
  search?	
  
    New	
  to	
  Lucene/Solr?	
  
    BaKle-­‐tested	
  veterans?	
  


©	
  Lucid	
  Imagina-on,	
  Inc.	
  
We'll Leave Time For Q&A!
  Who's	
  doing	
  what?	
  
    Solr	
  3.1?	
  
    Solr	
  1.4.1?	
  
    Nightly	
  build?	
  
    Solr	
  1.3	
  or	
  older?	
  

  Are	
  there	
  any	
  specific	
  problems	
  you're	
  having?	
  
  Meanwhile,	
  interrupt,	
  ask	
  ques8ons	
  as	
  we	
  go,	
  etc.	
  	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
A Brief Word About Lucid Imagination!
  Lucid	
  Imagina8on:	
  
     The	
  commercial	
  company	
  suppor-ng	
  	
  
      Lucene/Solr	
  open	
  source	
  search.	
  
     Founded	
  by	
  	
  
         Yonik	
  Seeley	
  –	
  Creator	
  of	
  Solr	
  
         Erik	
  Hatcher	
  –	
  Co-­‐author,	
  Lucene	
  In	
  Ac-on	
  
         Grant	
  Ingersoll	
  –	
  Apache	
  PMC	
  Chair	
  
         Marc	
  Krellenstein	
  –	
  Lucid	
  CTO	
  
     Staff	
  includes	
  9	
  Lucene/Solr	
  commiKers	
  
     Training,	
  cer-fica-on,	
  support,	
  LucidWorks	
  Enterprise	
  



©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Lucid Customers (That I've Worked With)!




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
…On To The Sinning!!




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Sins As Anti-Patterns?!
  "Sorta	
  kinda"	
  
     Specify	
  Nothing	
  (Sloth)	
  
     Creeping	
  Featurei-s	
  (Greed)	
  
     Blowhard	
  Jamboree	
  (Pride)	
  
     Boat	
  Anchor	
  (Lust)	
  
     Not	
  Invented	
  Here	
  (Envy)	
  
     Phatware	
  (GluKony)	
  
     Emperor's	
  New	
  Clothes	
  (Wrath)	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Sins Can Contradict One Another!!
  You'll	
  no-ce	
  that	
  many	
  of	
  the	
  "sins"	
  	
  
   we	
  see	
  will	
  be	
  the	
  exact	
  opposite	
  of	
  	
  
   others	
  
  Just	
  as	
  some	
  of	
  us	
  tend	
  towards	
  	
  
   laziness,	
  others	
  towards	
  excess	
  

  Some-mes	
  you	
  -­‐	
  
     "Look	
  before	
  you	
  leap."	
  
  Other	
  -mes,	
  	
  
     "He	
  who	
  hesitates	
  is	
  lost."	
  
  In	
  Solr	
  (or	
  any	
  search	
  app),	
  one	
  size	
  never	
  fits	
  all	
  


©	
  Lucid	
  Imagina-on,	
  Inc.	
  
"I	
  don't	
  know	
  
                                        and	
  I	
  don't	
  care."	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Sloth!
  "We	
  aren't	
  really	
  into	
  open	
  source."	
  
     Lack	
  of	
  commitment	
  to	
  Solr	
  and/or	
  the	
  search	
  
      applica-on	
  itself	
  
  Not	
  developing	
  in-­‐house	
  Solr	
  exper-se	
  
  Not	
  paying	
  enough	
  aKen-on	
  to	
  JVM	
  sebngs,	
  	
  
   garbage	
  collec-on,	
  and	
  RAM	
  alloca-on.	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Sloth!
  Neglec-ng	
  to	
  get	
  familiar	
  with	
  the	
  source	
  code	
  
     It	
  is	
  open	
  source	
  ader	
  all!	
  
  Not	
  taking	
  the	
  -me	
  to	
  understand	
  the	
  main	
  
   parts	
  of	
  Solr:	
  
     Request	
  Handlers	
  
     Search	
  components	
  
     Query	
  parsers	
  
            Extend	
  QParserPlugin	
  class	
  
     ValueSource	
  &	
  ValueSourceParser	
  –	
  custom	
  func-ons	
  
            New	
  pseudo-­‐fields	
  in	
  4.x	
  
     Response	
  writers	
  

©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Sloth!
  Not	
  keeping	
  up	
  with	
  new	
  features	
  and	
  
   developments	
  in	
  Lucene	
  and	
  Solr	
  




    CHANGES.txt	
  –	
  use	
  "diff"	
  to	
  keep	
  up	
  on	
  changes	
  



©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Sloth!
  New	
  features	
  in	
  Solr	
  3.1:	
  
     Solr	
  spa8al	
  
     Edismax	
  query	
  parser	
  
          NOT	
  experimental!	
  
     Dynamic	
  metadata	
  extrac-on	
  via	
  UIMA	
  
     Numeric	
  range	
  face8ng	
  (like	
  date	
  face-ng)	
  
     Lucene	
  RAMDirectoryFactory	
  available	
  
     Face-ng	
  performance	
  improvements	
  
     Spellcheck	
  and	
  Terms	
  components	
  now	
  
      work	
  for	
  distributed	
  search	
  
     Suggester	
  component	
  –	
  beKer	
  autosuggest!	
  
          Can	
  add	
  custom	
  dict.,	
  phrases,	
  etc.	
  
©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Sloth!
  New	
  features	
  coming	
  in	
  Solr	
  4.x:	
  
     Lucene	
  DocumentWritersPerThread	
  (DWPT)	
  
          Moving	
  towards	
  "real	
  -me"	
  
     UpdateHandler	
  upgrade	
  to	
  work	
  with	
  real-­‐-me	
  	
  
     Field	
  collapsing/grouping	
  
     Pivot	
  facets	
  
     SolrCloud	
  (Zookeeper)	
  
     Fuzzy	
  queries	
  100	
  -mes	
  faster	
  
     Pseudo	
  fields	
  via	
  func-ons	
  
     Relevancy	
  func-on	
  queries:	
  n,	
  idf,	
  docFreq,	
  norm,	
  …	
  


©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Sloth: The Path To Salvation!
  Commit	
  to	
  the	
  project	
  and	
  to	
  learning	
  Solr	
  
  Stay	
  up	
  to	
  date	
  on	
  Solr	
  changes	
  
  Stay	
  current	
  with	
  ongoing	
  releases	
  
  Get	
  familiar	
  with	
  the	
  source	
  code	
  
  Spend	
  some	
  -me	
  to	
  understand	
  the	
  main	
  
   configura-on	
  files:	
  
     solrconfig.xml	
  
     schema.xml	
  
  Read	
  through	
  the	
  en-re	
  Solr	
  Wiki	
  once	
  every	
  so	
  oden	
  
  Develop	
  in-­‐house	
  Solr	
  exper-se	
  



©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Save	
  a	
  penny,	
  
                                        lose	
  a	
  customer.	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Greed!
  Skimping	
  on	
  resources	
  such	
  as:	
  
     RAM	
  	
  
        "Here's	
  a	
  quarter	
  buddy,	
  go	
  buy	
  some	
  RAM!"	
  
     Storage	
  space	
  

  You	
  will	
  get	
  what	
  you	
  pay	
  for!	
  
     …on	
  the	
  other	
  hand,	
  not	
  every	
  company	
  has	
  "deep	
  pockets"	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Greed!
  Trying	
  to	
  "squeeze	
  by",	
  indexing	
  to,	
  and	
  searching	
  
   on,	
  the	
  same	
  server	
  
                                                     Indexing	
  
               Indexing	
  


                                                                         Shards	
  (Indexers)	
  




                                                                                   Slave/Searchers	
  




                                                  Load	
  Balancer	
  
              Searches	
  
                                                  Searches	
  
©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Greed!
  Not	
  making	
  the	
  effort	
  to	
  find	
  the	
  right	
  balance	
  
   between	
  precision	
  and	
  recall	
  

        Recall:	
  What	
  frac-on	
  of	
       Precision:	
  What	
  frac-on	
  
        the	
  relevant	
  documents	
  in	
     of	
  the	
  returned	
  results	
  
        the	
  collec-on	
  were	
  re-­‐	
      are	
  relevant	
  to	
  the	
  
        turned	
  by	
  the	
  system?	
  	
     informa-on	
  need?	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Greed!
  A	
  few	
  thoughts	
  about	
  relevance:	
  
     Get	
  feedback	
  from	
  domain	
  experts	
  
     Is	
  it	
  beKer	
  to	
  have	
  lots	
  of	
  results	
  with	
  less	
  	
  
          precision,	
  or	
  fewer,	
  more	
  targeted	
  results?	
  
     Different	
  sites	
  will	
  have	
  very	
  different	
  	
  
          requirements	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Greed: The Path To Salvation!
      Pry	
  open	
  your	
  wallet	
  –	
  don't	
  be	
  cheap	
  
      You	
  don't	
  have	
  to	
  push	
  the	
  envelope	
  
      Find	
  the	
  right	
  balance	
  between	
  recall	
  and	
  precision	
  
      Don't	
  push	
  for	
  more	
  results	
  over	
  precision	
  –	
  unless	
  
       that	
  is	
  a	
  clear	
  requirement	
  (some-mes	
  it	
  is)	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
"What	
  could	
  possibly	
  
                                                    go	
  wrong?	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Pride!
  Reinven-ng	
  the	
  wheel	
  
     "Why	
  don't	
  we	
  just	
  write	
  our	
  own	
  search	
  
      libraries?"	
  
     Nobody	
  has	
  a	
  use	
  case	
  like	
  us	
  –	
  right?	
  
     "We	
  need	
  to	
  change	
  the	
  scoring	
  algorithms."	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Pride!
  Thinking	
  you	
  can	
  "do	
  it	
  all"	
  in	
  Solr	
  
     Solr	
  is	
  rarely	
  a	
  good	
  choice	
  as	
  a	
  SOR	
  
  Consider	
  other	
  tools	
  to	
  work	
  with	
  Solr:	
  
     Nutch	
  
     Mahout	
  
     OpenNLP	
  
     Google	
  Connector	
  Framework	
  
     Your	
  own	
  code	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Pride!
  Stubbornly	
  refusing	
  to	
  use	
  resources	
  such	
  as	
  the	
  	
  
   mailing	
  lists:	
  
     Solr	
  user	
  list:	
  
         solr-­‐user@lucene.apache.org	
  
     Solr	
  developer	
  list:	
  
         dev@lucene.apache.org	
  
     Lucene	
  user	
  list:	
  
         java-­‐user@lucene.apache.org	
  	
  

  LucidFind:	
  hKp://www.lucidimagina-on.com/search/	
  	
  



©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Pride!
  "I	
  will	
  not	
  yield!"	
  
     Trying	
  to	
  "win	
  baKles"	
  on	
  the	
  mailing	
  lists	
  
     Good	
  Karma	
  –	
  be	
  a	
  good	
  ci-zen	
  in	
  the	
  community	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Pride: The Path To Salvation!
  Ask	
  for	
  help	
  when	
  needed	
  
  Let	
  the	
  business	
  needs	
  define	
  the	
  project	
  –	
  don't	
  
   let	
  the	
  tail	
  wag	
  the	
  dog	
  
  Get	
  a	
  feel	
  for	
  the	
  Solr	
  community	
  and	
  respect	
  the	
  
   experience	
  of	
  others	
  
  You're	
  situa-on,	
  while	
  possibly	
  unique,	
  is	
  probably	
  
   not	
  completely	
  dissimilar	
  to	
  others.	
  Learn	
  from	
  the	
  	
  
   pioneers	
  and	
  Solr	
  veterans	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
"Someone	
  stop	
  me!"	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Lust!
  Obsessing	
  over	
  unimportant	
  details	
  too	
  early	
  
   in	
  the	
  project	
  
     Agile	
  approach	
  is	
  well	
  suited	
  to	
  Solr	
  
          development	
  –	
  iterate!	
  
  Trying	
  to	
  "push	
  the	
  envelope"	
  
     Necessary	
  some-mes,	
  but	
  it's	
  not	
  called	
  
          the	
  "bleeding	
  edge"	
  without	
  reason	
  
     "Ease	
  in"	
  to	
  major	
  changes	
  
  Too	
  much	
  aKen-on	
  to	
  JVM	
  sebngs	
  
            Solr	
  experts	
  are	
  not	
  usually	
  JVM/GC	
  experts	
  



©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Lust!
  "An--­‐greed"	
  –	
  CommiEng	
  too	
  many	
  resources	
  	
  
   to	
  Solr	
  
     Make	
  sure	
  the	
  OS	
  has	
  plenty	
  of	
  RAM	
  
           to	
  cache	
  files,	
  etc	
  
  "If	
  one	
  is	
  good,	
  a	
  dozen	
  must	
  be	
  beKer!"	
  
     As	
  much	
  as	
  possible,	
  try	
  to	
  get	
  a	
  sense	
  of	
  what	
  
           your	
  query	
  volume	
  will	
  be,	
  and	
  don't	
  just	
  throw	
  
           money	
  at	
  building	
  a	
  monstrous	
  farm	
  of	
  searchers	
  
     Solr	
  has	
  proven	
  to	
  be	
  much	
  more	
  efficient	
  than	
  some	
  	
  
           large,	
  commercial	
  search	
  solu-ons	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Lust!
  Blood	
  from	
  a	
  turnip:	
  
     Trying	
  some	
  absurd	
  new	
  technique,	
  	
  
      "just	
  because"	
  

  RAMDirectoryFactory	
  –	
  not	
  a	
  secret	
  way	
  to	
  faster	
  
   indexing/searching	
  
     No	
  disk-­‐backed	
  persistence	
  
     Usually	
  not	
  worth	
  it	
  
     …but	
  you	
  never	
  know…	
  

  Research	
  first	
  before	
  going	
  "extreme"	
  

©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Lust!
  No	
  need	
  to	
  index	
  millions	
  of	
  docs	
  for	
  development	
  
  BeKer	
  to	
  work	
  with	
  small	
  sets	
  of	
  data	
  while	
  
   gebng	
  started.	
  
  Don't	
  worry	
  too	
  much	
  about	
  field	
  types	
  as	
  you	
  get	
  
   started.	
  Get	
  data	
  in	
  the	
  index,	
  then	
  analyze	
  and	
  
   refine.	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Lust: The Path To Salvation!
  Use	
  an	
  agile	
  approach	
  –	
  start	
  simply,	
  build	
  your	
  
   applica-on	
  slowly,	
  iterate	
  
  Deal	
  with	
  the	
  low-­‐hanging	
  fruit	
  first	
  
  Measure	
  twice,	
  cut	
  once	
  
  Don't	
  miss	
  the	
  forest	
  for	
  the	
  trees	
  –	
  no	
  need	
  to	
  
   obsess	
  over	
  details	
  in	
  the	
  early	
  stages	
  
  Do	
  some	
  due	
  diligence	
  before	
  trying	
  unorthodox	
  
   approaches	
  
  Get	
  a	
  small	
  sample	
  of	
  data	
  indexed	
  w/o	
  worrying	
  about	
  type,	
  
   then	
  itera-ons	
  of	
  refinement	
  



©	
  Lucid	
  Imagina-on,	
  Inc.	
  
"If	
  we	
  had	
  some	
  bacon	
  	
  
                                                 we	
  could	
  have	
  some	
  
                                        	
  bacon	
  and	
  eggs	
  –	
  if	
  we	
  	
  
                                                       had	
  some	
  eggs."	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Envy!
  Adding	
  "cool"	
  features	
  you	
  see	
  on	
  other	
  
   sites,	
  but	
  don't	
  really	
  need	
  
     Keep	
  it	
  "lean	
  and	
  mean",	
  especially	
  
       to	
  start	
  
     Resist	
  the	
  urge	
  to	
  include	
  the	
  	
  
       "kitchen	
  sink"	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Envy!
  You	
  too	
  can	
  master	
  dismax!	
  
     Don't	
  be	
  afraid	
  of	
  dismax/edismax	
  
     Lots	
  of	
  controls	
  to	
  learn,	
  but	
  also	
  
      lots	
  of	
  power	
  
     Flexibility	
  to	
  search	
  mul-ple	
  fields	
  
     Boost	
  different	
  fields	
  
     Boost	
  phrase	
  fields	
  (pf)	
  higher	
  than	
  query	
  fields	
  (qf)	
  
     Use	
  boost	
  queries	
  (bq)	
  and	
  func-on	
  queries	
  (bf)	
  
     Most	
  in-mida-ng	
  params:	
  
            -e	
  
            mm	
  

©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Envy!
  Spa-al	
  search	
  –	
  seems	
  complicated,	
  but	
  
   major	
  sites	
  make	
  it	
  look	
  easy	
  
  Now,	
  in	
  Solr	
  3.1	
  –	
  it	
  is	
  easy!	
  
  You	
  can:	
  
     Store	
  spa-al	
  data	
  in	
  your	
  index	
  
     Filter	
  by	
  distance	
  
     Sort	
  by	
  distance	
  
     Boost/bias	
  by	
  distance	
  
     Facet	
  by	
  distance	
  
  Also	
  consider:	
  Search-­‐based	
  naviga-on	
  such	
  as	
  
   "Show	
  me	
  in-­‐stock	
  items	
  only"	
  

©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Envy: The Path To Salvation!
  Focus	
  on	
  your	
  requirements,	
  don't	
  try	
  
   to	
  add	
  "bells	
  and	
  whistles"	
  you	
  don't	
  
   need	
  
  Don't	
  be	
  hesitant	
  to	
  dive	
  into	
  the	
  power	
  
   of	
  dismax/edismax	
  
  Take	
  advantage	
  of	
  new	
  features	
  such	
  as	
  
   Solr	
  spa-al,	
  if	
  those	
  features	
  will	
  add	
  
   value	
  to	
  the	
  end	
  user	
  experience	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
"A	
  fat	
  stomach	
  never	
  	
  
                                        breeds	
  fine	
  thoughts."	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Gluttony!
  “Staying	
  fit	
  and	
  trim”	
  is	
  usually	
  good	
  prac-ce	
  	
  
   when	
  designing	
  and	
  running	
  Solr	
  applica-ons	
  
     Once	
  again	
  –	
  keep	
  it	
  "lean	
  and	
  mean"	
  	
  
  A	
  lot	
  of	
  these	
  issues	
  cross	
  over	
  into	
  the	
  “Sloth”	
  	
  
   category	
  
     The	
  effort	
  needed	
  to	
  keep	
  your	
  configura-on	
  	
  
          and	
  data	
  efficiently	
  managed	
  is	
  not	
  considered	
  	
  
          important	
  
  Don't	
  lose	
  control	
  of	
  your	
  configura-on	
  files	
  
     Remove	
  unnecessary	
  elements	
  
     Version	
  control	
  all	
  configura-on	
  files	
  


©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Gluttony!
  Slim	
  down	
  those	
  "bloated"	
  queries:	
  
           q="red	
  shoes"&	
  accountId=(12343	
  OR	
  338899	
  
            OR	
  554443	
  OR	
  243445	
  OR	
  55442OR	
  3330899	
  	
  
            OR	
  59927	
  OR	
  3888999	
  OR	
  549	
  OR	
  440293579	
  
            34201	
  OR	
  339917	
  OR	
  300191	
  OR	
  339338	
  OR	
  	
  
            109823	
  OR	
  679176	
  OR	
  31407815	
  OR	
  3001756	
  	
  
            OR	
  134322	
  OR	
  311123	
  OR	
  987888	
  OR	
  997181	
  OR	
  771819	
  OR	
  
            100292	
  OR	
  3389474	
  OR	
  5505759	
  OR	
  2459577	
  OR	
  4499957	
  OR	
  
            1996571	
  OR	
  559590	
  OR	
  220299	
  OR	
  4404872	
  OR	
  151510	
  OR	
  
            66017	
  OR	
  666	
  OR	
  113459	
  OR	
  890575	
  OR	
  505725	
  OR	
  330393	
  OR	
  
            349940	
  OR	
  4094994	
  OR	
  1245995	
  OR	
  2459959	
  OR	
  4255909	
  OR	
  
                 899955	
  OR	
  7878899	
  OR	
  100999	
  …	
  ∞	
  )	
  

©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Gluttony!
  Stay	
  in	
  shape	
  –	
  Flex	
  Your	
  Solr	
  Muscles!	
  
     Keep	
  up	
  on	
  new	
  features	
  
     Training,	
  when	
  appropriate	
  
     Cer-fica-on	
  
     Contribute!	
  
     Follow	
  the	
  user	
  lists	
  
     Refactor	
  when	
  new	
  features	
  can	
  help	
  
     Keep	
  up	
  to	
  date	
  on	
  new	
  releases	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Gluttony: The Path To Salvation!
  Keep	
  configura-on	
  files	
  clean	
  and	
  trim.	
  Remove	
  
   unused	
  elements	
  
  Periodically	
  review	
  queries	
  to	
  make	
  sure	
  they	
  
   are	
  efficient	
  
  Refactor	
  when	
  necessary	
  –	
  keep	
  your	
  
   applica-on	
  fit	
  and	
  trim	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
"Hope	
  is	
  the	
  denial	
  of	
  reality."	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Wrath!
  Wrath	
  -­‐	
  usually	
  synonymous	
  with	
  anger,	
  but…	
  
  Let’s	
  use	
  an	
  older	
  defini-on	
  here:	
  	
  
     “A	
  vehement	
  denial	
  of	
  the	
  truth,	
  	
  
       both	
  to	
  others	
  and	
  in	
  the	
  form	
  of	
  	
  
       self-­‐denial	
  and	
  impaMence.”	
  
  Step	
  back	
  every	
  now	
  and	
  then	
  and	
  look	
  
   objec-vely	
  at	
  your	
  applica-on	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Wrath!
  Resist	
  the	
  push	
  to	
  rush	
  to	
  produc-on…	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Wrath!
  Ignoring	
  new	
  Solr	
  releases	
  
     OK	
  to	
  wait	
  un-l	
  a	
  release	
  is	
  proven	
  
     But	
  gebng	
  too	
  far	
  behind	
  makes	
  upgrading	
  
      more	
  painful	
  with	
  each	
  release	
  

  We	
  don't	
  have	
  -me	
  to	
  do	
  it	
  right,	
  but	
  we	
  always	
  	
  
   have	
  -me	
  to	
  fix	
  it	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Wrath!
  Ignoring	
  complaints	
  about	
  results	
  relevance	
  
  Disregarding	
  feedback	
  from	
  stakeholders	
  
  Remember	
  –	
  the	
  point	
  of	
  your	
  search	
  applica-on	
  
   is	
  to	
  support	
  the	
  business,	
  not	
  to	
  "build	
  cool	
  stuff"	
  
  Not	
  taking	
  advantage	
  of	
  log	
  files	
  
     Consider	
  mining	
  log	
  files,	
  storing	
  data	
  in	
  
           rela-onal	
  DB	
  for	
  genera-ng	
  reports	
  
     Capturing	
  user	
  queries	
  and	
  query	
  counts	
  can	
  be	
  
           extremely	
  useful	
  
                Can	
  also	
  be	
  used	
  for	
  query-­‐based	
  autosuggest.	
  
                 (not	
  just	
  indexed	
  terms)	
  


©	
  Lucid	
  Imagina-on,	
  Inc.	
  
Wrath: The Path To Salvation!
  Keep	
  your	
  version	
  of	
  Solr	
  up	
  to	
  date	
  
     OK	
  to	
  wait	
  "awhile",	
  but	
  don't	
  skip	
  versions	
  
  Seek	
  and	
  embrace	
  feedback	
  from	
  business	
  and	
  	
  
   domain	
  experts	
  
  Constantly	
  gauge	
  and	
  improve	
  relevance	
  as	
  an	
  	
  
   ongoing	
  task	
  
  Avoid	
  the	
  push	
  to	
  release	
  too	
  soon	
  (as	
  best	
  you	
  can)	
  
  Take	
  advantage	
  of	
  log	
  files	
  to	
  understand	
  what	
  	
  
   users	
  are	
  doing,	
  and	
  what	
  is	
  not	
  working	
  well	
  




©	
  Lucid	
  Imagina-on,	
  Inc.	
  
¡Búsqueda,	
  y	
  usted	
  encontrará!	
  

Contenu connexe

Similaire à The Seven Deadly Sins of Solr - By Jay Hill

Solr: Search at the Speed of Light
Solr: Search at the Speed of LightSolr: Search at the Speed of Light
Solr: Search at the Speed of LightErik Hatcher
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
Cognitum Ontorion: Knowledge Representation and Reasoning System
Cognitum Ontorion: Knowledge Representation and Reasoning SystemCognitum Ontorion: Knowledge Representation and Reasoning System
Cognitum Ontorion: Knowledge Representation and Reasoning SystemCognitum
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startupDzung Nguyen
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014gmalouf678
 
7 New Tools Java Developers Should Know
7 New Tools Java Developers Should Know7 New Tools Java Developers Should Know
7 New Tools Java Developers Should KnowTakipi
 
[CB19] Spyware, Ransomware and Worms. How to prevent the next SAP tragedy by ...
[CB19] Spyware, Ransomware and Worms. How to prevent the next SAP tragedy by ...[CB19] Spyware, Ransomware and Worms. How to prevent the next SAP tragedy by ...
[CB19] Spyware, Ransomware and Worms. How to prevent the next SAP tragedy by ...CODE BLUE
 
Node.js Deeper Dive
Node.js Deeper DiveNode.js Deeper Dive
Node.js Deeper DiveJustin Reock
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionAnant Corporation
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMilen Dyankov
 
Planning JavaScript and Ajax for larger teams
Planning JavaScript and Ajax for larger teamsPlanning JavaScript and Ajax for larger teams
Planning JavaScript and Ajax for larger teamsChristian Heilmann
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 
API Description Languages
API Description LanguagesAPI Description Languages
API Description LanguagesAkana
 
API Description Languages
API Description LanguagesAPI Description Languages
API Description LanguagesAkana
 
Becoming an IBM Connections Developer
Becoming an IBM Connections DeveloperBecoming an IBM Connections Developer
Becoming an IBM Connections DeveloperRob Novak
 
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)Clancy Childs
 
Benchmarking Web Application Scanners for YOUR Organization
Benchmarking Web Application Scanners for YOUR OrganizationBenchmarking Web Application Scanners for YOUR Organization
Benchmarking Web Application Scanners for YOUR OrganizationDenim Group
 

Similaire à The Seven Deadly Sins of Solr - By Jay Hill (20)

Solr: Search at the Speed of Light
Solr: Search at the Speed of LightSolr: Search at the Speed of Light
Solr: Search at the Speed of Light
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Cognitum Ontorion: Knowledge Representation and Reasoning System
Cognitum Ontorion: Knowledge Representation and Reasoning SystemCognitum Ontorion: Knowledge Representation and Reasoning System
Cognitum Ontorion: Knowledge Representation and Reasoning System
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startup
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
 
7 New Tools Java Developers Should Know
7 New Tools Java Developers Should Know7 New Tools Java Developers Should Know
7 New Tools Java Developers Should Know
 
[CB19] Spyware, Ransomware and Worms. How to prevent the next SAP tragedy by ...
[CB19] Spyware, Ransomware and Worms. How to prevent the next SAP tragedy by ...[CB19] Spyware, Ransomware and Worms. How to prevent the next SAP tragedy by ...
[CB19] Spyware, Ransomware and Worms. How to prevent the next SAP tragedy by ...
 
Node.js Deeper Dive
Node.js Deeper DiveNode.js Deeper Dive
Node.js Deeper Dive
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
 
Planning JavaScript and Ajax for larger teams
Planning JavaScript and Ajax for larger teamsPlanning JavaScript and Ajax for larger teams
Planning JavaScript and Ajax for larger teams
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
API Description Languages
API Description LanguagesAPI Description Languages
API Description Languages
 
API Description Languages
API Description LanguagesAPI Description Languages
API Description Languages
 
Becoming an IBM Connections Developer
Becoming an IBM Connections DeveloperBecoming an IBM Connections Developer
Becoming an IBM Connections Developer
 
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
 
Benchmarking Web Application Scanners for YOUR Organization
Benchmarking Web Application Scanners for YOUR OrganizationBenchmarking Web Application Scanners for YOUR Organization
Benchmarking Web Application Scanners for YOUR Organization
 

Plus de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Dernier

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Dernier (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

The Seven Deadly Sins of Solr - By Jay Hill

  • 1.
  • 2. Introductions…!   Who  the  hell  am  I?    Jay  Hill,  Lucid  Imagina-on    7  years  Lucene  experience    4  years  Solr  experience    Author  of  Lucid  Training    SME  for  Lucid  Cer-fica-on     Who  the  hell  are  you?    New  to  search?    New  to  Lucene/Solr?    BaKle-­‐tested  veterans?   ©  Lucid  Imagina-on,  Inc.  
  • 3. We'll Leave Time For Q&A!   Who's  doing  what?    Solr  3.1?    Solr  1.4.1?    Nightly  build?    Solr  1.3  or  older?     Are  there  any  specific  problems  you're  having?     Meanwhile,  interrupt,  ask  ques8ons  as  we  go,  etc.     ©  Lucid  Imagina-on,  Inc.  
  • 4. A Brief Word About Lucid Imagination!   Lucid  Imagina8on:    The  commercial  company  suppor-ng     Lucene/Solr  open  source  search.    Founded  by      Yonik  Seeley  –  Creator  of  Solr    Erik  Hatcher  –  Co-­‐author,  Lucene  In  Ac-on    Grant  Ingersoll  –  Apache  PMC  Chair    Marc  Krellenstein  –  Lucid  CTO    Staff  includes  9  Lucene/Solr  commiKers    Training,  cer-fica-on,  support,  LucidWorks  Enterprise   ©  Lucid  Imagina-on,  Inc.  
  • 5. Lucid Customers (That I've Worked With)! ©  Lucid  Imagina-on,  Inc.  
  • 6. …On To The Sinning!! ©  Lucid  Imagina-on,  Inc.  
  • 7. Sins As Anti-Patterns?!   "Sorta  kinda"    Specify  Nothing  (Sloth)    Creeping  Featurei-s  (Greed)    Blowhard  Jamboree  (Pride)    Boat  Anchor  (Lust)    Not  Invented  Here  (Envy)    Phatware  (GluKony)    Emperor's  New  Clothes  (Wrath)   ©  Lucid  Imagina-on,  Inc.  
  • 8. Sins Can Contradict One Another!!   You'll  no-ce  that  many  of  the  "sins"     we  see  will  be  the  exact  opposite  of     others     Just  as  some  of  us  tend  towards     laziness,  others  towards  excess     Some-mes  you  -­‐    "Look  before  you  leap."     Other  -mes,      "He  who  hesitates  is  lost."     In  Solr  (or  any  search  app),  one  size  never  fits  all   ©  Lucid  Imagina-on,  Inc.  
  • 9. "I  don't  know   and  I  don't  care."   ©  Lucid  Imagina-on,  Inc.  
  • 10. Sloth!   "We  aren't  really  into  open  source."    Lack  of  commitment  to  Solr  and/or  the  search   applica-on  itself     Not  developing  in-­‐house  Solr  exper-se     Not  paying  enough  aKen-on  to  JVM  sebngs,     garbage  collec-on,  and  RAM  alloca-on.   ©  Lucid  Imagina-on,  Inc.  
  • 11. Sloth!   Neglec-ng  to  get  familiar  with  the  source  code    It  is  open  source  ader  all!     Not  taking  the  -me  to  understand  the  main   parts  of  Solr:    Request  Handlers    Search  components    Query  parsers    Extend  QParserPlugin  class    ValueSource  &  ValueSourceParser  –  custom  func-ons    New  pseudo-­‐fields  in  4.x    Response  writers   ©  Lucid  Imagina-on,  Inc.  
  • 12. Sloth!   Not  keeping  up  with  new  features  and   developments  in  Lucene  and  Solr   CHANGES.txt  –  use  "diff"  to  keep  up  on  changes   ©  Lucid  Imagina-on,  Inc.  
  • 13. Sloth!   New  features  in  Solr  3.1:    Solr  spa8al    Edismax  query  parser    NOT  experimental!    Dynamic  metadata  extrac-on  via  UIMA    Numeric  range  face8ng  (like  date  face-ng)    Lucene  RAMDirectoryFactory  available    Face-ng  performance  improvements    Spellcheck  and  Terms  components  now   work  for  distributed  search    Suggester  component  –  beKer  autosuggest!    Can  add  custom  dict.,  phrases,  etc.   ©  Lucid  Imagina-on,  Inc.  
  • 14. Sloth!   New  features  coming  in  Solr  4.x:    Lucene  DocumentWritersPerThread  (DWPT)    Moving  towards  "real  -me"    UpdateHandler  upgrade  to  work  with  real-­‐-me      Field  collapsing/grouping    Pivot  facets    SolrCloud  (Zookeeper)    Fuzzy  queries  100  -mes  faster    Pseudo  fields  via  func-ons    Relevancy  func-on  queries:  n,  idf,  docFreq,  norm,  …   ©  Lucid  Imagina-on,  Inc.  
  • 15. Sloth: The Path To Salvation!   Commit  to  the  project  and  to  learning  Solr     Stay  up  to  date  on  Solr  changes     Stay  current  with  ongoing  releases     Get  familiar  with  the  source  code     Spend  some  -me  to  understand  the  main   configura-on  files:    solrconfig.xml    schema.xml     Read  through  the  en-re  Solr  Wiki  once  every  so  oden     Develop  in-­‐house  Solr  exper-se   ©  Lucid  Imagina-on,  Inc.  
  • 16. Save  a  penny,   lose  a  customer.   ©  Lucid  Imagina-on,  Inc.  
  • 17. Greed!   Skimping  on  resources  such  as:    RAM      "Here's  a  quarter  buddy,  go  buy  some  RAM!"    Storage  space     You  will  get  what  you  pay  for!    …on  the  other  hand,  not  every  company  has  "deep  pockets"   ©  Lucid  Imagina-on,  Inc.  
  • 18. Greed!   Trying  to  "squeeze  by",  indexing  to,  and  searching   on,  the  same  server   Indexing   Indexing   Shards  (Indexers)   Slave/Searchers   Load  Balancer   Searches   Searches   ©  Lucid  Imagina-on,  Inc.  
  • 19. Greed!   Not  making  the  effort  to  find  the  right  balance   between  precision  and  recall   Recall:  What  frac-on  of   Precision:  What  frac-on   the  relevant  documents  in   of  the  returned  results   the  collec-on  were  re-­‐   are  relevant  to  the   turned  by  the  system?     informa-on  need?   ©  Lucid  Imagina-on,  Inc.  
  • 20. Greed!   A  few  thoughts  about  relevance:    Get  feedback  from  domain  experts    Is  it  beKer  to  have  lots  of  results  with  less     precision,  or  fewer,  more  targeted  results?    Different  sites  will  have  very  different     requirements   ©  Lucid  Imagina-on,  Inc.  
  • 21. Greed: The Path To Salvation!   Pry  open  your  wallet  –  don't  be  cheap     You  don't  have  to  push  the  envelope     Find  the  right  balance  between  recall  and  precision     Don't  push  for  more  results  over  precision  –  unless   that  is  a  clear  requirement  (some-mes  it  is)   ©  Lucid  Imagina-on,  Inc.  
  • 22. "What  could  possibly   go  wrong?   ©  Lucid  Imagina-on,  Inc.  
  • 23. Pride!   Reinven-ng  the  wheel    "Why  don't  we  just  write  our  own  search   libraries?"    Nobody  has  a  use  case  like  us  –  right?    "We  need  to  change  the  scoring  algorithms."   ©  Lucid  Imagina-on,  Inc.  
  • 24. Pride!   Thinking  you  can  "do  it  all"  in  Solr    Solr  is  rarely  a  good  choice  as  a  SOR     Consider  other  tools  to  work  with  Solr:    Nutch    Mahout    OpenNLP    Google  Connector  Framework    Your  own  code   ©  Lucid  Imagina-on,  Inc.  
  • 25. Pride!   Stubbornly  refusing  to  use  resources  such  as  the     mailing  lists:    Solr  user  list:    solr-­‐user@lucene.apache.org    Solr  developer  list:    dev@lucene.apache.org    Lucene  user  list:    java-­‐user@lucene.apache.org       LucidFind:  hKp://www.lucidimagina-on.com/search/     ©  Lucid  Imagina-on,  Inc.  
  • 26. Pride!   "I  will  not  yield!"    Trying  to  "win  baKles"  on  the  mailing  lists    Good  Karma  –  be  a  good  ci-zen  in  the  community   ©  Lucid  Imagina-on,  Inc.  
  • 27. Pride: The Path To Salvation!   Ask  for  help  when  needed     Let  the  business  needs  define  the  project  –  don't   let  the  tail  wag  the  dog     Get  a  feel  for  the  Solr  community  and  respect  the   experience  of  others     You're  situa-on,  while  possibly  unique,  is  probably   not  completely  dissimilar  to  others.  Learn  from  the     pioneers  and  Solr  veterans   ©  Lucid  Imagina-on,  Inc.  
  • 28. "Someone  stop  me!"   ©  Lucid  Imagina-on,  Inc.  
  • 29. Lust!   Obsessing  over  unimportant  details  too  early   in  the  project    Agile  approach  is  well  suited  to  Solr   development  –  iterate!     Trying  to  "push  the  envelope"    Necessary  some-mes,  but  it's  not  called   the  "bleeding  edge"  without  reason    "Ease  in"  to  major  changes     Too  much  aKen-on  to  JVM  sebngs    Solr  experts  are  not  usually  JVM/GC  experts   ©  Lucid  Imagina-on,  Inc.  
  • 30. Lust!   "An--­‐greed"  –  CommiEng  too  many  resources     to  Solr    Make  sure  the  OS  has  plenty  of  RAM   to  cache  files,  etc     "If  one  is  good,  a  dozen  must  be  beKer!"    As  much  as  possible,  try  to  get  a  sense  of  what   your  query  volume  will  be,  and  don't  just  throw   money  at  building  a  monstrous  farm  of  searchers    Solr  has  proven  to  be  much  more  efficient  than  some     large,  commercial  search  solu-ons   ©  Lucid  Imagina-on,  Inc.  
  • 31. Lust!   Blood  from  a  turnip:    Trying  some  absurd  new  technique,     "just  because"     RAMDirectoryFactory  –  not  a  secret  way  to  faster   indexing/searching    No  disk-­‐backed  persistence    Usually  not  worth  it    …but  you  never  know…     Research  first  before  going  "extreme"   ©  Lucid  Imagina-on,  Inc.  
  • 32. Lust!   No  need  to  index  millions  of  docs  for  development     BeKer  to  work  with  small  sets  of  data  while   gebng  started.     Don't  worry  too  much  about  field  types  as  you  get   started.  Get  data  in  the  index,  then  analyze  and   refine.   ©  Lucid  Imagina-on,  Inc.  
  • 33. Lust: The Path To Salvation!   Use  an  agile  approach  –  start  simply,  build  your   applica-on  slowly,  iterate     Deal  with  the  low-­‐hanging  fruit  first     Measure  twice,  cut  once     Don't  miss  the  forest  for  the  trees  –  no  need  to   obsess  over  details  in  the  early  stages     Do  some  due  diligence  before  trying  unorthodox   approaches     Get  a  small  sample  of  data  indexed  w/o  worrying  about  type,   then  itera-ons  of  refinement   ©  Lucid  Imagina-on,  Inc.  
  • 34. "If  we  had  some  bacon     we  could  have  some    bacon  and  eggs  –  if  we     had  some  eggs."   ©  Lucid  Imagina-on,  Inc.  
  • 35. Envy!   Adding  "cool"  features  you  see  on  other   sites,  but  don't  really  need    Keep  it  "lean  and  mean",  especially   to  start    Resist  the  urge  to  include  the     "kitchen  sink"   ©  Lucid  Imagina-on,  Inc.  
  • 36. Envy!   You  too  can  master  dismax!    Don't  be  afraid  of  dismax/edismax    Lots  of  controls  to  learn,  but  also   lots  of  power    Flexibility  to  search  mul-ple  fields    Boost  different  fields    Boost  phrase  fields  (pf)  higher  than  query  fields  (qf)    Use  boost  queries  (bq)  and  func-on  queries  (bf)    Most  in-mida-ng  params:    -e    mm   ©  Lucid  Imagina-on,  Inc.  
  • 37. Envy!   Spa-al  search  –  seems  complicated,  but   major  sites  make  it  look  easy     Now,  in  Solr  3.1  –  it  is  easy!     You  can:    Store  spa-al  data  in  your  index    Filter  by  distance    Sort  by  distance    Boost/bias  by  distance    Facet  by  distance     Also  consider:  Search-­‐based  naviga-on  such  as   "Show  me  in-­‐stock  items  only"   ©  Lucid  Imagina-on,  Inc.  
  • 38. Envy: The Path To Salvation!   Focus  on  your  requirements,  don't  try   to  add  "bells  and  whistles"  you  don't   need     Don't  be  hesitant  to  dive  into  the  power   of  dismax/edismax     Take  advantage  of  new  features  such  as   Solr  spa-al,  if  those  features  will  add   value  to  the  end  user  experience   ©  Lucid  Imagina-on,  Inc.  
  • 39. "A  fat  stomach  never     breeds  fine  thoughts."   ©  Lucid  Imagina-on,  Inc.  
  • 40. Gluttony!   “Staying  fit  and  trim”  is  usually  good  prac-ce     when  designing  and  running  Solr  applica-ons    Once  again  –  keep  it  "lean  and  mean"       A  lot  of  these  issues  cross  over  into  the  “Sloth”     category    The  effort  needed  to  keep  your  configura-on     and  data  efficiently  managed  is  not  considered     important     Don't  lose  control  of  your  configura-on  files    Remove  unnecessary  elements    Version  control  all  configura-on  files   ©  Lucid  Imagina-on,  Inc.  
  • 41. Gluttony!   Slim  down  those  "bloated"  queries:    q="red  shoes"&  accountId=(12343  OR  338899   OR  554443  OR  243445  OR  55442OR  3330899     OR  59927  OR  3888999  OR  549  OR  440293579   34201  OR  339917  OR  300191  OR  339338  OR     109823  OR  679176  OR  31407815  OR  3001756     OR  134322  OR  311123  OR  987888  OR  997181  OR  771819  OR   100292  OR  3389474  OR  5505759  OR  2459577  OR  4499957  OR   1996571  OR  559590  OR  220299  OR  4404872  OR  151510  OR   66017  OR  666  OR  113459  OR  890575  OR  505725  OR  330393  OR   349940  OR  4094994  OR  1245995  OR  2459959  OR  4255909  OR   899955  OR  7878899  OR  100999  …  ∞  )   ©  Lucid  Imagina-on,  Inc.  
  • 42. Gluttony!   Stay  in  shape  –  Flex  Your  Solr  Muscles!    Keep  up  on  new  features    Training,  when  appropriate    Cer-fica-on    Contribute!    Follow  the  user  lists    Refactor  when  new  features  can  help    Keep  up  to  date  on  new  releases   ©  Lucid  Imagina-on,  Inc.  
  • 43. Gluttony: The Path To Salvation!   Keep  configura-on  files  clean  and  trim.  Remove   unused  elements     Periodically  review  queries  to  make  sure  they   are  efficient     Refactor  when  necessary  –  keep  your   applica-on  fit  and  trim   ©  Lucid  Imagina-on,  Inc.  
  • 44. "Hope  is  the  denial  of  reality."   ©  Lucid  Imagina-on,  Inc.  
  • 45. Wrath!   Wrath  -­‐  usually  synonymous  with  anger,  but…     Let’s  use  an  older  defini-on  here:      “A  vehement  denial  of  the  truth,     both  to  others  and  in  the  form  of     self-­‐denial  and  impaMence.”     Step  back  every  now  and  then  and  look   objec-vely  at  your  applica-on   ©  Lucid  Imagina-on,  Inc.  
  • 46. Wrath!   Resist  the  push  to  rush  to  produc-on…   ©  Lucid  Imagina-on,  Inc.  
  • 47. Wrath!   Ignoring  new  Solr  releases    OK  to  wait  un-l  a  release  is  proven    But  gebng  too  far  behind  makes  upgrading   more  painful  with  each  release     We  don't  have  -me  to  do  it  right,  but  we  always     have  -me  to  fix  it   ©  Lucid  Imagina-on,  Inc.  
  • 48. Wrath!   Ignoring  complaints  about  results  relevance     Disregarding  feedback  from  stakeholders     Remember  –  the  point  of  your  search  applica-on   is  to  support  the  business,  not  to  "build  cool  stuff"     Not  taking  advantage  of  log  files    Consider  mining  log  files,  storing  data  in   rela-onal  DB  for  genera-ng  reports    Capturing  user  queries  and  query  counts  can  be   extremely  useful    Can  also  be  used  for  query-­‐based  autosuggest.   (not  just  indexed  terms)   ©  Lucid  Imagina-on,  Inc.  
  • 49. Wrath: The Path To Salvation!   Keep  your  version  of  Solr  up  to  date    OK  to  wait  "awhile",  but  don't  skip  versions     Seek  and  embrace  feedback  from  business  and     domain  experts     Constantly  gauge  and  improve  relevance  as  an     ongoing  task     Avoid  the  push  to  release  too  soon  (as  best  you  can)     Take  advantage  of  log  files  to  understand  what     users  are  doing,  and  what  is  not  working  well   ©  Lucid  Imagina-on,  Inc.  
  • 50. ¡Búsqueda,  y  usted  encontrará!