SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
How	
  Solr	
  powers	
  search	
  
 on	
  America’s	
  largest	
  
       flash	
  sale	
  site.

                         or

            “Personalized	
  search	
  of	
  a
	
  fast-­‐moving	
  time-­‐sensitive	
  data-­‐set”	
  

    Ade	
  Trenaman	
  -­‐	
  Team	
  Galactus
        atrenaman@gilt.com
          @adrian_trenaman
          http://tech.gilt.com
    http://slideshare.net/trenaman




                         1
adrian.trenaman@gmail.com|@adrian_trenaman	
  




                   Committer on Apache Karaf.
           Consultant/Contributor ActiveMQ, Camel, CXF.

                   Now: Tech Lead, Gilt Groupe

                                  2
Gilt:	
  Exclusive	
  Gorgeous	
  Stuff	
  on	
  Sale	
  at	
  Noon!




                                 3
Stampede!




    4
...	
  and	
  for	
  guys	
  ....




               5
...	
  and	
  for	
  baby	
  &	
  kids	
  ...




                     6
...	
  and	
  for	
  your	
  beautiful	
  home	
  ...




                         7
...	
  and	
  food	
  &	
  fine	
  wine	
  ...




                     8
...	
  cool	
  
     things	
  
      to	
  do	
  
    in	
  your	
  
     city	
  ...



9
...	
  or	
  somewhere	
  else	
  in	
  the	
  world.




                         10
The	
  Gilt	
  noon	
  ‘attack	
  of	
  self	
  denial’
Email	
  &	
  Mobile	
  notification	
  of	
  today’s	
  sales	
  goes	
  to	
  members	
  
at	
  11:45	
  EST...
   ...	
  Sales	
  go	
  live	
  at	
  Noon	
  EST.
Stampede.




                                               11
http://gilt.com/apacheconeu2012




     $25
       ... our treat to you.



                12
PDX,	
  NYC,	
  DUB	
  -­‐	
  tech.gilt.com




                    13
The problem



     14
Gilt	
  Data	
  Model	
  101
A Haiku:

Simple relations,                        Product         Silk Charmeuse
                                                         Wrap Dress
Solr makes hard; weeping, I
denormalize now.                             1
Ade, Sinsheim, 2012                      *
                                                         ... in white
                                          Look
                                             1
                                         *
                               *   *                     ... in size 8
                      Sale                SKU        8

                                    15
QED Architecture Pattern: Query, Enrich, Deliver
                                                                   search response: (lookId, skuId*)* as JSON


      Browser                       <<play>>                                            Solr

                               web-search                                           Solr




                                                      Search API
                               web-search                                            Solr




                                             enrich
     client-side               controllers                                          plugins
                                                                                     plugins &
      rendering                                                                     components


                   HTML/JSON


 note: search API for                Product                                Inventory          Targeting
 isolation, plugins to
 perform personalization             Service                                 Service            Service
 & inventory sorting,
 hydration of results




           Motivating principle: “Do One Thing Well”
                                             16
Search	
  MVP

Provide	
  search	
  listings	
  of	
  ‘product	
  looks’	
  
  Move	
  sold-­‐out	
  inventory	
  to	
  bottom	
  in	
  realtime
  Facet	
  by	
  Size	
  /	
  Color	
  /	
  Brand	
  /	
  Category
  Search	
  by	
  store	
  (Home,	
  Women’s,	
  ...)
  Provide	
  targeted	
  search	
  results
  Respect	
  sale	
  start	
  /	
  end	
  time	
  in	
  listings
  Auto-­‐suggest
  Auto-­‐complete

                                             17
Search Space

      18
Search	
  space:	
  what	
  to	
  index?

      Taxonomy!                 Brand, Product Name!
                                Price!



                                Color!

                                Inventory Status!




                                Description!




SKU Attributes /
Material / Origin!
                     19
Data,	
  data,	
  data	
  gonna	
  hate	
  ya,	
  hate	
  ya,	
  hate	
  ya
   Not	
  all	
  product	
  data	
  is	
  as	
  clean	
  as	
  you’d	
  like.	
  
      Description	
  contains	
  distracting	
  tokens	
  (‘also	
  available	
  in	
  
      blue,	
  black	
  (patent),	
  green	
  and	
  leather’)
      Colors	
  are	
  often	
  poorly	
  named:	
  ‘blk’,	
  ‘black’,	
  ‘priest’,	
  
      ‘black	
  /	
  white’,	
  ‘nite’,	
  ‘night’,	
  ‘100’,	
  ‘multi’,	
  ‘patent’.	
  
      Brand	
  names	
  can	
  mislead:	
  ‘Thomas	
  Pink	
  Shirts’,
      ‘White	
  +	
  Warren	
  Black	
  Dress’	
  

   Trust	
  nothing.	
  
   Try	
  everything.
      Be	
  ready	
  for	
  surprises...
                                              20
Example:	
  synonym	
  leakage
Naive	
  and	
  excited,	
  we	
  configured	
  the	
  Princeton	
  Synonym	
  
database	
  with	
  Solr	
  :)
   Search	
  for	
  “black	
  dress”	
  yields	
  “Jet	
  Set	
  Tote”	
  :(
   Learning:	
  don’t	
  blindly	
  rely	
  on	
  synonyms.




                                       huh?

                        Doh!
      (black → ‘jet black’ → jet) (dress → set)
                                          21
Index	
  by	
  looks	
  or	
  by	
  SKU?	
  
Turmoil:	
  we	
  list	
  looks,	
  but	
  filter	
  by	
  SKU	
  attributes	
  (e.g.	
  size).
Bad	
  Idea:	
  index	
  looks,	
  with	
  size	
  as	
  a	
  multi-­‐value	
  field:
	
  	
  	
  	
  	
  	
  product_look_sizes	
  =	
  "S",	
  "M",	
  "L",	
  "XL"

...	
  filtering	
  with	
  product_look_sizes:M	
  might	
  return	
  a	
  
product	
  that	
  has	
  inventory	
  for	
  S,	
  L,	
  XL	
  but	
  no	
  inventory	
  for	
  "M".	
  
Really	
  Bad	
  Experience.
Better	
  idea:	
  denormalize.	
  Index	
  by	
  SKU	
  with	
  a	
  single-­‐valued	
  
field	
  size	
  field:
	
  	
  	
  	
  	
  	
  sku_size	
  =	
  "M"

...	
  now,	
  when	
  someone	
  indexes	
  by	
  "M",	
  they	
  only	
  SKUs	
  in	
  
medium.	
  Bazinga!
                                                         22
Index	
  by	
  looks	
  or	
  by	
  SKU?	
  (cont’)
But	
  we	
  list	
  looks,	
  not	
  SKUS!!
          The	
  last	
  thing	
  a	
  customer	
  wants	
  is	
  to	
  see	
  8	
  pictures	
  of	
  the	
  
          same	
  dress	
  in	
  8	
  different	
  sizes
Group	
  by	
  'product	
  look	
  ID'	
  using	
  Solr	
  grouping:	
  works	
  a	
  treat!
<str	
  name="group">	
  	
  	
  	
  	
  	
  	
  	
  	
  true	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  </str>
<str	
  name="group.field">	
  	
  	
  sku_look_id_s	
  </str>
<str	
  name="group.limit">	
  	
  	
  -­‐1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  </str>
<str	
  name="group.ngroups">	
  true	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  </str>



Initial	
  worries	
  about	
  performance	
  are	
  (so-­‐far)	
  unfounded.
“Premature	
  optimization	
  is	
  the	
  root	
  of	
  all	
  evil.”
We	
  -­‐may-­‐	
  change	
  to	
  block-­‐join	
  queries	
  when	
  we	
  move	
  to	
  Solr4.
                                                                            23
Solr Internals & Extensions I
                                                  Solr
            request




                                                            INDEX
                                 group




                                                         Index contains SKUs
search response: (lookId, skuId*)* as JSON




                                             24
Realtime Inventory
     Ordering


        25
Real-­‐time	
  inventory	
  ordering
Crucial:	
  sold-­‐out	
  looks	
  should	
  move	
  to	
  bottom	
  of	
  list;	
  i.e.	
  	
  sort	
  
by	
  ‘inventory	
  status’.	
  
Given:	
  an	
  asynchronously	
  updated	
  caching	
  'inventory	
  status'	
  
API.
	
  	
  	
  InventoryStatus	
  status	
  =	
  getInventory(sku)

...	
  we	
  created	
  a	
  custom	
  inventory()	
  function	
  in	
  Solr	
  that	
  
assigns	
  a	
  numeric	
  value	
  (0/1)	
  to	
  each	
  SKU	
  in	
  a	
  result	
  [O(n)]
Queries	
  can	
  now	
  be	
  sorted	
  using:	
  
	
  	
  	
  	
  order	
  by	
  inventory(sku),	
  score




                                                          26
Real-­‐time	
  inventory	
  ordering	
  (cont’)
public	
  class	
  InventoryFunction	
  extends	
  ValueSource	
  {
	
  	
  public	
  DocValues	
  getValues(...)	
  throws	
  IOException	
  {
	
  	
  	
  	
  return	
  new	
  DocValues()	
  {


	
  	
  	
  	
  	
  	
  private	
  float	
  inventory(int	
  doc)	
  {
	
  	
  	
  	
  	
  	
  	
  	
  InventoryStatus	
  skuInventoryStatus	
  =	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  inventory.getSkuInventory(getSkuId(doc)).getStatus();
	
  	
  	
  	
  	
  	
  	
  	
  return	
  (skuInventoryStatus	
  ==	
  InventoryStatus.SOLD_OUT)	
  ?	
  0	
  :	
  1;
	
  	
  	
  	
  	
  	
  }


	
  	
  	
  	
  	
  	
  public	
  float	
  floatVal(int	
  doc)	
  {	
  return	
  inventory(doc);	
  }
	
  	
  	
  	
  	
  	
  public	
  double	
  doubleVal(int	
  doc)	
  {	
  return	
  inventory(doc);	
  }
	
  	
  	
  	
  	
  	
  :	
  :	
  :	
  
	
  	
  	
  	
  };
	
  	
  }
}                                                               27
Solr Internals & Extensions II
                                                                       Solr
            request




                                                  inventory function
                                                                              INDEX
                                 group


                                         sort


search response: (lookId, skuId*)* as JSON

                                                Inventory
                                                 Service



                                                 28
Targeting &
Personalization


       29
Targeting
Sale-­‐targeting	
  is	
  an	
  important	
  tool	
  for	
  Gilt	
  merch:	
  
            Groups	
  (Noir	
  members,	
  employees,	
  ...),	
  geo-­‐IP,	
  weather,	
  
            inclusion/exclusion	
  lists
Search	
  listings	
  and	
  auto-­‐suggest	
  must	
  respect	
  sale	
  targeting.	
  
Solution:
            Store	
  the	
  saleId	
  that	
  the	
  SKU	
  is	
  available	
  in	
  the	
  index
            Pass	
  the	
  user’s	
  GUID	
  to	
  Solr
            Filter	
  results	
  using	
  a	
  custom	
  filter
	
  	
  	
  	
  	
  	
  if	
  (!	
  allowed(saleId,	
  userGuid))	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  //	
  remove	
  result	
  from	
  listing
	
  	
  	
  	
  	
  	
  }	
  
                                                                  30
Solr Internals & Extensions III
                                                                       Solr
            request




                                                  inventory function
                                                                                                   INDEX




                                                                                 targeting filter
                                 group


                                         sort


search response: (lookId, skuId*)* as JSON

                                                Inventory                     Targeting
                                                 Service                       Service



                                                 31
Timing is Everything



         32
Our	
  index	
  is	
  time-­‐sensitive
Sales	
  go	
  live	
  at	
  Noon	
  EST:	
  
      Do	
  not	
  want	
  products	
  to	
  be	
  searchable	
  if	
  the	
  sale	
  is	
  not	
  yet	
  
      active.	
  
      Do	
  want	
  products	
  to	
  be	
  searchable	
  at	
  exactly	
  Noon	
  EST
We	
  index	
  Solr	
  every	
  n	
  minutes	
  (n	
  ==	
  15)
Need	
  to	
  encode	
  a	
  sense	
  of	
  time	
  in	
  the	
  index.




                                                  33
Time-­‐sensitive	
  search
	
  1st	
  idea:	
  index	
  a	
  SKUs	
  start/end	
  availability	
  and	
  then	
  filter	
  on	
  
that.
Use	
  Solr’s	
  ‘dynamic	
  fields’:
	
  	
  	
  	
  for	
  each	
  store	
  (men	
  /	
  women	
  /	
  home	
  /	
  kids	
  /	
  ...)	
  {	
  
	
  	
  	
  	
  	
  	
  //	
  get	
  the	
  nearest	
  upcoming	
  availbility	
  window	
  &	
  index	
  it.
	
  	
  	
  	
  	
  	
  sku_availability_start_<store>	
  =	
  startTime
	
  	
  	
  	
  	
  	
  sku_availability_end_<store>	
  =	
  endTime
	
  	
  	
  	
  	
  	
  sku_availability_sale_<store>	
  =	
  saleId
	
  	
  	
  	
  }

Then,	
  naively	
  filter	
  using	
  (see	
  next	
  slide	
  for	
  improvement):	
  
fq=+sku_availability_start_<store>:[*	
  TO	
  NOW]	
  	
  	
  
	
  	
  	
  +sku_availability_end_<store>:[NOW+1HOUR	
  TO	
  *]


                                                                34
All	
  praise	
  and	
  thanks	
  to	
  the	
  chump.
    Filtering	
  by	
  time	
  for	
  every	
  
    request	
  is	
  	
  expensive	
  :(
    Posed	
  question	
  to	
  The	
  Chump	
  
    at	
  Lucene	
  Revolution	
  2012.
    Solution:	
  rounding	
  time	
  
    bounds	
  to	
  nearest	
  hour	
  /	
  
    minute	
  means	
  we	
  get	
  a	
  	
  
    cached	
  query	
  filter.	
  
[*	
  TO	
  NOW]	
  →	
  [*	
  TO	
  NOW/HOUR]

    Super	
  Chump.	
  Chump	
  is	
  
    Champ.	
  Chump-­‐tastic.	
  
    Chump-­‐nominal.	
  Awe-­‐chump.	
  
                                       35
Time-­‐sensitive	
  search	
  (cont’)
  e.g.	
  Consider	
  a	
  SKU	
  available	
  in	
  women	
  &	
  home:
  	
  	
  	
  	
  sku_availability_start_home	
  =	
  32141232	
  //	
  Time

  	
  	
  	
  	
  sku_availability_end_home	
  =	
  32161232	
  //	
  Time

  	
  	
  	
  	
  sku_availability_sale_home	
  =	
  12121221	
  //	
  SaleID

  	
  	
  	
  	
  sku_availability_start_women	
  =	
  32141232	
  //	
  Time

  	
  	
  	
  	
  sku_availability_end_women	
  =	
  32161232	
  //	
  Time

  	
  	
  	
  	
  sku_availability_sale_women	
  =	
  223423424	
  //	
  SaleID

  Now,	
  can	
  search	
  in	
  Home	
  store	
  using:	
  
fq=+sku_availability_start_home:[*	
  TO	
  NOW/HOUR]	
  
+sku_availability_end_home:[NOW/HOUR+1HOUR	
  TO	
  *]


                                                    36
Problems	
  in	
  the	
  space-­‐time	
  continuum	
  (cont’)
At	
  index	
  time,	
  we	
  used	
  a	
  heuristic	
  to	
  determine	
  the	
  'best	
  
availability	
  window'	
  when	
  a	
  SKU	
  is	
  in	
  2	
  sales	
  in	
  the	
  same	
  store
      “Earliest	
  active	
  window,	
  or	
  soonest	
  starting	
  window”:	
  some	
  
      windows	
  ignored
      Targeting	
  problems:	
  might	
  remove	
  a	
  SKU	
  in	
  a	
  restricted	
  
      sale,	
  even	
  if	
  the	
  SKU	
  is	
  visible	
  in	
  another	
  Sale	
  :(

               ‘Best Availability Window’

               Sale A (targeted to ‘Noir’)

                                             Sale B (untargeted)

                                                        Sale C (weekend specials)
                                 Danger Zone

                                                  37
Problems	
  in	
  the	
  space-­‐time	
  continuum	
  (cont’)
          Also,	
  some	
  sales	
  are	
  restricted	
  to	
  the	
  channel	
  the	
  product	
  is	
  in!
                       'Channels':	
  e.g.	
  mobile,	
  iPad,	
  ...
          Our	
  'best	
  availability	
  window'	
  code	
  was	
  already	
  creaking;	
  	
  
          Introducing	
  more	
  dynamic	
  fields	
  (channel	
  x	
  store)	
  stank.
	
  	
  	
  	
  sku_availability_start_web_home	
  =	
  32141232	
  //	
  Time

	
  	
  	
  	
  sku_availability_end_web_home	
  =	
  32161232	
  //	
  Time

	
  	
  	
  	
  sku_availability_sale_web_home	
  =	
  12121221	
  //	
  SaleID

	
  	
  	
  	
  sku_availability_start_ipad_home	
  =	
  32141232	
  //	
  Time

	
  	
  	
  	
  sku_availability_end_ipad_home	
  =	
  32161232	
  //	
  Time

	
  	
  	
  	
  sku_availability_sale_ipad_home	
  =	
  12121221	
  //	
  SaleID

	
  	
  	
  	
  sku_availability_start_mobile_home	
  =	
  32141232	
  //	
  Time

	
  	
  	
  	
  :	
  :	
  :	
  :	
                           38
Eureka!
      Stop	
  indexing	
  SKUs;	
  instead:	
  index	
  'the	
  availability	
  of	
  a	
  SKU	
  at	
  a	
  
      moment	
  in	
  time'
	
  	
  	
  	
  sku_name	
  =	
  ...
	
  	
  	
  	
  sku_color	
  =	
  ...
	
  	
  	
  	
  sku_availibility_start	
  =	
  
	
  	
  	
  	
  sku_availibility_end	
  =	
  
	
  	
  	
  	
  sku_store	
  =	
  home	
  
	
  	
  	
  	
  sku_channels	
  =	
  mobile,	
  ipad
	
  	
  	
  	
  sku_sale_id	
  =	
  

      The	
  filter	
  becomes	
  much	
  simpler:	
  
      fq	
  =	
  +sku_availability_start:[*	
  TO	
  NOW/HOUR]	
  
      	
  	
  	
  	
  	
  +sku_availability_end:[NOW/HOUR+1HOUR	
  TO	
  *]	
  	
  	
  	
  

      Code	
  fell	
  away.	
  Just	
  had	
  to	
  to	
  post-­‐filter	
  out	
  multiple	
  SKU	
  
      availabilities	
  in	
  results	
  [O(n)]
                                                              39
Solr Internals & Extensions IV
                                                                        Solr
            request
                        availability-component




                                                   inventory function
                                                                                                                INDEX




                                                                                              targeting filter
                                                                           de-dupe filter
                                 group


                                         sort


search response: (lookId, skuId*)* as JSON

                                                 Inventory                                 Targeting
                                                  Service                                   Service



                                                  40
Faceting



   41
Mixed-­‐logic	
  multi-­‐select	
  faceting
Facet	
  filters	
  typically	
  use	
  'AND'	
  logic
size	
  =	
  36	
  AND	
  brand	
  =	
  'Timber	
  Island'	
  AND	
  color	
  =	
  'RED'

Prefer	
  mixed	
  logic,	
  where	
  'OR'	
  is	
  used	
  for	
  filters	
  on	
  the	
  same	
  
facet.
size	
  =	
  36	
  OR	
  size	
  =	
  38	
  AND	
  brand	
  =	
  'Timber	
  Island'	
  
	
  	
  AND	
  color	
  =	
  'RED'	
  OR	
  color='BLUE'

Use	
  ‘multi-­‐select’	
  faceting	
  technique	
  to	
  ensure	
  that	
  facet	
  values	
  
don’t	
  disappear.
       Tag	
  the	
  facet	
  filter-­‐query:
       fq:{!tag=brand_fq}brand_facet:"Timber	
  Island"

       Ignore	
  the	
  effects	
  of	
  the	
  facet	
  filter-­‐query	
  when	
  faceting:
       <str	
  name="facet.field">{!ex=brand_fq}brand_facet</str>
                                                         42
‘Disappearing	
  facet	
  problem’
To	
  get	
  facets	
  relevant	
  to	
  the	
  query,	
  we	
  set	
  facet.mincount	
  to	
  1.
<str	
  name="facet.mincount">1</str>	
  


           However:	
  if	
  subsequent	
  facet	
  filters	
  reduce	
  the	
  facet	
  count	
  
           to	
  zero,	
  then	
  the	
  facet's	
  'disappear'	
  :(
	
  Consider	
  a	
  search	
  for	
  'shoes',	
  returning	
  the	
  following	
  facets
brand:	
  Ade	
  -­‐>	
  10,	
  Eric	
  -­‐>	
  5,	
  	
  	
  color:	
  black	
  -­‐>	
  8,	
  red	
  -­‐>	
  7

Then	
  filter	
  on	
  red	
  (and	
  assume	
  that	
  only	
  Ade	
  shoes	
  are	
  red).	
  We	
  
want	
  to	
  have:	
  
brands:	
  Ade	
  -­‐>	
  7,	
  Eric	
  -­‐>	
  0,	
  	
  	
  colors:	
  black	
  -­‐>	
  0,	
  red	
  -­‐>	
  7

However,	
  if	
  facet.mincount	
  ==	
  1,	
  we	
  get:	
  	
  
brands:	
  Ade	
  -­‐>	
  7	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  colors:	
  red	
  -­‐>	
  7
                                                                                 43
‘Disappearing	
  facet	
  problem’	
  (cont’)
We	
  want	
  to	
  say	
  'Eric	
  and	
  black	
  are	
  relevant	
  to	
  the	
  query,	
  but	
  has	
  
zero	
  results	
  due	
  to	
  filtering'.	
  
       Solution:	
  For	
  each	
  facet	
  overlay	
  the	
  original	
  counts	
  with	
  the	
  
       values	
  of	
  the	
  new	
  count	
  if	
  present,	
  or	
  zero	
  otherwise.	
  	
  	
  
<str	
  name="facet.field">{!ex=brand_fq,size_fq,taxonomy_fq,color_fq	
  
key=all_brand_facets}brand_facet</str>
<str	
  name="facet.field">{!ex=brand_fq,size_fq,taxonomy_fq,color_fq	
  
key=all_color_facets}color_facet</str>

This	
  means	
  we	
  get:	
  
all_brand_facets:	
  	
  Ade	
  -­‐>	
  10,	
  Eric	
  -­‐>	
  5,	
  	
  	
  	
  brand_facet:	
  Ade	
  -­‐>	
  7
all_color_facets:	
  black	
  -­‐>	
  8,	
  red	
  -­‐>	
  7,	
  	
  	
  	
  	
  color_facet:	
  red	
  -­‐>	
  7

Merge	
  to	
  get:	
  
brand_facet:	
  Ade	
  -­‐>	
  7,	
  Eric	
  -­‐>	
  0,	
  	
  color_facet:	
  red	
  -­‐>	
  7,	
  black	
  -­‐>	
  0
                                                          44
Color	
  Me	
  Happy

   Need	
  to	
  map	
  SKU's	
  color	
  
   to	
  a	
  simple	
  set	
  of	
  colors	
  
   for	
  faceting.
   Used	
  a	
  synonym	
  file	
  to	
  
   map	
  'urobilin'	
  to	
  'yellow'
   Use	
  a	
  stopwords	
  file	
  to	
  
   remove	
  unknown	
  colors
   It	
  works,	
  but	
  it's	
  brittle;	
  
   want	
  to	
  move	
  to	
  a	
  solution	
  
   based	
  on	
  color	
  analysis	
  of	
  
   swatches
                                                   45
Hierarchical	
  taxonomy	
  faceting
Our	
  taxonomy	
  is	
  hierarchical.	
  Solr	
  faceting	
  is	
  not	
  :(
Encoded	
  taxonomy	
  hierarchy	
  using	
  'paths'
	
  	
  "Gilt::Men::Clothing::Pants"
Our	
  search	
  API	
  converts	
  the	
  paths	
  into	
  a	
  tree	
  for	
  rendering	
  
purposes.
Works	
  but	
  (a)	
  feels	
  jenky	
  (b)	
  ordering	
  is	
  alphabetic
Prefer	
  in	
  future	
  to	
  use	
  unique	
  taxonomy	
  key,	
  and	
  use	
  that	
  the	
  
filter	
  through	
  an	
  ordered	
  tree	
  in	
  our	
  AP



                                               46
Hierarchical	
  size	
  faceting
Size	
  ordering	
  is	
  non	
  trivial:	
  there	
  is	
  a	
  two-­‐level	
  hierarchy	
  &	
  
ordering	
  is	
  difficult:	
  
      00,	
  0,	
  2,	
  4,	
  6,	
  8,	
  10,	
  12	
  rather	
  than	
  0,	
  00,	
  10,	
  12,	
  2,	
  4,	
  6,	
  8
      XS,	
  S,	
  M,	
  L,	
  XL,	
  XXL	
  rather	
  than	
  M,	
  L,	
  S,	
  XL,	
  XS,	
  XXL
End	
  up	
  encoding	
  a	
  size	
  ordinal	
  into	
  the	
  size	
  facet	
  label
      "Women's	
  Apparel::[[0000000003]]00"
      "Women's	
  Apparel::[[0000000004]]0"
      "Women's	
  Apparel::[[0000000005]]2"

OK,	
  but	
  again,	
  feels	
  hacky.	
  
“If	
  only	
  there	
  was	
  a	
  way	
  to	
  annotate	
  facets	
  with	
  meaningful	
  
data”
                                                         47
Solr Internals & Extensions (V)
                                                                        Solr
            request
                        availability-component




                                                   inventory function
                                                                                                                INDEX




                                                                                              targeting filter
                                                                           de-dupe filter
                                 group
                        facet




                                         sort


search response: (lookId, skuId*)* as JSON

                                                 Inventory                                 Targeting
                                                  Service                                   Service



                                                  48
Lesson	
  I:	
  Solr	
  Makes	
  You	
  Happy




                     49
Lesson	
  II:	
  Solr	
  makes	
  the	
  business	
  Love	
  You	
  
Gilt	
  Search	
  released	
  in	
  A/B	
  test	
  to	
  30%.	
  
      Initial	
  deployment	
  can	
  handle	
  >	
  180,000	
  RPM	
  (5:1	
  ratio	
  of	
  
      auto-­‐complete	
  to	
  search)
      Members	
  who's	
  visit	
  included	
  search	
  are	
  4x	
  more	
  likely	
  to	
  
      buy.
      Search	
  generates	
  2-­‐4%	
  incremental	
  revenue
Business	
  pleads	
  with	
  us	
  to	
  end	
  A/B	
  test	
  and	
  release	
  to	
  100%
We've	
  iterated	
  to	
  drive	
  more	
  traffic	
  (&	
  succeeded)	
  with	
  no	
  loss	
  of	
  
conversion.
      From	
  subtle	
  to	
  suBtle.
                                                 50
Lesson	
  III:	
  Excelsior
 Power	
  all	
  listings	
  (keyword,	
  
 category,	
  sale,	
  brand)	
  via	
  
 Solr
 Faceting	
  on	
  SKU	
  attributes	
  
 (e.g.	
  ‘age’,	
  ‘gender’)
 Solr	
  4
 Hack	
  Debridement:	
  
   Improve	
  hierarchical	
  
   faceting	
  &	
  facet	
  ordering
   ‘Double	
  facet’	
  feels	
  wrong

                                             51
Thanks!

      Ade	
  Trenaman
      Tech	
  Lead,	
  Gilt

Twitter:	
  adrian_trenaman
LinkedIn:	
  adrian.trenaman

  atrenaman@gilt.com




              52
http://gilt.com/apacheconeu2012




     $25    ... enjoy!



                53

Contenu connexe

Tendances

Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Kouhei Sutou
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopHortonworks
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture Ramez Al-Fayez
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Flink Forward
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...Andrew Lamb
 
Hive on Spark の設計指針を読んでみた
Hive on Spark の設計指針を読んでみたHive on Spark の設計指針を読んでみた
Hive on Spark の設計指針を読んでみたRecruit Technologies
 

Tendances (20)

Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture
 
Models for hierarchical data
Models for hierarchical dataModels for hierarchical data
Models for hierarchical data
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
 
Hive on Spark の設計指針を読んでみた
Hive on Spark の設計指針を読んでみたHive on Spark の設計指針を読んでみた
Hive on Spark の設計指針を読んでみた
 

Similaire à Personalized Search on the Largest Flash Sale Site in America

Solr nyc meetup - May 14 2015 - Adrian Trenaman
Solr nyc meetup - May 14 2015 - Adrian TrenamanSolr nyc meetup - May 14 2015 - Adrian Trenaman
Solr nyc meetup - May 14 2015 - Adrian TrenamanAdrian Trenaman
 
Lucene revolution 2013 adrian trenaman
Lucene revolution 2013   adrian trenamanLucene revolution 2013   adrian trenaman
Lucene revolution 2013 adrian trenamanAdrian Trenaman
 
Cross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowCross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowChristian Hassa
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalogMongoDB
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSONChris Saxon
 
Beyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQLBeyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQLVaticle
 
Elixir, GraphQL and Vue.js
Elixir, GraphQL and Vue.jsElixir, GraphQL and Vue.js
Elixir, GraphQL and Vue.jsJeroen Visser
 
From Android NDK To AOSP
From Android NDK To AOSPFrom Android NDK To AOSP
From Android NDK To AOSPMin-Yih Hsu
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solutionJulian Hyde
 
Automated UI test on mobile - with Cucumber/Calabash
Automated UI test on mobile - with Cucumber/CalabashAutomated UI test on mobile - with Cucumber/Calabash
Automated UI test on mobile - with Cucumber/CalabashNiels Frydenholm
 
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubJoining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubData Con LA
 
AEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkAEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkPaolo Mottadelli
 
Using Mongo At Shopwiki
Using Mongo At ShopwikiUsing Mongo At Shopwiki
Using Mongo At ShopwikiAvery Rosen
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Maxime Beugnet
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
Building complex UI on Android
Building complex UI on AndroidBuilding complex UI on Android
Building complex UI on AndroidMaciej Witowski
 
Introduction to Optimization Group
Introduction to Optimization GroupIntroduction to Optimization Group
Introduction to Optimization GroupTom_Thompson
 
Artificial Intelligence (AI) DAOs (decentralized autonomous organizations) - ...
Artificial Intelligence (AI) DAOs (decentralized autonomous organizations) - ...Artificial Intelligence (AI) DAOs (decentralized autonomous organizations) - ...
Artificial Intelligence (AI) DAOs (decentralized autonomous organizations) - ...BigchainDB
 

Similaire à Personalized Search on the Largest Flash Sale Site in America (20)

Solr nyc meetup - May 14 2015 - Adrian Trenaman
Solr nyc meetup - May 14 2015 - Adrian TrenamanSolr nyc meetup - May 14 2015 - Adrian Trenaman
Solr nyc meetup - May 14 2015 - Adrian Trenaman
 
Lucene revolution 2013 adrian trenaman
Lucene revolution 2013   adrian trenamanLucene revolution 2013   adrian trenaman
Lucene revolution 2013 adrian trenaman
 
Cross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowCross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlow
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSON
 
Beyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQLBeyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQL
 
Elixir, GraphQL and Vue.js
Elixir, GraphQL and Vue.jsElixir, GraphQL and Vue.js
Elixir, GraphQL and Vue.js
 
From Android NDK To AOSP
From Android NDK To AOSPFrom Android NDK To AOSP
From Android NDK To AOSP
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solution
 
Automated UI test on mobile - with Cucumber/Calabash
Automated UI test on mobile - with Cucumber/CalabashAutomated UI test on mobile - with Cucumber/Calabash
Automated UI test on mobile - with Cucumber/Calabash
 
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubJoining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
 
AEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkAEM (CQ) eCommerce Framework
AEM (CQ) eCommerce Framework
 
Using Mongo At Shopwiki
Using Mongo At ShopwikiUsing Mongo At Shopwiki
Using Mongo At Shopwiki
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Building complex UI on Android
Building complex UI on AndroidBuilding complex UI on Android
Building complex UI on Android
 
BDD from QA side
BDD from QA sideBDD from QA side
BDD from QA side
 
Introduction to Optimization Group
Introduction to Optimization GroupIntroduction to Optimization Group
Introduction to Optimization Group
 
Artificial Intelligence (AI) DAOs (decentralized autonomous organizations) - ...
Artificial Intelligence (AI) DAOs (decentralized autonomous organizations) - ...Artificial Intelligence (AI) DAOs (decentralized autonomous organizations) - ...
Artificial Intelligence (AI) DAOs (decentralized autonomous organizations) - ...
 

Plus de Adrian Trenaman

Why i love computer science
Why i love computer scienceWhy i love computer science
Why i love computer scienceAdrian Trenaman
 
JavaOne 2015: Scaling micro services at Gilt
JavaOne 2015: Scaling micro services at GiltJavaOne 2015: Scaling micro services at Gilt
JavaOne 2015: Scaling micro services at GiltAdrian Trenaman
 
GeeCON Microservices 2015 scaling micro services at gilt
GeeCON Microservices 2015   scaling micro services at giltGeeCON Microservices 2015   scaling micro services at gilt
GeeCON Microservices 2015 scaling micro services at giltAdrian Trenaman
 
Scaling micro services at gilt
Scaling micro services at giltScaling micro services at gilt
Scaling micro services at giltAdrian Trenaman
 
Implementing WebServices with Camel and CXF in ServiceMix
Implementing WebServices with Camel and CXF in ServiceMixImplementing WebServices with Camel and CXF in ServiceMix
Implementing WebServices with Camel and CXF in ServiceMixAdrian Trenaman
 
WebServices in ServiceMix with CXF
WebServices in ServiceMix with CXFWebServices in ServiceMix with CXF
WebServices in ServiceMix with CXFAdrian Trenaman
 
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010Adrian Trenaman
 
Oop2008 RESTful services with GWT and Apache CXF
Oop2008 RESTful services with GWT and Apache CXFOop2008 RESTful services with GWT and Apache CXF
Oop2008 RESTful services with GWT and Apache CXFAdrian Trenaman
 
ApacheCon EU 2009 Tales from the front line - ActiveMQ ServiceMix and CXF
ApacheCon EU 2009 Tales from the front line - ActiveMQ ServiceMix and CXFApacheCon EU 2009 Tales from the front line - ActiveMQ ServiceMix and CXF
ApacheCon EU 2009 Tales from the front line - ActiveMQ ServiceMix and CXFAdrian Trenaman
 
Apache coneu 2009-adrian-trenaman-adopting-open-source-in-the-enterprise
Apache coneu 2009-adrian-trenaman-adopting-open-source-in-the-enterpriseApache coneu 2009-adrian-trenaman-adopting-open-source-in-the-enterprise
Apache coneu 2009-adrian-trenaman-adopting-open-source-in-the-enterpriseAdrian Trenaman
 
20100907 fuse-community-evening-adrian-trenaman-no-logo
20100907 fuse-community-evening-adrian-trenaman-no-logo20100907 fuse-community-evening-adrian-trenaman-no-logo
20100907 fuse-community-evening-adrian-trenaman-no-logoAdrian Trenaman
 
An Introduction to Apache ServiceMix 4 - FUSE ESB
An Introduction to Apache ServiceMix 4 - FUSE ESBAn Introduction to Apache ServiceMix 4 - FUSE ESB
An Introduction to Apache ServiceMix 4 - FUSE ESBAdrian Trenaman
 

Plus de Adrian Trenaman (12)

Why i love computer science
Why i love computer scienceWhy i love computer science
Why i love computer science
 
JavaOne 2015: Scaling micro services at Gilt
JavaOne 2015: Scaling micro services at GiltJavaOne 2015: Scaling micro services at Gilt
JavaOne 2015: Scaling micro services at Gilt
 
GeeCON Microservices 2015 scaling micro services at gilt
GeeCON Microservices 2015   scaling micro services at giltGeeCON Microservices 2015   scaling micro services at gilt
GeeCON Microservices 2015 scaling micro services at gilt
 
Scaling micro services at gilt
Scaling micro services at giltScaling micro services at gilt
Scaling micro services at gilt
 
Implementing WebServices with Camel and CXF in ServiceMix
Implementing WebServices with Camel and CXF in ServiceMixImplementing WebServices with Camel and CXF in ServiceMix
Implementing WebServices with Camel and CXF in ServiceMix
 
WebServices in ServiceMix with CXF
WebServices in ServiceMix with CXFWebServices in ServiceMix with CXF
WebServices in ServiceMix with CXF
 
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
 
Oop2008 RESTful services with GWT and Apache CXF
Oop2008 RESTful services with GWT and Apache CXFOop2008 RESTful services with GWT and Apache CXF
Oop2008 RESTful services with GWT and Apache CXF
 
ApacheCon EU 2009 Tales from the front line - ActiveMQ ServiceMix and CXF
ApacheCon EU 2009 Tales from the front line - ActiveMQ ServiceMix and CXFApacheCon EU 2009 Tales from the front line - ActiveMQ ServiceMix and CXF
ApacheCon EU 2009 Tales from the front line - ActiveMQ ServiceMix and CXF
 
Apache coneu 2009-adrian-trenaman-adopting-open-source-in-the-enterprise
Apache coneu 2009-adrian-trenaman-adopting-open-source-in-the-enterpriseApache coneu 2009-adrian-trenaman-adopting-open-source-in-the-enterprise
Apache coneu 2009-adrian-trenaman-adopting-open-source-in-the-enterprise
 
20100907 fuse-community-evening-adrian-trenaman-no-logo
20100907 fuse-community-evening-adrian-trenaman-no-logo20100907 fuse-community-evening-adrian-trenaman-no-logo
20100907 fuse-community-evening-adrian-trenaman-no-logo
 
An Introduction to Apache ServiceMix 4 - FUSE ESB
An Introduction to Apache ServiceMix 4 - FUSE ESBAn Introduction to Apache ServiceMix 4 - FUSE ESB
An Introduction to Apache ServiceMix 4 - FUSE ESB
 

Dernier

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Dernier (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Personalized Search on the Largest Flash Sale Site in America

  • 1. How  Solr  powers  search   on  America’s  largest   flash  sale  site. or “Personalized  search  of  a  fast-­‐moving  time-­‐sensitive  data-­‐set”   Ade  Trenaman  -­‐  Team  Galactus atrenaman@gilt.com @adrian_trenaman http://tech.gilt.com http://slideshare.net/trenaman 1
  • 2. adrian.trenaman@gmail.com|@adrian_trenaman   Committer on Apache Karaf. Consultant/Contributor ActiveMQ, Camel, CXF. Now: Tech Lead, Gilt Groupe 2
  • 3. Gilt:  Exclusive  Gorgeous  Stuff  on  Sale  at  Noon! 3
  • 5. ...  and  for  guys  .... 5
  • 6. ...  and  for  baby  &  kids  ... 6
  • 7. ...  and  for  your  beautiful  home  ... 7
  • 8. ...  and  food  &  fine  wine  ... 8
  • 9. ...  cool   things   to  do   in  your   city  ... 9
  • 10. ...  or  somewhere  else  in  the  world. 10
  • 11. The  Gilt  noon  ‘attack  of  self  denial’ Email  &  Mobile  notification  of  today’s  sales  goes  to  members   at  11:45  EST... ...  Sales  go  live  at  Noon  EST. Stampede. 11
  • 12. http://gilt.com/apacheconeu2012 $25 ... our treat to you. 12
  • 13. PDX,  NYC,  DUB  -­‐  tech.gilt.com 13
  • 15. Gilt  Data  Model  101 A Haiku: Simple relations, Product Silk Charmeuse Wrap Dress Solr makes hard; weeping, I denormalize now. 1 Ade, Sinsheim, 2012 * ... in white Look 1 * * * ... in size 8 Sale SKU 8 15
  • 16. QED Architecture Pattern: Query, Enrich, Deliver search response: (lookId, skuId*)* as JSON Browser <<play>> Solr web-search Solr Search API web-search Solr enrich client-side controllers plugins plugins & rendering components HTML/JSON note: search API for Product Inventory Targeting isolation, plugins to perform personalization Service Service Service & inventory sorting, hydration of results Motivating principle: “Do One Thing Well” 16
  • 17. Search  MVP Provide  search  listings  of  ‘product  looks’   Move  sold-­‐out  inventory  to  bottom  in  realtime Facet  by  Size  /  Color  /  Brand  /  Category Search  by  store  (Home,  Women’s,  ...) Provide  targeted  search  results Respect  sale  start  /  end  time  in  listings Auto-­‐suggest Auto-­‐complete 17
  • 19. Search  space:  what  to  index? Taxonomy! Brand, Product Name! Price! Color! Inventory Status! Description! SKU Attributes / Material / Origin! 19
  • 20. Data,  data,  data  gonna  hate  ya,  hate  ya,  hate  ya Not  all  product  data  is  as  clean  as  you’d  like.   Description  contains  distracting  tokens  (‘also  available  in   blue,  black  (patent),  green  and  leather’) Colors  are  often  poorly  named:  ‘blk’,  ‘black’,  ‘priest’,   ‘black  /  white’,  ‘nite’,  ‘night’,  ‘100’,  ‘multi’,  ‘patent’.   Brand  names  can  mislead:  ‘Thomas  Pink  Shirts’, ‘White  +  Warren  Black  Dress’   Trust  nothing.   Try  everything. Be  ready  for  surprises... 20
  • 21. Example:  synonym  leakage Naive  and  excited,  we  configured  the  Princeton  Synonym   database  with  Solr  :) Search  for  “black  dress”  yields  “Jet  Set  Tote”  :( Learning:  don’t  blindly  rely  on  synonyms. huh? Doh! (black → ‘jet black’ → jet) (dress → set) 21
  • 22. Index  by  looks  or  by  SKU?   Turmoil:  we  list  looks,  but  filter  by  SKU  attributes  (e.g.  size). Bad  Idea:  index  looks,  with  size  as  a  multi-­‐value  field:            product_look_sizes  =  "S",  "M",  "L",  "XL" ...  filtering  with  product_look_sizes:M  might  return  a   product  that  has  inventory  for  S,  L,  XL  but  no  inventory  for  "M".   Really  Bad  Experience. Better  idea:  denormalize.  Index  by  SKU  with  a  single-­‐valued   field  size  field:            sku_size  =  "M" ...  now,  when  someone  indexes  by  "M",  they  only  SKUs  in   medium.  Bazinga! 22
  • 23. Index  by  looks  or  by  SKU?  (cont’) But  we  list  looks,  not  SKUS!! The  last  thing  a  customer  wants  is  to  see  8  pictures  of  the   same  dress  in  8  different  sizes Group  by  'product  look  ID'  using  Solr  grouping:  works  a  treat! <str  name="group">                  true                    </str> <str  name="group.field">      sku_look_id_s  </str> <str  name="group.limit">      -­‐1                        </str> <str  name="group.ngroups">  true                    </str> Initial  worries  about  performance  are  (so-­‐far)  unfounded. “Premature  optimization  is  the  root  of  all  evil.” We  -­‐may-­‐  change  to  block-­‐join  queries  when  we  move  to  Solr4. 23
  • 24. Solr Internals & Extensions I Solr request INDEX group Index contains SKUs search response: (lookId, skuId*)* as JSON 24
  • 25. Realtime Inventory Ordering 25
  • 26. Real-­‐time  inventory  ordering Crucial:  sold-­‐out  looks  should  move  to  bottom  of  list;  i.e.    sort   by  ‘inventory  status’.   Given:  an  asynchronously  updated  caching  'inventory  status'   API.      InventoryStatus  status  =  getInventory(sku) ...  we  created  a  custom  inventory()  function  in  Solr  that   assigns  a  numeric  value  (0/1)  to  each  SKU  in  a  result  [O(n)] Queries  can  now  be  sorted  using:          order  by  inventory(sku),  score 26
  • 27. Real-­‐time  inventory  ordering  (cont’) public  class  InventoryFunction  extends  ValueSource  {    public  DocValues  getValues(...)  throws  IOException  {        return  new  DocValues()  {            private  float  inventory(int  doc)  {                InventoryStatus  skuInventoryStatus  =                      inventory.getSkuInventory(getSkuId(doc)).getStatus();                return  (skuInventoryStatus  ==  InventoryStatus.SOLD_OUT)  ?  0  :  1;            }            public  float  floatVal(int  doc)  {  return  inventory(doc);  }            public  double  doubleVal(int  doc)  {  return  inventory(doc);  }            :  :  :          };    } } 27
  • 28. Solr Internals & Extensions II Solr request inventory function INDEX group sort search response: (lookId, skuId*)* as JSON Inventory Service 28
  • 30. Targeting Sale-­‐targeting  is  an  important  tool  for  Gilt  merch:   Groups  (Noir  members,  employees,  ...),  geo-­‐IP,  weather,   inclusion/exclusion  lists Search  listings  and  auto-­‐suggest  must  respect  sale  targeting.   Solution: Store  the  saleId  that  the  SKU  is  available  in  the  index Pass  the  user’s  GUID  to  Solr Filter  results  using  a  custom  filter            if  (!  allowed(saleId,  userGuid))  {                  //  remove  result  from  listing            }   30
  • 31. Solr Internals & Extensions III Solr request inventory function INDEX targeting filter group sort search response: (lookId, skuId*)* as JSON Inventory Targeting Service Service 31
  • 33. Our  index  is  time-­‐sensitive Sales  go  live  at  Noon  EST:   Do  not  want  products  to  be  searchable  if  the  sale  is  not  yet   active.   Do  want  products  to  be  searchable  at  exactly  Noon  EST We  index  Solr  every  n  minutes  (n  ==  15) Need  to  encode  a  sense  of  time  in  the  index. 33
  • 34. Time-­‐sensitive  search  1st  idea:  index  a  SKUs  start/end  availability  and  then  filter  on   that. Use  Solr’s  ‘dynamic  fields’:        for  each  store  (men  /  women  /  home  /  kids  /  ...)  {              //  get  the  nearest  upcoming  availbility  window  &  index  it.            sku_availability_start_<store>  =  startTime            sku_availability_end_<store>  =  endTime            sku_availability_sale_<store>  =  saleId        } Then,  naively  filter  using  (see  next  slide  for  improvement):   fq=+sku_availability_start_<store>:[*  TO  NOW]            +sku_availability_end_<store>:[NOW+1HOUR  TO  *] 34
  • 35. All  praise  and  thanks  to  the  chump. Filtering  by  time  for  every   request  is    expensive  :( Posed  question  to  The  Chump   at  Lucene  Revolution  2012. Solution:  rounding  time   bounds  to  nearest  hour  /   minute  means  we  get  a     cached  query  filter.   [*  TO  NOW]  →  [*  TO  NOW/HOUR] Super  Chump.  Chump  is   Champ.  Chump-­‐tastic.   Chump-­‐nominal.  Awe-­‐chump.   35
  • 36. Time-­‐sensitive  search  (cont’) e.g.  Consider  a  SKU  available  in  women  &  home:        sku_availability_start_home  =  32141232  //  Time        sku_availability_end_home  =  32161232  //  Time        sku_availability_sale_home  =  12121221  //  SaleID        sku_availability_start_women  =  32141232  //  Time        sku_availability_end_women  =  32161232  //  Time        sku_availability_sale_women  =  223423424  //  SaleID Now,  can  search  in  Home  store  using:   fq=+sku_availability_start_home:[*  TO  NOW/HOUR]   +sku_availability_end_home:[NOW/HOUR+1HOUR  TO  *] 36
  • 37. Problems  in  the  space-­‐time  continuum  (cont’) At  index  time,  we  used  a  heuristic  to  determine  the  'best   availability  window'  when  a  SKU  is  in  2  sales  in  the  same  store “Earliest  active  window,  or  soonest  starting  window”:  some   windows  ignored Targeting  problems:  might  remove  a  SKU  in  a  restricted   sale,  even  if  the  SKU  is  visible  in  another  Sale  :( ‘Best Availability Window’ Sale A (targeted to ‘Noir’) Sale B (untargeted) Sale C (weekend specials) Danger Zone 37
  • 38. Problems  in  the  space-­‐time  continuum  (cont’) Also,  some  sales  are  restricted  to  the  channel  the  product  is  in! 'Channels':  e.g.  mobile,  iPad,  ... Our  'best  availability  window'  code  was  already  creaking;     Introducing  more  dynamic  fields  (channel  x  store)  stank.        sku_availability_start_web_home  =  32141232  //  Time        sku_availability_end_web_home  =  32161232  //  Time        sku_availability_sale_web_home  =  12121221  //  SaleID        sku_availability_start_ipad_home  =  32141232  //  Time        sku_availability_end_ipad_home  =  32161232  //  Time        sku_availability_sale_ipad_home  =  12121221  //  SaleID        sku_availability_start_mobile_home  =  32141232  //  Time        :  :  :  :   38
  • 39. Eureka! Stop  indexing  SKUs;  instead:  index  'the  availability  of  a  SKU  at  a   moment  in  time'        sku_name  =  ...        sku_color  =  ...        sku_availibility_start  =          sku_availibility_end  =          sku_store  =  home          sku_channels  =  mobile,  ipad        sku_sale_id  =   The  filter  becomes  much  simpler:   fq  =  +sku_availability_start:[*  TO  NOW/HOUR]            +sku_availability_end:[NOW/HOUR+1HOUR  TO  *]         Code  fell  away.  Just  had  to  to  post-­‐filter  out  multiple  SKU   availabilities  in  results  [O(n)] 39
  • 40. Solr Internals & Extensions IV Solr request availability-component inventory function INDEX targeting filter de-dupe filter group sort search response: (lookId, skuId*)* as JSON Inventory Targeting Service Service 40
  • 41. Faceting 41
  • 42. Mixed-­‐logic  multi-­‐select  faceting Facet  filters  typically  use  'AND'  logic size  =  36  AND  brand  =  'Timber  Island'  AND  color  =  'RED' Prefer  mixed  logic,  where  'OR'  is  used  for  filters  on  the  same   facet. size  =  36  OR  size  =  38  AND  brand  =  'Timber  Island'      AND  color  =  'RED'  OR  color='BLUE' Use  ‘multi-­‐select’  faceting  technique  to  ensure  that  facet  values   don’t  disappear. Tag  the  facet  filter-­‐query: fq:{!tag=brand_fq}brand_facet:"Timber  Island" Ignore  the  effects  of  the  facet  filter-­‐query  when  faceting: <str  name="facet.field">{!ex=brand_fq}brand_facet</str> 42
  • 43. ‘Disappearing  facet  problem’ To  get  facets  relevant  to  the  query,  we  set  facet.mincount  to  1. <str  name="facet.mincount">1</str>   However:  if  subsequent  facet  filters  reduce  the  facet  count   to  zero,  then  the  facet's  'disappear'  :(  Consider  a  search  for  'shoes',  returning  the  following  facets brand:  Ade  -­‐>  10,  Eric  -­‐>  5,      color:  black  -­‐>  8,  red  -­‐>  7 Then  filter  on  red  (and  assume  that  only  Ade  shoes  are  red).  We   want  to  have:   brands:  Ade  -­‐>  7,  Eric  -­‐>  0,      colors:  black  -­‐>  0,  red  -­‐>  7 However,  if  facet.mincount  ==  1,  we  get:     brands:  Ade  -­‐>  7                              colors:  red  -­‐>  7 43
  • 44. ‘Disappearing  facet  problem’  (cont’) We  want  to  say  'Eric  and  black  are  relevant  to  the  query,  but  has   zero  results  due  to  filtering'.   Solution:  For  each  facet  overlay  the  original  counts  with  the   values  of  the  new  count  if  present,  or  zero  otherwise.       <str  name="facet.field">{!ex=brand_fq,size_fq,taxonomy_fq,color_fq   key=all_brand_facets}brand_facet</str> <str  name="facet.field">{!ex=brand_fq,size_fq,taxonomy_fq,color_fq   key=all_color_facets}color_facet</str> This  means  we  get:   all_brand_facets:    Ade  -­‐>  10,  Eric  -­‐>  5,        brand_facet:  Ade  -­‐>  7 all_color_facets:  black  -­‐>  8,  red  -­‐>  7,          color_facet:  red  -­‐>  7 Merge  to  get:   brand_facet:  Ade  -­‐>  7,  Eric  -­‐>  0,    color_facet:  red  -­‐>  7,  black  -­‐>  0 44
  • 45. Color  Me  Happy Need  to  map  SKU's  color   to  a  simple  set  of  colors   for  faceting. Used  a  synonym  file  to   map  'urobilin'  to  'yellow' Use  a  stopwords  file  to   remove  unknown  colors It  works,  but  it's  brittle;   want  to  move  to  a  solution   based  on  color  analysis  of   swatches 45
  • 46. Hierarchical  taxonomy  faceting Our  taxonomy  is  hierarchical.  Solr  faceting  is  not  :( Encoded  taxonomy  hierarchy  using  'paths'    "Gilt::Men::Clothing::Pants" Our  search  API  converts  the  paths  into  a  tree  for  rendering   purposes. Works  but  (a)  feels  jenky  (b)  ordering  is  alphabetic Prefer  in  future  to  use  unique  taxonomy  key,  and  use  that  the   filter  through  an  ordered  tree  in  our  AP 46
  • 47. Hierarchical  size  faceting Size  ordering  is  non  trivial:  there  is  a  two-­‐level  hierarchy  &   ordering  is  difficult:   00,  0,  2,  4,  6,  8,  10,  12  rather  than  0,  00,  10,  12,  2,  4,  6,  8 XS,  S,  M,  L,  XL,  XXL  rather  than  M,  L,  S,  XL,  XS,  XXL End  up  encoding  a  size  ordinal  into  the  size  facet  label "Women's  Apparel::[[0000000003]]00" "Women's  Apparel::[[0000000004]]0" "Women's  Apparel::[[0000000005]]2" OK,  but  again,  feels  hacky.   “If  only  there  was  a  way  to  annotate  facets  with  meaningful   data” 47
  • 48. Solr Internals & Extensions (V) Solr request availability-component inventory function INDEX targeting filter de-dupe filter group facet sort search response: (lookId, skuId*)* as JSON Inventory Targeting Service Service 48
  • 49. Lesson  I:  Solr  Makes  You  Happy 49
  • 50. Lesson  II:  Solr  makes  the  business  Love  You   Gilt  Search  released  in  A/B  test  to  30%.   Initial  deployment  can  handle  >  180,000  RPM  (5:1  ratio  of   auto-­‐complete  to  search) Members  who's  visit  included  search  are  4x  more  likely  to   buy. Search  generates  2-­‐4%  incremental  revenue Business  pleads  with  us  to  end  A/B  test  and  release  to  100% We've  iterated  to  drive  more  traffic  (&  succeeded)  with  no  loss  of   conversion. From  subtle  to  suBtle. 50
  • 51. Lesson  III:  Excelsior Power  all  listings  (keyword,   category,  sale,  brand)  via   Solr Faceting  on  SKU  attributes   (e.g.  ‘age’,  ‘gender’) Solr  4 Hack  Debridement:   Improve  hierarchical   faceting  &  facet  ordering ‘Double  facet’  feels  wrong 51
  • 52. Thanks! Ade  Trenaman Tech  Lead,  Gilt Twitter:  adrian_trenaman LinkedIn:  adrian.trenaman atrenaman@gilt.com 52
  • 53. http://gilt.com/apacheconeu2012 $25 ... enjoy! 53