SlideShare a Scribd company logo
1 of 34
Download to read offline
1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Real-­‐Time	
  Analy=cs	
  with	
  Solr	
  
Yonik	
  Seeley	
  
10/15/2015	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
My	
  Background	
  
• Creator	
  of	
  Solr	
  
• Cloudera	
  Engineer	
  	
  
• LucidWorks	
  Co-­‐Founder	
  
• Lucene/Solr	
  commiMer,	
  PMC	
  member	
  
• Apache	
  SoQware	
  Founda=on	
  member	
  
• M.S.	
  in	
  Computer	
  Science,	
  Stanford	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Solr	
  for	
  Analy=cs	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Search	
  and	
  Hadoop	
  
•  Search	
  is	
  a	
  key	
  component	
  of	
  many	
  big	
  data	
  problems	
  
•  Many	
  analy=cs	
  use	
  cases	
  start	
  with	
  search	
  
•  Adding	
  analy=cs	
  to	
  full-­‐text	
  search	
  has	
  proven	
  to	
  be	
  more	
  effec=ve	
  than	
  
vice-­‐versa	
  
•  External	
  integra=ons	
  are	
  challenging	
  for	
  "real-­‐=me"	
  (i.e.	
  interac=ve)	
  
results	
  
	
  
	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Solr	
  in	
  Hadoop	
  
	
  
•  Top	
  Hadoop	
  vendors	
  who	
  have	
  integrated	
  search	
  have	
  all	
  chosen	
  Apache	
  Solr	
  
• For	
  example:	
  Cloudera,	
  Hortonworks,	
  MapR,	
  IBM,	
  ...	
  
•  Historical	
  focus	
  on	
  interac=ve	
  response	
  =mes	
  
•  Historical	
  focus	
  on	
  faceted	
  search	
  /	
  guided	
  naviga=on	
  
•  High	
  performance	
  indexes	
  
• originally	
  for	
  "full-­‐text"	
  search,	
  but	
  just	
  as	
  great	
  for	
  meta-­‐data!	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Inverted	
  Index	
  
aardvark	
  
hood	
  
red	
  
liMle	
  
riding	
  
robin	
  
women	
  
zoo	
  
LiMle	
  Red	
  Riding	
  Hood	
  
Robin	
  Hood	
  
LiMle	
  Women	
  
0	
   1	
  
0	
   2	
  
0	
  
0	
  
2	
  
1	
  
0	
  
1	
  
2	
  
Documents	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Columnar	
  Storage	
  (DocValues)	
  
a1	
  
a2	
  
a3	
  
a4	
  
b1	
  
b2	
  
b3	
  
b4	
  
c1	
  
c2	
  
c3	
  
c4	
  
a1	
   b1	
   c1	
  a3	
   b3	
   c3	
  
Stored	
  Fields	
  
(row	
  oriented)	
  
DocValues	
  
(column	
  oriented)	
  
a1	
   b1	
   c1	
  a1	
   b1	
   c1	
   ...	
   a1	
   b1	
   c1	
  a2	
   b2	
   c3	
   ...	
  
•  Fast	
  linear	
  scan	
  
• Read	
  only	
  the	
  data	
  you	
  need	
  
•  Fast	
  random	
  access	
  
• docid	
  -­‐>	
  value(s)	
  
•  High	
  degree	
  of	
  locality	
  
•  Compressed	
  
• prefix,	
  delta,	
  table,	
  gcd,	
  etc	
  
•  Mostly	
  "Off-­‐Heap"	
  
• Memory	
  mapped	
  from	
  index	
  
•  Row	
  vs	
  Column	
  configurable	
  per	
  field!	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Mul=-­‐Segment	
  Index	
  
_0.fnm	
  
_0.fdt	
  
_0.fdx	
  
[...]	
  
_0_1.del	
   _1.fnm	
  
_1.fdt	
  
_1.fdx	
  
[…]	
  
segments_3	
  
•  Each	
  segment	
  is	
  a	
  self-­‐contained	
  "index"	
  
•  Segments	
  are	
  never	
  changed	
  once	
  wriMen	
  
•  Per-­‐segment	
  caching	
  very	
  effec=ve	
  
•  Point-­‐in-­‐=me	
  searcher	
  
•  gejng	
  new	
  view	
  means	
  wri=ng	
  &	
  
including	
  addi=onal	
  segment	
  
•  turns	
  a	
  weakness	
  into	
  a	
  strength	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Faceted	
  Search	
  
•  Breaks	
  search	
  results	
  into	
  
buckets	
  
•  Generally	
  provides	
  bucket	
  
counts	
  
•  Allows	
  user	
  to	
  filter	
  /	
  "drill	
  
into"	
  results	
  
10	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
New	
  Facet	
  Module	
  
11	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Face=ng	
  
Search	
  
Sta=s=cs	
  
Facet	
  Module	
  Goals	
  
Search	
  
Joins	
  
Grouping	
  
Field	
  
Collapsing	
  
New	
  Facet	
  Module	
  
JSON	
  Facet	
  API	
  
•  Integra=on	
  
•  Performance	
  
•  Ease	
  of	
  use	
  
Highligh=ng	
  
Nested	
  
Documents	
  
Geosearch	
  
12	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Slice	
  and	
  Dice	
  with	
  Facet	
  commands	
  
Domain	
  
Facet	
  
Command	
  
A	
  
•  Domain:	
  A	
  set	
  of	
  documents	
  
•  Facet	
  command:	
  create	
  sub-­‐domains	
  /	
  "facet	
  buckets"	
  
Facet	
  
Command	
  
B	
  
Domain	
  
Domain	
  
Domain	
  
Domain	
  
Facet	
  
Command	
  
C	
  
Domain	
  
Domain	
  
Domain	
  
Domain	
  
Domain	
  
Domain	
  
13	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Facet	
  Func=ons	
  /	
  Sta=s=cs	
  
Domain	
  
Facet	
  
Command	
  
A	
  
Facet	
  
Command	
  
B	
  
Domain	
  
Domain	
  
Domain	
  
Domain	
  
Facet	
  
Command	
  
C	
  
Domain	
  
Domain	
  
Domain	
  
Domain	
  
Domain	
  
Domain	
  
sum(x)	
  
unique(y)	
  
sum(x)	
  
unique(y)	
  
sum(x)	
  
unique(y)	
  
min(units)	
  
avg(price)	
  
•  Facet	
  func=on	
  calculates	
  something	
  over	
  a	
  domain	
  
•  Can	
  sort	
  domains	
  by	
  facet	
  func=ons!	
  
14	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Facet	
  func=ons	
  
•  Calculate	
  (and	
  Sort)	
  by	
  things	
  other	
  than	
  document	
  count	
  
Func%on	
   Example	
   Descrip%on	
  
sum	
   sum(sales)	
   Summa=on	
  of	
  numeric	
  values	
  
avg	
   avg(popularity)	
   Average	
  of	
  numeric	
  values	
  	
  
sumsq	
   sumsq(rent)	
   Sum	
  of	
  squares	
  
min	
   min(salary)	
   Minimum	
  value	
  
max	
   max(mul(popularity,boost))	
   Maximum	
  value	
  
unique	
   unique(state)	
   Number	
  of	
  unique	
  values	
  (calc	
  dis=nct)	
  
hll	
   hll(state)	
   Number	
  of	
  unique	
  values	
  using	
  HyperLogLog	
  algorithm	
  
percen=le	
   percen=le(salary,	
  25,	
  50,	
  75)	
   Calculates	
  percen=les	
  via	
  t-­‐digest	
  algorithm	
  
topdocs	
   topdocs("another	
  query",5)	
   (in	
  progress)	
  Returns	
  the	
  top	
  documents	
  for	
  another	
  query	
  
15	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Simple	
  request	
  and	
  response	
  
curl	
  http://localhost:8983/solr/query	
  -­‐d	
  '	
  
q=widgets&	
  
json.facet=	
  
{	
  
	
  	
  x	
  :	
  "avg(price)"	
  ,	
  	
  
	
  	
  y	
  :	
  "unique(brand)"	
  
}	
  
'	
  
	
  
[…]	
  
"facets"	
  :	
  {	
  
	
  	
  "count"	
  :	
  314,	
  
	
  	
  "x"	
  :	
  102.5,	
  
	
  	
  "y"	
  :	
  28	
  
}	
  
root	
  domain	
  defined	
  by	
  docs	
  
matching	
  the	
  query	
   count	
  of	
  docs	
  in	
  the	
  bucket	
  
16	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
All-­‐JSON	
  request	
  example	
  
$	
  curl	
  http://localhost:8983/solr/query	
  -­‐d	
  '	
  
{	
  
	
  	
  query	
  :	
  "widgets", 	
   	
   	
   	
  //	
  our	
  JSON	
  parser	
  accepts	
  comments	
  (C-­‐style	
  too)	
  
	
  	
  filter	
  :	
  "inStock:true",	
   	
   	
  //	
  bare	
  strings	
  can	
  appear	
  unquoted	
  
	
  	
  offset:	
  0,	
  
	
  	
  limit:	
  5,	
  
	
  	
  sort:	
  "price	
  desc",	
  
	
  	
  fields:	
  ["id","name","price"],	
   	
  /*	
  could	
  have	
  also	
  used	
  "id,name,price"	
  */	
  
	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  x	
  :	
  "avg(price)",	
  
	
  	
  	
  	
  y	
  :	
  "unique(brand)"	
  
	
  	
  }	
  
}	
  
'	
  
17	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Bucke=ng	
  Facet	
  Types	
  
•  Terms	
  Facet	
  
• Creates	
  new	
  domains	
  (facet	
  buckets)	
  based	
  on	
  values	
  in	
  a	
  field	
  
•  Range	
  Facet	
  
• Creates	
  mul=ple	
  buckets	
  based	
  on	
  date	
  ranges	
  or	
  numeric	
  ranges	
  
•  Query	
  Facet	
  
• Creates	
  a	
  single	
  bucket	
  of	
  documents	
  that	
  match	
  any	
  given	
  query	
  
•  Unlimited	
  nes=ng:	
  Any	
  facet	
  types	
  may	
  have	
  any	
  number	
  of	
  sub-­‐facets	
  
18	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Terms	
  facet	
  example	
  
	
  json.facet={	
  
	
  	
  	
  shoes	
  :	
  {	
  
	
  	
  	
  	
  	
  type	
  :	
  terms,	
  	
  
	
  	
  	
  	
  	
  field	
  :	
  shoe_style,	
  
	
  	
  	
  	
  	
  sort	
  :	
  {x	
  :	
  desc},	
  
	
  	
  	
  	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  x	
  :	
  "avg(price)",	
  
	
  	
  	
  	
  	
  	
  	
  y	
  :	
  "unique(brand)"	
  
	
  	
  	
  	
  	
  }	
  
	
  	
  	
  }	
  
	
  }	
  
"facets":	
  {	
  
	
  	
  "count"	
  :	
  472,	
  
	
  	
  "shoes":	
  {	
  
	
  	
  	
  	
  "buckets"	
  :	
  [	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "val"	
  :	
  "Hiking",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "count"	
  :	
  34,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "x"	
  :	
  135.25,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "y"	
  :	
  17,	
  
	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "val"	
  :	
  "Running",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "count"	
  :	
  45,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "x"	
  :	
  110.75,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "y"	
  :	
  24,	
  
	
  	
  	
  	
  	
  	
  },	
  
	
  
Calculated	
  per-­‐bucket	
  
19	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Sub-­‐facet	
  example	
  
	
  json.facet={	
  
	
  	
  	
  shoes:{	
  
	
  	
  	
  	
  	
  type	
  :	
  terms,	
  
	
  	
  	
  	
  	
  field	
  :	
  shoe_style,	
  
	
  	
  	
  	
  	
  sort	
  :	
  {x	
  :	
  desc},	
  
	
  	
  	
  	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  x	
  :	
  "avg(price)",	
  
	
  	
  	
  	
  	
  	
  	
  y	
  :	
  "unique(brand)",	
  
	
  	
  	
  	
  	
  	
  	
  colors	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  type	
  :	
  terms,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  field	
  :	
  color	
  
	
  	
  	
  	
  	
  	
  	
  }	
  	
  
	
  	
  	
  	
  	
  }	
  
	
  	
  	
  }	
  
	
  }	
  
"facets":	
  {	
  
	
  	
  "count"	
  :	
  472,	
  
	
  	
  "shoes":	
  {	
  
	
  	
  	
  	
  "buckets"	
  :	
  [	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "val"	
  :	
  "Hiking",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "count"	
  :	
  34,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "x"	
  :	
  135.25,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "y"	
  :	
  17,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "colors"	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "buckets"	
  :	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  "val"	
  :	
  "brown",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "count"	
  :	
  12	
  },	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  "val"	
  :	
  "black",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "count"	
  :	
  10	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  },	
  […]	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ]	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  //	
  end	
  of	
  colors	
  sub-­‐facet	
  
	
  	
  	
  	
  	
  	
  },	
  //	
  end	
  of	
  Hiking	
  bucket	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "val"	
  :	
  "Running",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "count"	
  :	
  45,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "x"	
  :	
  110.75,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "y"	
  :	
  24,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "colors"	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "buckets"	
  :	
  […]	
  
20	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
BI	
  Usecase	
  
21	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Fantasy	
  ($1045)	
  
	
  
Top	
  Authors	
  
$423	
  George	
  R.R.	
  Mar=n	
  
$347	
  Brandon	
  Sanderson	
  
$155	
  JK	
  Rowling	
  
	
  
Top	
  Books	
  
$252	
  A	
  Game	
  of	
  Thrones	
  
$113	
  Emperor	
  of	
  Thorns	
  
$101	
  Nine	
  Princes	
  in	
  Amber	
  
$82	
  	
  	
  Steel	
  Heart	
  
Sci-­‐Fi	
  ($898)	
  
	
  
Top	
  Authors	
  
$321	
  Iain	
  M	
  Banks	
  
$218	
  Neal	
  Asher	
  
$155	
  Neal	
  Stephenson	
  
	
  
Top	
  Books	
  
$113	
  Gridlinked	
  
$101	
  Use	
  of	
  Weapons	
  
$93	
  	
  	
  Snow	
  Crash	
  
$82	
  	
  	
  The	
  Skinner	
  
Mystery	
  ($645)	
  
	
  
Top	
  Authors	
  
$191	
  James	
  PaMerson	
  
$145	
  Patricia	
  Cornwell	
  
$126	
  John	
  Grisham	
  
	
  
Top	
  Books	
  
$85	
  	
  One	
  for	
  the	
  Money	
  
$77	
  	
  Angels	
  &	
  Daemons	
  
$64	
  	
  ShuMer	
  Island	
  
$35	
  	
  The	
  Firm	
  
Filter	
  By	
  
State	
  
$852	
  NJ	
  	
  	
  (14	
  stores)	
  
$658	
  NY	
  	
  (11	
  stores)	
  
$421	
  CT	
  	
  	
  (8	
  stores)	
  
	
  
Chain	
  
$984	
  Amazoon	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (14	
  stores)	
  
$734	
  Houses&Royalty	
  (9	
  stores)	
  
$387	
  Books-­‐r-­‐us	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (7	
  stores)	
  
	
  
Store	
  
$108	
  Amazoon	
  Branchburg	
  
$93	
  	
  	
  Books-­‐r-­‐us	
  Bridgewater	
  
$87	
  	
  	
  H&R	
  NYC	
  
	
  
	
  
	
  
Number	
  of	
  Books	
  
	
  
Chain	
  
201K	
  Houses&Royalty	
  
183K	
  Amazoon	
  
98K	
  	
  	
  Books-­‐r-­‐us	
  
	
  
Store	
  
193K	
  H&R	
  NYC	
  
77K	
  	
  	
  Books-­‐r-­‐us	
  Bridgewater	
  
68K	
  	
  	
  Amazoon	
  Branchburg	
  
	
  
22	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
date_breakout	
  :	
  {	
  
	
  	
  	
  type	
  :	
  range,	
  
	
  	
  	
  field	
  :	
  sale_date,	
  
	
  	
  	
  start	
  :	
  ...,	
  
	
  	
  	
  end	
  :	
  ...,	
  
	
  	
  	
  gap	
  :	
  "+1MONTH”,	
  
	
  
	
  	
  facet	
  :	
  {	
  	
  
	
  	
  	
  	
  	
  	
  	
  top_genres	
  :	
  {	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  type	
  :	
  terms	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  field	
  :	
  genre,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  sort	
  :	
  "revenue	
  desc",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  limit	
  :	
  4,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  revenue	
  :	
  "sum(sales)"	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  by_chain:	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  type	
  :	
  terms,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  field	
  :	
  chain,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  revenue	
  :	
  "sum(sales)"	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  	
  	
  }	
  
Implementa=on	
  
Range	
  Facet	
  
(sale_date)	
  
Terms	
  Facet	
  
(genre)	
  
Terms	
  Facet	
  
(chain)	
  
sum(sales)	
  
sum(sales)	
  
23	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Fantasy	
  ($1045)	
  
	
  
Top	
  Authors	
  
$423	
  George	
  R.R.	
  Mar=n	
  
$347	
  Brandon	
  Sanderson	
  
$155	
  JK	
  Rowling	
  
	
  
Top	
  Books	
  
$252	
  A	
  Game	
  of	
  Thrones	
  
$113	
  Emperor	
  of	
  Thorns	
  
$101	
  Nine	
  Princes	
  in	
  Amber	
  
$82	
  	
  	
  Steel	
  Heart	
  
Sci-­‐Fi	
  ($898)	
  
	
  
Top	
  Authors	
  
$321	
  Iain	
  M	
  Banks	
  
$218	
  Neal	
  Asher	
  
$155	
  Neal	
  Stephenson	
  
	
  
Top	
  Books	
  
$113	
  Gridlinked	
  
$101	
  Use	
  of	
  Weapons	
  
$93	
  	
  	
  Snow	
  Crash	
  
$82	
  	
  	
  The	
  Skinner	
  
Mystery	
  ($645)	
  
	
  
Top	
  Authors	
  
$191	
  James	
  PaMerson	
  
$145	
  Patricia	
  Cornwell	
  
$126	
  John	
  Grisham	
  
	
  
Top	
  Books	
  
$85	
  	
  One	
  for	
  the	
  Money	
  
$77	
  	
  Angels	
  &	
  Daemons	
  
$64	
  	
  ShuMer	
  Island	
  
$35	
  	
  The	
  Firm	
  
top_genres:{	
  
	
  	
  	
  	
  type	
  :	
  terms,	
  
	
  	
  	
  	
  field	
  :	
  genre,	
  
	
  	
  	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  rev	
  :	
  "sum(sales)",	
  
	
  	
  	
  	
  	
  	
  	
  top_authors:{	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  type	
  :	
  terms,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  field	
  :	
  author,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  sort	
  	
  :"rev	
  desc",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  limit	
  :	
  3,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  facet	
  :	
  {	
  rev	
  :	
  "sum(sales)"	
  }	
  
	
  	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  	
  top_books:{	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  type	
  :	
  terms,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  field	
  :	
  =tle,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  sort	
  	
  :	
  "rev	
  desc",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  limit	
  :	
  4,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  facet	
  :	
  {	
  rev	
  :	
  "sum(sales)"	
  }	
  
	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  }	
  
Implementa=on	
  (con=nued)	
  
Terms	
  Facet	
  
(genre)	
  
Terms	
  Facet	
  
(author)	
  
Terms	
  Facet	
  
(=tle)	
  
sum(sales)	
  
sum(sales)	
  
sum(sales)	
  
24	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Performance	
  
25	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
facet=true&stats=true	
  
&stats.field={!tag=stat1+mean=true}field2	
  
&facet.pivot={!stats=stat1}field1	
  
&f.field1.limit=10	
  
json.facet={	
  
	
  	
  	
  	
  f	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  type	
  :	
  terms,	
  
	
  	
  	
  	
  	
  	
  	
  	
  field	
  :	
  field1,	
  
	
  	
  	
  	
  	
  	
  	
  	
  facet:{	
  mean:"avg(field2)"	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
Tested	
  Facet	
  Request	
  
Legacy
(stats component & pivot facets)
JSON Facet API
(New Facet Module)
26	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
27	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
28	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Nested	
  Documents	
  
29	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Indexing	
  Nested	
  Documents	
  
id	
  :	
  book1	
  
=tle	
  :	
  The	
  Way	
  of	
  Kings	
  
author	
  :	
  Brandon	
  Sanderson	
  
id	
  :	
  book1_review1	
  
review_author	
  :	
  Yonik	
  
stars	
  :	
  5	
  
comment	
  :	
  A	
  great	
  start	
  to	
  what	
  ...	
  
id	
  :	
  book1_review2	
  
review_author	
  :	
  Dan	
  
stars	
  :	
  3	
  
comment	
  :	
  This	
  book	
  was	
  too	
  long	
  
id	
  :	
  book2	
  
=tle	
  :	
  Snow	
  Crash	
  
author	
  :	
  Neal	
  Stephenson	
  
id	
  :	
  book2_review1	
  
review_author	
  :	
  Yonik	
  
stars	
  :	
  5	
  
comment	
  :	
  Ahead	
  of	
  it's	
  =me	
  ...	
  
book1_review1	
  
book1_review2	
  
book1	
  
book2_review1	
  
book2	
  
Lucene	
  index	
  view	
  (flat)	
  
•  Group	
  indexed	
  as	
  a	
  "block"	
  
•  atomic	
  
•  internal	
  document	
  ids	
  con=guous	
  
•  enables	
  quick	
  and	
  inexpensive	
  joins	
  
30	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Indexing	
  Nested	
  Documents	
  (JSON	
  format)	
  
{	
  	
  id	
  :	
  book1,	
  	
  type	
  :	
  book,	
  	
  =tle	
  :	
  "The	
  Way	
  of	
  Kings",	
  	
  author	
  :	
  "Brandon	
  Sanderson",	
  
	
  	
  	
  genre	
  :	
  fantasy,	
  	
  pubyear	
  :	
  2010,	
  	
  publisher	
  :	
  Tor,	
  
	
  	
  _childDocuments_	
  :	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  	
  id	
  :	
  book1_review1,	
  	
  type	
  :	
  review,	
  	
  review_dt:"2015-­‐01-­‐03T14:30:00Z",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  stars	
  :	
  5,	
  	
  review_author	
  :	
  Yonik,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  comment	
  :	
  "A	
  great	
  start	
  to	
  what	
  looks	
  like	
  an	
  epic	
  series!"	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  ,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  	
  id	
  :	
  book1_review2,	
  	
  type	
  :	
  review,	
  	
  review_dt:"2015-­‐03-­‐15T12:00:00Z",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  stars	
  :	
  3,	
  review_author	
  :	
  Dan,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  comment	
  :	
  "This	
  book	
  was	
  too	
  long."	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  ]	
  
	
  }	
  
31	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Block	
  Join	
  Queries	
  
Find	
  reviews	
  men=oning	
  "epic",	
  limi=ng	
  to	
  reviews	
  for	
  books	
  published	
  by	
  Tor	
  
	
  
	
  
	
  
	
  
	
  
Find	
  books	
  published	
  by	
  Tor	
  with	
  a	
  review	
  men=oning	
  "epic"	
  
	
  q=comment:epic	
  
	
  fq={!child	
  of="type:book"}publisher:Tor	
  
	
  sort=review_dt	
  desc	
  
	
  q=publisher:Tor	
  
	
  fq={!parent	
  which="type:book"}comment:epic	
  
	
  sort=pubyear	
  desc	
  
32	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Block	
  Join	
  Face=ng	
  (child	
  to	
  parent)	
  
	
  
•  Find	
  the	
  number	
  of	
  books	
  I	
  (Yonik)	
  reviewed,	
  broken	
  out	
  by	
  Genre	
  
q=review_author:Yonik&	
  
json.facet={	
  
	
  	
  	
  	
  genres	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  type	
  :	
  terms,	
  
	
  	
  	
  	
  	
  	
  	
  	
  field	
  :	
  genre,	
  
	
  	
  	
  	
  	
  	
  	
  	
  domain	
  :	
  {	
  blockParent	
  :	
  "type:book"	
  }	
  	
  
	
  	
  	
  	
  }	
  
}	
  
33	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Block	
  Join	
  Face=ng	
  (parent	
  to	
  child)	
  
	
  
•  Find	
  the	
  top	
  reviewers	
  for	
  sci-­‐fi	
  and	
  fantasy	
  books	
  
q=genre:(sci-­‐fi	
  OR	
  fantasy)&	
  
json.facet={	
  
	
  	
  	
  	
  top_reviewers	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  type:	
  terms,	
  
	
  	
  	
  	
  	
  	
  	
  	
  field:	
  review_author,	
  
	
  	
  	
  	
  	
  	
  	
  	
  domain:	
  {	
  blockChildren	
  :	
  "type:book"	
  }	
  	
  
	
  	
  	
  	
  }	
  
}	
  
34	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  you	
  
yonik@cloudera.com	
  

More Related Content

What's hot

OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
Lucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 

What's hot (20)

Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
 
Why is data independence (still) so important? Optiq and Apache Drill.
Why is data independence (still) so important? Optiq and Apache Drill.Why is data independence (still) so important? Optiq and Apache Drill.
Why is data independence (still) so important? Optiq and Apache Drill.
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
 

Viewers also liked

Webinar Google Analytics Real Time MA 22-11-11
Webinar Google Analytics Real Time MA 22-11-11Webinar Google Analytics Real Time MA 22-11-11
Webinar Google Analytics Real Time MA 22-11-11
Watt
 

Viewers also liked (20)

Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 
Solr for Analytics
Solr for AnalyticsSolr for Analytics
Solr for Analytics
 
Large Scale ETL for Hadoop and Cloudera Search using Morphlines
Large Scale ETL for Hadoop and Cloudera Search using MorphlinesLarge Scale ETL for Hadoop and Cloudera Search using Morphlines
Large Scale ETL for Hadoop and Cloudera Search using Morphlines
 
Webinar Google Analytics Real Time MA 22-11-11
Webinar Google Analytics Real Time MA 22-11-11Webinar Google Analytics Real Time MA 22-11-11
Webinar Google Analytics Real Time MA 22-11-11
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Cloudera Search Webinar: Big Data Search, Bigger Insights
Cloudera Search Webinar: Big Data Search, Bigger InsightsCloudera Search Webinar: Big Data Search, Bigger Insights
Cloudera Search Webinar: Big Data Search, Bigger Insights
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Build a Great Application in Minutes!: Presented by Stefan Olafsson, Twigkit
Build a Great Application in Minutes!: Presented by Stefan Olafsson, TwigkitBuild a Great Application in Minutes!: Presented by Stefan Olafsson, Twigkit
Build a Great Application in Minutes!: Presented by Stefan Olafsson, Twigkit
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Webinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrWebinar: Natural Language Search with Solr
Webinar: Natural Language Search with Solr
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
Nested Types in Impala
Nested Types in ImpalaNested Types in Impala
Nested Types in Impala
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 

Similar to Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera

Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Lucidworks
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
Data Works MD
 

Similar to Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera (20)

Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboard
 
Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with Terraform
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
 
Building analytics applications with streaming expressions in apache solr
Building analytics applications with streaming expressions in apache solrBuilding analytics applications with streaming expressions in apache solr
Building analytics applications with streaming expressions in apache solr
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
IE9에서 HTML5 개발하기
IE9에서 HTML5 개발하기IE9에서 HTML5 개발하기
IE9에서 HTML5 개발하기
 
managing big data
managing big datamanaging big data
managing big data
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 

More from Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera

  • 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   Real-­‐Time  Analy=cs  with  Solr   Yonik  Seeley   10/15/2015  
  • 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   My  Background   • Creator  of  Solr   • Cloudera  Engineer     • LucidWorks  Co-­‐Founder   • Lucene/Solr  commiMer,  PMC  member   • Apache  SoQware  Founda=on  member   • M.S.  in  Computer  Science,  Stanford  
  • 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   Solr  for  Analy=cs  
  • 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   Search  and  Hadoop   •  Search  is  a  key  component  of  many  big  data  problems   •  Many  analy=cs  use  cases  start  with  search   •  Adding  analy=cs  to  full-­‐text  search  has  proven  to  be  more  effec=ve  than   vice-­‐versa   •  External  integra=ons  are  challenging  for  "real-­‐=me"  (i.e.  interac=ve)   results      
  • 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   Solr  in  Hadoop     •  Top  Hadoop  vendors  who  have  integrated  search  have  all  chosen  Apache  Solr   • For  example:  Cloudera,  Hortonworks,  MapR,  IBM,  ...   •  Historical  focus  on  interac=ve  response  =mes   •  Historical  focus  on  faceted  search  /  guided  naviga=on   •  High  performance  indexes   • originally  for  "full-­‐text"  search,  but  just  as  great  for  meta-­‐data!  
  • 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   Inverted  Index   aardvark   hood   red   liMle   riding   robin   women   zoo   LiMle  Red  Riding  Hood   Robin  Hood   LiMle  Women   0   1   0   2   0   0   2   1   0   1   2   Documents  
  • 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   Columnar  Storage  (DocValues)   a1   a2   a3   a4   b1   b2   b3   b4   c1   c2   c3   c4   a1   b1   c1  a3   b3   c3   Stored  Fields   (row  oriented)   DocValues   (column  oriented)   a1   b1   c1  a1   b1   c1   ...   a1   b1   c1  a2   b2   c3   ...   •  Fast  linear  scan   • Read  only  the  data  you  need   •  Fast  random  access   • docid  -­‐>  value(s)   •  High  degree  of  locality   •  Compressed   • prefix,  delta,  table,  gcd,  etc   •  Mostly  "Off-­‐Heap"   • Memory  mapped  from  index   •  Row  vs  Column  configurable  per  field!  
  • 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.   Mul=-­‐Segment  Index   _0.fnm   _0.fdt   _0.fdx   [...]   _0_1.del   _1.fnm   _1.fdt   _1.fdx   […]   segments_3   •  Each  segment  is  a  self-­‐contained  "index"   •  Segments  are  never  changed  once  wriMen   •  Per-­‐segment  caching  very  effec=ve   •  Point-­‐in-­‐=me  searcher   •  gejng  new  view  means  wri=ng  &   including  addi=onal  segment   •  turns  a  weakness  into  a  strength  
  • 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.   Faceted  Search   •  Breaks  search  results  into   buckets   •  Generally  provides  bucket   counts   •  Allows  user  to  filter  /  "drill   into"  results  
  • 10. 10  ©  Cloudera,  Inc.  All  rights  reserved.   New  Facet  Module  
  • 11. 11  ©  Cloudera,  Inc.  All  rights  reserved.   Face=ng   Search   Sta=s=cs   Facet  Module  Goals   Search   Joins   Grouping   Field   Collapsing   New  Facet  Module   JSON  Facet  API   •  Integra=on   •  Performance   •  Ease  of  use   Highligh=ng   Nested   Documents   Geosearch  
  • 12. 12  ©  Cloudera,  Inc.  All  rights  reserved.   Slice  and  Dice  with  Facet  commands   Domain   Facet   Command   A   •  Domain:  A  set  of  documents   •  Facet  command:  create  sub-­‐domains  /  "facet  buckets"   Facet   Command   B   Domain   Domain   Domain   Domain   Facet   Command   C   Domain   Domain   Domain   Domain   Domain   Domain  
  • 13. 13  ©  Cloudera,  Inc.  All  rights  reserved.   Facet  Func=ons  /  Sta=s=cs   Domain   Facet   Command   A   Facet   Command   B   Domain   Domain   Domain   Domain   Facet   Command   C   Domain   Domain   Domain   Domain   Domain   Domain   sum(x)   unique(y)   sum(x)   unique(y)   sum(x)   unique(y)   min(units)   avg(price)   •  Facet  func=on  calculates  something  over  a  domain   •  Can  sort  domains  by  facet  func=ons!  
  • 14. 14  ©  Cloudera,  Inc.  All  rights  reserved.   Facet  func=ons   •  Calculate  (and  Sort)  by  things  other  than  document  count   Func%on   Example   Descrip%on   sum   sum(sales)   Summa=on  of  numeric  values   avg   avg(popularity)   Average  of  numeric  values     sumsq   sumsq(rent)   Sum  of  squares   min   min(salary)   Minimum  value   max   max(mul(popularity,boost))   Maximum  value   unique   unique(state)   Number  of  unique  values  (calc  dis=nct)   hll   hll(state)   Number  of  unique  values  using  HyperLogLog  algorithm   percen=le   percen=le(salary,  25,  50,  75)   Calculates  percen=les  via  t-­‐digest  algorithm   topdocs   topdocs("another  query",5)   (in  progress)  Returns  the  top  documents  for  another  query  
  • 15. 15  ©  Cloudera,  Inc.  All  rights  reserved.   Simple  request  and  response   curl  http://localhost:8983/solr/query  -­‐d  '   q=widgets&   json.facet=   {      x  :  "avg(price)"  ,        y  :  "unique(brand)"   }   '     […]   "facets"  :  {      "count"  :  314,      "x"  :  102.5,      "y"  :  28   }   root  domain  defined  by  docs   matching  the  query   count  of  docs  in  the  bucket  
  • 16. 16  ©  Cloudera,  Inc.  All  rights  reserved.   All-­‐JSON  request  example   $  curl  http://localhost:8983/solr/query  -­‐d  '   {      query  :  "widgets",        //  our  JSON  parser  accepts  comments  (C-­‐style  too)      filter  :  "inStock:true",      //  bare  strings  can  appear  unquoted      offset:  0,      limit:  5,      sort:  "price  desc",      fields:  ["id","name","price"],    /*  could  have  also  used  "id,name,price"  */      facet  :  {          x  :  "avg(price)",          y  :  "unique(brand)"      }   }   '  
  • 17. 17  ©  Cloudera,  Inc.  All  rights  reserved.   Bucke=ng  Facet  Types   •  Terms  Facet   • Creates  new  domains  (facet  buckets)  based  on  values  in  a  field   •  Range  Facet   • Creates  mul=ple  buckets  based  on  date  ranges  or  numeric  ranges   •  Query  Facet   • Creates  a  single  bucket  of  documents  that  match  any  given  query   •  Unlimited  nes=ng:  Any  facet  types  may  have  any  number  of  sub-­‐facets  
  • 18. 18  ©  Cloudera,  Inc.  All  rights  reserved.   Terms  facet  example    json.facet={        shoes  :  {            type  :  terms,              field  :  shoe_style,            sort  :  {x  :  desc},            facet  :  {                x  :  "avg(price)",                y  :  "unique(brand)"            }        }    }   "facets":  {      "count"  :  472,      "shoes":  {          "buckets"  :  [              {                  "val"  :  "Hiking",                  "count"  :  34,                  "x"  :  135.25,                  "y"  :  17,              },              {                  "val"  :  "Running",                  "count"  :  45,                  "x"  :  110.75,                  "y"  :  24,              },     Calculated  per-­‐bucket  
  • 19. 19  ©  Cloudera,  Inc.  All  rights  reserved.   Sub-­‐facet  example    json.facet={        shoes:{            type  :  terms,            field  :  shoe_style,            sort  :  {x  :  desc},            facet  :  {                x  :  "avg(price)",                y  :  "unique(brand)",                colors  :  {                    type  :  terms,                    field  :  color                }              }        }    }   "facets":  {      "count"  :  472,      "shoes":  {          "buckets"  :  [              {                  "val"  :  "Hiking",                  "count"  :  34,                  "x"  :  135.25,                  "y"  :  17,                  "colors"  :  {                      "buckets"  :  [                          {  "val"  :  "brown",                              "count"  :  12  },                          {  "val"  :  "black",                              "count"  :  10                          },  […]                      ]                  }  //  end  of  colors  sub-­‐facet              },  //  end  of  Hiking  bucket              {                  "val"  :  "Running",                  "count"  :  45,                  "x"  :  110.75,                  "y"  :  24,                  "colors"  :  {                      "buckets"  :  […]  
  • 20. 20  ©  Cloudera,  Inc.  All  rights  reserved.   BI  Usecase  
  • 21. 21  ©  Cloudera,  Inc.  All  rights  reserved.   Fantasy  ($1045)     Top  Authors   $423  George  R.R.  Mar=n   $347  Brandon  Sanderson   $155  JK  Rowling     Top  Books   $252  A  Game  of  Thrones   $113  Emperor  of  Thorns   $101  Nine  Princes  in  Amber   $82      Steel  Heart   Sci-­‐Fi  ($898)     Top  Authors   $321  Iain  M  Banks   $218  Neal  Asher   $155  Neal  Stephenson     Top  Books   $113  Gridlinked   $101  Use  of  Weapons   $93      Snow  Crash   $82      The  Skinner   Mystery  ($645)     Top  Authors   $191  James  PaMerson   $145  Patricia  Cornwell   $126  John  Grisham     Top  Books   $85    One  for  the  Money   $77    Angels  &  Daemons   $64    ShuMer  Island   $35    The  Firm   Filter  By   State   $852  NJ      (14  stores)   $658  NY    (11  stores)   $421  CT      (8  stores)     Chain   $984  Amazoon                        (14  stores)   $734  Houses&Royalty  (9  stores)   $387  Books-­‐r-­‐us                      (7  stores)     Store   $108  Amazoon  Branchburg   $93      Books-­‐r-­‐us  Bridgewater   $87      H&R  NYC         Number  of  Books     Chain   201K  Houses&Royalty   183K  Amazoon   98K      Books-­‐r-­‐us     Store   193K  H&R  NYC   77K      Books-­‐r-­‐us  Bridgewater   68K      Amazoon  Branchburg    
  • 22. 22  ©  Cloudera,  Inc.  All  rights  reserved.   date_breakout  :  {        type  :  range,        field  :  sale_date,        start  :  ...,        end  :  ...,        gap  :  "+1MONTH”,        facet  :  {                  top_genres  :  {                        type  :  terms                      field  :  genre,                      sort  :  "revenue  desc",                      limit  :  4,                      facet  :  {                            revenue  :  "sum(sales)"                      }                },              by_chain:  {                      type  :  terms,                      field  :  chain,                      facet  :  {                            revenue  :  "sum(sales)"                      }              }   Implementa=on   Range  Facet   (sale_date)   Terms  Facet   (genre)   Terms  Facet   (chain)   sum(sales)   sum(sales)  
  • 23. 23  ©  Cloudera,  Inc.  All  rights  reserved.   Fantasy  ($1045)     Top  Authors   $423  George  R.R.  Mar=n   $347  Brandon  Sanderson   $155  JK  Rowling     Top  Books   $252  A  Game  of  Thrones   $113  Emperor  of  Thorns   $101  Nine  Princes  in  Amber   $82      Steel  Heart   Sci-­‐Fi  ($898)     Top  Authors   $321  Iain  M  Banks   $218  Neal  Asher   $155  Neal  Stephenson     Top  Books   $113  Gridlinked   $101  Use  of  Weapons   $93      Snow  Crash   $82      The  Skinner   Mystery  ($645)     Top  Authors   $191  James  PaMerson   $145  Patricia  Cornwell   $126  John  Grisham     Top  Books   $85    One  for  the  Money   $77    Angels  &  Daemons   $64    ShuMer  Island   $35    The  Firm   top_genres:{          type  :  terms,          field  :  genre,          facet  :  {                rev  :  "sum(sales)",                top_authors:{                    type  :  terms,                    field  :  author,                    sort    :"rev  desc",                    limit  :  3,                    facet  :  {  rev  :  "sum(sales)"  }                },                top_books:{                    type  :  terms,                    field  :  =tle,                    sort    :  "rev  desc",                    limit  :  4,                    facet  :  {  rev  :  "sum(sales)"  }                }        }   Implementa=on  (con=nued)   Terms  Facet   (genre)   Terms  Facet   (author)   Terms  Facet   (=tle)   sum(sales)   sum(sales)   sum(sales)  
  • 24. 24  ©  Cloudera,  Inc.  All  rights  reserved.   Performance  
  • 25. 25  ©  Cloudera,  Inc.  All  rights  reserved.   facet=true&stats=true   &stats.field={!tag=stat1+mean=true}field2   &facet.pivot={!stats=stat1}field1   &f.field1.limit=10   json.facet={          f  :  {                  type  :  terms,                  field  :  field1,                  facet:{  mean:"avg(field2)"  }          }   }   Tested  Facet  Request   Legacy (stats component & pivot facets) JSON Facet API (New Facet Module)
  • 26. 26  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 27. 27  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 28. 28  ©  Cloudera,  Inc.  All  rights  reserved.   Nested  Documents  
  • 29. 29  ©  Cloudera,  Inc.  All  rights  reserved.   Indexing  Nested  Documents   id  :  book1   =tle  :  The  Way  of  Kings   author  :  Brandon  Sanderson   id  :  book1_review1   review_author  :  Yonik   stars  :  5   comment  :  A  great  start  to  what  ...   id  :  book1_review2   review_author  :  Dan   stars  :  3   comment  :  This  book  was  too  long   id  :  book2   =tle  :  Snow  Crash   author  :  Neal  Stephenson   id  :  book2_review1   review_author  :  Yonik   stars  :  5   comment  :  Ahead  of  it's  =me  ...   book1_review1   book1_review2   book1   book2_review1   book2   Lucene  index  view  (flat)   •  Group  indexed  as  a  "block"   •  atomic   •  internal  document  ids  con=guous   •  enables  quick  and  inexpensive  joins  
  • 30. 30  ©  Cloudera,  Inc.  All  rights  reserved.   Indexing  Nested  Documents  (JSON  format)   {    id  :  book1,    type  :  book,    =tle  :  "The  Way  of  Kings",    author  :  "Brandon  Sanderson",        genre  :  fantasy,    pubyear  :  2010,    publisher  :  Tor,      _childDocuments_  :  [                    {    id  :  book1_review1,    type  :  review,    review_dt:"2015-­‐01-­‐03T14:30:00Z",                          stars  :  5,    review_author  :  Yonik,                          comment  :  "A  great  start  to  what  looks  like  an  epic  series!"                    }  ,                    {    id  :  book1_review2,    type  :  review,    review_dt:"2015-­‐03-­‐15T12:00:00Z",                          stars  :  3,  review_author  :  Dan,                          comment  :  "This  book  was  too  long."                    }          ]    }  
  • 31. 31  ©  Cloudera,  Inc.  All  rights  reserved.   Block  Join  Queries   Find  reviews  men=oning  "epic",  limi=ng  to  reviews  for  books  published  by  Tor             Find  books  published  by  Tor  with  a  review  men=oning  "epic"    q=comment:epic    fq={!child  of="type:book"}publisher:Tor    sort=review_dt  desc    q=publisher:Tor    fq={!parent  which="type:book"}comment:epic    sort=pubyear  desc  
  • 32. 32  ©  Cloudera,  Inc.  All  rights  reserved.   Block  Join  Face=ng  (child  to  parent)     •  Find  the  number  of  books  I  (Yonik)  reviewed,  broken  out  by  Genre   q=review_author:Yonik&   json.facet={          genres  :  {                  type  :  terms,                  field  :  genre,                  domain  :  {  blockParent  :  "type:book"  }            }   }  
  • 33. 33  ©  Cloudera,  Inc.  All  rights  reserved.   Block  Join  Face=ng  (parent  to  child)     •  Find  the  top  reviewers  for  sci-­‐fi  and  fantasy  books   q=genre:(sci-­‐fi  OR  fantasy)&   json.facet={          top_reviewers  :  {                  type:  terms,                  field:  review_author,                  domain:  {  blockChildren  :  "type:book"  }            }   }  
  • 34. 34  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you   yonik@cloudera.com