SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Trey Grainger
Director of Engineering, Search & Recommendations
2015.10.15
Trey Grainger
Director of Engineering, Search & Recommendations
•  Joined CareerBuilder in 2007 as a Software Engineer
•  MBA, Management of Technology – Georgia Tech
•  BA, Computer Science, Business, & Philosophy – Furman University
•  Mining Massive Datasets (in progress) - Stanford University
Fun outside of CB:
•  Co-author of Solr in Action, plus a handful of research papers
•  Frequent conference speaker
•  Founder of Celiaccess.com, the gluten-free search engine
•  Lucene/Solr contributor
About	
  Me	
  
Agenda
•  Introduc/on	
  
•  Defining	
  the	
  problem	
  –	
  the	
  need	
  for	
  Seman/c	
  Search	
  
•  Building	
  an	
  Intent	
  Engine	
  
	
   	
  -­‐	
  Type-­‐ahead	
  predic/on	
  
	
   	
  -­‐	
  Spelling	
  Correc/on	
  
	
   	
  -­‐	
  En/ty	
  /	
  En/ty-­‐type	
  Resolu/on	
  
	
   	
  -­‐	
  Seman/c	
  Query	
  Parsing	
  
	
   	
  -­‐	
  Query	
  Augmenta/on	
  
	
   	
  -­‐	
  The	
  Knowledge	
  Graph	
  
•  Conclusion	
  
Knowledge	
  
Graph	
  
At CareerBuilder, Solr Powers...At CareerBuilder, Solr Powers...
Search	
  by	
  the	
  Numbers	
  
5	
  
Powering	
  50+	
  Search	
  Experiences	
  Including:	
  
100	
  million	
  +	
  
Searches	
  per	
  day	
  
30+	
  
SoRware	
  Developers,	
  Data	
  
Scien/sts	
  +	
  Analysts	
  
	
  	
  500+	
  
Search	
  Servers	
  
1,5	
  billion	
  +	
  
Documents	
  indexed	
  and	
  
searchable	
  
1	
  Global	
  Search	
  	
  
Technology	
  plaUorm	
  
...and many more
What’s	
  the	
  problem	
  we’re	
  trying	
  to	
  solve	
  today?	
  
User’s	
  Query:	
  	
  	
  	
  
machine	
  learning	
  research	
  and	
  development	
  Portland,	
  OR	
  soRware	
  	
  
engineer	
  AND	
  hadoop,	
  java	
  
	
  
	
  
	
  
Tradi>onal	
  Query	
  Parsing:	
  	
  	
  	
  	
  
(machine	
  AND	
  learning	
  AND	
  research	
  AND	
  development	
  AND	
  portland)	
  	
  
	
  OR	
  (soRware	
  AND	
  engineer	
  AND	
  hadoop	
  AND	
  java)	
  
	
  
Seman>c	
  Query	
  Parsing:	
  
"machine	
  learning"	
  AND	
  	
  "research	
  and	
  development"	
  AND	
  	
  "Portland,	
  OR"	
  	
  
AND	
  	
  "soRware	
  engineer"	
  AND	
  hadoop	
  AND	
  java	
  
	
  
Seman>cally	
  Expanded	
  Query:	
  
("machine	
  learning"^10	
  OR	
  	
  "data	
  scien/st"	
  OR	
  "data	
  mining"	
  OR	
  "ar/ficial	
  intelligence")	
  
AND	
  ("research	
  and	
  development"^10	
  OR	
  	
  "r&d")	
  AND	
  	
  
AND	
  ("Portland,	
  OR"^10	
  OR	
  	
  "Portland,	
  Oregon"	
  OR	
  {!geofilt	
  pt=45.512,-­‐122.676	
  d=50	
  sfield=geo})	
  	
  
AND	
  ("soRware	
  engineer"^10	
  OR	
  "soRware	
  developer")	
  	
  
AND	
  (hadoop^10	
  OR	
  	
  "big	
  data"	
  OR	
  hbase	
  OR	
  hive)	
  AND	
  (java^10	
  OR	
  j2ee)	
  
But	
  we	
  also	
  really	
  want	
  “things”,	
  not	
  “strings”…	
  
Job	
  Level	
   Job	
  /tle	
   Company	
  
Job	
  Title	
   Company	
   School	
  +	
  Degree	
  
Type-­‐ahead	
  
Predic/on	
  
Knowledge	
  Graph	
  and	
  Intent	
  Engine	
  
Search	
  Box	
  
Seman/c	
  Query	
  
Parsing	
  
Intent Engine
Spelling	
  Correc/on	
  
En/ty	
  /	
  En/ty	
  
Type	
  Resolu/on	
  
Machine-­‐learned	
  
Ranking	
  
Relevancy Engine (“re-expressing intent”)
User	
  Feedback	
  	
  
(Clarifying	
  Intent)	
  
Query	
  Re-­‐wri/ng	
   Search	
  Results	
  
Query	
  
Augmenta/on	
  
Knowledge	
  
Graph	
  
Type-­‐ahead	
  Predic>ons	
  
Seman/c	
  Autocomplete	
  
	
  
•  Shows	
  top	
  terms	
  for	
  any	
  search	
  
	
  
•  Breaks	
  out	
  job	
  /tles,	
  skills,	
  companies,	
  
related	
  keywords,	
  and	
  other	
  
categories	
  
	
  
•  Understands	
  abbrevia/ons,	
  alternate	
  
forms,	
  misspellings	
  
	
  
•  Supports	
  full	
  Boolean	
  syntax	
  and	
  
mul/-­‐term	
  autocomplete	
  
	
  
•  Enables	
  fielded	
  search	
  on	
  en//es,	
  not	
  
just	
  keywords	
  
Spelling	
  Correc>on*	
  	
  
*Google	
  “Solr	
  Spell	
  Check	
  Component”
En>ty	
  /	
  En>ty-­‐type	
  
Resolu>on	
  
Differen>a>ng	
  related	
  terms	
  
Synonyms:	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  cpa	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  	
  	
  cer/fied	
  public	
  accountant	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  rn	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  	
  	
  registered	
  nurse	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  r.n.	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  	
  	
  registered	
  nurse	
  
	
  
Ambiguous	
  Terms*:	
  	
  	
  	
  	
  driver	
  	
  	
  	
  =>	
  	
  	
  driver	
  (trucking)	
  	
  	
  ~80%	
  likelihood	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  driver	
  	
  	
  	
  =>	
  	
  	
  driver	
  (so5ware)	
  	
  ~20%	
  likelihood	
  
	
  
Related	
  Terms:	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  r.n.	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =>	
  	
  	
  nursing,	
  bsn	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  hadoop	
  	
  =>	
  	
  	
  mapreduce,	
  hive,	
  pig	
  
	
  
	
  
*differen9ated	
  based	
  upon	
  user	
  and	
  query	
  context	
  
	
  
Building	
  a	
  Taxonomy	
  of	
  En>>es	
  
Many ways to generate this:
•  Topic Modelling
•  Clustering of documents
•  Statistical Analysis of interesting phrases
•  Buy a dictionary (often doesn’t work for
domain-specific search problems)
•  …
Our strategy:
Generate a model of domain-specific phrases by	
  	
  
mining	
  query	
  logs	
  for	
  commonly	
  searched	
  phrases	
  within	
  the	
  domain	
  [1]	
  
[1] K. Aljadda, M. Korayem, T. Grainger, C. Russell. "Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific Jargon," in IEEE Big Data 2014.
En>ty-­‐type	
  Recogni>on	
  
Build classifiers trained on
External data sources
(Wikipedia, DBPedia,
WordNet, etc.), as well as
from our own domain.
The subject for a future
talk / research paper…
java	
  developer	
  
registered	
  nurse	
  
emergency	
  room	
  
director	
  
job	
  >tle	
  
skill	
  
job	
  level	
  
loca>on	
  
work	
  type	
  
Portland,	
  OR	
  
part-­‐>me	
  
Seman>c	
  Query	
  Parsing	
  
Query	
  Parsing:	
  The	
  whole	
  is	
  greater	
  than	
  the	
  sum	
  of	
  the	
  parts	
  
project	
  manager	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  vs.	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "project"	
  AND	
  "manager"	
  
building	
  architect	
  	
  	
  	
  	
  	
  	
  	
  	
  vs.	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "building"	
  AND	
  "architect"	
  
soRware	
  architect	
  	
  	
  	
  	
  	
  	
  	
  vs.	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "soRware"	
  AND	
  "architect"	
  
	
  
	
  
Consider:	
   	
  	
  a	
  "soRware	
  architect"	
  designs	
  and	
  builds	
  soRware	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  "building	
  architect"	
  uses	
  soRware	
  to	
  design	
  architecture	
  
	
  	
   	
  	
  
	
  	
  
	
   	
  	
  
User’s	
  Query:	
  
machine	
  learning	
  research	
  and	
  
development	
  Portland,	
  OR	
  soRware	
  	
  
engineer	
  AND	
  hadoop	
  java	
  
Tradi>onal	
  Query	
  Parsing:	
  	
  	
  	
  	
  
(machine	
  AND	
  learning	
  AND	
  research	
  
AND	
  development	
  AND	
  portland)	
  	
  
	
  OR	
  (soRware	
  AND	
  engineer	
  AND	
  
hadoop	
  AND	
  java)	
  
≠
Identifying the correct phrase (not just the parts) is crucial here!
Probabilistic Query Parser
Goal: given a query, predict which
combinations of keywords should be
combined together as phrases
Example:
senior java developer hadoop
Possible Parsings:
senior, java, developer, hadoop
"senior java", developer, hadoop
"senior java developer", hadoop
"senior java developer hadoop”
"senior java", "developer hadoop”
senior, "java developer", hadoop
senior, java, "developer hadoop"
Input: senior hadoop developer java ruby on rails perl
Seman>c	
  Search	
  Architecture	
  –	
  Query	
  Parsing	
  
1)  Generate the previously discussed taxonomy of
Domain-specific phrases
•  You	
  can	
  mine	
  query	
  logs	
  or	
  actual	
  text	
  of	
  documents	
  for	
  
significant	
  phrases	
  within	
  your	
  domain	
  [1]	
  
2) Feed these phrases to SolrTextTagger (uses Lucene FST
for high-throughput term lookups)
3) Use SolrTextTagger to perform entity extraction
on incoming queries (tagging documents is also possible)
4) Also invoke probabilistic parser to dynamically identify
unknown phrases from a corpus of data (language model)
5) Shown on next slides:
Pass extracted entities to a Query Augmentation phase to
rewrite the query with enhanced semantic understanding
[1] K. Aljadda, M. Korayem, T. Grainger, C. Russell. "Crowdsourced Query Augmentation through Semantic Discovery of
Domain-specific Jargon," in IEEE Big Data 2014.
[2] https://github.com/OpenSextant/SolrTextTagger
Query	
  Augmenta>on	
  
machine	
  learning	
  
Keywords:	
  
Search	
  Behavior,	
  
Applica>on	
  Behavior,	
  etc.	
  
Job	
  Title	
  Classifier,	
  Skills	
  Extractor,	
  Job	
  Level	
  Classifier,	
  etc.	
  
Seman>c	
  Query	
  
Augmenta>on	
  
keywords:((machine	
  learning)^10	
  OR	
  	
  
{	
  AT_LEAST_2:	
  ("data	
  mining"^0.9,	
  matlab^0.8,	
  	
  
"data	
  scien/st"^0.75,	
  "ar/ficial	
  intelligence"^0.7,	
  	
  
"neural	
  networks"^0.55))	
  }	
  
{	
  BOOST_TO_TOP:	
  (	
  job_/tle:(	
  
"soRware	
  engineer"	
  OR	
  "data	
  manager"	
  OR	
  	
  
"data	
  scien/st"	
  OR	
  "hadoop	
  engineer"))	
  }	
  
	
  
Modified	
  Query:	
  
Related	
  Occupa>ons	
  
machine	
  learning:	
  	
  
{15-­‐1031.00	
  	
  	
  	
  .58	
  
Computer	
  Soware	
  Engineers,	
  Applica>ons	
  
15-­‐1011.00	
  	
  	
  	
  .55	
  
Computer	
  and	
  Informa>on	
  Scien>sts,	
  Research	
  
15-­‐1032.00	
  	
  	
  	
  .52	
  	
  
Computer	
  Soware	
  Engineers,	
  Systems	
  Soware	
  }	
  
machine	
  learning:	
  	
  
	
  {	
  soRware	
  engineer	
  .65,	
  	
  
	
  	
  	
  data	
  manager	
  .3,	
  	
  
	
  	
  	
  data	
  scien/st	
  .25,	
  	
  
	
  	
  	
  hadoop	
  engineer	
  .2,	
  }	
  
Common	
  Job	
  Titles	
  
Semantic Search Architecture – Query Augmentation
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  Related	
  Phrases	
  
machine	
  learning:	
  	
  
	
  {	
  	
  data	
  mining	
  .9,	
  
	
  	
  	
  	
  matlab	
  .8,	
  
	
  	
  	
  	
  data	
  scien/st	
  .75,	
  	
  
	
  	
  	
  	
  ar/ficial	
  intelligence	
  .7,	
  	
  
	
  	
  	
  	
  neural	
  networks	
  .55	
  }	
  
Known	
  keyword	
  	
  
phrases	
  
java	
  developer	
  
machine	
  learning	
  
registered	
  nurse	
  
FST	
  
Knowledge	
  	
  
Graph	
  in	
  
+
Query Enrichment
Document Enrichment
Document Enrichment
Knowledge	
  Graph	
  
Serves as a “data science toolkit” API that allows dynamically navigating and pivoting through
multiple levels of relationships between items in our domain. Compare the relationships of skills to
keywords, job titles to skills to keywords, skills to government occupation codes, skills to experience
level, etc.
	
  
Knowledge Graph API
Core	
  similarity	
  engine,	
  exposed	
  via	
  API	
  
Any	
  product	
  can	
  leverage	
  our	
  core	
  rela/onship	
  scoring	
  
engine	
  to	
  score	
  any	
  list	
  of	
  en//es	
  against	
  any	
  other	
  list	
  
Full	
  domain	
  support	
  
Keywords,	
  job	
  /tles,	
  skills,	
  companies,	
  job	
  levels,	
  
loca/ons,	
  and	
  all	
  other	
  taxonomies.	
  	
  
Intersec>ons,	
  overlaps,	
  &	
  rela>onship	
  
scoring,	
  many	
  levels	
  deep	
  
Users	
  can	
  either	
  provide	
  a	
  list	
  of	
  items	
  to	
  score,	
  or	
  else	
  have	
  the	
  
system	
  dynamically	
  discover	
  the	
  most	
  related	
  items	
  (or	
  both).	
  
Knowledge	
  
Graph	
  
So how does it work?
Foreground	
  vs.	
  Background	
  Analysis	
  
Every	
  term	
  scored	
  against	
  it’s	
  context.	
  The	
  more	
  	
  
commonly	
  the	
  term	
  appears	
  within	
  it’s	
  foreground	
  
context	
  versus	
  its	
  background	
  context,	
  the	
  more	
  
relevant	
  it	
  is	
  to	
  the	
  specified	
  foreground	
  context.	
  
countFG(x) - totalDocsFG * probBG(x)
z = --------------------------------------------------------
sqrt(totalDocsFG * probBG(x) * (1 - probBG(x)))
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness":0.9773, "popularity":369 },
{ "value":"java", "relatedness":0.9236, "popularity":15653 },
{ "value":".net", "relatedness":0.5294, "popularity":17683 },
{ "value":"bee", "relatedness":0.0, "popularity":0 },
{ "value":"teacher", "relatedness":-0.2380, "popularity":9923 },
{ "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] }
We are essentially boosting terms which are more related to some known feature
(and ignoring terms which are equally likely to appear in the background corpus)
+
-
Foreground	
  Query:	
  	
  
	
  	
  	
  	
  "Hadoop"	
  
Knowledge	
  
Graph	
  
Knowledge Graph – Potential Use Cases
Cross-­‐walk	
  between	
  Types	
  
•  Have	
  an	
  ID	
  field,	
  but	
  want	
  to	
  enable	
  free	
  text	
  search	
  
on	
  the	
  most	
  associated	
  en/ty	
  with	
  that	
  ID?	
  
•  	
  Have	
  a	
  “state”	
  (geo)	
  search	
  box,	
  but	
  want	
  to	
  accept	
  
any	
  free-­‐text	
  loca/on	
  and	
  map	
  it	
  to	
  the	
  right	
  state?	
  	
  
•  Have	
  an	
  old	
  classifica/on	
  taxonomy	
  and	
  want	
  to	
  
know	
  how	
  the	
  values	
  from	
  the	
  old	
  system	
  now	
  map	
  
into	
  the	
  new	
  values?	
  
Build	
  User	
  Profiles	
  from	
  Search	
  Logs	
  
•  If	
  someone	
  searches	
  for	
  “Java”,	
  and	
  then	
  “JQuery”,	
  
and	
  then	
  “CSS”,	
  and	
  then	
  “JSP”,	
  what	
  do	
  those	
  have	
  
in	
  common?	
  
•  What	
  if	
  they	
  search	
  for	
  “Java”,	
  and	
  then	
  	
  “C++”,	
  and	
  
then	
  “Assembly”?	
  
Discover	
  Rela>onships	
  Between	
  Anything	
  
•  If	
  I	
  want	
  to	
  become	
  a	
  data	
  scien/st	
  and	
  know	
  
Python,	
  what	
  libraries	
  should	
  I	
  learn?	
  
•  If	
  my	
  last	
  job	
  was	
  mid-­‐level	
  soRware	
  engineer	
  and	
  
my	
  current	
  job	
  is	
  Engineering	
  Lead,	
  what	
  are	
  my	
  
most	
  likely	
  next	
  roles?	
  
Traverse	
  arbitrarily	
  deep,	
  Sort	
  on	
  anything	
  
•  Build	
  an	
  instant	
  co-­‐occurrence	
  matrix,	
  sort	
  the	
  top	
  
values	
  by	
  their	
  relatedness,	
  and	
  then	
  add	
  in	
  any	
  
number	
  of	
  addi/onal	
  dimensions	
  (RAM	
  permi|ng).	
  
Data	
  Cleansing	
  
•  Have	
  dirty	
  taxonomies	
  and	
  need	
  to	
  figure	
  out	
  which	
  
items	
  don’t	
  belong?	
  
•  Need	
  to	
  understand	
  the	
  conceptual	
  cohesion	
  of	
  a	
  
document	
  (vs	
  spammy	
  or	
  off-­‐topic	
  content)?	
  
Knowledge	
  
Graph	
  
2014-2015 Publications & Presentations
Books:
Solr in Action - A comprehensive guide to implementing scalable search using Apache Solr
Research papers:
●  Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific jargon - 2014
●  Towards a Job title Classification System - 2014
●  Augmenting Recommendation Systems Using a Model of Semantically-related Terms
Extracted from User Behavior - 2014
●  sCooL: A system for academic institution name normalization - 2014
●  PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems - 2014
●  SKILL: A System for Skill Identification and Normalization – 2015
●  Carotene: A Job Title Classification System for the Online Recruitment Domain - 2015
●  WebScalding: A Framework for Big Data Web Services - 2015
●  A Pipeline for Extracting and Deduplicating Domain-Specific Knowledge Bases - 2015
●  Macau: Large-Scale Skill Sense Disambiguation in the Online Recruitment Domain - 2015
●  Improving the Quality of Semantic Relationships Extracted from Massive User Behavioral Data – 2015
●  Query Sense Disambiguation Leveraging Large Scale User Behavioral Data - 2015
Speaking Engagements:
●  Over a dozen in the last year: Lucene/Solr Revolution 2014, WSDM 2014, Atlanta Solr Meetup, Atlanta Big Data Meetup, Second
International Syposium on Big Data and Data Analytics, RecSys 2014, IEEE Big Data Conference 2014 (x2), AAAI/IAAI 2015, IEEE Big Data
2015 (x6) Lucene/Solr Revolution 2015
So	
  What’s	
  Next?	
  
machine	
  learning	
  
Keywords:	
  
Search	
  Behavior,	
  
Applica>on	
  Behavior,	
  etc.	
  
Job	
  Title	
  Classifier,	
  Skills	
  Extractor,	
  Job	
  Level	
  Classifier,	
  etc.	
  
Seman>c	
  Query	
  
Augmenta>on	
  
keywords:((machine	
  learning)^10	
  OR	
  	
  
{	
  AT_LEAST_2:	
  ("data	
  mining"^0.9,	
  matlab^0.8,	
  	
  
"data	
  scien/st"^0.75,	
  "ar/ficial	
  intelligence"^0.7,	
  	
  
"neural	
  networks"^0.55))	
  }	
  
{	
  BOOST_TO_TOP:	
  (	
  job_/tle:(	
  
"soRware	
  engineer"	
  OR	
  "data	
  manager"	
  OR	
  	
  
"data	
  scien/st"	
  OR	
  "hadoop	
  engineer"))	
  }	
  
	
  
Modified	
  Query:	
  
Related	
  Occupa>ons	
  
machine	
  learning:	
  	
  
{15-­‐1031.00	
  	
  	
  	
  .58	
  
Computer	
  Soware	
  Engineers,	
  Applica>ons	
  
15-­‐1011.00	
  	
  	
  	
  .55	
  
Computer	
  and	
  Informa>on	
  Scien>sts,	
  Research	
  
15-­‐1032.00	
  	
  	
  	
  .52	
  	
  
Computer	
  Soware	
  Engineers,	
  Systems	
  Soware	
  }	
  
machine	
  learning:	
  	
  
	
  {	
  soRware	
  engineer	
  .65,	
  	
  
	
  	
  	
  data	
  manager	
  .3,	
  	
  
	
  	
  	
  data	
  scien/st	
  .25,	
  	
  
	
  	
  	
  hadoop	
  engineer	
  .2,	
  }	
  
Common	
  Job	
  Titles	
  
Semantic Search Architecture – Query Augmentation
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  Related	
  Phrases	
  
machine	
  learning:	
  	
  
	
  {	
  	
  data	
  mining	
  .9,	
  
	
  	
  	
  	
  matlab	
  .8,	
  
	
  	
  	
  	
  data	
  scien/st	
  .75,	
  	
  
	
  	
  	
  	
  ar/ficial	
  intelligence	
  .7,	
  	
  
	
  	
  	
  	
  neural	
  networks	
  .55	
  }	
  
Known	
  keyword	
  	
  
phrases	
  
java	
  developer	
  
machine	
  learning	
  
registered	
  nurse	
  
FST	
  
Knowledge	
  	
  
Graph	
  in	
  
+
This	
  Piece:	
  
	
  	
  	
  	
  	
  How	
  do	
  you	
  construct	
  the	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  best	
  possible	
  queries?	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  The	
  answer…	
  Learning	
  to	
  Rank	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (Machine-­‐learned	
  Ranking)	
  
	
  
	
  	
  	
  	
  	
  That	
  can	
  be	
  a	
  topic	
  for	
  next	
  /me…	
  
Type-­‐ahead	
  
Predic/on	
  
Knowledge	
  Graph	
  and	
  Intent	
  Engine	
  
Search	
  Box	
  
Seman/c	
  Query	
  
Parsing	
  
Intent Engine
Spelling	
  Correc/on	
  
En/ty	
  /	
  En/ty	
  
Type	
  Resolu/on	
  
Machine-­‐learned	
  
Ranking	
  
Relevancy Engine (“re-expressing intent”)
User	
  Feedback	
  	
  
(Clarifying	
  Intent)	
  
Query	
  Re-­‐wri/ng	
   Search	
  Results	
  
Query	
  
Augmenta/on	
  
Knowledge	
  
Graph	
  
Addi>onal	
  References:	
  
Contact	
  Info	
  
Yes,	
  WE	
  ARE	
  HIRING	
  @	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .	
  	
  	
  Come	
  talk	
  with	
  me	
  if	
  you	
  are	
  interested…	
  
Trey	
  Grainger	
  
	
  trey.grainger@careerbuilder.com	
  
	
  @treygrainger	
  
	
  
	
  
	
  
	
  
	
  
hcp://solrinac>on.com	
  
Conference discount (43% off): lusorevcftw
	
  
Other	
  presenta>ons:	
  	
  	
  
	
  	
  	
  	
  	
  hcp://www.treygrainger.com	
  
	
  

Contenu connexe

Tendances

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Lucidworks
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Lucidworks
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Lucidworks
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 

Tendances (20)

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Haystacks slides
Haystacks slidesHaystacks slides
Haystacks slides
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 

En vedette

An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...Lucidworks
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Lucidworks
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 

En vedette (6)

Apache Solr
Apache SolrApache Solr
Apache Solr
 
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 

Similaire à Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by Trey Grainger, CareerBuilder

The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchCareerBuilder.com
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search SystemTrey Grainger
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseBrendan Tierney
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorialYiqun Liu
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...Aman Grover
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013MLconf
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesConnected Data World
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEOMichael King
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your CodeNate Abele
 
Measuring Your Code 2.0
Measuring Your Code 2.0Measuring Your Code 2.0
Measuring Your Code 2.0Nate Abele
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingDataWorks Summit
 

Similaire à Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by Trey Grainger, CareerBuilder (20)

The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic search
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle Database
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
 
R program
R programR program
R program
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your Code
 
Measuring Your Code 2.0
Measuring Your Code 2.0Measuring Your Code 2.0
Measuring Your Code 2.0
 
Text mining and Visualizations
Text mining  and VisualizationsText mining  and Visualizations
Text mining and Visualizations
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
 

Plus de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Plus de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Dernier

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Dernier (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by Trey Grainger, CareerBuilder

  • 1. Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine Trey Grainger Director of Engineering, Search & Recommendations 2015.10.15
  • 2. Trey Grainger Director of Engineering, Search & Recommendations •  Joined CareerBuilder in 2007 as a Software Engineer •  MBA, Management of Technology – Georgia Tech •  BA, Computer Science, Business, & Philosophy – Furman University •  Mining Massive Datasets (in progress) - Stanford University Fun outside of CB: •  Co-author of Solr in Action, plus a handful of research papers •  Frequent conference speaker •  Founder of Celiaccess.com, the gluten-free search engine •  Lucene/Solr contributor About  Me  
  • 3. Agenda •  Introduc/on   •  Defining  the  problem  –  the  need  for  Seman/c  Search   •  Building  an  Intent  Engine      -­‐  Type-­‐ahead  predic/on      -­‐  Spelling  Correc/on      -­‐  En/ty  /  En/ty-­‐type  Resolu/on      -­‐  Seman/c  Query  Parsing      -­‐  Query  Augmenta/on      -­‐  The  Knowledge  Graph   •  Conclusion   Knowledge   Graph  
  • 4. At CareerBuilder, Solr Powers...At CareerBuilder, Solr Powers...
  • 5. Search  by  the  Numbers   5   Powering  50+  Search  Experiences  Including:   100  million  +   Searches  per  day   30+   SoRware  Developers,  Data   Scien/sts  +  Analysts      500+   Search  Servers   1,5  billion  +   Documents  indexed  and   searchable   1  Global  Search     Technology  plaUorm   ...and many more
  • 6. What’s  the  problem  we’re  trying  to  solve  today?   User’s  Query:         machine  learning  research  and  development  Portland,  OR  soRware     engineer  AND  hadoop,  java         Tradi>onal  Query  Parsing:           (machine  AND  learning  AND  research  AND  development  AND  portland)      OR  (soRware  AND  engineer  AND  hadoop  AND  java)     Seman>c  Query  Parsing:   "machine  learning"  AND    "research  and  development"  AND    "Portland,  OR"     AND    "soRware  engineer"  AND  hadoop  AND  java     Seman>cally  Expanded  Query:   ("machine  learning"^10  OR    "data  scien/st"  OR  "data  mining"  OR  "ar/ficial  intelligence")   AND  ("research  and  development"^10  OR    "r&d")  AND     AND  ("Portland,  OR"^10  OR    "Portland,  Oregon"  OR  {!geofilt  pt=45.512,-­‐122.676  d=50  sfield=geo})     AND  ("soRware  engineer"^10  OR  "soRware  developer")     AND  (hadoop^10  OR    "big  data"  OR  hbase  OR  hive)  AND  (java^10  OR  j2ee)  
  • 7. But  we  also  really  want  “things”,  not  “strings”…   Job  Level   Job  /tle   Company   Job  Title   Company   School  +  Degree  
  • 8. Type-­‐ahead   Predic/on   Knowledge  Graph  and  Intent  Engine   Search  Box   Seman/c  Query   Parsing   Intent Engine Spelling  Correc/on   En/ty  /  En/ty   Type  Resolu/on   Machine-­‐learned   Ranking   Relevancy Engine (“re-expressing intent”) User  Feedback     (Clarifying  Intent)   Query  Re-­‐wri/ng   Search  Results   Query   Augmenta/on   Knowledge   Graph  
  • 10. Seman/c  Autocomplete     •  Shows  top  terms  for  any  search     •  Breaks  out  job  /tles,  skills,  companies,   related  keywords,  and  other   categories     •  Understands  abbrevia/ons,  alternate   forms,  misspellings     •  Supports  full  Boolean  syntax  and   mul/-­‐term  autocomplete     •  Enables  fielded  search  on  en//es,  not   just  keywords  
  • 11. Spelling  Correc>on*     *Google  “Solr  Spell  Check  Component”
  • 12.
  • 13. En>ty  /  En>ty-­‐type   Resolu>on  
  • 14. Differen>a>ng  related  terms   Synonyms:                                        cpa                  =>      cer/fied  public  accountant                                                                                    rn                      =>      registered  nurse                                                                                                                                                                    r.n.                  =>      registered  nurse     Ambiguous  Terms*:          driver        =>      driver  (trucking)      ~80%  likelihood                                                                                    driver        =>      driver  (so5ware)    ~20%  likelihood     Related  Terms:                        r.n.                    =>      nursing,  bsn                                                                                hadoop    =>      mapreduce,  hive,  pig       *differen9ated  based  upon  user  and  query  context    
  • 15. Building  a  Taxonomy  of  En>>es   Many ways to generate this: •  Topic Modelling •  Clustering of documents •  Statistical Analysis of interesting phrases •  Buy a dictionary (often doesn’t work for domain-specific search problems) •  … Our strategy: Generate a model of domain-specific phrases by     mining  query  logs  for  commonly  searched  phrases  within  the  domain  [1]   [1] K. Aljadda, M. Korayem, T. Grainger, C. Russell. "Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific Jargon," in IEEE Big Data 2014.
  • 16. En>ty-­‐type  Recogni>on   Build classifiers trained on External data sources (Wikipedia, DBPedia, WordNet, etc.), as well as from our own domain. The subject for a future talk / research paper… java  developer   registered  nurse   emergency  room   director   job  >tle   skill   job  level   loca>on   work  type   Portland,  OR   part-­‐>me  
  • 18. Query  Parsing:  The  whole  is  greater  than  the  sum  of  the  parts   project  manager                      vs.                          "project"  AND  "manager"   building  architect                  vs.                          "building"  AND  "architect"   soRware  architect                vs.                          "soRware"  AND  "architect"       Consider:      a  "soRware  architect"  designs  and  builds  soRware                                                    a  "building  architect"  uses  soRware  to  design  architecture                     User’s  Query:   machine  learning  research  and   development  Portland,  OR  soRware     engineer  AND  hadoop  java   Tradi>onal  Query  Parsing:           (machine  AND  learning  AND  research   AND  development  AND  portland)      OR  (soRware  AND  engineer  AND   hadoop  AND  java)   ≠ Identifying the correct phrase (not just the parts) is crucial here!
  • 19.
  • 20. Probabilistic Query Parser Goal: given a query, predict which combinations of keywords should be combined together as phrases Example: senior java developer hadoop Possible Parsings: senior, java, developer, hadoop "senior java", developer, hadoop "senior java developer", hadoop "senior java developer hadoop” "senior java", "developer hadoop” senior, "java developer", hadoop senior, java, "developer hadoop"
  • 21. Input: senior hadoop developer java ruby on rails perl
  • 22. Seman>c  Search  Architecture  –  Query  Parsing   1)  Generate the previously discussed taxonomy of Domain-specific phrases •  You  can  mine  query  logs  or  actual  text  of  documents  for   significant  phrases  within  your  domain  [1]   2) Feed these phrases to SolrTextTagger (uses Lucene FST for high-throughput term lookups) 3) Use SolrTextTagger to perform entity extraction on incoming queries (tagging documents is also possible) 4) Also invoke probabilistic parser to dynamically identify unknown phrases from a corpus of data (language model) 5) Shown on next slides: Pass extracted entities to a Query Augmentation phase to rewrite the query with enhanced semantic understanding [1] K. Aljadda, M. Korayem, T. Grainger, C. Russell. "Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific Jargon," in IEEE Big Data 2014. [2] https://github.com/OpenSextant/SolrTextTagger
  • 24. machine  learning   Keywords:   Search  Behavior,   Applica>on  Behavior,  etc.   Job  Title  Classifier,  Skills  Extractor,  Job  Level  Classifier,  etc.   Seman>c  Query   Augmenta>on   keywords:((machine  learning)^10  OR     {  AT_LEAST_2:  ("data  mining"^0.9,  matlab^0.8,     "data  scien/st"^0.75,  "ar/ficial  intelligence"^0.7,     "neural  networks"^0.55))  }   {  BOOST_TO_TOP:  (  job_/tle:(   "soRware  engineer"  OR  "data  manager"  OR     "data  scien/st"  OR  "hadoop  engineer"))  }     Modified  Query:   Related  Occupa>ons   machine  learning:     {15-­‐1031.00        .58   Computer  Soware  Engineers,  Applica>ons   15-­‐1011.00        .55   Computer  and  Informa>on  Scien>sts,  Research   15-­‐1032.00        .52     Computer  Soware  Engineers,  Systems  Soware  }   machine  learning:      {  soRware  engineer  .65,          data  manager  .3,          data  scien/st  .25,          hadoop  engineer  .2,  }   Common  Job  Titles   Semantic Search Architecture – Query Augmentation                                    Related  Phrases   machine  learning:      {    data  mining  .9,          matlab  .8,          data  scien/st  .75,            ar/ficial  intelligence  .7,            neural  networks  .55  }   Known  keyword     phrases   java  developer   machine  learning   registered  nurse   FST   Knowledge     Graph  in   +
  • 29. Serves as a “data science toolkit” API that allows dynamically navigating and pivoting through multiple levels of relationships between items in our domain. Compare the relationships of skills to keywords, job titles to skills to keywords, skills to government occupation codes, skills to experience level, etc.   Knowledge Graph API Core  similarity  engine,  exposed  via  API   Any  product  can  leverage  our  core  rela/onship  scoring   engine  to  score  any  list  of  en//es  against  any  other  list   Full  domain  support   Keywords,  job  /tles,  skills,  companies,  job  levels,   loca/ons,  and  all  other  taxonomies.     Intersec>ons,  overlaps,  &  rela>onship   scoring,  many  levels  deep   Users  can  either  provide  a  list  of  items  to  score,  or  else  have  the   system  dynamically  discover  the  most  related  items  (or  both).   Knowledge   Graph  
  • 30. So how does it work? Foreground  vs.  Background  Analysis   Every  term  scored  against  it’s  context.  The  more     commonly  the  term  appears  within  it’s  foreground   context  versus  its  background  context,  the  more   relevant  it  is  to  the  specified  foreground  context.   countFG(x) - totalDocsFG * probBG(x) z = -------------------------------------------------------- sqrt(totalDocsFG * probBG(x) * (1 - probBG(x))) { "type":"keywords”, "values":[ { "value":"hive", "relatedness":0.9773, "popularity":369 }, { "value":"java", "relatedness":0.9236, "popularity":15653 }, { "value":".net", "relatedness":0.5294, "popularity":17683 }, { "value":"bee", "relatedness":0.0, "popularity":0 }, { "value":"teacher", "relatedness":-0.2380, "popularity":9923 }, { "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] } We are essentially boosting terms which are more related to some known feature (and ignoring terms which are equally likely to appear in the background corpus) + - Foreground  Query:            "Hadoop"   Knowledge   Graph  
  • 31. Knowledge Graph – Potential Use Cases Cross-­‐walk  between  Types   •  Have  an  ID  field,  but  want  to  enable  free  text  search   on  the  most  associated  en/ty  with  that  ID?   •   Have  a  “state”  (geo)  search  box,  but  want  to  accept   any  free-­‐text  loca/on  and  map  it  to  the  right  state?     •  Have  an  old  classifica/on  taxonomy  and  want  to   know  how  the  values  from  the  old  system  now  map   into  the  new  values?   Build  User  Profiles  from  Search  Logs   •  If  someone  searches  for  “Java”,  and  then  “JQuery”,   and  then  “CSS”,  and  then  “JSP”,  what  do  those  have   in  common?   •  What  if  they  search  for  “Java”,  and  then    “C++”,  and   then  “Assembly”?   Discover  Rela>onships  Between  Anything   •  If  I  want  to  become  a  data  scien/st  and  know   Python,  what  libraries  should  I  learn?   •  If  my  last  job  was  mid-­‐level  soRware  engineer  and   my  current  job  is  Engineering  Lead,  what  are  my   most  likely  next  roles?   Traverse  arbitrarily  deep,  Sort  on  anything   •  Build  an  instant  co-­‐occurrence  matrix,  sort  the  top   values  by  their  relatedness,  and  then  add  in  any   number  of  addi/onal  dimensions  (RAM  permi|ng).   Data  Cleansing   •  Have  dirty  taxonomies  and  need  to  figure  out  which   items  don’t  belong?   •  Need  to  understand  the  conceptual  cohesion  of  a   document  (vs  spammy  or  off-­‐topic  content)?   Knowledge   Graph  
  • 32. 2014-2015 Publications & Presentations Books: Solr in Action - A comprehensive guide to implementing scalable search using Apache Solr Research papers: ●  Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific jargon - 2014 ●  Towards a Job title Classification System - 2014 ●  Augmenting Recommendation Systems Using a Model of Semantically-related Terms Extracted from User Behavior - 2014 ●  sCooL: A system for academic institution name normalization - 2014 ●  PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems - 2014 ●  SKILL: A System for Skill Identification and Normalization – 2015 ●  Carotene: A Job Title Classification System for the Online Recruitment Domain - 2015 ●  WebScalding: A Framework for Big Data Web Services - 2015 ●  A Pipeline for Extracting and Deduplicating Domain-Specific Knowledge Bases - 2015 ●  Macau: Large-Scale Skill Sense Disambiguation in the Online Recruitment Domain - 2015 ●  Improving the Quality of Semantic Relationships Extracted from Massive User Behavioral Data – 2015 ●  Query Sense Disambiguation Leveraging Large Scale User Behavioral Data - 2015 Speaking Engagements: ●  Over a dozen in the last year: Lucene/Solr Revolution 2014, WSDM 2014, Atlanta Solr Meetup, Atlanta Big Data Meetup, Second International Syposium on Big Data and Data Analytics, RecSys 2014, IEEE Big Data Conference 2014 (x2), AAAI/IAAI 2015, IEEE Big Data 2015 (x6) Lucene/Solr Revolution 2015
  • 34. machine  learning   Keywords:   Search  Behavior,   Applica>on  Behavior,  etc.   Job  Title  Classifier,  Skills  Extractor,  Job  Level  Classifier,  etc.   Seman>c  Query   Augmenta>on   keywords:((machine  learning)^10  OR     {  AT_LEAST_2:  ("data  mining"^0.9,  matlab^0.8,     "data  scien/st"^0.75,  "ar/ficial  intelligence"^0.7,     "neural  networks"^0.55))  }   {  BOOST_TO_TOP:  (  job_/tle:(   "soRware  engineer"  OR  "data  manager"  OR     "data  scien/st"  OR  "hadoop  engineer"))  }     Modified  Query:   Related  Occupa>ons   machine  learning:     {15-­‐1031.00        .58   Computer  Soware  Engineers,  Applica>ons   15-­‐1011.00        .55   Computer  and  Informa>on  Scien>sts,  Research   15-­‐1032.00        .52     Computer  Soware  Engineers,  Systems  Soware  }   machine  learning:      {  soRware  engineer  .65,          data  manager  .3,          data  scien/st  .25,          hadoop  engineer  .2,  }   Common  Job  Titles   Semantic Search Architecture – Query Augmentation                                    Related  Phrases   machine  learning:      {    data  mining  .9,          matlab  .8,          data  scien/st  .75,            ar/ficial  intelligence  .7,            neural  networks  .55  }   Known  keyword     phrases   java  developer   machine  learning   registered  nurse   FST   Knowledge     Graph  in   + This  Piece:            How  do  you  construct  the                    best  possible  queries?                    The  answer…  Learning  to  Rank                                          (Machine-­‐learned  Ranking)              That  can  be  a  topic  for  next  /me…  
  • 35. Type-­‐ahead   Predic/on   Knowledge  Graph  and  Intent  Engine   Search  Box   Seman/c  Query   Parsing   Intent Engine Spelling  Correc/on   En/ty  /  En/ty   Type  Resolu/on   Machine-­‐learned   Ranking   Relevancy Engine (“re-expressing intent”) User  Feedback     (Clarifying  Intent)   Query  Re-­‐wri/ng   Search  Results   Query   Augmenta/on   Knowledge   Graph  
  • 37. Contact  Info   Yes,  WE  ARE  HIRING  @                                                                    .      Come  talk  with  me  if  you  are  interested…   Trey  Grainger    trey.grainger@careerbuilder.com    @treygrainger             hcp://solrinac>on.com   Conference discount (43% off): lusorevcftw   Other  presenta>ons:                hcp://www.treygrainger.com