The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
Semtech bizsemanticsearchtutorial
1. • Barbara Starr ( )
– Basics of What semantic search is, what tools
and techniques are used
• Bill Slawski ( )
– Strategy for SEO
– Case based examples and analysis
2. • Pursued a doctorate in Artificial Intelligence from
South Africa in the 80's.
• Recruited to build intelligent/predictive trading
systems on Wall Street
• Migrated to government-based contracts, several
of which turned into real world products like
– SIRI (PAL from DARPA)
– WATSON (Acquaint - IBM Watson Labs was
a team member)
• From the vantage of a semantic technologist, I
keenly watched the evolution of the Semantic Web.
• “Shocked into the real world” when working as a
consultant @ Overstock.
– Rdfa on 900,000 item pages 2 days before Google adopted it
– UPC and identifier “miner”
• Today – Consultant for companies such as GS1
US, Columnist, Strategist, …
5. • Based on concept of “citations” and very easily gamed
• Probabilistic or Statistical (Not Symbolic)
• Keyword Based Search Engine (Not Concept Based or
Ontology Based)
• “link juice” ?
• Other odd vernacular that
became standard jargon in the
“SEO” community
6. SIRI
“Amazing fact: same amount
of computing to answer one
Google Search query as all the
computing done –
in flight and on the ground
-- for the entire Apollo program!”
“Moore's law is the observation
that, over the history of
computing hardware, the
number of transistors in a
dense integrated circuit doubles
approximately every two years””
Source: Wikipedia
7. “A new form of Web
content that is meaningful
to computers will unleash a
revolution of new
possibilities”
• Tim Berners Lee
• James Hendler
• Ora Lassila
http://www.cs.umd.edu/~golbeck/LBSC690/SemanticWeb.html
8. What they want
When they want it (Now)
Accurate (Reliable & Informative)
Available
Search engines must satisfy consumer needs, else:
9.
10. “Def. Semantic Search is any retrieval method where
– User intent and resources are represented in a semantic model
• A set of concepts or topics that generalize over tokens/phrases
• Additional structure such as a hierarchy among concepts, relationships among
concepts etc.
– Semantic representations of the query and the user intent are exploited
in some part of the retrieval process”
Peter Mika, Sr. Research Scientist, Yahoo Labs ⎪ June 19, 2014
11. Inevitable passage of
Semantic Web adoption
(or some version thereof)
– culminating in
schema.org
http://semanticweb.com/semtech-2011-coverage-the-rdfaseo-wave-how-to-catch-it-and-why_b20458
12. “Things” not” strings” -May 16 2012
Understanding “things” helps Google
understand what things are in the world
and what users are searching for
June 2012 –Twitter announces Twitter Cards Pinterest
Rich Pins
13. • Directly extracting on page metadata to create enhanced displays
• Searching directly on consumed metadata
• Provide direct answers to queries by searching on consumed, verified and validated
information
RICH SNIPPETS 2009
Searchmonkey 2008
• Aggregate answers or deduce them (like a timeline of events)
• Expose more relevant answers in the long tail of search
• Assist in interpreting a user query
• Detect relevancy signals: i.e what content to show to what audience
• Use it in conjunction with machine learning techniques- to eg. Train other components
• …
tiles
Long tail:
Peanut Butter
and Jelly in
stripes ?
14. Search is changing
• Semantic, Predictive, Personalised, Conversational
– Search over documents
– Search over Data
• Rise of Answer Engines (Direct answers proliferating)
• Data Quality is imperative
Becoming Less like a search Engine
and more like a personal Assistant
15. SIRI
Google Now
Cortana
AiAgents
(create your own)
Runs cross platform
16. “Answer
box”
Organic
Search
Results
Search
Over Data
Knowledge
Panel
Search
Over
Documents
20. • Microsoft has given a fairly concise definition of the entity
recognition and disambiguation process:
– The objective of an Entity Recognition and Disambiguation
system is to recognize mentions of entities in a given text,
disambiguate them, and map them to the entities in a given
entity collection or knowledge base.
• In Google’s case, that means recognizing entities on web
pages or web documents and mapping them back to
specific entities in their Knowledge Graph
21. Implicit entity graph derived/inferred
from the text on a web page
Explicit entities obtained from
structured markup on a web page
May need to map to
external Ontologies like
schema.org or some
other ontology
Technology – NLP or IR or … Technology – Semantic Web
22. Make it Search Engine/Machine Friendly & tell them (explicitly)
what “things” are on your web page
• Make it (your information on your website) available to Google (and the major search and social
engines), ensure you make it easy for computers to read and discover your stuff.
• With schema.org (and/or the preferred vocabulary/ontology of the search social engine you are
optimizing for, e.g for Facebook use rdfa & Opengraph). Google, Yahoo, Bing, Yandex =>
Schema.org
• Pick a markup format (syntax) and stick with it
– Microdata
– Microformat
– Rdfa
– Rdfa lite
– JSON-LD
23. • Recall some of Google’s Mission/Objective Statements or goals
– “Organizing the worlds information to make it universally accessible and useful”
– “To help with that we have built the knowledge graph”
– Give an identity to every “thing” in the world
• The knowledge graph
– Contains information and entities and their relationships
– Helps in Resolving ambiguities when processing queries
You can explicitly disambiguate your content by providing a freebase mid –
machine identifier - (in your markup)
25. Google plus in “Enhanced Displays and
the knowledge Graph
• Authorship
• Local businesses
• Knowledge Carousel
• ………
26. With Schema.org (and JSON-LD in this case)
• Note the sameAs statement
• mid makes it easier to match or reconcile the “thing”
https://www.youtube.com/watch?v=W9pRpSW_KqA&src_vid=0oOwrBEeQss&feature=iv&annotation_id=annotation_1139520055 Ref: Google I/O 2014
27. The Knowledge Graph Powers:
• Rich snippets in Events
• Event listings in Google Maps
• Notifications in Google Now
https://www.youtube.com/watch?v=XXw8g-FbemI Ref: Google I/O 2014
31. Rich snippets make your data more visible in Search Engine Results Pages
Which would you rather click on?
No Rich Snippets With Rich Snippets
Lower Bounce Rate
32. 32
More Visibility in
verticals, recipes
& images via
markup
In Search Engine Results Pages
Your product is not visible
if no “color” attribute is
populated
&
Search Verticals
33. You want peanut
butter and jelly in
stripes ?
Allows unique and interesting content to surface
34. “Google
Plus”
Key Point -
Corollary: If you don’t exist as an entity you do not exist in the knowledge graph or in “Search Over Data”
The cost of that: Anonymity and Irrelevance!
37. Google’s Structured Markup Helper
• Generates JSON-LD or microdata
• E-mail and web page markup
Data Highlighter
https://support.google.com/webmasters/answer/99170?hl=en&ref_topic=1088472
“Google can present your data more attractively
-- and in new ways -- in search results and in other
products such as the Google Knowledge Graph.”
List provided on schema.rdfs.org
Wordpress plugin and html code http://schema.rdfs.org/tools.html
47. • Microdata reveal
· JSON-LD sniffer
· Semantic inspector
· META SEO inspector
· Green Turtle RDFa
List maintained by Aaron Bradley:
http://www.seoskeptic.com/structured-data-markup-validation-testing-tools/
Written Explanation of Walkthrough
http://searchengineland.com/see-entities-web-page-tools-help-194710
GRUFF
48.
49. • Alchemyapi (with freebase mappings of entities since July 2013)
• Opencalais
• Semantic Verses
• Aylien which was launched in Feb 2014, provides mappings to freebase and schema.org.
• Smartlogic
• lexalytics
• Text-Processing
• Stanford’s Ner
• Textrazor
51. Ensure sure you supply rich, high quality data,
mapped to search filters for maximum visibility
Not visible if no “color”
attribute populated
Fill in The
Gaps
52. • Ensure to supply rich, consistent data in any
format you submit and ensure it is validated,
verified and fresh
• Send Consistent signals
• Provide global identifiers whenever possible
57. • “Query logs record the actual usage of search systems and their analysis has proven critical to
improving search engine functionality. Yet, despite the deluge of information, query log analysis
often suffers from the sparsity of the query space.
we propose a new model for query log data called the entity-aware
click graph. In this representation, we decompose queries into entities and modifiers, and
measure their association with clicked pages. We demonstrate the benefits of this approach on
the crucial task of understanding which websites fulfill similar user needs, showing that using this
representation we can achieve a higher precision than other query log-based approaches ”
Measuring website similarity using an entity-aware click graph
2012 publication: Peter Mika, Hugo Zaragoza, Pablo N Mendes, RoI Blanco
http://dl.acm.org/citation.cfm?id=2398500
58. Need to understand the question in order to answer it
• Entity Mention Queries: Common structure to entity mention queries:
query = <entity> + <intent>
• Queries that return facts as an answer
• What form does the question take? (Question forms)
Where was X born?
When was X born?
Who invented X?
Where was X invented?
What is the X of Y?
Flights from ?x to ?y
Visit old problems/solutions with scale (Parameterized Queries, Form Based Queries,
Query Template, Template Based Query)
Takeaway: Create Content that will provide great answers to these kinds of questions
(for entities relevant to your audience)
59.
60. • Social Graphs
• Interest Graphs
• Mobile Social graphs
• Attraction graphs
• Engagement graphs
• Attention Graphs
• Intent graph
• User Query Graph
• ……..
61. Takeaway: Write engaging content around your audiences interests
(Find ways – “Big Data” - to determine their interests)
62. Anatomy of a Google Search
Results Page (Revisited)
Search
Over Data
Search
Over
Documents