DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Smx west Barbara Starr Mac Version - Schema 201 for Real world Succes
1. Schema 201
Real World Markup For success: From a Search Engine Perspective.
By: Barbara Starr
Twitter: @BarbaraStarr
Email: bstarr@algebraixdata.com
2. Meta Information
ME
• Pursued a doctorate in Artificial Intelligence from South My favorite author:
Africa in the 80's. Isaac Asimov
• Recruited to build intelligent/predictive trading systems
on Wall Street
• Migrated to government-based contracts, several of
which turned into real world products like Favorite book:
– SIRI (PAL from DARPA)
– WATSON (Acquaint - IBM Watson Labs was a team
I Robot
member)
• From the vantage of a semantic technologist, I keenly
watched the evolution of the Semantic Web.
• “Shocked into the real world” when working as a
consultant @ Overstock
• Today - Educator, Consultant, Developer, Strategist Favorite character:
MULTIVAC
By: Barbara Starr
Twitter: @BarbaraStarr
Email: bstarr@algebraixdata.com
Linkedin: http://www.linkedin.com/in/barbarastarr
3. Additional Metainformation
For the purpose of this talk:
MY ROBOT or Artificially Intelligent Entity or Search Engine
LO
W
same-as
LO
W LO
W
Metadata Structured Markup Semantic Markup
same-as same-as
5. SEARCH ENGINE POINT OF VIEW
Sorry guys, but it’s back
to me. He is a
headless browser (like
Structured markup
googlebot and bingbot)
for real world
And does not talk to
success! Just
humans!
some background
first!
6. SEARCH ENGINE POINT OF VIEW
There are many means
by which I can exploit When listening to
structured markup! me, bear in mind
that if you make
me happy, you will
be too!
7. SEARCH ENGINE POINT OF VIEW
I can directly extract
information from
Searchmonkey 2008
structured markup to
enhance SERP displays
tiles
RICH SNIPPETS 2009
9. SEARCH ENGINE POINT OF VIEW
I can provide direct
answers to queries by
searching on
consumed, verified and
validated information
10. SEARCH ENGINE POINT OF VIEW
I can even aggregate
answers or deduce
them (like a timeline of
events)
I can also leverage it to
expose more relevant
answers in the long tail
of search
11. SEARCH ENGINE POINT OF VIEW
?
I can detect
Penn Treebank tagset
relevancy
signals: i.e what
content to show
to what I can even use it in
I can use it to
audience conjunction with
Assist in
interpreting a machine learning
user query techniques- to eg.
Train other
components
12. SEARCH ENGINE POINT OF VIEW
I can combine it with
computer vision
techniques.
I can leverage
metadata for
better image
search
SIRI
I can enhance
user’s shopping
experience.
13. SEARCH ENGINE POINT OF VIEW
I could really use
Multiple conflicting
this stuff. And it
vocabularies that I will
is like the tower
have to align internally
of babel out
and multiple syntax
there!
formats as well.
?
Microdata
Microformats
RDFa
Goodrelations for e-commerce
I’m a Search Engine Robot
Prior to Schema.org
14. Timeline of RDFa and Semantic Web Adoption
As of Semtech 2011 June 2 – Schema.org announced
Inevitable passage of
Semantic Web adoption –
culminating in schema.org
16. SEARCH ENGINE POINT OF VIEW
A Search Engine
alliance has the power
to MANDATE Align and consume
vocabulary and syntax! many vocabularies
that may not be of
interest to search
engines?
Rather mandate vocabulary And Syntax - microdata
17. Bringing Order from Chaos
On subjects Search Engines are
Interested in!
With great:
• Tools
• Mappings
• And more
• From the W3C
17
18. SEARCH ENGINE POINT OF VIEW
“Know” rather than
Symbolic reasoning vs “recognize”
probabilistic reasoning!
INTRODUCING THE KNOWLEDGE GRAPH
19. And speaking of
SEARCH ENGINE POINT OF VIEW
the knowledge
graph or Folks finding answers
knowledge on my page never
carousel! even have to click
♫ through to yours!
I can even now
start to derive
associations or
relationships
between entities.
20. SEARCH ENGINE POINT OF VIEW
I find it so helpful that I
would really like to be
able to keep all that
validated verified
information to myself!
21. SEARCH ENGINE POINT OF VIEW
I find it so helpful that I
would really like to be
able to keep all that
validated verified
information to myself!
Check out this great data
highlighter. The information
is available only to me and
not to any other search bots!
Can you believe I have been
accused of hijacking
structured markup?
22. SEARCH ENGINE POINT OF VIEW
How do I make this
information findable
and visible to users?
I could use your
assistance as follows!
23. SEARCH ENGINE POINT OF VIEW
Mark up as Ensure the following match:
much • on page markup
information as • data in any feeds you
you can. submit
• information visible to the
user/human!
Enrich your
content/data.
Rich markup sends rich
signals to search
engines.
24. SEARCH ENGINE POINT OF VIEW
As an example, look at
Clearly, if you do not
populate the “color”
the filters that show up
attribute, it is not possible
on the left hand side in
for your product to show
Google Shopping.
up in that filter.
25. SEARCH ENGINE POINT OF VIEW
This same type of logic
also applies to the
various verticals
(however at a higher For example, searching in
level in the “search the recipe vertical, if you
taxonomy” so to speak) have not entered recipe
information, your results will
be “filtered out” from that
SERP result set.
26. SEARCH ENGINE POINT OF VIEW
Adding context in
search verticals really
Google’s “SearchVerticals” helps me serve up
relevant information
Notice any correlations? (Seriously increases my
I would advise you to! recall), as does
geospatial information.
Consumed information -
Structured Data Dashboard
Consequently, drilling
down into a query using
more and more
filters, enables me to
better refine my
understanding of the
intent of your search .
27. Visibility and misperceived information exposure
outweighs “Risk” as the exposure is controllable
Reward RISK
Visibility overpowers Risk
In fact, if correctly done, Risk is completely controllable!
Determining what data to expose is optional, controllable and a business dependent decision.
28. Quantifiable, Measurable, Avoidable
• Fine line between visibility & exposing information?
• Completely controllable
• Completely Avoidable
• Business dependent solution
• Level 5 Place Holder
29. SEARCH ENGINE POINT OF VIEW
My social counterparts
have been leveraging
structured markup
(rdfa) for their They are also leveraging it
opengraph protocol for in their newly released
quite some time. graph search!
Not only that, they are even
building an entity graph not
dissimilar from my
knowledge graph!
The Open Graph Protocol enables you to
integrate your Web pages into the social graph Example of crowdsourced
entity graph info source - places
31. SEARCH ENGINE POINT OF VIEW
Mark up everything
you can. (within
reason and your Make sure that
business priorities) everything you mark up is
also visible to the human
end user. If not, you are
cloaking!
32. SEARCH ENGINE POINT OF VIEW
Make sure your
information is fresh and
there are no stale links.
Don’t try to spam me. You
will not only run the risk of
a penalty, but you will also
lose my trust. (the latter is
an important signal in and
of itself!)
33. SEARCH ENGINE POINT OF VIEW
Ensure your data is of the
highest possible quality
(Cleaned and scrubbed)
and richly attributed. That
will ensure your maximum
visibility in my verticals and
search filters.
34. SEARCH ENGINE POINT OF VIEW
Mark up information not
yet consumed by search
engines to get the
advantage of extra lift
Check the list to see
when it is adopted.
what is coming out
next! Schema.org is
dynamic and is
growing!
35. SEARCH ENGINE POINT OF VIEW
Ensure your
images are Stay tuned for way
enhanced and more to come in the not
also marked up. too distant future!
36. Bye for now
By Barbara Starr
Twitter: @BarbaraStarr
Linkedin: http://www.linkedin.com/in/barbarastarr
E-mail: bstarr@algebraixdata.com