Brave New World of Search Technologies

1
Brave New Search World
Ran Hock
Online Strategies
ran@onstrat.com

2
• The nature of “search” is changing
radically.
• Structure is being created from
(relatively) unstructured data.
• The “Semantic Web” is becoming an
actuality.
• Natural Language Processing (NLP)
and other technologies are being
extensively applied to search and
search-related activities.

3
• These technologies are making the following
kinds of things happen:
– “Knowledge graphs”
– “Entity” identification in numerous
applications
– Natural language search statements
– Actual searching of images (not just of
image metadata)
• These advances are coming not just from
Google but from numerous services,
especially for “news” search.

4
Some Themes/Perspectives
• What is happening is more evolutionary than
revolutionary. Many, but not all, of the "pieces" of
the technology have been around for a while.
• Structure is being derived out of (not totally) chaos.
We are going from words to meaning.
• Google isn’t the only player here.
• We can take real advantage of the developments.
• Using what you already know about “search” is
important.

5
Unstructuredness of Data
• Part of the “organization of knowledge” problem
• Particularly acute for textual material
• To a computer, a “word” is a string of characters
bounded by spaces or punctuation and has no
“meaning”.
• When we are searching for something, we are
searching for meaningful things, not character
strings.
• Meaning can be derived from context by the use
of NLP.

6
Where We Were Recently
• Boolean Logic
– Actually a precursor/example of Artificial
Intelligence (AI) applied to “search”.
– Still a part of search AI
• Boolean is (from our infancy) a central
aspect of how we think, a part of our
“consciousness”
• Old approach: Searching by concepts

7
“Old” (circa 1975 – 2???)
search strategy
(searching by “concepts”)
OR

8
(cont.)
• Ranking of web search results was/is
based on a wide range (ca 200) factors,
“signals”
• User-controlled field searching (intitle:
etc.)
• Etc.

9
The “Newer” Technologies
• Semantic Web Technologies
• Artificial Intelligence (AI) used at a broad
level and utilizing various AI subfields
• AI - Expert Systems approaches
• AI - Natural Language Processing (NLP)
• AI - NLP - Entity identification (extraction,
disambiguation, classification, etc.)
• AI - Machine Learning
• Big Data processing

10
Technologies:
The Semantic Web
• W3C “informal” definition – "The Semantic
Web is an extension of the current web in
which information is given well-defined
meaning, better enabling computers and
people to work in cooperation.”
(from Tim Berners-Lee et al, The Semantic Web. Scientific
American, May 2001.)

11
Technologies:
The Semantic Web
• Essence:
• “strings to things”
• “words to meaning”
• Technologically accomplished on webpages by
means of a specialized xml markup language, etc.

12
Technologies:
The Semantic Web
• Idea born pre-1999
• In practice, also requires other technologies
such as Natural Language Processing, etc.
• 2006 - Berners-Lee and colleagues stated
that: "This simple idea…remains largely
unrealized".
• 2013 - more than four million Web domains
contained Semantic Web markup.

13
Technologies:
AI - Expert Systems
• Search results ranking has long used an
“expert systems” approach, mimicking what an
experienced researcher looks for:
– Words appearing in the title
– Number of times cited (linked-to)
– Proximity of words
– Words in the abstract
– Words in headings
– Etc.
• This will continue, more and more
automatically.

14
Technologies:
Natural Language Processing
• A part of artificial intelligence and computational
linguistics
• Deals with helping computers “understand”
written and spoken languages
• Plays a key role in voice input for search,
natural language search statements,
translations, and more.

15
Technologies:
Google's syntactic systems
• predict part-of-speech tags for each word in
a given sentence,
• identify morphological features such as
gender and number.
• label relationships between words, such as
subject, object, modification, etc.
• leverage large amounts of unlabeled data
• incorporate neural net technology.
research.google.com/pubs/NaturalLanguageProcessing.html

16
Technologies:
Google’s semantic systems
• identify entities in free text,
• label them with types (such as person,
location, or organization),
• cluster mentions of those entities within and
across documents (co-reference resolution),
• incorporates multiple sources of knowledge
and information to aid with analysis of text
research.google.com/pubs/NaturalLanguageProcessing.html

17
Technologies:
Entity Extraction
• A.k.a. named-entity recognition, entity identification
• Complementary to other natural language processing
• Identifies things, people, places, etc. within text (and
speech).
• Relates to the idea of concepts referred to earlier.
• Because “text” is based on language, “structure” is there
but the structure is not readily evident to a computer.

18
Technologies:
Entity Extraction
• Context-based connections allow
discernment of different meanings of a word.
• Entity extraction draws inferences based on
the logical content of the data.
• Entity extraction may be the single most
important tool for bringing structure to
unstructured data, specifically text.
• Also used for search query “suggestions”.
• An excellent example is found in Silobreaker.

22
Technologies:
Machine Learning
Computers teaching themselves
Google RankBrain
• Used in processing search results, part of Google’s
Hummingbird search algorithm
• A way of interpreting a search statement in order to
find web pages that may not have the specific words
in the search statement.
• Uses patterns from seemingly unconnected other
“complex” searches to find similarities in the current
search, then applying that information to most likely
useful content.
• Google regards this as the third most important
signal.

23
Technologies:
Big Data
• The existence of “big data” collections provides
unprecedented opportunities for computational
approaches for computers to “understand” text.
• In neural networking image entity identification
experiments, the accuracy of machine learning
algorithms improves vastly when used with large
pools of data.
• "...Google’s search engine queries a 100 petabyte
index that incorporates over 200 indicators and
whose algorithms change more than 500 times per
year."

24
Specific Applications of These
(and Other) Technologies
• Continued gradual incorporation of “expert”
techniques
• Natural language search statements
• Search by voice
• Image recognition and search: search of images,
search by image, and facial recognition
• Knowledge Graphs
• Entities in news search

25
Gradual Incorporation of
“Expert” Techniques
• An “ordinary” search isn’t what it used to be.
• Google has now quietly taken over more of the
“old” “professional searcher” techniques and now
automatically adds not just word variants, but
synonyms.

26
• Suggested searches (based on known connections and
not just based on your character string)
A "data-driven" approach - trillions of words, vs "rules“. Not
just word variants.
• The old “synonyms” (~diet) option didn’t just go away. It is
now applied automatically. (Few people use the OR.)

27
• “Did you mean” is now more often
“Showing results for”

28
• “Fuzzy Logic” – As well as searching for
words that are “close”, Google may
drop some of your “concepts” for some
records

29
Gradual Incorporation of “Expert”
Techniques
– If Google “thinks” you want specific facts and
“sees” a matching answer, you may get that
immediately.

30
Specific Applications:
Natural Language
Search Statements
• Don’t hesitate to use them!
• The above two searches give different (and
relevant) answers
• This is especially important for Google Now
and Siri!

31
Voice Search
• Apple (iOS) - Siri
• Google – Google Now
• Bing – Cortana (recently deceased?)
• These “expect” natural language, so
natural language will yield the best
results.

32
Image Recognition and Search:
Search of Images
Not much recent obvious change in Bing’s or
Google’s regular image search, but:
• “Categorization” (aspect of entity extraction) is
now shown on image search results pages
• Google, Microsoft (Bing) and Apple are heavy
into research on image identification and
classification.
• What’s happening/coming can be anticipated by
looking at Google Photos.

33
Search of Images
Bing Image Search

34
Search of Images

35
Search of Images
• In December 2015, Microsoft beat out 5 competitors
(including Google) in the ImageNet contest for
machine recognition of images
• Machines were trained to recognize images using a
“deep neural networking” method.
• Competitors must locate and identify objects from
100,000 photographs found in Flickr and search
engines and then place them in 1,000 object
categories.
• Microsoft, the winner, had an error rate of 3.5 percent
for classification and 9 percent for localization.
• Machine learning using neural networking is also very
successfully used for translations, such as in Skype’s
new translation offering

36
Image
Recognition and
Search: Search
by Image

37
Specific
Applications:
Image
Recognition and
Search: Entity and
Facial Recognition
in Google Photos

38
Knowledge Graphs
• Knowledge graphs do not originate with
Google (but Google has made the term
widely known.)
• “Knowledge graph theory was initiated by
C. Hoede, a discrete mathematician at the
University of Twente and F.N. Stokman,
a mathematical sociologist at the
University of Groningen, both in the
Netherlands.” (ca 1982)
http://doc.utwente.nl/64931/1/memo1876.pdf

39
Google Knowledge Graph
• The Google Knowledge Graph, overall,
is a database about “things” and the
connections between those things.
• Delivers and summarizes key facts
about people, places, things.
• The selection of those facts is based on
connections regarding that entity and
related entities and on what other users
have asked about that entity.

40
• Launched May 2012
• At its heart, Google Knowledge Graph is a
database of facts.
• At that time it contained 18 billion facts
between 570 million objects.
• The kinds of things included vary with the
kind of entity.
• Content comes primarily from Wikipedia,
World Factbook, Freebase/Wikidata, plus
other sources.

43
• The key power of Google Knowledge
Graph lies in its utilization of
connections between entities as
searched for by other users.
• At present, its present main weakness
is its heavy un-vetted reliance on
Wikipedia, which is not always right,
e.g., the Wikipedia article on Knowledge
Graph.

46
Bing’s Knowledge Graph
• Named “Snapshot”, it uses Bing’s Satori
technology
• Launched in June 2012
• Utilizes Wikipedia, Freebase, Qwiki,
LinkedIn, Britannica, etc.
• Builds into results interactive features
such as audio and video

49
News Applications
Examples of News Sites Effectively Using
These Technologies
• Silobreaker (example shown earlier)
• EMM

50
News Applications
EMM – European Media Monitor
• From the European Commission
• Computerized analysis of news trends
and story content
• Makes extensive use of NLP techniques
for entity extraction and clustering
• “Organizes” a vast quantity of
knowledge very efficiently.

54
So, How do we as researchers take
advantage of this?
• Get in the habit of using what's new (Siri,
Google Now, natural language).
Join the Evolution!
• Actually pay attention to Google Instant
(suggestions).
• Don't forsake the old. There are times when
you need to turn the auto-pilot off and take
charge.
• Ask questions you didn't bother asking
before [because you didn't think the search
engine would do it.]

55
So, how do we as researchers take
best advantage of this?
• Increase awareness of information quality
criteria
• Worry a bit -
– Worrisome - the general public's further reliance
on quick, single, local, twitter-length answers
– Worrisome - Localization,
– Worrisome -"echo chambers“
– " Machines making decisions on our behalf”
• Enjoy the new.

56
Questions?
Ran Hock
Online Strategies
ran@onstrat.com

Brave New World of Search Technologies

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Brave New World of Search Technologies

Similaire à Brave New World of Search Technologies (20)

Plus de voginip

Plus de voginip (20)

Dernier

Dernier (20)

Brave New World of Search Technologies

Notes de l'éditeur