SearchLove San Diego 2019 - Alexis Sanders - Quest is in the Name

| #searchlove | @alexisKsanders
quest is in the word.

the foundation of science, math,
philosophy, and intellectual
exploration boil down to one thing…
ironically teaching programs in struggle with
inquiry-based learning, falling back on fact-based
memorization – read for full thoughts: “The value
of asking questions” by keith g. kozminski

the (humble)
question…
questions are genuinely the building blocks
of learning, the root of search, inquiry-based
learning dates back to socrates

…and their answers.
the quality of our information becomes more
important as the quantity increases

which makes the process
of answering questions
fascinating!

now (more than ever),
as we have a massive
information source
according to howtogeek.com it is estimated
that google held 15 extabytes (10^12, a
trillion MB)

the internet gives us:
ultimate diversity

map of known universe and the internet
https://www.cfa.harvard.edu/news/2011-16
https://internet-map.net/

we have access to info w/:
different opinions,
point of views,
backgrounds,
countries,
etc.

our ability to get answers,
is limited only by our
imagination
the boxour potential

(and ability to ask the right
questions)

despite its beauty,
the internet
suffers from its:
• size
• low barrier to
entry

leading to:
info overload,
incorrect,
incomplete,
(ironically) ignorance, etc.
so... many....i words. sculpture by alicia
martin

to find anything
useful at all, we
needed to filter w/ a
machine
(b/c time and computational speed)
although we are better an comprehending
and processing natural language questions
(for now…)
01110101 01110011
01100101 01101100
01100101 01110011
01110011
useful.

thus, we have the rise of
information retrieval.

information retrieval systems:
an automated process,
responds to a query
by examining documents and
returning relevant information
sorted.
Modern Information Retrieval – Baeza-Yates
and Robiero-Neto in 1999 defined IR as – “”

this infers that an optimal
information retrieval system
returns all relevant documents in
a prioritized order.
“searching health information in question-answering systems”
maria-dolores olvera-lobo and juncal gutierrez-artacho (2013)
Meadows 1993

however, this implies users:
want to see webpages
users will evaluate
the process is unidirectional (i.e., not interactive)
query & page share same language
“searching health information in question-
answering systems” maria-dolores olvera-
lobo and juncal gutierrez-artacho (2013)

in reality:
• users want fast answers (to fact-based questions)
• choose first-page higher results
• search is haunted by confirmation bias
CTR by industry study:
https://twitter.com/AlexisKSanders/status/100
1544770089553920

when put under pressure, we
either get diamonds… or crushed.
(yay evolution)

a natural evolutionary improvement:
a machine that directly
answers questions

a.k.a., question and
answering systems

this isn’t a new idea
(+58 years young…)
the legacy starts with roots in the socratic method; however,
automated q/a from databased was the start with BASEBALL
(’61) and LUNAR (’72), both of which answered closed-domain
questions relating to baseball and lunar samples from the apollo
mission respectively.

so, what is a
question & answering (QA)
system?

QA is a computer science discipline
within the fields of information retrieval &
NLP which is concerned with building
systems that automatically answer
questions posed by humans in a natural
language. - wikipedia
https://en.wikipedia.org/wiki/Question_answe
ring

an interactive human-computer process
that encompasses:
• understanding users informational needs
• typically expressed in a natural language query
• retrieving relevant documents, data, or knowledge
• extracting, qualifying, & prioritizing available answers
• presenting, explaining responses in an effective
manner
definition from mark maybury, new directions
in question answering (2004)

laymen’s terms:
computer(s) answering
human questions
(in laymen’s terms)
recursive much?

visual information from mark maybury, new
directions in question answering (2004)
NLP
question/document analysis
information extraction
language generation
discourse analysis (i.e., ways in
which language is used)
IR
query formulation
document retrieval
document analysis
id’ing relevant docs
ordering docs
relevancy feedback
human-
computer
interaction
user modelling
user preferences
displays
user interaction
Q/A

www +
sources
QA process (at a very high level)
• query decomposition
• syntactic & semantic parsing
• question analysis
• translation
• classification
• expansion
• matching
• query reformulation
• document analysis
• retrieval
• id’ing relevant documents
• ordering
• relevancy feedback
• answer analysis
• id’ing candidates
• extraction
• validation
• evaluation (rank)
answer
display
answer
processing
information
retrieval
processing
query
• representation

the challenge is that people & machines
don’t process information in the same way…
an oldie, but a goodie… ☺
https://www.youtube.com/watch?v=gn4nRCC9TwQ

types of QA problems:
factoid
temporal
spatial
definitional
descriptional
biographical
opinionoid
multimedia / multimodal
multilingual
visual information from mark maybury, new
directions in question answering (2004)

visual and concept: https://chatbotslife.com/ultimate-guide-to-leveraging-
nlp-machine-learning-for-you-chatbot-531ff2dd870c
mitsuku, worbot, watson, drqa, pizzabot, eagli, baseball, lunar, etc.
ask anythingask specific area of qs
smart-machine very hard
hardrules-baseddeletion
open domainclosed domain
generative
answer by
chopping existing
lexical structures,
like paraphrasing
making brand
new content,
generating
sentences
common
concepts
discussed
w/in
research

why care: well, search
engines care,
a lot…

we can see the
seedings of this
research:
• featured snippets
• PAA
• voice

shout out to bing for:
multi-perspective answers
chatbots integrated with SERPs (for seattle restaurants)

end goal:
a system that can
respond to any
question.
a mix of human’s natural language process
and a machines processing power.

“QAS are becoming a model for the
future of web search.”
Question answering systems: a review on present
developments, challenges and trends” lorena
kodra and elina kajo mece (computer engineering
polytechnic university of Tirana) - 2017

there’s a ton of:
research,
datasets, and
competitions
being actively worked on around QAS.

“sentence compression by deletion with LSTMs” – '15
the goal of sentence compression is to generate a
shorter paraphrase of a sentence.
the deletion approach is a standard
(i.e., not reformulating words).
“Sentence Compression by Deletion with LSTMs”
Katja Filippova, Enrique Alfonseca, Carlos A.
Colmenares, Lukasz Kaiser, Oriol Vinyals (2015)

tl;dr: google team introduced an evaluation
scheme for generative models for text (i.e.,
a way to grade machines, when they use
their own words).
Eval all, trust a few, do wrong to none: Comparing
sentence generation models Ondrej Cıfka, Aliaksei
Severyn, Enrique Alfonseca, Katja Filippova (2018)

example sentence compressions
-------------- -----------
----------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------- -------------------------------------------
-------------------------------------------------------------------- ------------------
----------------------------------------------------------------------------------------------------------------------------

not gwen-gwen!
there were of course difficulties…

sidebar: apparently
nose telescopes
actually exists…
one more (just for fun)

results:
• outperformed baseline
• indicate a compression model (which is not given syntactic
information explicitly in the form of features) may demonstrate
competitive performance
• some difficult due to quotes, commas, dense
script, important context

“searchQA: a new Q&A dataset
augmented with context from a search
engine” – '17
launched searchQA (dataset of
Jeopardy! questions) w/140k
q-a pairs
“analyzing language learned by an active question
answering agent” by buck, bulian, ciaramite,
gajewski, gesmundo, houlsby, wang (2018)
140k
q-a pairs w/snippets

“identifying well-formed natural language questions” - '18
attempt to id' well-formed natural-
language questions with 25k qs classified
as: well-formed and
not well-formed.
“identifying well-formed natural language
questions” by manaal faruqui and dipanjan
das – Google AI 2018
well-formed not w-f
x25,000

achievement:
70.7% accuracy
error resulting from deep semantics and syntax
(e.g., [what is the history of dirk bikes?] vs. dirt)
“identifying well-formed natural language
questions” by manaal faruqui and dipanjan
das – Google AI 2018

“ask the right questions” – '17/18
proposes a new framework to improve QA:
active question answering (AQA).
“ask the right questions: active question
reformulation with reinforcement learning” by
buck, bulian, ciaramite, gajewski, gesmundo,
houlsby, wang (2018)

inspired by humans I
and our ability to ask the right questions.

it improves answers by
reformulating questions.
well
formed q
= easy
poorly
formed q =
hard

how: evaluated against dataset of jeopardy!
questions (which are convoluted by design)

results:
• approach = effective
• agent able to learn non-trivial information
• suggests that machine comprehension task
involve “mostly pattern matching and relevant
modelling” (i.e., it’s not comprehending)

“adversarial examples for
evaluating reading comprehension
systems” - '17
• it’s unclear how much a
reading comprehension
system understands
language
• suggests it’s not capable of
significant understanding
“adversarial examples for evaluating reading
comprehension systems” Robin Jia, Percy
Liang, CS department 2017

and of our there’s the work
on the standford question
answering dataset (SQuAD)
150k questions posed by
crowdworkers on a set of
wikipedia articles
squad is a reading comprehension dataset -
https://rajpurkar.github.io/SQuAD-explorer/
150k
SQuAD

if you have a squad, you want
BERT on it….
bidirectional encoder
representations from transformers
(not him →)
BERT is a new method of pre-training language
representations which obtains state-of-the-art results on
a wide array of Natural Language Processing (NLP)
tasks. - https://github.com/google-research/bert

look at that date…
every week has a
groundbreaking
results…
someone should create an ernie to start
competing with bert…

google ai blog - jan '19
intro’ed a new db of
300k q-a pairs,
"natural questions"
https://ai.googleblog.com/2019/01/natural-questions-new-corpus-and.html
https://ai.google.com/research/NaturalQuestions/
https://ai.google.com/research/NaturalQuestions/visualization
300k
natural questions

there’s also a comp.
guess which model
is first…
https://ai.google.com/research/NaturalQuesti
ons/competition

https://ai.google.com/research/NaturalQuestions/
what an overachiever…

so, what do we do about it?
have a problem?
can u do sth about it?
don’t worry about itdo it.
sleep, enjoy hobbies,
live life, etc.
y n
y n

well, obvi:
o strive for first place,
o in a manner that supports
long-term stability,
o focus build a loyal base,
o enjoy the ride.

how do we strive for first place?

we return to the SEO model,
for additional context see:
https://moz.com/blog/seo-cyborg
crawl render index rank connect
technical content
signaling

get a checklist at:
moz.com/blog/seo-cyborg

+focus on
strategic content and
experiences

G is probably going to own these (eventually):
featured snippet:
factoid
temporal
descriptional
definitional
biographical
local features:
spatial
image search:
images
YouTube:
video

probably, they’ll also continue to go after
transactional opportunities
(expanding what they’re already doing with booking in hotels,
flights, and entertainment)

what are best bets?
o brand questions (they’re yours)
o niche, expertise questions
o opinionoid
o video
o interactive experiences
o seamless user experiences*
see checklist on seamlessness:
https://searchengineland.com/2019-in-
search-find-your-seamlessness-309844

a final note:

even though we’re not at a point where
machines return our answers, the
general public acts as if we are.
shout out to ian madrigal for making these
hearings somewhat bearable…
https://twitter.com/iansmadrig/status/10725327674
92182024
(cough)
(cough)

we see this behavior in the CTR on
top results.
https://twitter.com/AlexisKSanders/status/100
1544770089553920/

we understand that search
engines are just returning the
most relevant document for
the query,

that the response is determined
(in part) by the question,
well, what is it G?

and that (even though it’s is extraordinarily
impressive) search is not perfect.
https://www.seroundtable.com/google-
pyramids-are-85-years-old-26839.html

with the power of
knowledge (of the
internet) comes
responsibilities.

suggested list of our responsibilities as education
internet denizens:
□ being a gateway for quality information
□ attempt to be aware of our own biases
□ validating sources (making a good faith effort to)
□ educating others (on searches fallibility & discerning fact from fiction)
□ not being a troll (remembering that people are on the other end)
□ reporting (and escalating) egregious errors
□ emphasize credibility and security w/clients

recap:
• questions contribute to answers
• QAS are a potential strategic direction
for search engines
• established what SEOs can do
• our responsibility as internet citizens
It’s been a pleasure getting to know you.
thank you for your time and attention!

fin.

after a long sissy-
sophie day at her
favorite froyo loc
sophia's
aunt
waiting for santa to
arrive in town

merkle’s seo partners

thank you for participating!
@AlexisKSanders
/in/alexissanders

“deal or no deal? end-to-end learning for negotiation
dialogues” – '17
trained end-to-end model for negotiation (i.e.,
machine had to learn linguistic and reasoning skill)
“deal or no deal? end-to-end learning for
negotiation dialogues” mike lewis, denis yarats,
yann n. dauphin, devi parikh, dhruv batra (2017)

negotiation requires complex
communications and reasoning
skills.

results:
• agents demonstrated
compromise, holding out,
and to deceive w/o human
design
• can be improved in self-
play (practicing on negotiating with
computers first)

and ultimately…
“the answer determines the
success of the question-
answering system.”
“when the answer comes into question in
question-answering: survey and open
issues” - 2011

SearchLove San Diego 2019 - Alexis Sanders - Quest is in the Name

Recommended

Recommended

More Related Content

Similar to SearchLove San Diego 2019 - Alexis Sanders - Quest is in the Name

Similar to SearchLove San Diego 2019 - Alexis Sanders - Quest is in the Name (20)

More from Distilled

More from Distilled (20)

Recently uploaded

Recently uploaded (20)

SearchLove San Diego 2019 - Alexis Sanders - Quest is in the Name