SlideShare une entreprise Scribd logo
1  sur  50
How Search Engines Leverage
Opinion-based Articles for Ranking
Rethinking Search: Corroboration of Web Answers
Koray Tuğberk GÜBÜR
Components for Re-ranking based on Opiniated Factoids
01
Uncertain
Inference
Knowledge
Base
02
Corroboration
of Web
Answers
03
Embarrassment
Factor
04
Open Information
Extraction
5
External
Databases
6
7
Evidence
Aggregation
09
9
Information
Literacy
06
07
08
10 Semantic Role
Labeling
Truth Ranges
05
Uncertain Inference
• Uncertain Inference is found by C. J. Van
Rijsbergen from Glasgow University.
• Focuses on “Query Inference” with
“Context Understanding”.
• Query Path, and Query Context (Context-
Sensitive Search Elements) are used.
• Query is processed with Probable
Probabilities for Question Generation.
• It requires a “Knowledge Base” for
understanding Factual Needs for the query.
• “Uncertain facts” have a plausibility
threshold that gives “Opinions” to exist on
results.
• Extract word sequences in News Titles.
How do Search Engines know facts?
Andrew Houge
The Structured Search Engine
Uncertain Inference
How do Search Engines know facts?
Andrew Houge – The Structured Search Engine
• Query Processing and Parsing is
another topic.
• But, to reach out to “wrong” and
“true” facts, the high level of
confidence and coverage are
needed.
• The Uncertain Inference follows
users’ behaviors in “Adaptive
Search”, or sometimes, it uses
“word-sequences” in a mega corpus.
• Extract, Entity-Attribute Pairs and
their synonyms from News Articles.
Knowledge Base
• Different than Knowledge Graph.
• Stores facts, or factual values for the
same entity-attribute pairs, and
triples.
• It is dynamic.
• A fact from today might be
inaccurate information tomorrow.
• Procedural Part of Knowledge Bases
helps to update the connections
between components.
• Understand which facts are
approved by search engine.
Browsable Fact Repository
Corroboration of Web Answers
• One of the best 10 “Opinion Papers” in
Information Retrieval.
• Directly connected to the concept of
“Helpful Content”, or “Information
Responsiveness”.
• “Even, main web source has
contradicting information for the same
question, which one is fact?”.
• Corroboration of Web Answers focus on
“Truth Ranges”, and “Answer
Prominence” to choose answers from
certain sources.
• Create your own truth range by
auditing ranking resources.
How do Search Engines know facts?
Corroboration of Web Answers
• Minji Wu, and Amelia Marian focus on
numeric values and measure units to
find real authorities.
• PageRank, Source Authority, First
Answer, Closeness to First Answer and
De-duplication are used to determine a
“Fact Range”, or “Truth Range”.
• The “Truth Range” changes from today
to tomorrow according to ranking
sources
• Use numeric values, metrics, dates, and
measurement units to have higher
precision.
How do Search Engines know facts?
Corroboration of Web Answers
• Google cited the research paper
of “Corroborating Answers from
Multiple Web Sources” more than
40 times in “Candidate Answer
Passage” patent series.
• It is used in Featured Snippets
(Web Answers) since 2018.
• This brings us to “Embarrassment
Factor”.
• Use “safe” and “indirect” answers
for conflicted issues.
How do Search Engines know facts?
Embarrassment Factor
• What is Embarrassment Factor?
• Does a Search Engine get shame?
• Can you make a search engine feel shame
with your bad answer, or opinion?
• What happens if you tell that “Barrack
Obama is a communist” in a featured
snippet? Or, “Global Warming is hoax”, or
“Vaccines are for controlling your brain”.
• Let’s remember, “Truth Ranges”.
• Do not play with the patience of search
engine engineers. Do not take advantage
of fundamental NLP understanding.
How do Search Engines know facts?
Truth Ranges
• Fuzzy Logic is used.
• Not every wrong is equal.
• Some facts are more facts.
• Some opinions are accepted as consensus.
• Upper and Bottom Limits are used to
determine “safe opinions”.
• Google created “Content Advisories” to
help for “Information Consensus”.
• Stay in the consensus (reports with
descriptive news), unless it is “satiric”
(critiques with questions).
• Use “question-format” as a shield against
algorithms, if you are outside of truth
ranges.
Which one is more factual?
Source: Wesley Chai
Truth Ranges
• There are two different approaches
in Linguistics for a “truth”, or “fact”.
• Words like “will”, “can”, “might”, “may”,
“may” decrease the certainty.
• Numeric Ranges, or Sentiment
Magnitude and Direction are used.
• The middle of range is called
“Fixpoint”.
• The answers that are outside of
Range is filtered out.
• Find the balance between
“precision” and “coverage” in news
titles, and intros.
How do Search Engines know facts?
Truth Ranges
• According to Fuzzy Logic:
• 1 > 5 and 1 > 10 are not equally wrong.
• One of them is more wrong than other.
• For “Disagreeing Views”,
“Corroboration” happens with
inference.
• “Barrack Obama is born in Hawaii”,
• “Barrack Obama is born in Kenya”.
• A search engine might see “Barrack
Obama is a US Citizen” as a safe answer to
give to avoid embarrassment.
• Use the absolute truths, for projecting
a safe answer rather than giving a
possible wrong factoid.
Journalists share organization’s trustworthiness
Source: Indiatimes
Source: Making Better Informed Trust Decisions with Generalized Fact-Finding
Truth Ranges
• Uncertainty is used as a measurement to filter
factoids.
• Phrases like “I am sure”, or “%45 possibility” create
uncertainty.
• Intrinsic Ambiguities decrease the trust to the
source.
• “Who claims what” is key point for fact-finding
algorithms.
• Source Reliability and, “Variance” and “Mean”
values are used for “fixpoints”.
• Do not use “I am sure”, or “Pretty sure”, “I think…”,
“In my opinion…”, “It might”, “It may”. Tell whether
the “bomb exploded”, or not. Tell “how many
people died”, do not tell “With %45 possibility,
over 20 people…”
• Compare your numbers, names, dates and places
for an event to your competitors.
“Safe Answers” is better.
Source: Making Better Informed Trust Decisions with Generalized Fact-Finding
CIUV: Collaborating Information Against Unreliable Views
Truth Ranges: Why do we need PageRank?
• Speed.
• Google and other search engines do not have time
to process text of the documents.
• News SEO has to prioritize “indexing”.
• News Search Engine has to serve everything in
fastest way.
• Processing the text, checking accuracy is not
possible in seconds, minutes, or hours and days,
when a source publishes 100,000 words a day.
• Thus, Truth Ranges is a “long-term ranking factor”
for news sources.
• Google gets angry when I give PageRank related
suggestions.
• Understand that, some sources are prioritized,
even if they scrape and use your original news
story.
Groundedness - Unanimity
Source: Towards an axiomatic approach to truth discovery
Source: Towards an axiomatic approach to truth discovery
Truth Ranges: Why do we need PageRank?
We guess that this news is quality…
Source: Corroborating Information from Disagreeing Views
Source: Corroborating Information from
Disagreeing Views
Information Extraction (OIE)
An example of OIE
• Open Information Extraction is found
by WAVII.
• WAVII is bought by Google for $30
Million.
• It is used to expand Google’s
Knowledge Graph.
• OIE is to extract triples, and recognize
minor entities to structure a semantic
network.
• Extract “predicates” from news
articles. Create tuples from
“predicates, nouns, and subjects”.
• Understand which fact, or factoid is
given first, or later.
Open Information Extraction Example from the researchers.
Information Extraction (OIE): Rel-grams
Precision / Coverage
• Open Information Extraction is to
extract opinions, and facts about
certain concepts, and named entities.
• It uses “tuples” as “predicate” and
“noun”.
• Aggregates occurrences, standardizing
the masked sections by comparing the
different OIE iterations.
• Match “prepositions” to
“interrogative” terms.
• Use “uncertain inference” to extract
interrogative terms.
Information Extraction (OIE): Rel-grams
Word Connections and Sense Disambiguation
• OIE is used by Google to recognize and
understand micro entities, and knowledge on
the web.
• OIE is helpful for processing the text in the
news sources to understand latest changes in
real-world, and reflect it on the knowledge
base.
• Open Information Extraction is different than
Information Retrieval.
• The opinions and facts of web sources are
compared to each other to understand the
higher groundedness.
• Update outdated facts in your website. “X
lives in P” declaration might be wrong, if “X”
is not alive anymore. How many “died in”
entity lives in your internal knowledge base?
External Databases (Data Commons)
Structuring the Web
• Data Commons is aggregation of
unified databases for nearly every
topic, industry, geography and
entity.
• It is a common fact repository that
is open to all web.
• It is supported by Ramanathan V.
Guha.
• It focuses on statistical data.
• Query external databases for
“statistics” to create statistic-rich
news articles.
External Databases (Data Commons)
How do Search Engines know facts?
• Google integrated Data
Commons Project to its own
algorithms.
• The announcement is done by
Prabhakar Raghavan.
• It helps to understand accuracy,
and authority of an information
source.
• A trustworthy news article
propagate its trust to next news
article.
External Databases (Data Commons)
“As we may think”
External Databases (Data Commons)
“As we may think”
“Google is planned to be third-part of your brain”
- Sergey Bring
“Google is designed as a Star Trek Computer to
answer your needs.
It is not created for websites, it is created for users.
- Larry Page
“They already hate Google, so what is the down-
side?”
- Craig Nevill-Manning
Semantic Role Labeling
Which news source reflected emotions?
• Words’ order change, but sentence’s
meaning stay same.
• Same opinion can be expressed in
many different ways.
• XYZ corporation bought the stock.
• They sold the stock to XYZ corporation.
• The stock was bought by XYZ corporation.
• The purchase of the stock by XYZ
• corporation ...
• The stock purchase by XYZ corporation ...
• OIE provides an aggregation for
tuples, and relational n-grams to
extract factual propositions.
• Semantic Role Labels help for
standardization based on
“predicates”.
• Match “emotions” to “causes” with
shorter declarations, stay away from
“nested declarations”.
Semantic Role Labeling as Dependency Parsing: Exploring
Latent Tree Structures Inside Arguments
Semantic Role Labeling
Agent – Predicate - Theme
• Predicates can take multiple
arguments.
• Semantic role labels are descriptions of
the semantic relation between the
predicate and its arguments.
• Semantic Roles are abstract
representations of the role that an
argument plays in the event described
by the predicate.
• Semantic Role Labeling assigns roles to
the constituents of a sentence.
• Semantic selection restrictions allow
words to have semantic contractions on
the semantic properties.
• Understand “patterns of human
mind”. Reflect these patterns in news
articles, according to “macro-
context”.
Semantic Role Labeling
Predicate is context.
• Let’s say, “George Bush” phrase appeared
500,000 times in the News Titles.
• Google has to categorize them according to the
news contexts.
• “Context-based Person Search” is used for this
task.
• But, News Search Engines have to be fast.
• There is no time for processing the text.
• But, “SRL” is a quick process.
• Check Semantic Role Label of Entity, is it agent? Or, is it
theme?
• Which instrument is used?
• Which goal is mentioned?
• Which propositional structure is used?
• For the sentence “George Bush signed military
operation”, the “Relational Grams”, “Aggregated
Tuples”, and “Semantic Role Labels” help a
search engine to differentiate entities/context from
each other.
• “Grouping entities” is not enough. Group
“contexts”. “X and Love Life”, “X and Career”
have different contexts. Connections should
follow “identity” and context together. Analyze
“News Context”, more than “Entity” that
appears.
Semantic Role Labeling
How do opinions differ in phrases?
• Beyond Classification:
• It helps to see the factual information.
• It is used to differentiate opinions from
each other.
• It measures the possibility of truth.
• It understands the representation of the
web source according to its connection to
others.
• Semantic Role Labeling is used by
semantic search engines to have
better entity associations.
• The suggested associations, or
graphs are accepted or rejected by
semantic network constructors.
• “Names in the News Title”
should match the Faces in the
News Image.
Source: Marina Santini, Brighton University
Source: Grounded Semantic Role Labeling
Question-Answer Pairs
Which evidence is correct?
• Question Generation and Answer
Pairing are NLP tasks for fact
extraction.
• Question generation involves query
parsing and processing.
• Answer pairing involves dense-context
retrieval and question-answer format
matching.
• But, it is not clear which answer is
more accurate.
• Thus, Question-Answer Coverage,
Entity-oriented search and Semantic-
Syntatic Parsing are used.
• Matching entities, attributes,
queries, or phrases are not good
enough, as long as information is
not responsive.
Source: Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
Information Literacy - Consensus
Who said it?
• Google started to give education for
Information Literacy.
• It involves recognizing information source
before the information on the source.
• Google ranks News Sources for certain
topics, contexts and entities before ranking
the news.
• The need of “fast indexing and serving” will
always be more important than
understanding the “truth” at the first stage.
• Thus, the quality news sources have higher
accuracy with more historical data, and
PageRank.
• Google has to assume that truth comes
from strength of repeated evidence from
the most authoritative sources.
• Audit “About the source” panels of your
competitors, create a review, and third-
party mention gap.
Information Literacy - Consensus
Author Authority?
https://searchengineland.com/what-social-signals-do-google-
bing-really-count-55389
• Danny Sullivan once asked Google
and Bing whether they use social
signals, or author names to
understand who is the real expert on
a topic.
• Both of the search engines said that
they audit “author quality” and
“author expertise” for different
topics.
• Associate authoritative authors
with your web source stronger, if
they are writing for multiple web
sources.
Information Literacy - Consensus
How do they use Knowledge Base?
Integrating Knowledge Graph and Natural Text for Language
Model Pre-training
• There are hundreds of different
algorithms to understand the
authenticity and “true facts”.
• For a search engine engineer,
there is no “lie” and “fact”.
• It is only “true facts” and “wrong
facts”.
• And, KELM-like algorithms help
together to differentiate them
from each other.
• Query “Google Knowledge Graph
API” to understand what they
state for the same entity.
Information Literacy - Cues
What makes you trustworthy?
• The research that Google cites
mentioned that there are “6 Cues for
snap judgments about whom to trust”.
• These involve “images”, “brands”,
“headlines – tonality”, “social cues”,
“sponsors”, and “interactivity”.
• Google works with MediaWise to
perform surveys and integrate findings
to their own algorithms.
• Create your own “audit templates” for
news articles for these 6 different
verticals. Mark up “MediaWise”.
Information Literacy – About this source
Why does your opinion matter?
• The story of “Web Answers” is
too long.
• Context-terms, Topical Entries,
Candidate Answer Passages,
Context-scoring for Candidate
Answer Passages, and many
more concepts…
• Google Product Manager calls
these “word callouts”.
• Search Engine Engineers call
them “representative answer”.
• Learn NLP. Scoring Candidate Answer Passages
Some Google Designs
Machine learning to identify opinions in documents
•Identifying opinionated portions in documents
•Relating opinionated portions inside the document
and/or across other documents (e.g., that relate to
the same story)
•To surface opinionated snippets or quotes to users
of a news aggregation.
•To identify portions of a document that convey
opinion.
•Google might rank a source for “report”, but not
for “opinion”. Understand which vertical has a
higher chance for your web source.
Some Google Designs
System and method for supporting editorial opinion in the
ranking of search results
“Editorial opinion” without “distorting facts” helps you for ranking.
Especially for “first-person” experience stories, or reviews.
Some Google Designs
Embedded communication of link information
“Information in the improved link tags may allow one or more publishers of content and/or
documents to convey opinions about content and/or documents at one or more content
locations and/or one or more document locations. The link tags may also allow one or more
publishers to convey a weighting of the relative importance of one or more content locations
and/or one or more document locations. In some embodiment, at least a portion of the
information in the improved link tags may be encrypted, to allow one or more publishers to
restrict the audience that may view the information in the link tags….. The improved link tags may
allow the publishers to communicate additional information, such as opinions, about the content locations
and/or document locations.”
Categorize boilerplate/main content links according to their context.
“Joe Biden and Congress” might have a different “block-link” than “Joe Biden and Elections”.
Some Google Designs
Aspect-Based Sentiment Summarization
Use “key-points” with “sentiments” to summarize essence of news stories.
Topicality and Context Filters
Long and Shor Term Solutions for SERP Construction in News Vertical
Short-term Solutions for News Search Engines:
• Classify authoritative sources (PageRank,
Article Count, Unique Sentence Count,
Publication Frequency, Length, Citations,
Search Behaviors).
• Rank authoritative sources for different
topics.
• Classify and rank news web pages according
to their context, and topicality.
• Serve the most relevant news articles based
on trust and confidence.
Long-term Solutions for News Search Engines:
• Process text.
• Understand facts.
• Audit accuracy and comprehensiveness.
• Filter the sources, by re-assigning topical
relevance and authority.
Samples from News SEO with Factoids
NaturalNews
Samples from News SEO with Factoids
NaturalNews
Samples from News SEO with Factoids
Powerofpositivity
Samples from News SEO with Factoids
Powerofpositivity
Samples from News SEO with Factoids
RealClearPolitics
Samples from News SEO with Factoids
BREITBART
Some Samples
Politico
What would you do if you were Google?
Which opinions should rank?
What would you do if you were Google?
Which opinions should rank?
What would you do if you were Google?
Which opinions should rank?
What would you do if you were Google?
Which opinions should rank?
What would you do if you were Google?
Which opinions should rank?
Thanks
Q&A

Contenu connexe

Tendances

William slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-searchWilliam slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-searchBill Slawski
 
Keyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic WebKeyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic WebBill Slawski
 
Everything You Didn't Know About Entity SEO
Everything You Didn't Know About Entity SEO Everything You Didn't Know About Entity SEO
Everything You Didn't Know About Entity SEO Sara Taher
 
Slawski New Approaches for Structured Data:Evolution of Question Answering
Slawski   New Approaches for Structured Data:Evolution of Question Answering Slawski   New Approaches for Structured Data:Evolution of Question Answering
Slawski New Approaches for Structured Data:Evolution of Question Answering Bill Slawski
 
SEO & Patents Vrtualcon v. 3.0
SEO & Patents Vrtualcon v. 3.0SEO & Patents Vrtualcon v. 3.0
SEO & Patents Vrtualcon v. 3.0Bill Slawski
 
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...Ahrefs
 
Semantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA ConSemantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA ConBill Slawski
 
Antifragility in Digital Marketing
Antifragility in Digital MarketingAntifragility in Digital Marketing
Antifragility in Digital MarketingElias Dabbas
 
SEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateSEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateKoray Tugberk GUBUR
 
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOConAleyda Solís
 
Quality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGCQuality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGCHamlet Batista
 
How to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With PythonHow to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With Pythonsearchsolved
 
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesPhilipp Klöckner
 
How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...Oban International
 
Data-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costsData-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costsadlift
 
Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Andrew Charlton
 
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...LazarinaStoyanova
 

Tendances (20)

William slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-searchWilliam slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-search
 
Semantic search
Semantic searchSemantic search
Semantic search
 
Keyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic WebKeyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic Web
 
Everything You Didn't Know About Entity SEO
Everything You Didn't Know About Entity SEO Everything You Didn't Know About Entity SEO
Everything You Didn't Know About Entity SEO
 
Slawski New Approaches for Structured Data:Evolution of Question Answering
Slawski   New Approaches for Structured Data:Evolution of Question Answering Slawski   New Approaches for Structured Data:Evolution of Question Answering
Slawski New Approaches for Structured Data:Evolution of Question Answering
 
Entity seo
Entity seoEntity seo
Entity seo
 
SEO & Patents Vrtualcon v. 3.0
SEO & Patents Vrtualcon v. 3.0SEO & Patents Vrtualcon v. 3.0
SEO & Patents Vrtualcon v. 3.0
 
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
Machine Learning use cases for Technical SEO Automation Brighton SEO Patrick ...
 
Semantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA ConSemantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA Con
 
Antifragility in Digital Marketing
Antifragility in Digital MarketingAntifragility in Digital Marketing
Antifragility in Digital Marketing
 
SEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateSEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
SEO Case Study - Hangikredi.com From 12 March to 24 September Core Update
 
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
 
Quality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGCQuality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGC
 
How to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With PythonHow to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With Python
 
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
 
How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...
 
Data-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costsData-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costs
 
Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022
 
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
 

Similaire à Opinion-based Article Ranking for Information Retrieval Systems: Factoids and Facts

How to evaluate the whole web (without being Google)
How to evaluate the whole web (without being Google)How to evaluate the whole web (without being Google)
How to evaluate the whole web (without being Google)Dixon Jones
 
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyondBill Slawski
 
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...Jess Melia
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationLorri Mon
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsSloan Carne
 
Data Informed Design - Good Tech Test - May 2018
Data Informed Design - Good Tech Test - May 2018Data Informed Design - Good Tech Test - May 2018
Data Informed Design - Good Tech Test - May 2018Courtney Clark
 
Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013BrightEdge
 
Smx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalSmx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalMarianne Sweeny
 
Enterprise Search and Findability in 2013
Enterprise Search and Findability in 2013Enterprise Search and Findability in 2013
Enterprise Search and Findability in 2013Findwise
 
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of SearchBill Slawski
 
Designing Big Content - Search Exchange 2013
Designing Big Content - Search Exchange 2013Designing Big Content - Search Exchange 2013
Designing Big Content - Search Exchange 2013Brian_Chappell
 
Evaluating Webpages
Evaluating WebpagesEvaluating Webpages
Evaluating Webpageshisled
 
CSI: Clinical Site Intelligence
CSI: Clinical Site IntelligenceCSI: Clinical Site Intelligence
CSI: Clinical Site IntelligencegoBalto
 
Class 1-become-an-online-sleuth
Class 1-become-an-online-sleuthClass 1-become-an-online-sleuth
Class 1-become-an-online-sleuthWheeler School
 
What IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherWhat IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherIan Lurie
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeMarianne Sweeny
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataPrecisely
 

Similaire à Opinion-based Article Ranking for Information Retrieval Systems: Factoids and Facts (20)

How to evaluate the whole web (without being Google)
How to evaluate the whole web (without being Google)How to evaluate the whole web (without being Google)
How to evaluate the whole web (without being Google)
 
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyond
 
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
 
Data Informed Design - Good Tech Test - May 2018
Data Informed Design - Good Tech Test - May 2018Data Informed Design - Good Tech Test - May 2018
Data Informed Design - Good Tech Test - May 2018
 
Dean r berry Determining the Credibility of Sources
Dean r berry  Determining the Credibility of SourcesDean r berry  Determining the Credibility of Sources
Dean r berry Determining the Credibility of Sources
 
Dean r berry a determining the credibilitey of sources final 2 27
Dean r berry a determining the credibilitey of sources  final 2 27Dean r berry a determining the credibilitey of sources  final 2 27
Dean r berry a determining the credibilitey of sources final 2 27
 
Search V Next Final
Search V Next FinalSearch V Next Final
Search V Next Final
 
Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013
 
Smx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalSmx toronto adv-kw-research-final
Smx toronto adv-kw-research-final
 
Enterprise Search and Findability in 2013
Enterprise Search and Findability in 2013Enterprise Search and Findability in 2013
Enterprise Search and Findability in 2013
 
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of Search
 
Designing Big Content - Search Exchange 2013
Designing Big Content - Search Exchange 2013Designing Big Content - Search Exchange 2013
Designing Big Content - Search Exchange 2013
 
Evaluating Webpages
Evaluating WebpagesEvaluating Webpages
Evaluating Webpages
 
CSI: Clinical Site Intelligence
CSI: Clinical Site IntelligenceCSI: Clinical Site Intelligence
CSI: Clinical Site Intelligence
 
Class 1-become-an-online-sleuth
Class 1-become-an-online-sleuthClass 1-become-an-online-sleuth
Class 1-become-an-online-sleuth
 
What IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherWhat IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each Other
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 

Dernier

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 

Dernier (20)

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 

Opinion-based Article Ranking for Information Retrieval Systems: Factoids and Facts

  • 1. How Search Engines Leverage Opinion-based Articles for Ranking Rethinking Search: Corroboration of Web Answers Koray Tuğberk GÜBÜR
  • 2. Components for Re-ranking based on Opiniated Factoids 01 Uncertain Inference Knowledge Base 02 Corroboration of Web Answers 03 Embarrassment Factor 04 Open Information Extraction 5 External Databases 6 7 Evidence Aggregation 09 9 Information Literacy 06 07 08 10 Semantic Role Labeling Truth Ranges 05
  • 3. Uncertain Inference • Uncertain Inference is found by C. J. Van Rijsbergen from Glasgow University. • Focuses on “Query Inference” with “Context Understanding”. • Query Path, and Query Context (Context- Sensitive Search Elements) are used. • Query is processed with Probable Probabilities for Question Generation. • It requires a “Knowledge Base” for understanding Factual Needs for the query. • “Uncertain facts” have a plausibility threshold that gives “Opinions” to exist on results. • Extract word sequences in News Titles. How do Search Engines know facts? Andrew Houge The Structured Search Engine
  • 4. Uncertain Inference How do Search Engines know facts? Andrew Houge – The Structured Search Engine • Query Processing and Parsing is another topic. • But, to reach out to “wrong” and “true” facts, the high level of confidence and coverage are needed. • The Uncertain Inference follows users’ behaviors in “Adaptive Search”, or sometimes, it uses “word-sequences” in a mega corpus. • Extract, Entity-Attribute Pairs and their synonyms from News Articles.
  • 5. Knowledge Base • Different than Knowledge Graph. • Stores facts, or factual values for the same entity-attribute pairs, and triples. • It is dynamic. • A fact from today might be inaccurate information tomorrow. • Procedural Part of Knowledge Bases helps to update the connections between components. • Understand which facts are approved by search engine. Browsable Fact Repository
  • 6. Corroboration of Web Answers • One of the best 10 “Opinion Papers” in Information Retrieval. • Directly connected to the concept of “Helpful Content”, or “Information Responsiveness”. • “Even, main web source has contradicting information for the same question, which one is fact?”. • Corroboration of Web Answers focus on “Truth Ranges”, and “Answer Prominence” to choose answers from certain sources. • Create your own truth range by auditing ranking resources. How do Search Engines know facts?
  • 7. Corroboration of Web Answers • Minji Wu, and Amelia Marian focus on numeric values and measure units to find real authorities. • PageRank, Source Authority, First Answer, Closeness to First Answer and De-duplication are used to determine a “Fact Range”, or “Truth Range”. • The “Truth Range” changes from today to tomorrow according to ranking sources • Use numeric values, metrics, dates, and measurement units to have higher precision. How do Search Engines know facts?
  • 8. Corroboration of Web Answers • Google cited the research paper of “Corroborating Answers from Multiple Web Sources” more than 40 times in “Candidate Answer Passage” patent series. • It is used in Featured Snippets (Web Answers) since 2018. • This brings us to “Embarrassment Factor”. • Use “safe” and “indirect” answers for conflicted issues. How do Search Engines know facts?
  • 9. Embarrassment Factor • What is Embarrassment Factor? • Does a Search Engine get shame? • Can you make a search engine feel shame with your bad answer, or opinion? • What happens if you tell that “Barrack Obama is a communist” in a featured snippet? Or, “Global Warming is hoax”, or “Vaccines are for controlling your brain”. • Let’s remember, “Truth Ranges”. • Do not play with the patience of search engine engineers. Do not take advantage of fundamental NLP understanding. How do Search Engines know facts?
  • 10. Truth Ranges • Fuzzy Logic is used. • Not every wrong is equal. • Some facts are more facts. • Some opinions are accepted as consensus. • Upper and Bottom Limits are used to determine “safe opinions”. • Google created “Content Advisories” to help for “Information Consensus”. • Stay in the consensus (reports with descriptive news), unless it is “satiric” (critiques with questions). • Use “question-format” as a shield against algorithms, if you are outside of truth ranges. Which one is more factual? Source: Wesley Chai
  • 11. Truth Ranges • There are two different approaches in Linguistics for a “truth”, or “fact”. • Words like “will”, “can”, “might”, “may”, “may” decrease the certainty. • Numeric Ranges, or Sentiment Magnitude and Direction are used. • The middle of range is called “Fixpoint”. • The answers that are outside of Range is filtered out. • Find the balance between “precision” and “coverage” in news titles, and intros. How do Search Engines know facts?
  • 12. Truth Ranges • According to Fuzzy Logic: • 1 > 5 and 1 > 10 are not equally wrong. • One of them is more wrong than other. • For “Disagreeing Views”, “Corroboration” happens with inference. • “Barrack Obama is born in Hawaii”, • “Barrack Obama is born in Kenya”. • A search engine might see “Barrack Obama is a US Citizen” as a safe answer to give to avoid embarrassment. • Use the absolute truths, for projecting a safe answer rather than giving a possible wrong factoid. Journalists share organization’s trustworthiness Source: Indiatimes Source: Making Better Informed Trust Decisions with Generalized Fact-Finding
  • 13. Truth Ranges • Uncertainty is used as a measurement to filter factoids. • Phrases like “I am sure”, or “%45 possibility” create uncertainty. • Intrinsic Ambiguities decrease the trust to the source. • “Who claims what” is key point for fact-finding algorithms. • Source Reliability and, “Variance” and “Mean” values are used for “fixpoints”. • Do not use “I am sure”, or “Pretty sure”, “I think…”, “In my opinion…”, “It might”, “It may”. Tell whether the “bomb exploded”, or not. Tell “how many people died”, do not tell “With %45 possibility, over 20 people…” • Compare your numbers, names, dates and places for an event to your competitors. “Safe Answers” is better. Source: Making Better Informed Trust Decisions with Generalized Fact-Finding CIUV: Collaborating Information Against Unreliable Views
  • 14. Truth Ranges: Why do we need PageRank? • Speed. • Google and other search engines do not have time to process text of the documents. • News SEO has to prioritize “indexing”. • News Search Engine has to serve everything in fastest way. • Processing the text, checking accuracy is not possible in seconds, minutes, or hours and days, when a source publishes 100,000 words a day. • Thus, Truth Ranges is a “long-term ranking factor” for news sources. • Google gets angry when I give PageRank related suggestions. • Understand that, some sources are prioritized, even if they scrape and use your original news story. Groundedness - Unanimity Source: Towards an axiomatic approach to truth discovery Source: Towards an axiomatic approach to truth discovery
  • 15. Truth Ranges: Why do we need PageRank? We guess that this news is quality… Source: Corroborating Information from Disagreeing Views Source: Corroborating Information from Disagreeing Views
  • 16. Information Extraction (OIE) An example of OIE • Open Information Extraction is found by WAVII. • WAVII is bought by Google for $30 Million. • It is used to expand Google’s Knowledge Graph. • OIE is to extract triples, and recognize minor entities to structure a semantic network. • Extract “predicates” from news articles. Create tuples from “predicates, nouns, and subjects”. • Understand which fact, or factoid is given first, or later. Open Information Extraction Example from the researchers.
  • 17. Information Extraction (OIE): Rel-grams Precision / Coverage • Open Information Extraction is to extract opinions, and facts about certain concepts, and named entities. • It uses “tuples” as “predicate” and “noun”. • Aggregates occurrences, standardizing the masked sections by comparing the different OIE iterations. • Match “prepositions” to “interrogative” terms. • Use “uncertain inference” to extract interrogative terms.
  • 18. Information Extraction (OIE): Rel-grams Word Connections and Sense Disambiguation • OIE is used by Google to recognize and understand micro entities, and knowledge on the web. • OIE is helpful for processing the text in the news sources to understand latest changes in real-world, and reflect it on the knowledge base. • Open Information Extraction is different than Information Retrieval. • The opinions and facts of web sources are compared to each other to understand the higher groundedness. • Update outdated facts in your website. “X lives in P” declaration might be wrong, if “X” is not alive anymore. How many “died in” entity lives in your internal knowledge base?
  • 19. External Databases (Data Commons) Structuring the Web • Data Commons is aggregation of unified databases for nearly every topic, industry, geography and entity. • It is a common fact repository that is open to all web. • It is supported by Ramanathan V. Guha. • It focuses on statistical data. • Query external databases for “statistics” to create statistic-rich news articles.
  • 20. External Databases (Data Commons) How do Search Engines know facts? • Google integrated Data Commons Project to its own algorithms. • The announcement is done by Prabhakar Raghavan. • It helps to understand accuracy, and authority of an information source. • A trustworthy news article propagate its trust to next news article.
  • 21. External Databases (Data Commons) “As we may think”
  • 22. External Databases (Data Commons) “As we may think” “Google is planned to be third-part of your brain” - Sergey Bring “Google is designed as a Star Trek Computer to answer your needs. It is not created for websites, it is created for users. - Larry Page “They already hate Google, so what is the down- side?” - Craig Nevill-Manning
  • 23. Semantic Role Labeling Which news source reflected emotions? • Words’ order change, but sentence’s meaning stay same. • Same opinion can be expressed in many different ways. • XYZ corporation bought the stock. • They sold the stock to XYZ corporation. • The stock was bought by XYZ corporation. • The purchase of the stock by XYZ • corporation ... • The stock purchase by XYZ corporation ... • OIE provides an aggregation for tuples, and relational n-grams to extract factual propositions. • Semantic Role Labels help for standardization based on “predicates”. • Match “emotions” to “causes” with shorter declarations, stay away from “nested declarations”. Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
  • 24. Semantic Role Labeling Agent – Predicate - Theme • Predicates can take multiple arguments. • Semantic role labels are descriptions of the semantic relation between the predicate and its arguments. • Semantic Roles are abstract representations of the role that an argument plays in the event described by the predicate. • Semantic Role Labeling assigns roles to the constituents of a sentence. • Semantic selection restrictions allow words to have semantic contractions on the semantic properties. • Understand “patterns of human mind”. Reflect these patterns in news articles, according to “macro- context”.
  • 25. Semantic Role Labeling Predicate is context. • Let’s say, “George Bush” phrase appeared 500,000 times in the News Titles. • Google has to categorize them according to the news contexts. • “Context-based Person Search” is used for this task. • But, News Search Engines have to be fast. • There is no time for processing the text. • But, “SRL” is a quick process. • Check Semantic Role Label of Entity, is it agent? Or, is it theme? • Which instrument is used? • Which goal is mentioned? • Which propositional structure is used? • For the sentence “George Bush signed military operation”, the “Relational Grams”, “Aggregated Tuples”, and “Semantic Role Labels” help a search engine to differentiate entities/context from each other. • “Grouping entities” is not enough. Group “contexts”. “X and Love Life”, “X and Career” have different contexts. Connections should follow “identity” and context together. Analyze “News Context”, more than “Entity” that appears.
  • 26. Semantic Role Labeling How do opinions differ in phrases? • Beyond Classification: • It helps to see the factual information. • It is used to differentiate opinions from each other. • It measures the possibility of truth. • It understands the representation of the web source according to its connection to others. • Semantic Role Labeling is used by semantic search engines to have better entity associations. • The suggested associations, or graphs are accepted or rejected by semantic network constructors. • “Names in the News Title” should match the Faces in the News Image. Source: Marina Santini, Brighton University Source: Grounded Semantic Role Labeling
  • 27. Question-Answer Pairs Which evidence is correct? • Question Generation and Answer Pairing are NLP tasks for fact extraction. • Question generation involves query parsing and processing. • Answer pairing involves dense-context retrieval and question-answer format matching. • But, it is not clear which answer is more accurate. • Thus, Question-Answer Coverage, Entity-oriented search and Semantic- Syntatic Parsing are used. • Matching entities, attributes, queries, or phrases are not good enough, as long as information is not responsive. Source: Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
  • 28. Information Literacy - Consensus Who said it? • Google started to give education for Information Literacy. • It involves recognizing information source before the information on the source. • Google ranks News Sources for certain topics, contexts and entities before ranking the news. • The need of “fast indexing and serving” will always be more important than understanding the “truth” at the first stage. • Thus, the quality news sources have higher accuracy with more historical data, and PageRank. • Google has to assume that truth comes from strength of repeated evidence from the most authoritative sources. • Audit “About the source” panels of your competitors, create a review, and third- party mention gap.
  • 29. Information Literacy - Consensus Author Authority? https://searchengineland.com/what-social-signals-do-google- bing-really-count-55389 • Danny Sullivan once asked Google and Bing whether they use social signals, or author names to understand who is the real expert on a topic. • Both of the search engines said that they audit “author quality” and “author expertise” for different topics. • Associate authoritative authors with your web source stronger, if they are writing for multiple web sources.
  • 30. Information Literacy - Consensus How do they use Knowledge Base? Integrating Knowledge Graph and Natural Text for Language Model Pre-training • There are hundreds of different algorithms to understand the authenticity and “true facts”. • For a search engine engineer, there is no “lie” and “fact”. • It is only “true facts” and “wrong facts”. • And, KELM-like algorithms help together to differentiate them from each other. • Query “Google Knowledge Graph API” to understand what they state for the same entity.
  • 31. Information Literacy - Cues What makes you trustworthy? • The research that Google cites mentioned that there are “6 Cues for snap judgments about whom to trust”. • These involve “images”, “brands”, “headlines – tonality”, “social cues”, “sponsors”, and “interactivity”. • Google works with MediaWise to perform surveys and integrate findings to their own algorithms. • Create your own “audit templates” for news articles for these 6 different verticals. Mark up “MediaWise”.
  • 32. Information Literacy – About this source Why does your opinion matter? • The story of “Web Answers” is too long. • Context-terms, Topical Entries, Candidate Answer Passages, Context-scoring for Candidate Answer Passages, and many more concepts… • Google Product Manager calls these “word callouts”. • Search Engine Engineers call them “representative answer”. • Learn NLP. Scoring Candidate Answer Passages
  • 33. Some Google Designs Machine learning to identify opinions in documents •Identifying opinionated portions in documents •Relating opinionated portions inside the document and/or across other documents (e.g., that relate to the same story) •To surface opinionated snippets or quotes to users of a news aggregation. •To identify portions of a document that convey opinion. •Google might rank a source for “report”, but not for “opinion”. Understand which vertical has a higher chance for your web source.
  • 34. Some Google Designs System and method for supporting editorial opinion in the ranking of search results “Editorial opinion” without “distorting facts” helps you for ranking. Especially for “first-person” experience stories, or reviews.
  • 35. Some Google Designs Embedded communication of link information “Information in the improved link tags may allow one or more publishers of content and/or documents to convey opinions about content and/or documents at one or more content locations and/or one or more document locations. The link tags may also allow one or more publishers to convey a weighting of the relative importance of one or more content locations and/or one or more document locations. In some embodiment, at least a portion of the information in the improved link tags may be encrypted, to allow one or more publishers to restrict the audience that may view the information in the link tags….. The improved link tags may allow the publishers to communicate additional information, such as opinions, about the content locations and/or document locations.” Categorize boilerplate/main content links according to their context. “Joe Biden and Congress” might have a different “block-link” than “Joe Biden and Elections”.
  • 36. Some Google Designs Aspect-Based Sentiment Summarization Use “key-points” with “sentiments” to summarize essence of news stories.
  • 37. Topicality and Context Filters Long and Shor Term Solutions for SERP Construction in News Vertical Short-term Solutions for News Search Engines: • Classify authoritative sources (PageRank, Article Count, Unique Sentence Count, Publication Frequency, Length, Citations, Search Behaviors). • Rank authoritative sources for different topics. • Classify and rank news web pages according to their context, and topicality. • Serve the most relevant news articles based on trust and confidence. Long-term Solutions for News Search Engines: • Process text. • Understand facts. • Audit accuracy and comprehensiveness. • Filter the sources, by re-assigning topical relevance and authority.
  • 38. Samples from News SEO with Factoids NaturalNews
  • 39. Samples from News SEO with Factoids NaturalNews
  • 40. Samples from News SEO with Factoids Powerofpositivity
  • 41. Samples from News SEO with Factoids Powerofpositivity
  • 42. Samples from News SEO with Factoids RealClearPolitics
  • 43. Samples from News SEO with Factoids BREITBART
  • 45. What would you do if you were Google? Which opinions should rank?
  • 46. What would you do if you were Google? Which opinions should rank?
  • 47. What would you do if you were Google? Which opinions should rank?
  • 48. What would you do if you were Google? Which opinions should rank?
  • 49. What would you do if you were Google? Which opinions should rank?