SlideShare a Scribd company logo
1 of 44
Watson System
By :

Devendra Chaplot
Priyank Chhipa

Pratik Kumar
What Computers Find Easier

• ln((12,546,798 * π) ^ 2) / 34,567.46
=

0.00885
What Computers Find Easier
Select Payment where Owner=―David Jones‖ and Type(Product)=“Laptop‖,
Owner

Serial Number

David Jones

45322190-AK

Type
LapTop

Payment

MyBuy

$104.56

Invoice #

45322190-AK

Vendor

INV10895
Serial Number

Invoice #

INV10895

David Jones
David Jones

=

Dave

Jones

David Jones

≠
What Computers Find Hard
Computer programs are natively explicit, fast and exacting in their
calculation over numbers and symbols….But Natural Language is implicit, highly
contextual, ambiguous and often imprecise.

Person
A. Einstein

Birth Place
ULM

• Where was X born?

Structured

Unstructured

One day, from among his city views of Ulm, Otto chose a water color to send to
Albert Einstein as a remembrance of Einstein´s birthplace.

• X ran this?

Person
J. Welch

Organization
GE

If leadership is an art then surely Jack Welch has proved himself a master painter
during his tenure at GE.
A Grand Challenge Opportunity
 Capture the imagination
– The Next Deep Blue

 Engage the scientific community
– Envision new ways for computers to impact
society & science
– Drive important and measurable scientific advances

 Be Relevant to Important Problems
– Enable better, faster decision making over unstructured and structured content
– Business Intelligence, Knowledge Discovery and
Management, Government, Compliance, Publishing, Legal, Healthcare, Business
Integrity, Customer Relationship Management, Web Self-Service, Product Support, etc.
Real Language is Real Hard
• Chess
– A finite, mathematically well-defined search space
– Limited number of moves and states
– Grounded in explicit, unambiguous
mathematical rules

• Human Language
– Ambiguous, contextual and implicit
– Grounded only in human cognition
– Seemingly infinite number of ways to express the same meaning
Automatic Open-Domain Question Answering
A Long-Standing Challenge in Artificial Intelligence to emulate human expertise

 Given
– Rich Natural Language Questions
– Over a Broad Domain of Knowledge

 Deliver
–
–
–
–

7

Precise Answers: Determine what is being asked & give precise response
Accurate Confidences: Determine likelihood answer is correct
Consumable Justifications: Explain why the answer is right
Fast Response Time: Precision & Confidence in <3 seconds
You may have heard of IBM’s Watson…
A. What is the computer
system that played
against human
opponents on
“Jeopardy”…
and won.

Why Jeopardy?

The game of Jeopardy! makes great demands on its players – from the range of topical knowledge
covered to the nuances in language employed in the clues. The question IBM had for itself was―is it
possible to build a computer system that could process big data and come up with sensible answers
in seconds—so well that it could compete with human opponents?‖
8
Some Basic Jeopardy! Clues
• This fish was thought to be extinct millions of years ago
until one was found off South Africa in 1938
•
•

Category: ENDS IN "TH"
Answer: coelacanth

The type of thing being
asked for is often
indicated but can go
from specific to very
vague

• When hit by electrons, a phosphor gives off electromagnetic energy in this form
•
•

Category: General Science
Answer: light (or photons)

• Secy. Chase just submitted this to me for the third time--guess what, pal. This
time I'm accepting it
•
•

9

Category: Lincoln Blogs
Answer: his resignation
Lexical Answer Type
• We define a LAT to be a word in the clue that
indicates the type of the answer, independent of
assigning semantics to that word. For example in
the following clue, the LAT is the string
“maneuver.”
– Category: Oooh….Chess
– Clue: Invented in the 1500s to speed up the game, this
maneuver involves two pieces of the same color.

– Answer: Castling
Lexical Answer Type
• About 12 percent of the clues do not indicate an
explicit lexical answer type but may refer to the
answer with pronouns like “it,” “these,” or “this” or
not refer to it at all. In these cases the type of answer
must be inferred by the context.
Here’s an example:
– Category: Decorating
– Clue: Though it sounds “harsh,” it’s just embroidery, often
in a floral pattern, done with yarn on cotton cloth.
– Answer: crewel
How we convert data into knowledge for Watson’s use
Three types
of
knowledge

Domain Data
(articles, books, d
ocuments)

Training and test
question sets
w/answer keys

Converted to Indices for
search/passage lookup
Redirects extracted for
disambiguation
Frame cuts generated with
frequencies to determine likely
context

Pseudo docs extracted for
Candidate answer generation

Used to create logistic
regression model that
Watson uses for merging
scores

NLP Resources
(vocabularies, taxono
mies, ontologies)

Named entity
detection, relationship
detection algorithms

Custom slot grammar
parsers, prolog rules
for semantic analysis
Machine learning
• One of the core components of the system
– Multiple models
– 14000+ training questions

• Every candidate answer gets hundreds of
features/scores associated with it. There
features/scores are passed through previously
trained ML model for candidate answer scoring
• It's not just one model. In fact there is a chain of
models, each subsequent one utilizes scores
produced by previously run models
• Machine learning also used in other parts of the
system, such as LAT confidence analysis.
13
NLP
• Used in many places (Question Analysis, Evidence
Analysis, Content Pre-processing)

• Combines both rule and statistic based approaches
• Full NLP stack (used in QA)
– Tokenization
– Named Entity Recognition

– Deep Parsing and Predicate Argument Structure creation
– Lexical Answer Type (LAT) and Focus detection
– Anaphora resolution
– Semantic Relationships extraction

• Various technologies and techniques are used (English Slot
Grammar parser, R2 NED, machine learning for LAT confidence
analysis, custom annotators written in Prolog and Java)
NLP Examples
• LAT and Focus
– It's the Peter Benchley novel about a killer giant
squid that menaces the coast of Bermuda

• Named Entity Recognition
– It's the {Person::Peter Benchley} novel about a
killer giant {Animal::squid} that menaces the
{Location::coast of Bermuda}

• Anaphora Resolution
– Columbus embarked on his first voyage to this
continent in 1492. In the next two decades he led
three more expeditions there.
NLP in evidence analysis and content pre-processing
• Why do NLP on evidence passages and ingested
content?

• NLP in Evidence Analysis allows:
– LAT based scoring
– Named entities alignment based scoring

• NLP in Content Pre-processing
– Extracting and accumulating ―knowledge‖ frames from the
content
•

For instance
• SVO frame cuts will contain frequencies of Subject-VerbObject occurrences in the content that Watson has ingested.
• e.g squid menaces coast 809

– These ―knowledge‖ frames are then used to generate
candidate answers
Broad Domain
We do NOT attempt to anticipate all
questions and build databases.

We do NOT try to build a formal
model of the world

3.00%

2.50%

In a random sample of 20,000 questions we found
2,500 distinct types*. The most frequent occurring <3% of the time. The
distribution has a very long tail.

2.00%

And for each these types 1000’s of different things may be asked.
1.50%

1.00%

Even going for the head of the tail will
barely make a dent

0.50%

he
film
group
capital
woman
song
singer
show
composer
title
fruit
planet
there
person
language
holiday
color
place
son
tree
line
product
birds
animals
site
lady
province
dog
substance
insect
way
founder
senator
form
disease
someone
maker
father
words
object
writer
novelist
heroine
dish
post
month
vegetable
sign
countries
hat
bay

0.00%

*13% are non-distinct (e.g, it, this, these or NA)

Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text.
Structured sources (DBs and KBs) provide background knowledge for interpreting the text.
Automatic Learning for “Reading”

Volumes of Text

Syntactic Frames

Semantic Frames
Inventors patent inventions (.8)
Officials Submit Resignations (.7)
People earn degrees at schools (0.9)
Fluid is a liquid (.6)
Liquid is a fluid (.5)
Vessels Sink (0.7)
People sink 8-balls (0.5) (in pool/0.8)
Evaluating Possibilities and Their Evidences
In cell division, mitosis splits the nucleus & cytokinesis splits
this liquid cushioning the nucleus.







Organelle
Vacuole
Cytoplasm
Plasma
Mitochondria
Blood …

Many candidate answers (CAs) are generated from many different searches
Each possibility is evaluated according to different dimensions of evidence.
Just One piece of evidence is if the CA is of the right type. In this case a “liquid”.

Is(“Cytoplasm”, “liquid”) = 0.2
Is(“organelle”, “liquid”) = 0.1
Is(“vacuole”, “liquid”) = 0.2
Is(“plasma”, “liquid”) = 0.7

↑
“Cytoplasm is a fluid surrounding the nucleus…”
Wordnet  Is_a(Fluid, Liquid)  ?
Learned  Is_a(Fluid, Liquid)  yes.
Different Types of Evidence: Keyword Evidence
In May 1898 Portugal celebrated the
400th anniversary of this explorer’s
arrival in India.

In May, Gary arrived in India
after he celebrated his
anniversary in Portugal.
arrived in

celebrated

Keyword Matching

In May
1898
400th
anniversary

Evidence suggests
“Gary” is the answer
BUT the system must
learn that keyword
matching may be weak
relative to other types of
evidence

Keyword Matching

celebrated

In May

Keyword Matching

Portugal

anniversary

Keyword Matching

in Portugal

arrival in

India

explorer

Keyword Matching

India

Gary
Different Types of Evidence: Deeper Evidence
In May 1898 Portugal celebrated the
400th anniversary of this explorer’s
arrival in India.

On 27th May 1498, Vasco da Gama landed
On 27th May 1498, Vasco da Gama landed
On 27th May 1498, Vasco da Gama landed
in KappadBeach
Onin Kappad of May 1498, Vasco da
the 27th Beach
in Kappad Beach

Gama landed in Kappad Beach
Search Far and Wide
Explore many hypotheses

celebrated

Find Judge Evidence
Portugal

May 1898

400th anniversary

landed in

Many inference algorithms

Temporal
Reasoning

27th May 1498
Date
Math

Stronger
evidence can
be much
harder to find
and score.

arrival in

Statistical
Paraphrasing
Paraphrases

India

GeoSpatial
Reasoning

Kappad Beach
Geo-KB

Vasco da Gama

explorer

The evidence is still not 100% certain.
DeepQA
The technology & architecture behind Watson
The Difference Between Search & DeepQA
Decision Maker
Has Question

Search Engine

Distills to 2-3 Keywords
Reads Documents, Finds
Answers

Finds Documents containing
Keywords
Delivers Documents based on
Popularity

Finds & Analyzes
Evidence

Decision Maker
Asks NL Question
Considers Answer &
Evidence

Expe
rt
Understands Question
Produces Possible Answers &
Evidence
Analyzes Evidence, Computes
Confidence
Delivers Response, Evidence &
Confidence
DeepQA: the technology & architecture behind Watson
Learned Models
help combine and
weigh the Evidence

Evidence Sources

Answer Sources
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

Candidate
Answer
Generation

Hypothesis
Generation

Answer
Scoring

Evidence
Retrieval

Hypothesis
& Evidence
Scoring

Deep
Evidence
Scoring

Synthesis

Hypothesis
Generation

Hypothesis and
Evidence Scoring

Hypothesis
Generation

Hypothesis and Evidence
Scoring

model

model

model

model

model

model

model

model

model

Final Confidence
Merging & Ranking

Answer &
Confidence
DeepQA: the technology & architecture behind Watson
1

Initial
Question

Question
& Topic
Analysis

Initial Question Formulated:
“The name of this monetary
unit comes from the word for
"round"; earlier coins were
often oval”

3
Question
Decomposition

Watson performs
question
analysis, determin
es what is being
asked.
2

It decides whether
the question needs
to be subdivided.
DeepQA: the technology & architecture behind Watson
5
Answer Sources
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

Candidate
Answer
Generation

Hypothesis
Generation

Hypothesis
Generation

Hypothesis
Generation

In creating the
hypotheses it will
use, Watson consults
numerous sources
for potential
answers…

4
Watson then starts
to generate
hypotheses based
on decomposition
and initial
analysis…as many
hypothesis as may
be relevant to the
initial question…
DeepQA: the technology & architecture behind Watson
7

Evidence Sources

Answer Sources
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

6

Candidate
Answer
Generation

Hypothesis
Generation

Watson then uses
algorithms to “score”
each potential
answer and assign a
confidence to that
answer…

Answer
Scoring

Evidence
Retrieval

Hypothesis
& Evidence
Scoring

Deep
Evidence
Scoring

Synthesis

Hypothesis and
Evidence Scoring

Hypothesis and Evidence
Scoring

Watson uses
Evidence
Sources to
validate it’s
hypothesis and
help score the
potential
answers
If the question
was
decomposed, W
atson brings
together
hypotheses
from sub-parts

8
DeepQA: the technology & architecture behind Watson
9
Answer Sources
Initial
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

Candidate
Answer
Generation

Hypothesis
Generation

Hypothesis
Generation

Hypothesis
Generation

Using models
on the merged
hypotheses, Wa
tson can weigh
evidence based
on prior
“experiences”

Hypothesis
& Evidence
Scoring

Synthesis

Learned Models
help combine and
weigh the Evidence

model

model

model

model

model

model

model

model

model

Final Confidence
Merging & Ranking

10
Once Watson has
ranked its answers, it
then provides its
answers as well as the
confidence it has in
each answer.

Answer &
Confidence
DeepQA: the technology & architecture behind Watson
Learned Models
help combine and
weigh the Evidence

Evidence Sources

Answer Sources
Initial
Initial
Question

Question
& Topic
Analysis

Primary
Search

Question
Decomposition

Candidate
Answer
Generation

Hypothesis
Generation

Answer
Scoring

Evidence
Retrieval

Hypothesis
& Evidence
Scoring

Deep
Evidence
Scoring

Synthesis

Hypothesis
Generation

Hypothesis and
Evidence Scoring

Hypothesis
Generation

Hypothesis and Evidence
Scoring

model

model

model

model

model

model

model

model

model

Final Confidence
Merging & Ranking

Answer &
Confidence
Step 0 : Content Acquisition
• Content acquisition is a combination
of manual and automatic steps.
• The first step is to analyze example questions
from the problem space to produce a
description of the kinds of questions that must
be answered and a characterization of the
application domain.
• Analyzing example questions is primarily a
manual task, while domain analysis may be
informed by automatic or statistical
analyses, such as the LAT analysis.
Step 1 : Question Analysis

Initial
Initial
Question

The system attempts to understand
Question
what the question is asking and
& Topic
Analysis
performs the initial analyses that
determine how the question will
be processed by the rest of the system.
• Question Classification e.g. puzzle/math
• Focus and Lexical Answer Type (LAT) e.g. “On
this day” LAT – date/day
• Relation Detection e.g. sea (India, x, west)

• Decomposition - divide and conquer.

Question
Decomposition
Step 2 : Hypothesis Generation
1.

Answer Sources
Primary
Search

Primary search :
–

Candidate
Answer
Generation

Keyword based search

–

Top 250 results are considered for Candidate
Answer generation.

–

Question
Decomposition

Empirical statistics : 85% time answer is within
top 250 results.

Hypothesis
Generation

Hypothesis
Generation

2. CA generation : generates CAs using results of
Primary Search

3.

Soft Filtering
– lightweight (less resource intensive) scoring algorithms to a larger set of
initial candidates to prune them down to a smaller set of candidates
– Reduction in number of CA to approx. 100
– Answers are not fully discarded , may be reconsidered at final stage.

Hypothesis
Generation
Step 2 : Hypothesis Generation
4. Each CA plugged back into the question is
considered a hypothesis which the system has to
prove correct with some threshold of confidence.
5. If failed at this state , system has no hope of
answering the question whatsoever.
– Noise tolerance - tolerate noise in the early stages of the
pipeline and drive up precision downstream
– Favors recall over precision, with the expectation that the rest
of the processing pipeline will tease out the correct
answer, even if the set of candidates is quite large
Step 3 : Hypothesis & Evidence scoring
• Candidate answers that pass the soft filtering threshold
undergo a rigorous evaluation process that involves 2
steps :1.

Evidence retrieval :
–

Gathers additional supporting evidence for each candidate
answer, or hypothesis.
e.g. Passage search: gathering passages by adding CA to
primary search query.

2. Scoring:
–
–

Deep content analysis – includes many different
components, or scorers, that consider different dimensions
of the evidence
Produce a score that corresponds to how well evidence
supports a candidate answer for a given question.
Step 4 : Final Merging and Ranking
• Merging:
– Multiple candidate answers for a question
may be equivalent despite very different
surface forms.
– Using an ensemble of matching, normalization
and co-reference resolution algorithms, Watson
identifies equivalent and related hypothesis.

– Without merging, ranking algorithms would
be comparing multiple surface forms that
represent the same answer and trying to
discriminate among them.
Step 4 : Final Merging and Ranking
• Ranking and confidence estimation:
– After merging, the system must rank the hypotheses
and estimate confidence based on their merged
scores

– These hypothese are ran over set of training
questions with known answers.
– Watson’s metalearner uses multiple trained models
to handle different question classes as, for
instance, certain scores that may be crucial to identifying the correct answer for a factoid question may
not be as useful on puzzle questions
The Final Blow! (ctd.)

“I for one welcome our new computer overlords” - Jennings

38
Watson – a Workload Optimized System
•
•

2880 POWER7 cores

•

POWER7 3.55 GHz chip

•

500 GB per sec on-chip bandwidth

•

10 Gb Ethernet network

•

15 Terabytes of memory

•

20 Terabytes of disk, clustered

•

Can operate at 80 Teraflops

•

Runs IBM DeepQA software

•

Scales out with and searches vast amounts of unstructured information with UIMA
& Hadoop open source components

•

Linux provides a scalable, open platform, optimized
to exploit POWER7 performance

•
1

90 x IBM Power 7501 servers

10 racks include servers, networking, shared disk system, cluster controllers

Note that the Power 750 featuring POWER7 is a commercially available
server that runs AIX, IBM i and Linux and has been in market since Feb 2010
Watson: Precision, Confidence & Speed
•

Deep Analytics – We achieved

champion-levels of
Precision and Confidence over a huge variety of
expression

•

Speed – By optimizing

•

Results –

Watson’s computation for
Jeopardy! on 2,880 POWER7 processing cores we
went from 2 hours per question on a single
CPU to an average of just 3 seconds – fast
enough to compete with the best.
in 55 real-time sparring against former
Tournament of Champion Players last
year, Watson put on a very competitive
performance, winning 71%. In the final
Exhibition Match against Ken Jennings and Brad
Rutter, Watson won!
Potential Business Applications
Healthcare Analytics
• Analyzing: E-Medical records, hospital reports
• For: Clinical analysis; treatment protocol
optimization
• Benefits: Better management of chronic
diseases; optimized drug formularies; improved
patient outcomes

• Analyzing: Call center logs, emails, online media
• For: Buyer Behavior, Churn prediction
• Benefits: Improve Customer satisfaction and
retention, marketing campaigns, find new
revenue opportunities

Crime Analytics

Insurance Fraud

• Analyzing: Case files, police records, 911 calls…
• For: Rapid crime solving & crime trend analysis
• Benefits: Safer communities & optimized force
deployment

• Analyzing: Insurance claims
• For: Detecting Fraudulent activity & patterns
• Benefits: Reduced losses, faster
detection, more efficient claims processes

Automotive Quality Insight

Social Media for Marketing

• Analyzing: Tech notes, call logs, online media
• For: Warranty Analysis, Quality Assurance
• Benefits: Reduce warranty costs, improve
customer satisfaction, marketing campaigns

41

Customer Care

• Analyzing: Call center
notes, SharePoint, multiple content repositories
• For: churn prediction, product/brand quality
• Benefits: Improve consumer
satisfaction, marketing campaigns, find new
revenue opportunities or product/brand quality
issues
References
• The AI magazine

– Ferrucci, David, et al. "Building Watson: An overview of
the DeepQA project." AI magazine 31.3 (2010): 59-79.

• Watson Systems:
– http://www-03.ibm.com/innovation/us/watson/

• Wiki Page
– http://en.wikipedia.org/wiki/Watson_%28computer%2

• Building Watson A Brief Overview of the DeepQA
Project by Joel Farrell, IBM
– http://www.medbiq.org/sites/default/files/presentations
/2011/Farrell.ppt
References
• What is Watson, really?
– http://www01.ibm.com/software/ebusiness/jstart/downloads/IOD2011.ppt
– Authors : Keyur Dalal (IBM), Vladimir Stemkovski (IBM) and Jeff
Sumner (IBM)

• Jeopardy! IBM Watson Day 1 (Feb 14, 2011)
– http://www.youtube.com/watch?v=seNkjYyG3gI&feature=related

• Science Behind an Answer– http://www-03.ibm.com/innovation/us/watson/what-iswatson/science-behind-an-answer.html
– Video: http://youtu.be/DywO4zksfXw
Questions?
Thank You!

More Related Content

Similar to Watson System

Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
What is Watson – An Overvie.pdf
What is Watson – An Overvie.pdfWhat is Watson – An Overvie.pdf
What is Watson – An Overvie.pdfskyadav35
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDrPraveenPawar
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language AnalysisRudradeb Mitra
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligencePallavi Vashistha
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's PerspectiveJames Hendler
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Knowledge base system appl. p 3,4
Knowledge base system appl.  p 3,4Knowledge base system appl.  p 3,4
Knowledge base system appl. p 3,4Taymoor Nazmy
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-datac.titus.brown
 
Digital immortality Roadmap
Digital immortality RoadmapDigital immortality Roadmap
Digital immortality Roadmapavturchin
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentFaculty of Computer Science
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Languagebutest
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
University of California, Berkeley: iSchool Nov, 2009
University of California, Berkeley: iSchool Nov, 2009University of California, Berkeley: iSchool Nov, 2009
University of California, Berkeley: iSchool Nov, 2009Tom Moritz
 

Similar to Watson System (20)

Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
What is Watson – An Overvie.pdf
What is Watson – An Overvie.pdfWhat is Watson – An Overvie.pdf
What is Watson – An Overvie.pdf
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptx
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
What is AI ML NLP and how to apply them
What is AI ML NLP and how to apply themWhat is AI ML NLP and how to apply them
What is AI ML NLP and how to apply them
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language Analysis
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's Perspective
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Knowledge base system appl. p 3,4
Knowledge base system appl.  p 3,4Knowledge base system appl.  p 3,4
Knowledge base system appl. p 3,4
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-data
 
Digital immortality Roadmap
Digital immortality RoadmapDigital immortality Roadmap
Digital immortality Roadmap
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual Entailment
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Language
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
University of California, Berkeley: iSchool Nov, 2009
University of California, Berkeley: iSchool Nov, 2009University of California, Berkeley: iSchool Nov, 2009
University of California, Berkeley: iSchool Nov, 2009
 
Linkedinmay
LinkedinmayLinkedinmay
Linkedinmay
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Watson System

  • 1. Watson System By : Devendra Chaplot Priyank Chhipa Pratik Kumar
  • 2. What Computers Find Easier • ln((12,546,798 * π) ^ 2) / 34,567.46 = 0.00885
  • 3. What Computers Find Easier Select Payment where Owner=―David Jones‖ and Type(Product)=“Laptop‖, Owner Serial Number David Jones 45322190-AK Type LapTop Payment MyBuy $104.56 Invoice # 45322190-AK Vendor INV10895 Serial Number Invoice # INV10895 David Jones David Jones = Dave Jones David Jones ≠
  • 4. What Computers Find Hard Computer programs are natively explicit, fast and exacting in their calculation over numbers and symbols….But Natural Language is implicit, highly contextual, ambiguous and often imprecise. Person A. Einstein Birth Place ULM • Where was X born? Structured Unstructured One day, from among his city views of Ulm, Otto chose a water color to send to Albert Einstein as a remembrance of Einstein´s birthplace. • X ran this? Person J. Welch Organization GE If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE.
  • 5. A Grand Challenge Opportunity  Capture the imagination – The Next Deep Blue  Engage the scientific community – Envision new ways for computers to impact society & science – Drive important and measurable scientific advances  Be Relevant to Important Problems – Enable better, faster decision making over unstructured and structured content – Business Intelligence, Knowledge Discovery and Management, Government, Compliance, Publishing, Legal, Healthcare, Business Integrity, Customer Relationship Management, Web Self-Service, Product Support, etc.
  • 6. Real Language is Real Hard • Chess – A finite, mathematically well-defined search space – Limited number of moves and states – Grounded in explicit, unambiguous mathematical rules • Human Language – Ambiguous, contextual and implicit – Grounded only in human cognition – Seemingly infinite number of ways to express the same meaning
  • 7. Automatic Open-Domain Question Answering A Long-Standing Challenge in Artificial Intelligence to emulate human expertise  Given – Rich Natural Language Questions – Over a Broad Domain of Knowledge  Deliver – – – – 7 Precise Answers: Determine what is being asked & give precise response Accurate Confidences: Determine likelihood answer is correct Consumable Justifications: Explain why the answer is right Fast Response Time: Precision & Confidence in <3 seconds
  • 8. You may have heard of IBM’s Watson… A. What is the computer system that played against human opponents on “Jeopardy”… and won. Why Jeopardy? The game of Jeopardy! makes great demands on its players – from the range of topical knowledge covered to the nuances in language employed in the clues. The question IBM had for itself was―is it possible to build a computer system that could process big data and come up with sensible answers in seconds—so well that it could compete with human opponents?‖ 8
  • 9. Some Basic Jeopardy! Clues • This fish was thought to be extinct millions of years ago until one was found off South Africa in 1938 • • Category: ENDS IN "TH" Answer: coelacanth The type of thing being asked for is often indicated but can go from specific to very vague • When hit by electrons, a phosphor gives off electromagnetic energy in this form • • Category: General Science Answer: light (or photons) • Secy. Chase just submitted this to me for the third time--guess what, pal. This time I'm accepting it • • 9 Category: Lincoln Blogs Answer: his resignation
  • 10. Lexical Answer Type • We define a LAT to be a word in the clue that indicates the type of the answer, independent of assigning semantics to that word. For example in the following clue, the LAT is the string “maneuver.” – Category: Oooh….Chess – Clue: Invented in the 1500s to speed up the game, this maneuver involves two pieces of the same color. – Answer: Castling
  • 11. Lexical Answer Type • About 12 percent of the clues do not indicate an explicit lexical answer type but may refer to the answer with pronouns like “it,” “these,” or “this” or not refer to it at all. In these cases the type of answer must be inferred by the context. Here’s an example: – Category: Decorating – Clue: Though it sounds “harsh,” it’s just embroidery, often in a floral pattern, done with yarn on cotton cloth. – Answer: crewel
  • 12. How we convert data into knowledge for Watson’s use Three types of knowledge Domain Data (articles, books, d ocuments) Training and test question sets w/answer keys Converted to Indices for search/passage lookup Redirects extracted for disambiguation Frame cuts generated with frequencies to determine likely context Pseudo docs extracted for Candidate answer generation Used to create logistic regression model that Watson uses for merging scores NLP Resources (vocabularies, taxono mies, ontologies) Named entity detection, relationship detection algorithms Custom slot grammar parsers, prolog rules for semantic analysis
  • 13. Machine learning • One of the core components of the system – Multiple models – 14000+ training questions • Every candidate answer gets hundreds of features/scores associated with it. There features/scores are passed through previously trained ML model for candidate answer scoring • It's not just one model. In fact there is a chain of models, each subsequent one utilizes scores produced by previously run models • Machine learning also used in other parts of the system, such as LAT confidence analysis. 13
  • 14. NLP • Used in many places (Question Analysis, Evidence Analysis, Content Pre-processing) • Combines both rule and statistic based approaches • Full NLP stack (used in QA) – Tokenization – Named Entity Recognition – Deep Parsing and Predicate Argument Structure creation – Lexical Answer Type (LAT) and Focus detection – Anaphora resolution – Semantic Relationships extraction • Various technologies and techniques are used (English Slot Grammar parser, R2 NED, machine learning for LAT confidence analysis, custom annotators written in Prolog and Java)
  • 15. NLP Examples • LAT and Focus – It's the Peter Benchley novel about a killer giant squid that menaces the coast of Bermuda • Named Entity Recognition – It's the {Person::Peter Benchley} novel about a killer giant {Animal::squid} that menaces the {Location::coast of Bermuda} • Anaphora Resolution – Columbus embarked on his first voyage to this continent in 1492. In the next two decades he led three more expeditions there.
  • 16. NLP in evidence analysis and content pre-processing • Why do NLP on evidence passages and ingested content? • NLP in Evidence Analysis allows: – LAT based scoring – Named entities alignment based scoring • NLP in Content Pre-processing – Extracting and accumulating ―knowledge‖ frames from the content • For instance • SVO frame cuts will contain frequencies of Subject-VerbObject occurrences in the content that Watson has ingested. • e.g squid menaces coast 809 – These ―knowledge‖ frames are then used to generate candidate answers
  • 17. Broad Domain We do NOT attempt to anticipate all questions and build databases. We do NOT try to build a formal model of the world 3.00% 2.50% In a random sample of 20,000 questions we found 2,500 distinct types*. The most frequent occurring <3% of the time. The distribution has a very long tail. 2.00% And for each these types 1000’s of different things may be asked. 1.50% 1.00% Even going for the head of the tail will barely make a dent 0.50% he film group capital woman song singer show composer title fruit planet there person language holiday color place son tree line product birds animals site lady province dog substance insect way founder senator form disease someone maker father words object writer novelist heroine dish post month vegetable sign countries hat bay 0.00% *13% are non-distinct (e.g, it, this, these or NA) Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text. Structured sources (DBs and KBs) provide background knowledge for interpreting the text.
  • 18. Automatic Learning for “Reading” Volumes of Text Syntactic Frames Semantic Frames Inventors patent inventions (.8) Officials Submit Resignations (.7) People earn degrees at schools (0.9) Fluid is a liquid (.6) Liquid is a fluid (.5) Vessels Sink (0.7) People sink 8-balls (0.5) (in pool/0.8)
  • 19. Evaluating Possibilities and Their Evidences In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus.       Organelle Vacuole Cytoplasm Plasma Mitochondria Blood … Many candidate answers (CAs) are generated from many different searches Each possibility is evaluated according to different dimensions of evidence. Just One piece of evidence is if the CA is of the right type. In this case a “liquid”. Is(“Cytoplasm”, “liquid”) = 0.2 Is(“organelle”, “liquid”) = 0.1 Is(“vacuole”, “liquid”) = 0.2 Is(“plasma”, “liquid”) = 0.7 ↑ “Cytoplasm is a fluid surrounding the nucleus…” Wordnet  Is_a(Fluid, Liquid)  ? Learned  Is_a(Fluid, Liquid)  yes.
  • 20. Different Types of Evidence: Keyword Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. In May, Gary arrived in India after he celebrated his anniversary in Portugal. arrived in celebrated Keyword Matching In May 1898 400th anniversary Evidence suggests “Gary” is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence Keyword Matching celebrated In May Keyword Matching Portugal anniversary Keyword Matching in Portugal arrival in India explorer Keyword Matching India Gary
  • 21. Different Types of Evidence: Deeper Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. On 27th May 1498, Vasco da Gama landed On 27th May 1498, Vasco da Gama landed On 27th May 1498, Vasco da Gama landed in KappadBeach Onin Kappad of May 1498, Vasco da the 27th Beach in Kappad Beach Gama landed in Kappad Beach Search Far and Wide Explore many hypotheses celebrated Find Judge Evidence Portugal May 1898 400th anniversary landed in Many inference algorithms Temporal Reasoning 27th May 1498 Date Math Stronger evidence can be much harder to find and score. arrival in Statistical Paraphrasing Paraphrases India GeoSpatial Reasoning Kappad Beach Geo-KB Vasco da Gama explorer The evidence is still not 100% certain.
  • 22. DeepQA The technology & architecture behind Watson
  • 23. The Difference Between Search & DeepQA Decision Maker Has Question Search Engine Distills to 2-3 Keywords Reads Documents, Finds Answers Finds Documents containing Keywords Delivers Documents based on Popularity Finds & Analyzes Evidence Decision Maker Asks NL Question Considers Answer & Evidence Expe rt Understands Question Produces Possible Answers & Evidence Analyzes Evidence, Computes Confidence Delivers Response, Evidence & Confidence
  • 24. DeepQA: the technology & architecture behind Watson Learned Models help combine and weigh the Evidence Evidence Sources Answer Sources Initial Question Question & Topic Analysis Primary Search Question Decomposition Candidate Answer Generation Hypothesis Generation Answer Scoring Evidence Retrieval Hypothesis & Evidence Scoring Deep Evidence Scoring Synthesis Hypothesis Generation Hypothesis and Evidence Scoring Hypothesis Generation Hypothesis and Evidence Scoring model model model model model model model model model Final Confidence Merging & Ranking Answer & Confidence
  • 25. DeepQA: the technology & architecture behind Watson 1 Initial Question Question & Topic Analysis Initial Question Formulated: “The name of this monetary unit comes from the word for "round"; earlier coins were often oval” 3 Question Decomposition Watson performs question analysis, determin es what is being asked. 2 It decides whether the question needs to be subdivided.
  • 26. DeepQA: the technology & architecture behind Watson 5 Answer Sources Initial Question Question & Topic Analysis Primary Search Question Decomposition Candidate Answer Generation Hypothesis Generation Hypothesis Generation Hypothesis Generation In creating the hypotheses it will use, Watson consults numerous sources for potential answers… 4 Watson then starts to generate hypotheses based on decomposition and initial analysis…as many hypothesis as may be relevant to the initial question…
  • 27. DeepQA: the technology & architecture behind Watson 7 Evidence Sources Answer Sources Initial Question Question & Topic Analysis Primary Search Question Decomposition 6 Candidate Answer Generation Hypothesis Generation Watson then uses algorithms to “score” each potential answer and assign a confidence to that answer… Answer Scoring Evidence Retrieval Hypothesis & Evidence Scoring Deep Evidence Scoring Synthesis Hypothesis and Evidence Scoring Hypothesis and Evidence Scoring Watson uses Evidence Sources to validate it’s hypothesis and help score the potential answers If the question was decomposed, W atson brings together hypotheses from sub-parts 8
  • 28. DeepQA: the technology & architecture behind Watson 9 Answer Sources Initial Initial Question Question & Topic Analysis Primary Search Question Decomposition Candidate Answer Generation Hypothesis Generation Hypothesis Generation Hypothesis Generation Using models on the merged hypotheses, Wa tson can weigh evidence based on prior “experiences” Hypothesis & Evidence Scoring Synthesis Learned Models help combine and weigh the Evidence model model model model model model model model model Final Confidence Merging & Ranking 10 Once Watson has ranked its answers, it then provides its answers as well as the confidence it has in each answer. Answer & Confidence
  • 29. DeepQA: the technology & architecture behind Watson Learned Models help combine and weigh the Evidence Evidence Sources Answer Sources Initial Initial Question Question & Topic Analysis Primary Search Question Decomposition Candidate Answer Generation Hypothesis Generation Answer Scoring Evidence Retrieval Hypothesis & Evidence Scoring Deep Evidence Scoring Synthesis Hypothesis Generation Hypothesis and Evidence Scoring Hypothesis Generation Hypothesis and Evidence Scoring model model model model model model model model model Final Confidence Merging & Ranking Answer & Confidence
  • 30. Step 0 : Content Acquisition • Content acquisition is a combination of manual and automatic steps. • The first step is to analyze example questions from the problem space to produce a description of the kinds of questions that must be answered and a characterization of the application domain. • Analyzing example questions is primarily a manual task, while domain analysis may be informed by automatic or statistical analyses, such as the LAT analysis.
  • 31. Step 1 : Question Analysis Initial Initial Question The system attempts to understand Question what the question is asking and & Topic Analysis performs the initial analyses that determine how the question will be processed by the rest of the system. • Question Classification e.g. puzzle/math • Focus and Lexical Answer Type (LAT) e.g. “On this day” LAT – date/day • Relation Detection e.g. sea (India, x, west) • Decomposition - divide and conquer. Question Decomposition
  • 32. Step 2 : Hypothesis Generation 1. Answer Sources Primary Search Primary search : – Candidate Answer Generation Keyword based search – Top 250 results are considered for Candidate Answer generation. – Question Decomposition Empirical statistics : 85% time answer is within top 250 results. Hypothesis Generation Hypothesis Generation 2. CA generation : generates CAs using results of Primary Search 3. Soft Filtering – lightweight (less resource intensive) scoring algorithms to a larger set of initial candidates to prune them down to a smaller set of candidates – Reduction in number of CA to approx. 100 – Answers are not fully discarded , may be reconsidered at final stage. Hypothesis Generation
  • 33. Step 2 : Hypothesis Generation 4. Each CA plugged back into the question is considered a hypothesis which the system has to prove correct with some threshold of confidence. 5. If failed at this state , system has no hope of answering the question whatsoever. – Noise tolerance - tolerate noise in the early stages of the pipeline and drive up precision downstream – Favors recall over precision, with the expectation that the rest of the processing pipeline will tease out the correct answer, even if the set of candidates is quite large
  • 34. Step 3 : Hypothesis & Evidence scoring • Candidate answers that pass the soft filtering threshold undergo a rigorous evaluation process that involves 2 steps :1. Evidence retrieval : – Gathers additional supporting evidence for each candidate answer, or hypothesis. e.g. Passage search: gathering passages by adding CA to primary search query. 2. Scoring: – – Deep content analysis – includes many different components, or scorers, that consider different dimensions of the evidence Produce a score that corresponds to how well evidence supports a candidate answer for a given question.
  • 35. Step 4 : Final Merging and Ranking • Merging: – Multiple candidate answers for a question may be equivalent despite very different surface forms. – Using an ensemble of matching, normalization and co-reference resolution algorithms, Watson identifies equivalent and related hypothesis. – Without merging, ranking algorithms would be comparing multiple surface forms that represent the same answer and trying to discriminate among them.
  • 36. Step 4 : Final Merging and Ranking • Ranking and confidence estimation: – After merging, the system must rank the hypotheses and estimate confidence based on their merged scores – These hypothese are ran over set of training questions with known answers. – Watson’s metalearner uses multiple trained models to handle different question classes as, for instance, certain scores that may be crucial to identifying the correct answer for a factoid question may not be as useful on puzzle questions
  • 37. The Final Blow! (ctd.) “I for one welcome our new computer overlords” - Jennings 38
  • 38. Watson – a Workload Optimized System • • 2880 POWER7 cores • POWER7 3.55 GHz chip • 500 GB per sec on-chip bandwidth • 10 Gb Ethernet network • 15 Terabytes of memory • 20 Terabytes of disk, clustered • Can operate at 80 Teraflops • Runs IBM DeepQA software • Scales out with and searches vast amounts of unstructured information with UIMA & Hadoop open source components • Linux provides a scalable, open platform, optimized to exploit POWER7 performance • 1 90 x IBM Power 7501 servers 10 racks include servers, networking, shared disk system, cluster controllers Note that the Power 750 featuring POWER7 is a commercially available server that runs AIX, IBM i and Linux and has been in market since Feb 2010
  • 39. Watson: Precision, Confidence & Speed • Deep Analytics – We achieved champion-levels of Precision and Confidence over a huge variety of expression • Speed – By optimizing • Results – Watson’s computation for Jeopardy! on 2,880 POWER7 processing cores we went from 2 hours per question on a single CPU to an average of just 3 seconds – fast enough to compete with the best. in 55 real-time sparring against former Tournament of Champion Players last year, Watson put on a very competitive performance, winning 71%. In the final Exhibition Match against Ken Jennings and Brad Rutter, Watson won!
  • 40. Potential Business Applications Healthcare Analytics • Analyzing: E-Medical records, hospital reports • For: Clinical analysis; treatment protocol optimization • Benefits: Better management of chronic diseases; optimized drug formularies; improved patient outcomes • Analyzing: Call center logs, emails, online media • For: Buyer Behavior, Churn prediction • Benefits: Improve Customer satisfaction and retention, marketing campaigns, find new revenue opportunities Crime Analytics Insurance Fraud • Analyzing: Case files, police records, 911 calls… • For: Rapid crime solving & crime trend analysis • Benefits: Safer communities & optimized force deployment • Analyzing: Insurance claims • For: Detecting Fraudulent activity & patterns • Benefits: Reduced losses, faster detection, more efficient claims processes Automotive Quality Insight Social Media for Marketing • Analyzing: Tech notes, call logs, online media • For: Warranty Analysis, Quality Assurance • Benefits: Reduce warranty costs, improve customer satisfaction, marketing campaigns 41 Customer Care • Analyzing: Call center notes, SharePoint, multiple content repositories • For: churn prediction, product/brand quality • Benefits: Improve consumer satisfaction, marketing campaigns, find new revenue opportunities or product/brand quality issues
  • 41. References • The AI magazine – Ferrucci, David, et al. "Building Watson: An overview of the DeepQA project." AI magazine 31.3 (2010): 59-79. • Watson Systems: – http://www-03.ibm.com/innovation/us/watson/ • Wiki Page – http://en.wikipedia.org/wiki/Watson_%28computer%2 • Building Watson A Brief Overview of the DeepQA Project by Joel Farrell, IBM – http://www.medbiq.org/sites/default/files/presentations /2011/Farrell.ppt
  • 42. References • What is Watson, really? – http://www01.ibm.com/software/ebusiness/jstart/downloads/IOD2011.ppt – Authors : Keyur Dalal (IBM), Vladimir Stemkovski (IBM) and Jeff Sumner (IBM) • Jeopardy! IBM Watson Day 1 (Feb 14, 2011) – http://www.youtube.com/watch?v=seNkjYyG3gI&feature=related • Science Behind an Answer– http://www-03.ibm.com/innovation/us/watson/what-iswatson/science-behind-an-answer.html – Video: http://youtu.be/DywO4zksfXw

Editor's Notes

  1. Consider first what computers are good at.Here is a equation – the natural long of 12 million 546 thousand, 7 hundred and nintey-eight * Pi / 34,567.46 Anyone?Anyone know if its greater or less than 1?Comptuter?&lt;click&gt;The answer is 0.00885How about this?Select the payment where the owner is David Jones and the type of the product owned is a laptop.&lt;click&gt; The computer can easily traverse this information in tables or “structured databases” and go from row to column to row and find its way to the Payment field and get the answer.It does this sort of stuff really, really well storing just the information it needs to answer these database queriesHow about matching keywords does that real well too.Here to figure out that David Jones is the same as David Jones it compares each letter and knows they are the same so it concludes they are equal.&lt;click&gt;In the next example based on that simple letter matching algorithm it does not think “Dave Jones” is the same as “David Jones”To make that very simple leap it would have to know that that Dave is nickname for David and some likelihood they are the same.As far as the computer is concerned with no additional knowledge David and Dave are as different as Mary and Mauri.
  2. Computer programs are natively explicit and exacting in their calculations over numbers and symbols.But Natural Language - -the words and phrases we humans use to communicate with one another -- is implicit -- the exact meaning is not completely and exactly indicated -- but instead is highly dependent on the context -- what has been said before, the topic, how it is being discussed -- factually, figuratively, fictionally etc.Moreover, natural language is often imprecise – it does not have to treat a subject with numerical precision…humans naturally interact and operate all the time with different degrees of uncertainty and fuzzy associations between words and concepts. We use huge amounts of background knowledge to reconcile and interpret what we read.Consider these examples….it is one thing to build a database table to exactly answer the question “Where is someone born?”. The computer looks up the name in one column and is programmed to know that the other column contains the birth place. STRUCUTRED information, like this database table, is designed for computers to make simple comparisons and to be exactly as accurate as the data entered into the database. Natural language is created and used by humans for humans. A reason we call natural language “Unstructured” is because it lacks the exact structure and meaning that computer programs typically use to answer questions. Understanding what is being represented is a whole other challenge for computer programs .Consider this sentence &lt;read&gt; It implies that Albert Einstein was born in Ulm – but there is a whole lot the computer has to do to figure that out any degree of certainty - it has to understand sentence structure, parts of speech, the possible meaning of words and phrases and how they related to the words and phrases in the question. What does a remembrance, a water color and an Otto have to do with where someone was born.Consider another question in the Jeopardy Style … X ran this? And this potentially answer-bearing sentence. Read the Sentence…Does this sentence answer the question for Jack Welch - -what does “ran” have to do with leadership or painting. How would a computer confidently infer from this sentence that Jack Welch ran GE – might be easer to deduce that he was at least a painter there.
  3. Among many things, IBM Research is interested in pursuing exploratory research and Grand Challenges. Consider Deep Blue, the first computer to win against a grand master chess player.The goals are for a Grand Challenge Project are to engage and inspire the scientific community to have broader impact on science and society. To push the limits of computer technology and ultimately to drive innovation into business applications relevant to IBM customers. Like, as we will see, in the case of the Jeopardy! Challenge, to have impact on Business Intelligence, Knowledge Discovery and Knowledge Management Compliance, Legal, Healthcare, Business Integrity, Customer Relationship Management, Web Self-Service , Product Support and National Intelligence.
  4. Automatic Open-Domain Question Answering is represents a very long standing challenge in Computer Science, specifically in the areas of NLP, Information Retrieval and AI.&lt;go through slide&gt;
  5. Lets look at some examples of what we call Basic Factiods. &lt;Read the fish example&gt;  Notice that the type of thing being asked for is often indicated, but can go from specific to very vague.  If you are starting to think -- we can just build a database of fish, think again. &lt;Read the form example&gt;  imagine building a database of everything that can be a form Or consider this example  &lt;read the Lincoln example&gt; There are many questions that do not indicate the type at all.
  6. We do NOT approach the Jeopardy Challenge by trying to anticipate all questions and building databases of answers. In fact, in a random sample of 20,000 Jeopardy Clues we automatically identified the main subject or type being asked about. We found that in 13% of the sampled questions, there was no clear indication at all for the type of answer and the players must rely almost entirely on the context to figure out what sort of answer is required.  The remaining 87% is what you see is this graph. It shows, what we call, a very long tail. There is no small-enough set of topics to focus on that covers enough ground. Even focusing on the most frequent few (The head of the tail to the left) will cover less-than 10% of the content. 1000’s of topics from hats to insects to writers to diseases to vegetables are all equally fair game. And FOR these 1000’s of types, 1000’s of different questions may be asked and then phrased in an huge variety of different of ways. So, our primary approach and research interest is not to collect and organize databases. Rather it is ON reusable Natural Language Processing (NLP) technology for automatically understanding naturally occurring human-language text. AS-IS, pre=existing structured knowledge in the form of DBs or KBs is used to help to bridge meaning and interpret multiple NL texts. But because of the broad domain and the expressive language used in the questions and in content, pre-built databases have very limited use of answering any significant number of questions. The focus rather is on NL understanding.
  7. Note that the “Text Mining” inference is very different from the Wordnet Tycor – mining ordinary language as we find that people consider a FLUID a type of LIQUID, when a strict taxonomy like Wordnet, correctly does not. However, for this domain, Jeopardy!, ML techniques learn to trust the relations mined from natural language text which allows the system to boost Cytoplasm’ Tycor score.