Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Aquaint kickoff-overview-prange
1. AQUAINT R&D Program
Advanced QUestion Answering for INTelligence
Dr. John D. Prange
AQUAINT Program Director
JPrange@nsa.gov
301-688-7092
http://www.ic-arda.org
3 December 2001
2. Outline
• Information Exploitation Thrust
• AQUAINT Program
– The Vision
– The Challenges
– The Plan of Attack
– The AQUAINT Team
• Intelligence Community Perspective on
Information Exploitation and AQUAINT
• Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
3. Information Exploitation (Info-X)
What Functions Does It Include?
Information Analytic
Retrieval Knowledge
Content Data Assessment
Presentation and and
Mark-up Interpretation
Visualization
Data Filtering Reporting and
& Selection Dissemination
Content Data Synthesis
Transformation and Fusion
Information Information
Discovery Understanding
Info-X is Focused on Content & Its Meaning!
AQUAINT Kickoff – 3 December 2001
4. We Need To Dramatically Improve Our Ability
to Find & Understand Information
Report
With Each Passing Day . . . Report
……………………..
• More “Hay” “Barriers” to Deep Understanding of Content ……………………..
……………………..
……………………..
……………………..
• Lower No. Of “Needles
per Volume of Hay”
Analysis: Turning Raw Data ……………………..
……………………..
……………………..
……………………..
• Fewer Analysts intoofReportable Intelligence
Lack Variable Limited
……………………..
AND Multiple Knowledge
Control on Topics Reasoning
• Less Time!
Sources Representation
Creation Information Analytic Domains
&
Intelligence
Capabilities
Retrieval Knowledge Community
Multiple & Many Foreign Goal /
Multi- Data
Content Data Integrity/
Languages/
Assessment
and
Objective of
Products
Markup Use of Deception and
Presentation Interpretation
Media VisualizationCharacter Scripts Originator
Data Filtering Reporting and
Natural
& Selection Missing, Types, Sources, Degree of
Dissemination
Image/Video
(vs. Artificial) Conflicting, Quantities Interpretation
Understanding
Language Ambiguous Data of Errors & Judgement
Content Data Synthesis
Importance
Transformation Depth of Cross Importance
and Fusion
of Time Understanding Document of
Raw Data Dimension Required Analysis
Information Context
Information
“Finding the Discovery Understanding
Needles in Role of
Formal vs. Automated Lack of
the Haystack” Informal Information Automated
Knowledge
Clearly . . .an Analyst Intensive Activity
It Remains Conversation Extraction Learning
We MUST Reduce these “Barriers” &
Create “Cracks in this Wall”!
AQUAINT Kickoffand December 2001
So much hay – 3 so little time! But How . . . 4
5. We Need To Dramatically Improve Our Ability
to Find & Understand Information
Report
Report
……………………..
……………………..
……………………..
……………………..
……………………..
Analysis: Turning Raw Data ……………………..
……………………..
……………………..
……………………..
into Reportable Intelligence ……………………..
Information Analytic
Intelligence
Retrieval Knowledge Community
Content Data
Assessment
and Products
Markup Presentation and Interpretation
Visualization
Data Filtering Reporting and
& Selection Dissemination
Content Data Synthesis
Transformation and Fusion
Raw Data Information
Information
“Finding the Discovery Understanding
Needles in
the Haystack”
It Remains an Analyst Intensive Activity
AQUAINT Kickoff – 3 December 2001 5
6. We Need To Dramatically Improve Our Ability
to Find & Understand Information
Report
With Each Passing Day . . . Report
……………………..
• More “Hay” “Barriers” to Deep Understanding of Content ……………………..
……………………..
……………………..
……………………..
• Lower No. Of “Needles
per Volume of Hay”
Analysis: Turning Raw Data ……………………..
……………………..
……………………..
……………………..
• Fewer Analysts intoofReportable Intelligence
Lack Variable Limited
……………………..
AND Multiple Knowledge
Control on Topics Reasoning
Sources Representation
• Less Time! Creation Information Analytic Domains
& Capabilities
Retrieval Knowledge
Multiple & Many Foreign Goal /
Assessment
Data Integrity/
Multi- Data
Content
Languages/ and
Objective of
Markup Use of Deception and
Presentation Interpretation
Media VisualizationCharacter Scripts Originator
Data Filtering Reporting and
Natural
& Selection Missing, Types, Sources, Degree of
Dissemination
Image/Video
(vs. Artificial) Conflicting, Quantities Interpretation
Understanding
Language Ambiguous Data of Errors & Judgement
Content Data Synthesis
Importance
Transformation Depth of Cross Importance
and Fusion
of Time Understanding Document of
Raw Data Dimension Required Analysis
Information Context
Information
“Finding the Discovery Understanding
Needles in Role of
Formal vs. Automated Lack of
the Haystack” Informal Information Automated
Knowledge
It Remains an Analyst Intensive Activity
Conversation Extraction Learning
AQUAINT Kickoff – 3 December 2001 6
7. We Need To Dramatically Improve Our Ability
to Find & Understand Information
Report
With Each Passing Day . . . Report
……………………..
• More “Hay” “Barriers” to Deep Understanding of Content ……………………..
……………………..
……………………..
……………………..
• Lower No. Of “Needles
per Volume of Hay”
Analysis: Turning Raw Data ……………………..
……………………..
……………………..
……………………..
• Fewer Analysts intoofReportable Intelligence
Lack Variable Limited
……………………..
AND Multiple Knowledge
Control on Topics Reasoning
Sources Representation
• Less Time! Creation Information Analytic Domains
& Capabilities
Retrieval Knowledge
Multiple & Many Foreign Goal /
Assessment
Data Integrity/
Multi- Data
Content
Languages/ and
Objective of
Markup Use of Deception and
Presentation Interpretation
Media VisualizationCharacter Scripts Originator
Data Filtering Reporting and
Natural
& Selection Missing, Types, Sources, Degree of
Dissemination
Image/Video
(vs. Artificial) Conflicting, Quantities Interpretation
Understanding
Language Ambiguous Data of Errors & Judgement
Content Data Synthesis
Importance
Transformation Depth of Cross Importance
and Fusion
of Time Understanding Document of
Raw Data Dimension Required Analysis
Information Context
Information
“Finding the Discovery Understanding
Needles in Role of
Formal vs. Automated Lack of
the Haystack” Informal Information Automated
Knowledge
Clearly . . .an Analyst Intensive Activity
It Remains Conversation Extraction Learning
We MUST Reduce these “Barriers” &
Create “Cracks in this Wall”!
AQUAINT Kickoffand December 2001
So much hay – 3 so little time! But How . . . 7
8. Info-X R&D Programs:
The Ideal Build Process
ARDA Thrust:
Information Exploitation
End-to-end
Customer
Operational Problems Operational
Needs
Tests
Operational Capabilities Customer’s
Data
Technical Needs
Research R&D
Response Component
Research Projects Level
Testing
AQUAINT Kickoff – 3 December 2001
9. Current Info-X R&D Programs
• AQUAINT
Advanced QUestion & Answering for INTelligence
Full R&D
• VACE Programs
Video Analysis and Content Extraction consisting of
Three
2-Year Phases
• GI2Vis
Geospatial Intelligence Information Visualization
• LEMUR Exploratory
Statistical Language Modeling for Information Retrieval R&D Programs
consisting of
Programs
• NDHB 1-Year
+ Option Year
Non-Linear Dynamics from Human Behavior
AQUAINT Kickoff – 3 December 2001
10. Outline
• Information Exploitation Thrust
• AQUAINT Program
– The Vision
– The Challenges
– The Plan of Attack
– The AQUAINT Team
• Intelligence Community Perspective on
Information Exploitation and AQUAINT
• Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
11. “Some look at things
that are and ask why.
I dream of things that
might be and ask why
not.”
Robert Kennedy
1925-1968
AQUAINT Kickoff – 3 December 2001
14. TREC QA Track Approach
• ARDA & DARPA co-sponsoring the Question Answering Track in
the NIST’s organized Text Retrieval Conference (TREC) Program.
(Starting with TREC-8 in Nov 1999)
• TREC-10 Results (Nov 2001):
– 500- factual questions; About 50
questions had no answer in the Top System: 70% of the
TREC-10 Data sources; Used “Answers” found in their
“Real” Questions top 5 50-byte Passages
– Data source: approx. 3 GByte
database of ~980K news
stories
– 36 US & international
organizations participated;
92 separate runs evaluated
– System output: top 5 regions
(50 bytes) in a single story
believed to contain Answer
to the given question
AQUAINT Kickoff – 3 December 2001
15. Pilot Evaluations
TREC 10 QA Track
• The “List Task”
– Sample Questions:
• “Name 4 US cities that have a “Shubert” Theater”
• “Name 30 individuals who served as a cabinet officer under Ronald
Reagan”
– Evaluation Metric: (Number of distinct instances divided by the
target number of instances averaged over 25 questions)
• Top System among 18 runs: Achieved 76% Accuracy
• The “Context Task”
– Sample Series of Questions:
• “How many species of spiders are there?”
• “How many are poisonous to humans?”
• “What percentage of spider bites in the US are fatal?”
– Evaluation Metric: Same as Main Task; 10 Series of Questions; 42
total Questions)
• Top System: Found answer for 34 of the 42 total questions (81%)
AQUAINT Kickoff – 3 December 2001
16. AQUAINT
Advanced QUestion & Answering for INTelligence
In a foreign news broadcast a team of analysts observe a previously
unknown individual conferring with the Foreign Minister. They suspect that
he/she is really a new senior advisor.
What influence
Does this signal
What are does he/she
that other
his/her have on FM?
policy changes
views?
are coming?
What do we
know about
him/her?
Who is this And still more
advisor? questions ???
Overarching Context /
Operational Requirement
AQUAINT Kickoff – 3 December 2001
18. AQUAINT Is Skipping
Ahead Two Generations
Multiple Key
Barriers to
Content
Understanding
Will Be
Aggressively
Attacked
Commercial World & Current R&D Efforts
Are Addressing the Next Generation
But Only Selected Content Understanding
Barriers Are Being Aggressively Attacked
19. Outline
• Information Exploitation Thrust
• AQUAINT Program
– The Vision
– The Challenges
– The Plan of Attack
– The AQUAINT Team
• Intelligence Community Perspective on
Information Exploitation and AQUAINT
• Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
20. Top 10 Challenges
1) Satisfy QA requirements of the “Professional”
Information Analyst
2) Pursue QA Scenarios and not just isolated,
factually based QA
3) Support a collaborative, multiple analyst
environment
4) Some times SMALL things really matter and
other times BIG things don’t
5) Advanced QA must attack the “Data Chasm”
6) Time is of the Essence
AQUAINT Kickoff – 3 December 2001
21. Top 10 Challenges
7) Must extract, represent and preserve
information uncovered when searching for
answers
8) Rapidly increasing importance of Knowledge of
all types -- regardless of the approach
9) Expanding requirements for more advanced
learning and reasoning methods/approaches
10) Discovering the correct answer will be hard
enough; but crafting an appropriate, articulate,
succinct, explainable response will be even harder
AQUAINT Kickoff – 3 December 2001
22. Top 10 Challenges
1) Satisfy QA requirements of the “Professional”
Information Analyst
AQUAINT Kickoff – 3 December 2001
23. Professional Information Analysts:
Target Audience for AQUAINT -- Who are They?
• For ARDA and AQUAINT they are:
– Intelligence Community and Military Analysts
• But there are other Potential Target
Audiences of “Professional Information
Analysts”:
– Investigative / “CNN-type” Reporters
– Financial Industry Analysts / Investors
– Historians / Biographers
– Lawyers / Law Clerks
– Law Enforcement Detectives
– And Others
AQUAINT Kickoff – 3 December 2001
24. Intelligence Community Analysts –
Who are they?
What Do We See
When We Focus
Directly In On Our
Intelligence
Analysts?
AQUAINT Kickoff – 3 December 2001
25. Some Observations about
Intelligence Analysts (IA’s)
MAJOR DIFFERENCES DO EXIST AMONG IA’s
• First: There are different levels of intelligence within
the IC -- Strategic, Operational, Tactical --
– ARDA is focusing on Strategic Level IA’s
• Second: There is no stereotypical analyst even within
our Strategic Level Intelligence Agencies.
– Clear, significant differences exist across the national IC agencies as
well as across the different “INT’s”
– Additional, significant differences are accentuated by total breadth
and variety of all IC reporting requirements.
– There are even significant differences between
IA’s within the same IC agency
• Third: There are significant skill level
differences among IA’s
– Yes, the most seniors IA’s are exceptional
– But the junior IA’s aren’t bad either
AQUAINT Kickoff – 3 December 2001
26. Some Observations about
Intelligence Analysts (IA’s)
BUT UNIVERSAL SIMILARITIES CAN BE
IDENTIFIED ACROSS OUR IA’s
• We believe that these similarities are significant and
strong enough that:
– Taken collectively they highlight key differences between
Intelligence Analysts and the Emerging Casual Information
Consumer that is being fueled by the Information Revolution and
targeted by the commercial world
– A common set of critically important Info-X problems
for the IC can be identified and articulated
– Multi-agency R&D programs against these
common Info-X problems can be developed
to the benefit of all IC Agencies
AQUAINT Kickoff – 3 December 2001
27. Universal Similarities Across IA’s
1. IA’s are information professionals
2. IA’s are almost always subject matter experts within their
assigned task areas
3. IA’s track and follow a given event, scenario, problem, situation
for an extended period of time
4. Increasingly IA’s are performing all source analysis and
production
5. IA’s typically work with overwhelming volumes of data and
information, but that’s the good news
6. Increasingly IA’s must collaborate with other IA’s
7. IA’s are focused on their Mission and will do whatever it takes
to accomplish it
8. The Intelligence that IA’s produce is judged against the highest
standards (called the “Tenets of Intelligence”)
- Timeliness - Accuracy - Usability
- Completeness - Relevance
AQUAINT Kickoff – 3 December 2001
28. Universal Similarities Across IA’s
1. IA’s are information professionals --
That is, IA’s are not casual developers and consumers of information
2. IA’s are almost always subject matter experts within
their assigned task areas --
That is, IA’s have broad and deep knowledge of their subject area
and possess profound skills developed over 10’s of years of
experience
3. IA’s track and follow a given event, scenario,
problem, situation for an extended period of time --
That is, IA’s frequently have developed extensive working files
related to their investigation; IA’s information needs and queries carry
within them an extensive, non-expressed context and background
AQUAINT Kickoff – 3 December 2001
29. Universal Similarities Across IA’s
4. Increasingly IA’s are performing all source analysis
and production --
For example, the language analyst must use intercept from multiple
media, multiple languages and the imagery analyst must know how
to combine information from multiple INT’s.
5. IA’s typically work with overwhelming volumes of
data and information, but that’s the good news --
Raw data on which the IA developed information is based is often
“dirty”, “errorful”, “contradictory or conflicting”, “of questionable or
unknown validity”, “incomplete or missing”, “time sensitive”, “highly
fragmented”, etc.
6. Increasingly IA’s must collaborate with other IA’s --
These IA’s may be working in different organizations, different
agencies and they might not even know that each other would benefit
from collaboration.
AQUAINT Kickoff – 3 December 2001
30. Universal Similarities Across IA’s
7. IA’s are focused on their Mission and will do
whatever it takes to accomplish it --
That is, IA’s are highly adaptable and resourceful. They will develop
workable strategies and attacks regardless of the roadblocks that our
collection and processing “stovepipes” create and of the limitations
that our “brain dead” analytic tools offer.
8. The Intelligence that IA’s produce is judged against
the highest standards (called the “Tenets of
Intelligence”) --
– Timeliness
– Accuracy
– Usability
– Completeness
– Relevance
AQUAINT Kickoff – 3 December 2001
31. Top 10 Challenges
1) Satisfy QA requirements of the “Professional”
Information Analyst
2) Pursue QA Scenarios and not just isolated,
factually based QA
AQUAINT Kickoff – 3 December 2001
32. Implications of QA Scenarios
• Requires handling a Full Range of Complexity & Continuity of
Questions
• Need to understand & track the analysts’ line of reasoning and
flow of argument
• QA System requires significantly greater insight into
knowledge, desires, past experiences, likes and dislikes of
“Questioner”
Judgement Predictive
Questions
• Place much higher value on Interpretive
Questions? ?
Questions
?
recognizing and capturing Why
Questions
“background” information ? Other
Questions?
Factoid
• Questioner/System dialogue Question?
is now more than just a Overarching Context /
means for clarification Operational Requirement
AQUAINT Kickoff – 3 December 2001
33. Top 10 Challenges
1) Satisfy QA requirements of the “Professional”
Information Analyst
2) Pursue QA Scenarios and not just isolated,
factually based QA
3) Support a collaborative, multiple analyst
environment
AQUAINT Kickoff – 3 December 2001
34. Collaboration within QA
• Standard Collaboration • Non-Standard Discovery
(From an Analyst Perspective) (From a System Perspective)
– Who else is working all or a – Identify previous QA
portion of my task? Scenarios that have
“similarity” to current QA
– What do they know that I Scenario. Compare &
don’t and vice versa? Contrast
– Can we share/work together? – Use / Build-on / Update
previous results
Knowledge
Other Analysts
Bases;Technical – Uncover new data sources
Question & Requirement
Databases
QUESTION
Context; Analyst Background
– Borrow a successful “line
Knowledge
???? Query of reasoning” or
Assessment,
Natural Statement of
Question; Advisor, “argument flow”
Use of Collaboration Focus
Multimedia Examples – Alerts analyst to different
Question
Clarification
Understanding interpretations or to
and Interpretation
overlooked / undervalued
AQUAINT Kickoff – 3 December 2001 data
35. Top 10 Challenges
1) Satisfy QA requirements of the “Professional”
Information Analyst
2) Pursue QA Scenarios and not just isolated,
factually based QA
3) Support a collaborative, multiple analyst
environment
4) Some times SMALL things really matter and
other times BIG things don’t
AQUAINT Kickoff – 3 December 2001
36. “Small & Big” - Can we tell the difference?
• Some times SMALL differences can produce
significantly different results/interpretations:
– Stop Words
• “Books {by; for; about} kids”
– Attachments
• “The man saw the woman in the park with the telescope.”
– Co-reference
• “John {persuaded; promised} Bill to go. He just left.”
• “Mary took the pill from the bottle. She swallowed it.”
• Other times BIG differences can produce the same/
similar results:
– “Name the films in which Richard Harris starred.”
– “Richard Harris played a leading role in which movies?”
– “In what Hollywood productions did Richard Harris receive top
billing?”
AQUAINT Kickoff – 3 December 2001
37. Top 10 Challenges
1) Satisfy QA requirements of the “Professional”
Information Analyst
2) Pursue QA Scenarios and not just isolated,
factually based QA
3) Support a collaborative, multiple analyst
environment
4) Some times SMALL things really matter and
other times BIG things don’t
5) Advanced QA must attack the “Data Chasm”
AQUAINT Kickoff – 3 December 2001
38. Attacking the Data Chasm
Today Level I Level II Future
Level III
Mulit-Valued
Questions Factual Questions
Single Cross Media Full
Factual Cross Document Context-Based
Isolated Simple Judgement Question
Questions Scenario
Data Chasm
Increasing MANY Heterogeneous
Missing Reliability Contradictory Synthesis Across
Volumes Data Sources;
Data of Data Data “Documents”/Media
(Petabyte & up) All Types, Sizes, Locations
Answers Variable Narrative
Fully Intersected;
Automatically
Summary;
50/250 Byte Generated;
Fixed Templates Multi-Media
Passage from Variable
or Presentations;
Single Text Structure/
Tabular Lists Simple Interpreted
Document Format;
Results
Full Context
AQUAINT Kickoff – 3 December 2001 Responses
39. AQUAINT:
Data Types
Structured / Semi-Structured Unstructured
Technical /
“Tagged Data” Abstract
Visual
KB’s DB’s (e.g. Web Data) Data
Sensor Geospatial
Video Still Images
Human Economic Other
Language
Media Language Genre
Newswire /
Text English News Broadcast
Documents Foreign
Language 1 Technical
Speech Foreign Formal / Informal
Language 2 Communication
Multi-Media Foreign
Language N Other
AQUAINT Kickoff – 3 December 2001
40. AQUAINT:
Data Types
Structured / Semi-Structured Unstructured
Technical /
“Tagged Data” Abstract
Visual
KB’s DB’s (e.g. Web Data) Data
Sensor Geospatial
Video Still Images
Human Economic Other
Language
DATA FOCUS OF
Media Language Genre RELATED QA
PROGRAMS / ACTIVITIES
Newswire /
Text English Commercial
News Broadcast “Ask Jeeves”
Documents Foreign DARPA’s DAML
Language 1 Technical
DARPA’s RKF
Speech Foreign Formal / Informal DARPA’s TIDES & TDT
Language 2 Communication
TREC QA Track
Multi-Media Foreign
Other ARDA’s VACE
Language N
ARDA’s GI2Vis
AQUAINT Kickoff – 3 December 2001
41. AQUAINT:
Phase I Data Dimensions
Data Dimension Requirement Example
1. Focused Single media, Single language, and English newspaper/
single genre in an unstructured data newswire articles (text)
Source
2. Multiple Media Two or more of the following: text (clean, Question where the
degraded, and speech recognition answer is summarization
produced), raw speech, still imagery, of information found in
video data, abstract data (technical, video clips & may contain
geospacial), and related media a table of technical data
extracted from various
sources (geospacial, text,
etc.)
3. Cross Lingual English questions with foreign language English question with
references and passages. Foreign answer derived from
languages could be expressed using any single media (newswire)
number of foreign character scripts and material in Chinese or
encoding schemes. Arabic and other
language.
AQUAINT Kickoff – 3 December 2001
42. AQUAINT:
Phase I Data Dimensions
Data Dimension Requirement Example
4. Multiple Genre Formal and informal correspondence Question with answer
(various media), formal dialog, informal derived from formal
conversations or discussions, technical/ correspondence and
journal articles, newswire/broadcast news; journal articles
advertisements; product and technical
descriptions, government reports; public
databases
5. Structured & Tables, charts and maps, diagrams, linked Question with answer
Unstructured data or directed graph data, structured derived from knowledge
databases, structured transactions; large base and substantiated
knowledge bases; linked web/pages; and with information from
html/xml documents PLUS unstructured technical journal.
data from one of the media, lingual or
genre dimensions.
AQUAINT Kickoff – 3 December 2001
43. Top 10 Challenges
1) Satisfy QA requirements of the “Professional”
Information Analyst
2) Pursue QA Scenarios and not just isolated,
factually based QA
3) Support a collaborative, multiple analyst
environment
4) Some times SMALL things really matter and
other times BIG things don’t
5) Advanced QA must attack the “Data Chasm”
6) Time is of the Essence
AQUAINT Kickoff – 3 December 2001
44. Time: Our Achilles Heel?
• Real Difficulties Exist in:
– Extracting, correctly interpreting time references
& then creating manageable timelines
– Estimating & updating changing reliability
of information over time
– Processing information in time sequence
e.g. Tracking the details of an evolving event
over time -- A whole different set of problems
• And of course:
– We can’t forget all of the issues related to the
timeliness of the system’s response to our
question(s) -- we’ll need at least “near real
time responses”
March April May June July August
AQUAINT Kickoff – 3 December 2001
45. Top 10 Challenges
7) Must extract, represent and preserve
information uncovered when searching for
answers
AQUAINT Kickoff – 3 December 2001
46. QA Scenarios: A Different Paradigm?
• Current Analytic Paradigm: • A Different Paradigm may be
– Sequentially “Filter Down” to the useful when handling QA
final result Scenarios:
Data – Cast a “wider net” while searching
for “golden nuggets” (Answers)
How Wide to What Info to Retain?
Cast the “Net”? In what form?
For how long?
Background
Processing &
Analysis
Answers Discarded
Space of Data Objects and Sources
Results – Automatically Extract, Represent,
and Preserve “closely related”
– Works when QA’s are
background information within
independent, isolated activities
context of the QA Scenario
AQUAINT Kickoff – 3 December 2001
47. Top 10 Challenges
7) Must extract, represent and preserve
information uncovered when searching for
answers
8) Rapidly increasing importance of Knowledge of
all types -- regardless of the approach
AQUAINT Kickoff – 3 December 2001
48. Complex QA:
The Need for Ever Increasing Knowledge -- Of All Types
DIMENSIONS OF THE QUESTION DIMENSIONS OF THE ANSWER
PART OF THE QA PROBLEM PART OF THE QA PROBLEM
Scope Multiple
Sources
Advanced Simple Advanced
Simple QA Answer, QA
Factual
The image cannot be displayed. The image cannot be displayed.
Your computer may not have
R&D
enough memory to open the image, Single
Your computer may not have
R&D
enough memory to open the image,
Question
or the image may have been or the image may have been
Program
corrupted. Restart your computer,
and then open the file again. If the
red x still appears, you may have to
Source Program
corrupted. Restart your computer,
and then open the file again. If the
red x still appears, you may have to
delete the image and then insert it delete the image and then insert it
again. again.
Judgement Interpretation
Increasing
Increasing
Knowledge Knowledge
Context Requirements ** Fusion Requirements **
** Knowledge Requirement would be better represented with a
whole “quiver of arrows” of different sizes, lengths and types
AQUAINT Kickoff – 3 December 2001
49. Top 10 Challenges
7) Must extract, represent and preserve information
uncovered when searching for answers
8) Rapidly increasing importance of Knowledge of all
types -- regardless of the approach
9) Expanding requirements for more advanced
learning and reasoning methods/approaches
AQUAINT Kickoff – 3 December 2001
50. Improved Reasoning & Learning
In a foreign news broadcast a team of analysts observe a previously
unknown individual conferring with the Foreign Minister. They suspect that
he/she is really a new senior advisor.
FOCUS What influence
Does this signal
What are does he/she
that other
his/her have on FM?
policy changes
views?
are coming?
What do we
know about
him/her?
Who is this And still more
advisor? questions ???
Overarching Context /
Operational Requirement
AQUAINT Kickoff – 3 December 2001
51. Improved Reasoning & Learning
Advanced Reasoning:
Follow-up Follow-up
• Use Multi-level Plans Leads Leads
• Create and evaluate
chains of reasoning
• Reason across hetero- Education
geneous data sources TV & Radio
Broadcasts, Past
• Infer answers from Collected Positions Raw “Bio”
Newspapers Information
data extracted from Views
& Other Family
multiple sources when Archives New Senior
the answer is not Advisor Travels
explicitly stated Cross Fertilization Other
Activities
• Utilize Link Analysis &
Summarized
Evidence Discovery
Results Summarized
“Views:
• Plus other strategies Past & “Bio” Results
………..….
Present” .….… ……..…….
Advanced Learning: ………..….
……..…….
….….. ………..….
• Automatically .…….
.…….
….…..
….….. ……..…….
learn new or modify .……. ….….. …………...
.……. ….…..
existing reasoning
strategies
AQUAINT Kickoff – 3 December 2001
52. Top 10 Challenges
7) Must extract, represent and preserve information
uncovered when searching for answers
8) Rapidly increasing importance of Knowledge of all
types -- regardless of the approach
9) Expanding requirements for more advanced
learning and reasoning methods/approaches
10) Discovering the correct answer will be hard
enough; but crafting an appropriate, articulate,
succinct, explainable response will be even harder
AQUAINT Kickoff – 3 December 2001
53. Difficulties in Generating Answers
• Natural Language Generation continues to be a difficult, open
research area.
– Adding the requirement to generate multimedia answers makes this
problem even harder.
• Providing the ability to explain and/or justify answers also
continues to be a difficult, open research area.
– The more complex the line or chain of reasoning, the more complex
the explanation and/or justification
• QA Scenarios and differences across analysts add additional levels
of complexity. The Same Question asked within different scenarios
by different analysts could easily produce substantially:
– Different Answer content
– Different Answer format, structure, depth and/or breadth of coverage
– Or both
AQUAINT Kickoff – 3 December 2001
54. Outline
• Information Exploitation Thrust
• AQUAINT Program
– The Vision
– The Challenges
– The Plan of Attack
– The AQUAINT Team
• Intelligence Community Perspective on
Information Exploitation and AQUAINT
• Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
55. AQUAINT:
ARDA’s Plan of Attack
• ARDA’s newest major Info-X R&D Program
– Envisioned as a high risk, long term R&D Program:
• Phase I Fall 2001 - Fall 2003
• Phase II Fall 2003 - Fall 2005
• Phase III Fall/Winter 2005 - Fall/Winter 2007
• Focus on Final Objective from start
– Incrementally add media, data sources, & complexity of
questions & answers during each phase
• Each of AQUAINT’s 3 Phases:
– Use Zero-Based, Open BAA-styled Solicitations
– Focus on Key Research Objectives
– Be Closely Linked to Parallel System Integration/Testbed Efforts
& Data Collection/Preparation and Evaluation Efforts
AQUAINT Kickoff – 3 December 2001
56. AQUAINT:
R&D Focused on Three Functional Components
Other Analysts Knowledge Bases; Partially
Technical Annotated &
Question & Requirement Databases Supplemental Structured Data
Use
Context; Analyst Background
Automatic
QUESTION Knowledge KB Metadata
Queries Creation
???? Query Multiple
Assessment, Translate Queries Source
Natural Statement of into Source Specific
Advisor, Specific
Question; Retrieval Languages
Use of Collaboration Queries Queries
Answer
Context
Multimedia Examples Question Single, Merged
Question & Ranked List of
Clarification Under- Answer Relevant “Documents”
Multiple
Ranked
standing and Context Relevant
Lists
Supple-
mental
Relevant Use
FINAL Interpretation “Knowledge” “Documents”
Analyst
ANSWER • Relevant information
Proposed Query Refinement extracted and combined
Feed-
Answer based on Analyst where possible; Multiple
back
Feedback • Accumulation of Knowledge Sources;
across “Documents” Multiple Media;
Multi-Lingual;
• Cross “Document” Multiple Agencies
• Formulate Answer for Results of Analysis Summaries created;
Analyst in form they want • Language/Media
• Multimedia Navigation
Iterative Refinement
Independent Concept Determine
Tools for Analyst Review Representation
of Results based the
on Analyst Feedback • Inconsistencies noted;
Answer • Proposed Conclusions Answer
Formulation and Inferences Generated
AQUAINT Kickoff – 3 December 2001
57. AQUAINT:
Cross Cutting/Enabling Technologies R&D Areas
Specifically Solicited Research Areas include:
1) Advanced Reasoning for Question Answering
2) Sharable Knowledge Sources
3) Content Representation
4) Interactive Question Answering Sessions
5) Role of Context
6) Role of Knowledge
7) Deep, Human Language Processing and Understanding
AQUAINT Kickoff – 3 December 2001
58. AQUAINT:
Intermediate Goals
Increasing Complexity Levels of Questions & Answers
Level 1 Level 2 Level 3 Level 4
”Simple "Template & “Cross Media & ”Context-Based
Factual QA’s" Multi-valued QA’s” Cross Document QA’s" QA Scenarios”
Current Near Term Mid Term Long Term
AQUAINT Kickoff – 3 December 2001
59. AQUAINT:
Separate, Coordinated Activities
Annotated and ‘Ground Truthed’ Data
Component Level / End-to-End Testing & Evaluation
QUESTION Separate
???? Question Information Coordinated
Under- Retrieval
standing Process Activities
and Inter-
pretation
FINAL
ANSWER
AQUAINT
Analysis & Phase I
Synthesis
Answer Process Solicitation
Formulation
Determine
the Answer
Cross Cutting/Enabling Technologies Research Issues
Component Integration and System Architecture Issues
AQUAINT Kickoff – 3 December 2001
60. AQUAINT:
User Testbed / System Integration
• Pull together best available system components
emerging from AQUAINT Program research efforts
– Couple AQUAINT components with existing GOTS and COTS software
• Develop end-to-end AQUAINT prototype(s) aimed at
specific Operational QA environments
• Government-led effort:
– Directly Linked into Sponsoring Agency’s Technology Insertion
Organizations
– Close, working relationship with working Analysts
– Provide external system development support
– Mitre/Bedford will lead External System Integration / Testbed efforts
– Plan to also utilize additional external researchers as Consultants /
Advisors
AQUAINT Kickoff – 3 December 2001
61. AQUAINT:
Data & Evaluation Issues
• Data
– Start by Using Existing Data Collections
• NIST’s TREC Text Corpora
• Linguistic Data Consortium (LDC) Human Language Corpora (e.g.
TDT, Switchboard, Call Home, Call Friend Corpora)
• Existing Knowledge Bases and Other Structured Databases
– Future Data Collection & Annotation and Question/Answer Key
Development will be a major effort
– Will likely use combined efforts of NIST and LDC
• Evaluation
– Build upon highly successful TREC Q&A Track Evaluations --
NIST has lead and is currently developing a Phased Evaluation
Plan tied to AQUAINT Program Plans
– Cooperate to maximum extent possible with DARPA’s RKF
(Rapid Knowledge Formation) Program Evaluation Efforts
AQUAINT Kickoff – 3 December 2001
62. AQUAINT R&D Program
Workshops
• When:
Mon-Wed 3-5 December 2001
• Where:
Xerox Training & Conference
Facility, Leesburg, VA
• Mid-Year Workshops:
Progress Reviews; Primarily for
Program Participants
• Annual Workshops:
Major Workshop; Wider Audience;
Evaluation & Testbed Results
• Future Phase I Workshops
May/June 2002 West Coast Site
Dec 2002 Washington DC Area
May/June 2003 West Coast Site
Dec 2003 Washington DC Area
AQUAINT Kickoff – 3 December 2001
63. Reaching out to scientists
across the country…
Northeast Regional
Research Center
Hosted by MITRE
Corporation
Bedford, MA
Western Regional Information
Science Center
Hosted by Pacific Northwest
National Laboratory
Richland, WA
…bringing their
solutions home
AQUAINT Kickoff – 3 December 2001
64. Regional Research Centers
• Draw talent from national labs, academia, and
industry located in the region (Western or
Northeastern)
• Principle of organization is to attract highly
knowledgeable talent for short periods (weeks,
months) to focus on well-defined research problems
• Provide both real and virtual regional centers for
technical collaboration in solving Information
Technology problems of interest to the Intelligence
Community
Help from outside the fence
AQUAINT Kickoff – 3 December 2001
65. Northeast Regional Research Center
Hosted By MITRE, Bedford, MA
Administered by CIA
• Conduct a 6-8 week workshop on
an AQUAINT-related challenge in
Summer 2002
• 4-7 Sep 2001: Planning Workshop held at MITRE.
– Attended by Government Technical Leaders, MITRE, and invited
set of industrial, FFRDC and Academic researchers in the field
– Four Potential Challenge Problems identified; Formal Proposals
being developed for each Challenge Problem
• 16 Nov 2001: Best and final proposal submitted
• 5 Dec 2001: Final Selection made
AQUAINT Kickoff – 3 December 2001
66. Proposed NRRC Wkshp Challenge Problems
1. Temporal Issues
– Generate Sequence of events and activities along evolving
timeline, resolving multiple levels of time references across
series of documents/sources.
– Proposer: James Pustejovsky, Brandeis University
2. Re-Use of Accumulated Knowledge
– Investigate strategies for structuring and maintaining
previously generated knowledge for possible future use.
E.g. previous knowledge might include questions and
answers (original and amplified) as well as relevant and
background information retrieved and processed.
– Proposer: Marc Light, MITRE and Abraham Ittycheriah, IBM
AQUAINT Kickoff – 3 December 2001
67. Proposed NRRC Wkshp Challenge Problems
3. Multiple Perspectives
– Develop approaches for handling situations where
relevant information is obtained from multiple sources on
the same topic but generated from different perspectives
(e.g. cultural or political differences).
– Proposer: Jan Wiebe, University of Pittsburgh
4. Habitability
– How can a Question Answering system efficiently and
effectively inform a user what it can do and fail gracefully
when the question is beyond the reasonable capabilities
of the system.
– Proposers: Joe Marks, Mitsubishi Electric Research Lab
and Christy Doran , MITRE
AQUAINT Kickoff – 3 December 2001
68. Outline
• Information Exploitation Thrust
• AQUAINT Program
– The Vision
– The Challenges
– The Plan of Attack
– The AQUAINT Team
• Intelligence Community Perspective on
Information Exploitation and AQUAINT
• Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
69. ARDA’s AQUAINT Partners
Program
Committee
Active
External
Active
Stakeholders
External
Stakeholders
AQUAINT Kickoff – 3 December 2001
70. Supporting Roles
Evaluation
User Testbed
Data /
Operational Scenarios
TBD ??
Other Support
AQUAINT Kickoff – 3 December 2001
71. AQUAINT Phase I Projects (Fall 01 - Fall 03)
Total End-to-End Systems (6)
AQUAINT Kickoff – 3 December 2001
72. Answering Questions through
Understanding and Analysis (AQUA)
BBN Technologies
Objectives
• Develop Comprehensive system
• Use statistical language models,
knowledge sources, and formal
reasoning
• Develop proposition recognition
algorithm
• Interpretation by Entity relationship
model
PLAN
• Apply Cross Document Entity Detection and Tracking (CEDT) algorithm to QA
• Questions will be interpreted in context.
• Related QA sessions of others in workgroup will be brought to user’s attention
• Answers will be drawn from across documents and sources
Principal Investigators: Ralph Weischedel / Scott Miller Topic Area: Total System
ARDA Contracting Agent: NSA Data Dimension: Focused (Text)
AQUAINT Kickoff – 3 December 2001
73. JAVELIN: Justification-based Answer
Valuation through Language Interpretation
Carnegie Mellon Univ. (Language Technologies Institute)
OBJECTIVES
• QA as planning by
developing a glass box
planning infrastructure
• Universal auditability by
developing a detailed set of
labeled dependencies that
form a traceable network of
reasoning steps
• Utility-based information
fusion
PLAN
Address the full Q/A task:
• Question analysis - question typing, interpretation, refinement, clarification
• Information seeking - document retrieval, entity and relation extraction
• Multi-source information fusion - multi-faceted answers, redundancy and contradiction detection
Principal Investigator: Eric Nyberg Topic Area: Total System
Co-PIs: Jamie Callan, Jaime Carbonell Data Dimension: Multi-Lingual (Text)
AQUAINT Kickoff – 3 December 2001 DIA
ARDA Contracting Agent: (English, Chinese, Japanese)
74. Integrating Robust Semantics, Event Detection,
Information Fusion, and Summarization for
Multimedia Question Answering
Columbia Univ. / Univ. of Colorado-Boulder
OBJECTIVES
• Use statistical semantic parser to
produce a shallow, domain independent
semantic representation
• Develop a dialogue interface for
carrying on focused dialogue with users
• Adapt algorithms/components to
handle spoken questions
• Recognize “atomic” events and then
tracking related information
• Integrate summarization and language
generation to produce brief, coherent,
fluent answers.
PLAN
Build an integrated system to:
• Answer difficult questions that require interacting with the user to refine context
• Locate conflicting or time-varying answers in heterogeneous text databases
• Present answers that require combining/summarizing information from multiple sources
Principal Investigator: Vasileios Hatzivassiloglou, Kathleen Topic Area: Total System
McKeown / Daniel Jurafsky, Wayne Ward, Jim Martin Data Dimension: Multi-Media
ARDA Contracting Agent: DIA (Text/Voice)