SlideShare une entreprise Scribd logo
1  sur  123
Télécharger pour lire hors ligne
AQUAINT R&D Program
Advanced QUestion Answering for INTelligence




             Dr. John D. Prange
          AQUAINT Program Director
                JPrange@nsa.gov
                   301-688-7092
              http://www.ic-arda.org
                 3 December 2001
Outline
   •  Information Exploitation Thrust
   •  AQUAINT Program
        –  The Vision
        –  The Challenges
        –  The Plan of Attack
        –  The AQUAINT Team

   •  Intelligence Community Perspective on
      Information Exploitation and AQUAINT

   •  Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
Information Exploitation (Info-X)
                          What Functions Does It Include?

                                    Information             Analytic
                                      Retrieval            Knowledge

                     Content Data                                          Assessment
                                       Presentation and                        and
                       Mark-up                                            Interpretation
                                         Visualization


  Data Filtering                                                                           Reporting and
   & Selection                                                                             Dissemination


                    Content Data                                            Synthesis
                   Transformation                                          and Fusion

                                    Information            Information
                                     Discovery            Understanding


           Info-X is Focused on Content & Its Meaning!
AQUAINT Kickoff – 3 December 2001
We Need To Dramatically Improve Our Ability
          to Find & Understand Information
                                                                                                                                  Report
With Each Passing Day . . .                                                                                                         Report
                                                                                                                               ……………………..

  •  More “Hay”                   “Barriers” to Deep Understanding of Content                                                    ……………………..
                                                                                                                               ……………………..
                                                                                                                                 ……………………..
                                                                                                                               ……………………..
  •  Lower No. Of “Needles
     per Volume of Hay”
                                          Analysis: Turning Raw Data                                                           ……………………..
                                                                                                                                 ……………………..
                                                                                                                                 ……………………..
                                                                                                                               ……………………..

  •  Fewer Analysts                       intoofReportable Intelligence
                                           Lack                    Variable Limited
                                                                                                                                  ……………………..

     AND                       Multiple                                 Knowledge
                                                      Control on                              Topics              Reasoning
  •  Less Time!
                               Sources                                Representation
                                                       Creation Information          Analytic Domains
                                                                                         &
                                                                                                                        Intelligence
                                                                                                                  Capabilities
                                                                    Retrieval            Knowledge                          Community
                                         Multiple &                           Many Foreign                  Goal /
                                          Multi- Data
                                             Content    Data Integrity/
                                                                               Languages/
                                                                                                         Assessment
                                                                                                             and
                                                                                                        Objective of
                                                                                                                             Products
                                                Markup Use of Deception and
                                                              Presentation                              Interpretation
                                          Media                 VisualizationCharacter Scripts           Originator
                                Data Filtering                                                                         Reporting and
                               Natural
                                 & Selection                             Missing,           Types, Sources,           Degree of
                                                                                                                      Dissemination
                                                    Image/Video
                             (vs. Artificial)                           Conflicting,          Quantities         Interpretation
                                                   Understanding
                              Language                                 Ambiguous Data         of Errors          & Judgement
                                                 Content Data                                             Synthesis
                                         Importance
                                            Transformation        Depth of           Cross              Importance
                                                                                                         and Fusion
                                           of Time              Understanding      Document                   of
                   Raw Data              Dimension                Required          Analysis
                                                                                     Information          Context
                                                                 Information
                   “Finding the                                    Discovery            Understanding
                    Needles in                         Role of
                                                                          Formal vs.            Automated          Lack of
                  the Haystack”                                            Informal            Information        Automated
                                                      Knowledge
                                                 Clearly . . .an Analyst Intensive Activity
                                                 It Remains              Conversation           Extraction         Learning

                                                                We MUST Reduce these “Barriers” &
                                                                Create “Cracks in this Wall”!
AQUAINT Kickoffand December 2001
 So much hay – 3 so little time!                                                                             But How . . .                   4
We Need To Dramatically Improve Our Ability
          to Find & Understand Information
                                                                                                                                Report
                                                                                                                                  Report
                                                                                                                             ……………………..
                                                                                                                               ……………………..
                                                                                                                             ……………………..
                                                                                                                               ……………………..
                                                                                                                             ……………………..

                                                      Analysis: Turning Raw Data                                             ……………………..
                                                                                                                               ……………………..
                                                                                                                               ……………………..
                                                                                                                             ……………………..

                                                      into Reportable Intelligence                                              ……………………..




                                                                Information             Analytic
                                                                                                                          Intelligence
                                                                  Retrieval            Knowledge                          Community
                                                 Content Data
                                                                                                       Assessment
                                                                                                           and             Products
                                                   Markup          Presentation and                   Interpretation
                                                                     Visualization

                              Data Filtering                                                                         Reporting and
                               & Selection                                                                          Dissemination



                                                Content Data                                            Synthesis
                                               Transformation                                          and Fusion
                  Raw Data                                                             Information
                                                                Information
                   “Finding the                                  Discovery            Understanding
                    Needles in
                  the Haystack”
                                                 It Remains an Analyst Intensive Activity


AQUAINT Kickoff – 3 December 2001                                                                                                          5
We Need To Dramatically Improve Our Ability
          to Find & Understand Information
                                                                                                                                  Report
With Each Passing Day . . .                                                                                                         Report
                                                                                                                               ……………………..

  •  More “Hay”                   “Barriers” to Deep Understanding of Content                                                    ……………………..
                                                                                                                               ……………………..
                                                                                                                                 ……………………..
                                                                                                                               ……………………..
  •  Lower No. Of “Needles
     per Volume of Hay”
                                          Analysis: Turning Raw Data                                                           ……………………..
                                                                                                                                 ……………………..
                                                                                                                                 ……………………..
                                                                                                                               ……………………..

  •  Fewer Analysts                       intoofReportable Intelligence
                                           Lack                    Variable Limited
                                                                                                                                  ……………………..

     AND                       Multiple                                 Knowledge
                                                      Control on                              Topics              Reasoning
                               Sources                                Representation
  •  Less Time!                                        Creation Information          Analytic Domains
                                                                                         &                        Capabilities
                                                                    Retrieval            Knowledge
                                         Multiple &                           Many Foreign                  Goal /
                                                                                                         Assessment
                                                        Data Integrity/
                                          Multi- Data
                                             Content
                                                                               Languages/                    and
                                                                                                        Objective of
                                                Markup Use of Deception and
                                                              Presentation                              Interpretation
                                          Media                 VisualizationCharacter Scripts           Originator
                                Data Filtering                                                                         Reporting and
                               Natural
                                 & Selection                             Missing,           Types, Sources,           Degree of
                                                                                                                      Dissemination
                                                    Image/Video
                             (vs. Artificial)                           Conflicting,          Quantities         Interpretation
                                                   Understanding
                              Language                                 Ambiguous Data         of Errors          & Judgement
                                                 Content Data                                             Synthesis
                                         Importance
                                            Transformation        Depth of           Cross              Importance
                                                                                                         and Fusion
                                           of Time              Understanding      Document                   of
                   Raw Data              Dimension                Required          Analysis
                                                                                     Information          Context
                                                                 Information
                   “Finding the                                    Discovery            Understanding
                    Needles in                         Role of
                                                                          Formal vs.            Automated          Lack of
                  the Haystack”                                            Informal            Information        Automated
                                                      Knowledge
                                                 It Remains an Analyst Intensive Activity
                                                                         Conversation           Extraction         Learning




AQUAINT Kickoff – 3 December 2001                                                                                                            6
We Need To Dramatically Improve Our Ability
          to Find & Understand Information
                                                                                                                                  Report
With Each Passing Day . . .                                                                                                         Report
                                                                                                                               ……………………..

  •  More “Hay”                   “Barriers” to Deep Understanding of Content                                                    ……………………..
                                                                                                                               ……………………..
                                                                                                                                 ……………………..
                                                                                                                               ……………………..
  •  Lower No. Of “Needles
     per Volume of Hay”
                                          Analysis: Turning Raw Data                                                           ……………………..
                                                                                                                                 ……………………..
                                                                                                                                 ……………………..
                                                                                                                               ……………………..

  •  Fewer Analysts                       intoofReportable Intelligence
                                           Lack                    Variable Limited
                                                                                                                                  ……………………..

     AND                       Multiple                                 Knowledge
                                                      Control on                              Topics              Reasoning
                               Sources                                Representation
  •  Less Time!                                        Creation Information          Analytic Domains
                                                                                         &                        Capabilities
                                                                    Retrieval            Knowledge
                                         Multiple &                           Many Foreign                  Goal /
                                                                                                         Assessment
                                                        Data Integrity/
                                          Multi- Data
                                             Content
                                                                               Languages/                    and
                                                                                                        Objective of
                                                Markup Use of Deception and
                                                              Presentation                              Interpretation
                                          Media                 VisualizationCharacter Scripts           Originator
                                Data Filtering                                                                         Reporting and
                               Natural
                                 & Selection                             Missing,           Types, Sources,           Degree of
                                                                                                                      Dissemination
                                                    Image/Video
                             (vs. Artificial)                           Conflicting,          Quantities         Interpretation
                                                   Understanding
                              Language                                 Ambiguous Data         of Errors          & Judgement
                                                 Content Data                                             Synthesis
                                         Importance
                                            Transformation        Depth of           Cross              Importance
                                                                                                         and Fusion
                                           of Time              Understanding      Document                   of
                   Raw Data              Dimension                Required          Analysis
                                                                                     Information          Context
                                                                 Information
                   “Finding the                                    Discovery            Understanding
                    Needles in                         Role of
                                                                          Formal vs.            Automated          Lack of
                  the Haystack”                                            Informal            Information        Automated
                                                      Knowledge
                                                 Clearly . . .an Analyst Intensive Activity
                                                 It Remains              Conversation           Extraction         Learning

                                                                We MUST Reduce these “Barriers” &
                                                                Create “Cracks in this Wall”!
AQUAINT Kickoffand December 2001
 So much hay – 3 so little time!                                                                             But How . . .                   7
Info-X R&D Programs:
                                    The Ideal Build Process

                                          ARDA Thrust:
                                     Information Exploitation
                                                                End-to-end
       Customer
                                      Operational Problems      Operational
        Needs
                                                                   Tests

                                     Operational Capabilities   Customer’s
                                                                  Data
                                         Technical Needs
    Research                                                      R&D
    Response                                                    Component
                                        Research Projects         Level
                                                                 Testing
AQUAINT Kickoff – 3 December 2001
Current Info-X R&D Programs

 •  AQUAINT
      Advanced QUestion & Answering for INTelligence
                                                                   Full R&D
 •  VACE                                                          Programs
      Video Analysis and Content Extraction                      consisting of
                                                                    Three
                                                                2-Year Phases
 •  GI2Vis
      Geospatial Intelligence Information Visualization



 •  LEMUR                                                         Exploratory
      Statistical Language Modeling for Information Retrieval   R&D Programs
                                                                 consisting of
                                                                   Programs
 •  NDHB                                                            1-Year
                                                                + Option Year
      Non-Linear Dynamics from Human Behavior

AQUAINT Kickoff – 3 December 2001
Outline
   •  Information Exploitation Thrust
   •  AQUAINT Program
        –  The Vision
        –  The Challenges
        –  The Plan of Attack
        –  The AQUAINT Team

   •  Intelligence Community Perspective on
      Information Exploitation and AQUAINT

   •  Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
“Some look at things
                                    that are and ask why.
                                    I dream of things that
                                    might be and ask why
                                    not.”

                                            Robert Kennedy
                                               1925-1968



AQUAINT Kickoff – 3 December 2001
Traditional Information
                            Retrieval (IR) Approach

                    Question ?



                         System Specific
                              Query
                      e.g. Boolean Key Word
                             Equation
                                                                                                                             Data      Traditional
                                                                                                      Ranked List of
                                                                                                                            Source
                                                                                                    Hopefully “Relevant”
                                                                                                                           e.g Large
                                                                                                                                       Information
                              ..........
                              . .. .. .. .. .. .. .. .. .. .                                            Documents             Text     Retrieval
                              . .. ... ........................ .. .
                              . .. ... ... .. .. .. .. .. .. .. .. .
                              . .. ... ... ... .... ..... ..... .. .. .. .
                              . .. ... ... ..... .... .... ............. .. .                                               Archive
                              . .. ... .. . .... .... .. .. ... .. .. .. .. .. .. .
                                                               .
                              . .. ... ... ... ............................................. .. .
                              . .. ... ... ..... .... . ... .. .. .. .. .. ...... .. .
                              . .. ... ... ......... ............................... .. .
                              . .. ... ... ... ...................................... .. .
                              . .. ... ... ...... .... ..... .. .. .. .. .. ...... .. .
                                 . .. ... ... .............. .. .. .. .. .. .. .. .. .
                                     . .. .. .. .. .. .. .. ... ... ... ... ... ... .. . . .
                                          . .. ... ... ...... .. .. .. .. .. .. .. .
                                              . .. ... ............................ .. .
                                                   . .. ............................ .. .
                                                       . .. ........................ .. .
                                                            . .. .. .. .. .. .. .. .. .. .
                                                                 ..........




AQUAINT Kickoff – 3 December 2001
Next Generation Approaches:
                       Question Answering (QA) Systems
                                           Single, Factoid
  Move Closer                                Question ?
to the Question
 e.g. Question
 Classification                             System Specific
                                           Query; often Tailored
                                            to Question Type
                                                                                                                                     Ranked List of       Single   Traditional
                                               . .. .. .. .. .. .. .. .. .. .
                                                                                                                                   Hopefully “Relevant”    Data    Information
                  QA                           ............
                                               . . .. .. .. .. .. .. .. .. .. . .
                                               . .. .. ... ... .................... ... ... .. .. .
                                               . . .. .. ... .................. .. .. . . .
                                                                                                                                       Documents          Source   Retrieval
                                               . . .. ... ... ... ...................... .. .. .. . .
                                Shallow        . .. .. ... ... ............................... ... ... .. .. . .
                                               . . .. .. .. .................................. ... ... .. .. .
                                               . .. .. ... ... ... ........................... ... ... .. .. . .
                                Analysis       . .. ... ... .... ......................................... .... .... ... .. .. .
                                               . . .. .. ... .... .... ........................ ... ... .. . .
                                               . .. .. ... ... ........................... ... ... ... .. .. . .
                                                      . .. .. ... .... ..... .............................. ..... .... ... .. ..
                                                               . .. ... .... ..... ..... ..... ..... ..... ..... .... ... .. .
                                                                                                                            .
                                                                          . .. ... ... ... ... ... ... ... ... .. .            .
                                                                                       ..........
 Move Closer
to the Answer
 e.g. Passage
   Retrieval



                                                    “Answer”
AQUAINT Kickoff – 3 December 2001
TREC QA Track Approach
•  ARDA & DARPA co-sponsoring the Question Answering Track in
   the NIST’s organized Text Retrieval Conference (TREC) Program.
   (Starting with TREC-8 in Nov 1999)
•  TREC-10 Results (Nov 2001):
     –  500- factual questions; About 50
        questions had no answer in the              Top System: 70% of the
        TREC-10 Data sources; Used                  “Answers” found in their
        “Real” Questions                             top 5 50-byte Passages
     –  Data source: approx. 3 GByte
        database of ~980K news
        stories
     –  36 US & international
        organizations participated;
        92 separate runs evaluated
     –  System output: top 5 regions
        (50 bytes) in a single story
        believed to contain Answer
        to the given question
AQUAINT Kickoff – 3 December 2001
Pilot Evaluations
                                    TREC 10 QA Track

   •  The “List Task”
       –  Sample Questions:
              •  “Name 4 US cities that have a “Shubert” Theater”
              •  “Name 30 individuals who served as a cabinet officer under Ronald
                 Reagan”
        –  Evaluation Metric: (Number of distinct instances divided by the
            target number of instances averaged over 25 questions)
              •  Top System among 18 runs: Achieved 76% Accuracy

   •  The “Context Task”
        –  Sample Series of Questions:
              •  “How many species of spiders are there?”
              •  “How many are poisonous to humans?”
              •  “What percentage of spider bites in the US are fatal?”
        –  Evaluation Metric: Same as Main Task; 10 Series of Questions; 42
           total Questions)
              •  Top System: Found answer for 34 of the 42 total questions (81%)

AQUAINT Kickoff – 3 December 2001
AQUAINT
                  Advanced QUestion & Answering for INTelligence

     In a foreign news broadcast a team of analysts observe a previously
  unknown individual conferring with the Foreign Minister. They suspect that
                     he/she is really a new senior advisor.

                                         What influence
                                                           Does this signal
                             What are     does he/she
                                                             that other
                              his/her     have on FM?
                                                           policy changes
                              views?
                                                            are coming?
             What do we
             know about
              him/her?

            Who is this                                        And still more
             advisor?                                          questions ???


                                                           Overarching Context /
                                                          Operational Requirement
AQUAINT Kickoff – 3 December 2001
AQUAINT
                     Advanced QUestion & Answering for INTelligence

                                            Judgement    Predictive
                             Interpretive   Questions?   Questions?                                                                                                              Overarching Context /
 Interpreting                Questions?
                          Why
                                                                                                                                                                                Operational Requirement
   Complex
                      Questions
 QA Scenario               ?                                     Other
   within a         Factoid                                    Questions?
Larger Context     Questions
                       ?




   Deeper                                                                                                                                                                Ranked                       Extend
                                            Extract &
                                                                                                                                                                                                      Traditional
 Automated                                  Analyze                     ..........
                                                                                                                                                                          Lists of
                                                            . .. .. .. ... ... ... ... ... ... ... .. .. .. . .                                                                                       Information
Understanding                                Results        . . . . .. ... ... ... ... ... .. ... ... ... .. . . .
                                                            . . .. .. .... .... .... .... .... .... .... .... .... .... .... ... .. . .
                                                                                                                                                                        “Relevant”                    Retrieval
                                                            . .. .. ... ... ........................................... ... ... .. . .
                                                            . . .. .. .... ................................... ... ... ... .. . .                                      Data Objects      Multiple
                                                            . . .. ... .... .... ......................................... ... .. .. .. .
                                                            . .. .. ... ................................................................... ... .. .. .
                                                                                                      ..                                                                              Heterogeneous
                                                            . . .. .. .... ....................................................... .. .. .. .. . .
                                                            . .. .. ... ... ... ..................................................... ... . . . .
                  Advanced                                  . . .. .. .................................................................................. ... .. .. .
                                                            . . .. ... .... ... ................................................ .. .... .. .. . .
                                                            . .. .. ... ... .................................................... ... .. . . .
                                                                                                      .....
                                                                  . .. .. ... .... ................................................................ ... ... .. .
                                                                                                             . .. .. .. .. .. .. .. . . .
                                                                                                                                                                                           Data
                                                                                                                                                                                         Sources
                                                                          . .. ... .... ..... ..... ..... ........... ..... ..... ..... ... ... ... .. . .
Provide Answers   QA                                                                                          .. . .
                                                                                     . .. ... ... ... ... ... ... .... ... ... ... .. .. .. .
                                                                                                   . . . . . . . . .. .
                                                                                                                                             . .                   .

   in a Form
                            Interpret Results
 Analysts Want
                        & Formulate the Answers
                                                                                                                                                             Answers


 AQUAINT Kickoff – 3 December 2001
AQUAINT Is Skipping
                                           Ahead Two Generations

                                                            Multiple Key
                                                            Barriers to
                                                            Content
                                                            Understanding
                                                            Will Be
                                                            Aggressively
                                                            Attacked




Commercial World & Current R&D Efforts
Are Addressing the Next Generation
But Only Selected Content Understanding
Barriers Are Being Aggressively Attacked
Outline
   •  Information Exploitation Thrust
   •  AQUAINT Program
        –  The Vision
        –  The Challenges
        –  The Plan of Attack
        –  The AQUAINT Team

   •  Intelligence Community Perspective on
      Information Exploitation and AQUAINT

   •  Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
Top 10 Challenges

     1) Satisfy QA requirements of the “Professional”
        Information Analyst
     2) Pursue QA Scenarios and not just isolated,
        factually based QA
     3) Support a collaborative, multiple analyst
        environment
     4) Some times SMALL things really matter and
        other times BIG things don’t
     5) Advanced QA must attack the “Data Chasm”
     6) Time is of the Essence
AQUAINT Kickoff – 3 December 2001
Top 10 Challenges

     7) Must extract, represent and preserve
        information uncovered when searching for
        answers
     8) Rapidly increasing importance of Knowledge of
        all types -- regardless of the approach
     9) Expanding requirements for more advanced
        learning and reasoning methods/approaches
     10) Discovering the correct answer will be hard
       enough; but crafting an appropriate, articulate,
       succinct, explainable response will be even harder

AQUAINT Kickoff – 3 December 2001
Top 10 Challenges

     1) Satisfy QA requirements of the “Professional”
        Information Analyst




AQUAINT Kickoff – 3 December 2001
Professional Information Analysts:
                Target Audience for AQUAINT -- Who are They?


  •  For ARDA and AQUAINT they are:
       –  Intelligence Community and Military Analysts

  •  But there are other Potential Target
     Audiences of “Professional Information
     Analysts”:
       –    Investigative / “CNN-type” Reporters
       –    Financial Industry Analysts / Investors
       –    Historians / Biographers
       –    Lawyers / Law Clerks
       –    Law Enforcement Detectives
       –    And Others


AQUAINT Kickoff – 3 December 2001
Intelligence Community Analysts –
                                    Who are they?




                                                    What Do We See
                                                    When We Focus
                                                    Directly In On Our
                                                    Intelligence
                                                    Analysts?



AQUAINT Kickoff – 3 December 2001
Some Observations about
                          Intelligence Analysts (IA’s)
          MAJOR DIFFERENCES DO EXIST AMONG IA’s
     •  First: There are different levels of intelligence within
        the IC -- Strategic, Operational, Tactical --
           –  ARDA is focusing on Strategic Level IA’s

     •  Second: There is no stereotypical analyst even within
        our Strategic Level Intelligence Agencies.
           –  Clear, significant differences exist across the national IC agencies as
              well as across the different “INT’s”
           –  Additional, significant differences are accentuated by total breadth
              and variety of all IC reporting requirements.
           –  There are even significant differences between
              IA’s within the same IC agency

     •  Third: There are significant skill level
        differences among IA’s
           –  Yes, the most seniors IA’s are exceptional
           –  But the junior IA’s aren’t bad either
AQUAINT Kickoff – 3 December 2001
Some Observations about
                          Intelligence Analysts (IA’s)
                BUT UNIVERSAL SIMILARITIES CAN BE
                    IDENTIFIED ACROSS OUR IA’s
  •  We believe that these similarities are significant and
     strong enough that:
        –  Taken collectively they highlight key differences between
           Intelligence Analysts and the Emerging Casual Information
           Consumer that is being fueled by the Information Revolution and
           targeted by the commercial world
        –  A common set of critically important Info-X problems
           for the IC can be identified and articulated
        –  Multi-agency R&D programs against these
           common Info-X problems can be developed
           to the benefit of all IC Agencies



AQUAINT Kickoff – 3 December 2001
Universal Similarities Across IA’s
   1. IA’s are information professionals
   2. IA’s are almost always subject matter experts within their
       assigned task areas
   3. IA’s track and follow a given event, scenario, problem, situation
       for an extended period of time
   4. Increasingly IA’s are performing all source analysis and
       production
   5. IA’s typically work with overwhelming volumes of data and
       information, but that’s the good news
   6. Increasingly IA’s must collaborate with other IA’s
   7. IA’s are focused on their Mission and will do whatever it takes
       to accomplish it
   8. The Intelligence that IA’s produce is judged against the highest
       standards (called the “Tenets of Intelligence”)
              - Timeliness          - Accuracy    - Usability
              - Completeness        - Relevance
AQUAINT Kickoff – 3 December 2001
Universal Similarities Across IA’s

  1. IA’s are information professionals --
      That is, IA’s are not casual developers and consumers of information

  2. IA’s are almost always subject matter experts within
     their assigned task areas --
      That is, IA’s have broad and deep knowledge of their subject area
      and possess profound skills developed over 10’s of years of
      experience

  3. IA’s track and follow a given event, scenario,
     problem, situation for an extended period of time --
      That is, IA’s frequently have developed extensive working files
      related to their investigation; IA’s information needs and queries carry
      within them an extensive, non-expressed context and background

AQUAINT Kickoff – 3 December 2001
Universal Similarities Across IA’s

  4. Increasingly IA’s are performing all source analysis
     and production --
      For example, the language analyst must use intercept from multiple
      media, multiple languages and the imagery analyst must know how
      to combine information from multiple INT’s.

  5. IA’s typically work with overwhelming volumes of
     data and information, but that’s the good news --
      Raw data on which the IA developed information is based is often
      “dirty”, “errorful”, “contradictory or conflicting”, “of questionable or
      unknown validity”, “incomplete or missing”, “time sensitive”, “highly
      fragmented”, etc.

  6. Increasingly IA’s must collaborate with other IA’s --
      These IA’s may be working in different organizations, different
      agencies and they might not even know that each other would benefit
      from collaboration.
AQUAINT Kickoff – 3 December 2001
Universal Similarities Across IA’s

  7. IA’s are focused on their Mission and will do
     whatever it takes to accomplish it --
       That is, IA’s are highly adaptable and resourceful. They will develop
       workable strategies and attacks regardless of the roadblocks that our
       collection and processing “stovepipes” create and of the limitations
       that our “brain dead” analytic tools offer.

  8. The Intelligence that IA’s produce is judged against
     the highest standards (called the “Tenets of
     Intelligence”) --
        –    Timeliness
        –    Accuracy
        –    Usability
        –    Completeness
        –    Relevance
AQUAINT Kickoff – 3 December 2001
Top 10 Challenges
     1) Satisfy QA requirements of the “Professional”
        Information Analyst
     2) Pursue QA Scenarios and not just isolated,
        factually based QA




AQUAINT Kickoff – 3 December 2001
Implications of QA Scenarios
•  Requires handling a Full Range of Complexity & Continuity of
   Questions
•  Need to understand & track the analysts’ line of reasoning and
   flow of argument
•  QA System requires significantly greater insight into
   knowledge, desires, past experiences, likes and dislikes of
   “Questioner”
                                                           Judgement   Predictive
                                                           Questions
•  Place much higher value on               Interpretive
                                           Questions?          ?
                                                                       Questions
                                                                           ?
   recognizing and capturing             Why
                                       Questions
   “background” information               ?                                 Other
                                                                          Questions?
                                     Factoid
•  Questioner/System dialogue       Question?

   is now more than just a                                         Overarching Context /
   means for clarification                                        Operational Requirement

AQUAINT Kickoff – 3 December 2001
Top 10 Challenges

     1) Satisfy QA requirements of the “Professional”
        Information Analyst
     2) Pursue QA Scenarios and not just isolated,
        factually based QA
     3) Support a collaborative, multiple analyst
        environment




AQUAINT Kickoff – 3 December 2001
Collaboration within QA
  •  Standard Collaboration                                                 •  Non-Standard Discovery
      (From an Analyst Perspective)                                           (From a System Perspective)
       –  Who else is working all or a                                        –  Identify previous QA
          portion of my task?                                                    Scenarios that have
                                                                                 “similarity” to current QA
       –  What do they know that I                                               Scenario. Compare &
          don’t and vice versa?                                                  Contrast
       –  Can we share/work together?                                         –  Use / Build-on / Update
                                                                                 previous results
                                                          Knowledge
                                         Other Analysts
                                                          Bases;Technical     –  Uncover new data sources
             Question & Requirement
                                                          Databases
QUESTION
           Context; Analyst Background
                                                                              –  Borrow a successful “line
                     Knowledge
  ????                                      Query                                of reasoning” or
                                          Assessment,
           Natural Statement of
           Question;                       Advisor,                              “argument flow”
                          Use of         Collaboration         Focus
           Multimedia Examples                                                –  Alerts analyst to different
                                         Question
                 Clarification
                                         Understanding                           interpretations or to
                                         and Interpretation
                                                                                 overlooked / undervalued
 AQUAINT Kickoff – 3 December 2001                                               data
Top 10 Challenges

     1) Satisfy QA requirements of the “Professional”
        Information Analyst
     2) Pursue QA Scenarios and not just isolated,
        factually based QA
     3) Support a collaborative, multiple analyst
        environment
     4) Some times SMALL things really matter and
        other times BIG things don’t



AQUAINT Kickoff – 3 December 2001
“Small & Big” - Can we tell the difference?

  •  Some times SMALL differences can produce
     significantly different results/interpretations:
        –  Stop Words
             •  “Books {by; for; about} kids”
        –  Attachments
             •  “The man saw the woman in the park with the telescope.”
        –  Co-reference
             •  “John {persuaded; promised} Bill to go. He just left.”
             •  “Mary took the pill from the bottle. She swallowed it.”

  •  Other times BIG differences can produce the same/
     similar results:
        –  “Name the films in which Richard Harris starred.”
        –  “Richard Harris played a leading role in which movies?”
        –  “In what Hollywood productions did Richard Harris receive top
           billing?”

AQUAINT Kickoff – 3 December 2001
Top 10 Challenges

     1) Satisfy QA requirements of the “Professional”
        Information Analyst
     2) Pursue QA Scenarios and not just isolated,
        factually based QA
     3) Support a collaborative, multiple analyst
        environment
     4) Some times SMALL things really matter and
        other times BIG things don’t
     5) Advanced QA must attack the “Data Chasm”

AQUAINT Kickoff – 3 December 2001
Attacking the Data Chasm
              Today                      Level I                     Level II                     Future
                                                                                                 Level III

                                      Mulit-Valued
   Questions                        Factual Questions

               Single                                              Cross Media                     Full
              Factual                                            Cross Document               Context-Based
              Isolated                                          Simple Judgement                Question
             Questions                                                                          Scenario

Data Chasm
                                                                  Increasing         MANY Heterogeneous
  Missing       Reliability   Contradictory Synthesis Across
                                                                   Volumes               Data Sources;
   Data          of Data          Data      “Documents”/Media
                                                                (Petabyte & up)   All Types, Sizes, Locations


   Answers                                                      Variable Narrative
                                                                                             Fully Intersected;
                                                                                              Automatically
                                                                    Summary;
          50/250 Byte                                                                           Generated;
                                     Fixed Templates               Multi-Media
         Passage from                                                                             Variable
                                            or                    Presentations;
          Single Text                                                                            Structure/
                                       Tabular Lists            Simple Interpreted
           Document                                                                               Format;
                                                                     Results
                                                                                               Full Context
    AQUAINT Kickoff – 3 December 2001                                                           Responses
AQUAINT:
                                            Data Types

    Structured / Semi-Structured                Unstructured
                                                                                   Technical /
                        “Tagged Data”                                               Abstract
                                                      Visual
       KB’s      DB’s   (e.g. Web Data)                Data
                                                                          Sensor           Geospatial

                                              Video       Still Images
                             Human                                          Economic     Other
                            Language

      Media                 Language           Genre

                                                      Newswire /
          Text                 English                News Broadcast
          Documents            Foreign
                               Language 1             Technical

          Speech               Foreign                Formal / Informal
                               Language 2             Communication
          Multi-Media          Foreign
                               Language N             Other

AQUAINT Kickoff – 3 December 2001
AQUAINT:
                                            Data Types

    Structured / Semi-Structured                Unstructured
                                                                                    Technical /
                        “Tagged Data”                                                Abstract
                                                      Visual
       KB’s      DB’s   (e.g. Web Data)                Data
                                                                          Sensor            Geospatial

                                              Video       Still Images
                             Human                                          Economic      Other
                            Language
                                                                               DATA FOCUS OF
      Media                 Language           Genre                            RELATED QA
                                                                            PROGRAMS / ACTIVITIES
                                                      Newswire /
          Text                 English                                               Commercial
                                                      News Broadcast                “Ask Jeeves”
          Documents            Foreign                                             DARPA’s DAML
                               Language 1             Technical
                                                                                    DARPA’s RKF
          Speech               Foreign                Formal / Informal      DARPA’s TIDES & TDT
                               Language 2             Communication
                                                                                   TREC QA Track
          Multi-Media          Foreign
                                                      Other                        ARDA’s VACE
                               Language N
                                                                                   ARDA’s GI2Vis
AQUAINT Kickoff – 3 December 2001
AQUAINT:
                                    Phase I Data Dimensions

Data Dimension                        Requirement                         Example

1. Focused              Single media, Single language, and          English newspaper/
                        single genre in an unstructured data        newswire articles (text)
                        Source
2. Multiple Media      Two or more of the following: text (clean,   Question where the
                       degraded, and speech recognition             answer is summarization
                       produced), raw speech, still imagery,        of information found in
                       video data, abstract data (technical,        video clips & may contain
                       geospacial), and related media               a table of technical data
                                                                    extracted from various
                                                                    sources (geospacial, text,
                                                                    etc.)
3. Cross Lingual       English questions with foreign language      English question with
                       references and passages. Foreign             answer derived from
                       languages could be expressed using any       single media (newswire)
                       number of foreign character scripts and      material in Chinese or
                       encoding schemes.                            Arabic and other
                                                                    language.
AQUAINT Kickoff – 3 December 2001
AQUAINT:
                                    Phase I Data Dimensions

Data Dimension                        Requirement                         Example

4. Multiple Genre      Formal and informal correspondence           Question with answer
                       (various media), formal dialog, informal     derived from formal
                       conversations or discussions, technical/     correspondence and
                       journal articles, newswire/broadcast news;   journal articles
                       advertisements; product and technical
                       descriptions, government reports; public
                       databases


5. Structured &        Tables, charts and maps, diagrams, linked    Question with answer
   Unstructured        data or directed graph data, structured      derived from knowledge
                       databases, structured transactions; large    base and substantiated
                       knowledge bases; linked web/pages; and       with information from
                       html/xml documents PLUS unstructured         technical journal.
                       data from one of the media, lingual or
                       genre dimensions.


AQUAINT Kickoff – 3 December 2001
Top 10 Challenges

     1) Satisfy QA requirements of the “Professional”
        Information Analyst
     2) Pursue QA Scenarios and not just isolated,
        factually based QA
     3) Support a collaborative, multiple analyst
        environment
     4) Some times SMALL things really matter and
        other times BIG things don’t
     5) Advanced QA must attack the “Data Chasm”
     6) Time is of the Essence
AQUAINT Kickoff – 3 December 2001
Time: Our Achilles Heel?
•  Real Difficulties Exist in:
     –  Extracting, correctly interpreting time references
        & then creating manageable timelines
     –  Estimating & updating changing reliability
        of information over time
     –  Processing information in time sequence
        e.g. Tracking the details of an evolving event
        over time -- A whole different set of problems

•  And of course:
     –  We can’t forget all of the issues related to the
        timeliness of the system’s response to our
        question(s) -- we’ll need at least “near real
        time responses”


              March        April    May   June   July      August
AQUAINT Kickoff – 3 December 2001
Top 10 Challenges
     7) Must extract, represent and preserve
        information uncovered when searching for
        answers




AQUAINT Kickoff – 3 December 2001
QA Scenarios: A Different Paradigm?
 •  Current Analytic Paradigm:               •  A Different Paradigm may be
      –  Sequentially “Filter Down” to the      useful when handling QA
        final result                            Scenarios:
            Data                                 –  Cast a “wider net” while searching
                                                    for “golden nuggets” (Answers)
                                             How Wide to                           What Info to Retain?
                                             Cast the “Net”?                             In what form?
                                                                                        For how long?
                                               Background
                     Processing &
                       Analysis



                                              Answers                                         Discarded
                                                         Space of Data Objects and Sources
                       Results                   –  Automatically Extract, Represent,
                                                    and Preserve “closely related”
      –  Works when QA’s are
                                                    background information within
         independent, isolated activities
                                                    context of the QA Scenario
AQUAINT Kickoff – 3 December 2001
Top 10 Challenges
      7) Must extract, represent and preserve
         information uncovered when searching for
         answers

      8) Rapidly increasing importance of Knowledge of
         all types -- regardless of the approach




AQUAINT Kickoff – 3 December 2001
Complex QA:
            The Need for Ever Increasing Knowledge -- Of All Types


   DIMENSIONS OF THE QUESTION                                               DIMENSIONS OF THE ANSWER
    PART OF THE QA PROBLEM                                                  PART OF THE QA PROBLEM

                                         Scope                                                               Multiple
                                                                                                             Sources
                                               Advanced                        Simple                              Advanced
          Simple                               QA                              Answer,                             QA
          Factual
                 The image cannot be displayed.                                      The image cannot be displayed.
                 Your computer may not have
                                               R&D
                 enough memory to open the image,                               Single
                                                                                     Your computer may not have
                                                                                                                   R&D
                                                                                     enough memory to open the image,

          Question
                 or the image may have been                                          or the image may have been
                                               Program
                 corrupted. Restart your computer,
                 and then open the file again. If the
                 red x still appears, you may have to
                                                                               Source                              Program
                                                                                     corrupted. Restart your computer,
                                                                                     and then open the file again. If the
                                                                                     red x still appears, you may have to
                 delete the image and then insert it                                 delete the image and then insert it
                 again.                                                              again.




                                                                Judgement                                                        Interpretation
                                                                                                                            Increasing
                                                        Increasing
                                                        Knowledge                                                           Knowledge
      Context                                           Requirements **     Fusion                                          Requirements **




                ** Knowledge Requirement would be better represented with a
                        whole “quiver of arrows” of different sizes, lengths and types
AQUAINT Kickoff – 3 December 2001
Top 10 Challenges

     7) Must extract, represent and preserve information
        uncovered when searching for answers

     8) Rapidly increasing importance of Knowledge of all
        types -- regardless of the approach

     9) Expanding requirements for more advanced
        learning and reasoning methods/approaches




AQUAINT Kickoff – 3 December 2001
Improved Reasoning & Learning
     In a foreign news broadcast a team of analysts observe a previously
  unknown individual conferring with the Foreign Minister. They suspect that
                     he/she is really a new senior advisor.

          FOCUS                         What influence
                                                         Does this signal
                             What are    does he/she
                                                           that other
                              his/her    have on FM?
                                                         policy changes
                              views?
                                                          are coming?
             What do we
             know about
              him/her?

            Who is this                                      And still more
             advisor?                                        questions ???


                                                          Overarching Context /
                                                         Operational Requirement
AQUAINT Kickoff – 3 December 2001
Improved Reasoning & Learning
Advanced Reasoning:
                                Follow-up                                                            Follow-up
•  Use Multi-level Plans         Leads                                                                Leads
•  Create and evaluate
   chains of reasoning
•  Reason across hetero-                                                             Education
   geneous data sources                          TV & Radio
                                                 Broadcasts,                            Past
•  Infer answers from          Collected                                              Positions       Raw “Bio”
                                                 Newspapers                                          Information
   data extracted from          Views
                                                   & Other                              Family
   multiple sources when                          Archives          New Senior
   the answer is not                                                 Advisor             Travels
   explicitly stated                                       Cross Fertilization             Other
                                                                                        Activities
•  Utilize Link Analysis &
                                    Summarized
   Evidence Discovery
                                       Results                                                 Summarized
                                                          “Views:
•  Plus other strategies                               Past &               “Bio”            Results
                                                                           ………..….
                                                       Present” .….…       ……..…….
 Advanced Learning:                                                        ………..….
                                                                           ……..…….
                                                       ….…..               ………..….
 •  Automatically                                      .…….
                                                       .…….
                                                               ….…..
                                                               ….…..       ……..…….
    learn new or modify                                .…….    ….…..       …………...
                                                       .…….    ….…..
    existing reasoning
    strategies
AQUAINT Kickoff – 3 December 2001
Top 10 Challenges

     7) Must extract, represent and preserve information
        uncovered when searching for answers
     8) Rapidly increasing importance of Knowledge of all
        types -- regardless of the approach
     9) Expanding requirements for more advanced
        learning and reasoning methods/approaches
     10) Discovering the correct answer will be hard
       enough; but crafting an appropriate, articulate,
       succinct, explainable response will be even harder


AQUAINT Kickoff – 3 December 2001
Difficulties in Generating Answers
•  Natural Language Generation continues to be a difficult, open
   research area.
     –  Adding the requirement to generate multimedia answers makes this
        problem even harder.
•  Providing the ability to explain and/or justify answers also
   continues to be a difficult, open research area.
     –  The more complex the line or chain of reasoning, the more complex
        the explanation and/or justification
•  QA Scenarios and differences across analysts add additional levels
   of complexity. The Same Question asked within different scenarios
   by different analysts could easily produce substantially:
     –  Different Answer content
     –  Different Answer format, structure, depth and/or breadth of coverage
     –  Or both

AQUAINT Kickoff – 3 December 2001
Outline
   •  Information Exploitation Thrust
   •  AQUAINT Program
        –  The Vision
        –  The Challenges
        –  The Plan of Attack
        –  The AQUAINT Team

   •  Intelligence Community Perspective on
      Information Exploitation and AQUAINT

   •  Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
AQUAINT:
                                    ARDA’s Plan of Attack

   •  ARDA’s newest major Info-X R&D Program
         –  Envisioned as a high risk, long term R&D Program:
              •  Phase I            Fall 2001 - Fall 2003
              •  Phase II           Fall 2003 - Fall 2005
              •  Phase III          Fall/Winter 2005 - Fall/Winter 2007

   •  Focus on Final Objective from start
         –  Incrementally add media, data sources, & complexity of
            questions & answers during each phase

   •  Each of AQUAINT’s 3 Phases:
         –  Use Zero-Based, Open BAA-styled Solicitations
         –  Focus on Key Research Objectives
         –  Be Closely Linked to Parallel System Integration/Testbed Efforts
            & Data Collection/Preparation and Evaluation Efforts


AQUAINT Kickoff – 3 December 2001
AQUAINT:
                       R&D Focused on Three Functional Components

                                                              Other Analysts   Knowledge Bases;                        Partially
                                                                               Technical                                 Annotated &
                        Question & Requirement                                 Databases          Supplemental             Structured Data
                                                                                                      Use
                      Context; Analyst Background
                                                                                                                                  Automatic
  QUESTION                      Knowledge                                                         KB                                Metadata
                                                                                                  Queries                             Creation
    ????                                                      Query                                                         Multiple
                                                           Assessment,                         Translate Queries            Source
                      Natural Statement of                                                     into Source Specific
                                                            Advisor,                                                        Specific
                      Question;                                                                Retrieval Languages
                                      Use of              Collaboration        Queries                                      Queries
 Answer
 Context
                      Multimedia Examples              Question                                Single, Merged
                                                                  Question &                   Ranked List of
                            Clarification             Under-         Answer                    Relevant “Documents”
                                                                                                                            Multiple
                                                                                                                            Ranked
                                                    standing and Context                                         Relevant
                                                                                                                               Lists
                                                                                                                                         Supple-
                                                                                                                                          mental
                                                                                          Relevant                                       Use
                           FINAL                   Interpretation                        “Knowledge”         “Documents”

  Analyst
                            ANSWER                                              • Relevant information
                Proposed                        Query Refinement                  extracted and combined
   Feed-
                Answer                        based on Analyst                    where possible;                  Multiple
    back
                                            Feedback                            • Accumulation of Knowledge        Sources;
                                                                                  across “Documents”               Multiple Media;
                                                                                                                   Multi-Lingual;
                                                                                • Cross “Document”                 Multiple Agencies
            •  Formulate Answer for                   Results of Analysis         Summaries created;
              Analyst in form they want                                         • Language/Media
            •  Multimedia Navigation
                                                    Iterative Refinement
                                                                                  Independent Concept                  Determine
              Tools for Analyst Review                                            Representation
                                                      of Results based                                                    the
                                                    on Analyst Feedback         • Inconsistencies noted;
                  Answer                                                        • Proposed Conclusions                  Answer
                Formulation                                                       and Inferences Generated

AQUAINT Kickoff – 3 December 2001
AQUAINT:
                   Cross Cutting/Enabling Technologies R&D Areas


  Specifically Solicited Research Areas include:
  1) Advanced Reasoning for Question Answering
  2) Sharable Knowledge Sources
  3) Content Representation
  4) Interactive Question Answering Sessions
  5) Role of Context
  6) Role of Knowledge
  7) Deep, Human Language Processing and Understanding


AQUAINT Kickoff – 3 December 2001
AQUAINT:
                                       Intermediate Goals




                        Increasing Complexity Levels of Questions & Answers


        Level 1                  Level 2                Level 3                  Level 4
         ”Simple             "Template &            “Cross Media &            ”Context-Based
      Factual QA’s"         Multi-valued QA’s”   Cross Document QA’s"         QA Scenarios”




    Current                 Near Term             Mid Term              Long Term
AQUAINT Kickoff – 3 December 2001
AQUAINT:
                              Separate, Coordinated Activities

                 Annotated and ‘Ground Truthed’ Data
            Component Level / End-to-End Testing & Evaluation


          QUESTION                                                         Separate
            ????                    Question     Information              Coordinated
                                      Under-       Retrieval
                                     standing       Process                Activities
                                    and Inter-
                                     pretation
                      FINAL
                       ANSWER
                                                                   AQUAINT
                                                 Analysis &         Phase I
                                                  Synthesis
                  Answer                           Process        Solicitation
                Formulation
                                                   Determine
                                                   the Answer


           Cross Cutting/Enabling Technologies Research Issues
           Component Integration and System Architecture Issues
AQUAINT Kickoff – 3 December 2001
AQUAINT:
                           User Testbed / System Integration

•  Pull together best available system components
   emerging from AQUAINT Program research efforts
     –  Couple AQUAINT components with existing GOTS and COTS software

•  Develop end-to-end AQUAINT prototype(s) aimed at
   specific Operational QA environments
•  Government-led effort:
     –  Directly Linked into Sponsoring Agency’s Technology Insertion
        Organizations
     –  Close, working relationship with working Analysts
     –  Provide external system development support
     –  Mitre/Bedford will lead External System Integration / Testbed efforts
     –  Plan to also utilize additional external researchers as Consultants /
        Advisors


AQUAINT Kickoff – 3 December 2001
AQUAINT:
                                    Data & Evaluation Issues

•  Data
     –  Start by Using Existing Data Collections
          •  NIST’s TREC Text Corpora
          •  Linguistic Data Consortium (LDC) Human Language Corpora (e.g.
             TDT, Switchboard, Call Home, Call Friend Corpora)
          •  Existing Knowledge Bases and Other Structured Databases
     –  Future Data Collection & Annotation and Question/Answer Key
        Development will be a major effort
     –  Will likely use combined efforts of NIST and LDC
•  Evaluation
     –  Build upon highly successful TREC Q&A Track Evaluations --
        NIST has lead and is currently developing a Phased Evaluation
        Plan tied to AQUAINT Program Plans
     –  Cooperate to maximum extent possible with DARPA’s RKF
        (Rapid Knowledge Formation) Program Evaluation Efforts
AQUAINT Kickoff – 3 December 2001
AQUAINT R&D Program
                                       Workshops

•  When:
     Mon-Wed 3-5 December 2001

•  Where:
     Xerox Training & Conference
     Facility, Leesburg, VA

•  Mid-Year Workshops:
     Progress Reviews; Primarily for
     Program Participants

•  Annual Workshops:
     Major Workshop; Wider Audience;
     Evaluation & Testbed Results

•  Future Phase I Workshops
     May/June 2002       West Coast Site
     Dec 2002            Washington DC Area
     May/June 2003       West Coast Site
     Dec 2003            Washington DC Area

AQUAINT Kickoff – 3 December 2001
Reaching out to scientists
                         across the country…
                                           Northeast Regional
                                            Research Center
                                            Hosted by MITRE
                                              Corporation
                                              Bedford, MA




  Western Regional Information
        Science Center
  Hosted by Pacific Northwest
      National Laboratory
         Richland, WA
                                    …bringing their
                                          solutions home
AQUAINT Kickoff – 3 December 2001
Regional Research Centers

   •  Draw talent from national labs, academia, and
      industry located in the region (Western or
      Northeastern)

   •  Principle of organization is to attract highly
      knowledgeable talent for short periods (weeks,
      months) to focus on well-defined research problems

   •  Provide both real and virtual regional centers for
      technical collaboration in solving Information
      Technology problems of interest to the Intelligence
      Community

                    Help from outside the fence
AQUAINT Kickoff – 3 December 2001
Northeast Regional Research Center

   Hosted By MITRE, Bedford, MA
   Administered by CIA

 •  Conduct a 6-8 week workshop on
    an AQUAINT-related challenge in
    Summer 2002
 •  4-7 Sep 2001: Planning Workshop held at MITRE.
       –  Attended by Government Technical Leaders, MITRE, and invited
          set of industrial, FFRDC and Academic researchers in the field
       –  Four Potential Challenge Problems identified; Formal Proposals
          being developed for each Challenge Problem

 •  16 Nov 2001: Best and final proposal submitted
 •  5 Dec 2001: Final Selection made
AQUAINT Kickoff – 3 December 2001
Proposed NRRC Wkshp Challenge Problems

    1.  Temporal Issues
         –     Generate Sequence of events and activities along evolving
               timeline, resolving multiple levels of time references across
               series of documents/sources.
         –     Proposer: James Pustejovsky, Brandeis University

    2.  Re-Use of Accumulated Knowledge
         –     Investigate strategies for structuring and maintaining
               previously generated knowledge for possible future use.
               E.g. previous knowledge might include questions and
               answers (original and amplified) as well as relevant and
               background information retrieved and processed.
         –     Proposer: Marc Light, MITRE and Abraham Ittycheriah, IBM

AQUAINT Kickoff – 3 December 2001
Proposed NRRC Wkshp Challenge Problems

    3.  Multiple Perspectives
         –     Develop approaches for handling situations where
               relevant information is obtained from multiple sources on
               the same topic but generated from different perspectives
               (e.g. cultural or political differences).
         –     Proposer: Jan Wiebe, University of Pittsburgh

    4.  Habitability
         –     How can a Question Answering system efficiently and
               effectively inform a user what it can do and fail gracefully
               when the question is beyond the reasonable capabilities
               of the system.
         –     Proposers: Joe Marks, Mitsubishi Electric Research Lab
               and Christy Doran , MITRE
AQUAINT Kickoff – 3 December 2001
Outline
   •  Information Exploitation Thrust
   •  AQUAINT Program
        –  The Vision
        –  The Challenges
        –  The Plan of Attack
        –  The AQUAINT Team

   •  Intelligence Community Perspective on
      Information Exploitation and AQUAINT

   •  Some Final Thoughts . . .
AQUAINT Kickoff – 3 December 2001
ARDA’s AQUAINT Partners



                                     Program
                                    Committee




                                                               Active
                                                            External
                                                     Active
                                                          Stakeholders
                                                  External
                                                Stakeholders
AQUAINT Kickoff – 3 December 2001
Supporting Roles




              Evaluation

                                         User Testbed

                   Data /
             Operational Scenarios
                                               TBD ??
                                            Other Support
AQUAINT Kickoff – 3 December 2001
AQUAINT Phase I Projects (Fall 01 - Fall 03)

                        Total End-to-End Systems (6)




AQUAINT Kickoff – 3 December 2001
Answering Questions through
                    Understanding and Analysis (AQUA)
                                       BBN Technologies

            Objectives
•  Develop Comprehensive system
•  Use statistical language models,
  knowledge sources, and formal
  reasoning
•  Develop proposition recognition
  algorithm
•  Interpretation by Entity relationship
    model


                                           PLAN
•  Apply Cross Document Entity Detection and Tracking (CEDT) algorithm to QA
•  Questions will be interpreted in context.
•  Related QA sessions of others in workgroup will be brought to user’s attention
•  Answers will be drawn from across documents and sources

Principal Investigators: Ralph Weischedel / Scott Miller Topic Area: Total System
ARDA Contracting Agent: NSA                              Data Dimension: Focused (Text)
AQUAINT Kickoff – 3 December 2001
JAVELIN: Justification-based Answer
                Valuation through Language Interpretation
                Carnegie Mellon Univ. (Language Technologies Institute)

          OBJECTIVES
   •  QA as planning by
     developing a glass box
     planning infrastructure
   •  Universal auditability by
     developing a detailed set of
     labeled dependencies that
     form a traceable network of
     reasoning steps
   •  Utility-based information
     fusion

                                                PLAN
   Address the full Q/A task:
   •  Question analysis - question typing, interpretation, refinement, clarification
   •  Information seeking - document retrieval, entity and relation extraction
   •  Multi-source information fusion - multi-faceted answers, redundancy and contradiction detection


   Principal Investigator: Eric Nyberg               Topic Area: Total System
   Co-PIs: Jamie Callan, Jaime Carbonell             Data Dimension: Multi-Lingual (Text)
AQUAINT Kickoff – 3 December 2001 DIA
   ARDA Contracting Agent:                                            (English, Chinese, Japanese)
Integrating Robust Semantics, Event Detection,
                  Information Fusion, and Summarization for
                        Multimedia Question Answering
                    Columbia Univ. / Univ. of Colorado-Boulder
            OBJECTIVES
•  Use statistical semantic parser to
   produce a shallow, domain independent
   semantic representation
•  Develop a dialogue interface for
   carrying on focused dialogue with users
•  Adapt algorithms/components to
   handle spoken questions
•  Recognize “atomic” events and then
   tracking related information
•  Integrate summarization and language
   generation to produce brief, coherent,
   fluent answers.
                                             PLAN
Build an integrated system to:
•  Answer difficult questions that require interacting with the user to refine context
•  Locate conflicting or time-varying answers in heterogeneous text databases
•  Present answers that require combining/summarizing information from multiple sources
Principal Investigator: Vasileios Hatzivassiloglou, Kathleen Topic Area: Total System
McKeown / Daniel Jurafsky, Wayne Ward, Jim Martin            Data Dimension: Multi-Media
ARDA Contracting Agent: DIA                                                    (Text/Voice)
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prange

Contenu connexe

En vedette

En vedette (7)

Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...
 
Proceedings
ProceedingsProceedings
Proceedings
 
Event templatesfor qa2
Event templatesfor qa2Event templatesfor qa2
Event templatesfor qa2
 
Hpkb year 1 results
Hpkb   year 1 resultsHpkb   year 1 results
Hpkb year 1 results
 
SAIC System architecture
SAIC System architectureSAIC System architecture
SAIC System architecture
 
Proceedings
ProceedingsProceedings
Proceedings
 
Saic aqua summary
Saic aqua summarySaic aqua summary
Saic aqua summary
 

Similaire à Aquaint kickoff-overview-prange

Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
SEO CAMP
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
Joe Lamantia
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
Joe Lamantia
 
Designing Big Data Interactions: The Language of Discovery
Designing Big Data Interactions: The Language of DiscoveryDesigning Big Data Interactions: The Language of Discovery
Designing Big Data Interactions: The Language of Discovery
OReillyStrata
 
Sense networks
Sense networksSense networks
Sense networks
Ben Allen
 
The Language of Discovery: Designing Big Data Interactions
The Language of Discovery: Designing Big Data InteractionsThe Language of Discovery: Designing Big Data Interactions
The Language of Discovery: Designing Big Data Interactions
Joe Lamantia
 
Think Big Analytics AWS for Financial Services
Think Big Analytics AWS for Financial ServicesThink Big Analytics AWS for Financial Services
Think Big Analytics AWS for Financial Services
Amazon Web Services
 
Understanding the Consumer: Social Media Listening and Online Decision Paths
Understanding the Consumer: Social Media Listening and Online Decision PathsUnderstanding the Consumer: Social Media Listening and Online Decision Paths
Understanding the Consumer: Social Media Listening and Online Decision Paths
Vivastream
 
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Srini Bezwada
 

Similaire à Aquaint kickoff-overview-prange (20)

Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
 
CCI Europe Rethink Brochure
CCI Europe Rethink BrochureCCI Europe Rethink Brochure
CCI Europe Rethink Brochure
 
Monitor activity dashboard and georrefered analysis
Monitor activity dashboard and georrefered analysisMonitor activity dashboard and georrefered analysis
Monitor activity dashboard and georrefered analysis
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity Model
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
 
20070328 Information Management
20070328 Information Management20070328 Information Management
20070328 Information Management
 
Utilizing Semantics in the Production of iTV Shows (ESWC 2009)
Utilizing Semantics in the Production of iTV Shows (ESWC 2009)Utilizing Semantics in the Production of iTV Shows (ESWC 2009)
Utilizing Semantics in the Production of iTV Shows (ESWC 2009)
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
 
Designing Big Data Interactions: The Language of Discovery
Designing Big Data Interactions: The Language of DiscoveryDesigning Big Data Interactions: The Language of Discovery
Designing Big Data Interactions: The Language of Discovery
 
Sense networks
Sense networksSense networks
Sense networks
 
The Language of Discovery: Designing Big Data Interactions
The Language of Discovery: Designing Big Data InteractionsThe Language of Discovery: Designing Big Data Interactions
The Language of Discovery: Designing Big Data Interactions
 
Think Big Analytics AWS for Financial Services
Think Big Analytics AWS for Financial ServicesThink Big Analytics AWS for Financial Services
Think Big Analytics AWS for Financial Services
 
The New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesThe New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front Lines
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
Understanding the Consumer: Social Media Listening and Online Decision Paths
Understanding the Consumer: Social Media Listening and Online Decision PathsUnderstanding the Consumer: Social Media Listening and Online Decision Paths
Understanding the Consumer: Social Media Listening and Online Decision Paths
 
I3master
I3masterI3master
I3master
 
1 solution 4 each maturity level
1 solution 4 each maturity level1 solution 4 each maturity level
1 solution 4 each maturity level
 
Ibm i2
Ibm i2Ibm i2
Ibm i2
 
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
 
New Analytical Architectures for Big Data
New Analytical Architectures for Big DataNew Analytical Architectures for Big Data
New Analytical Architectures for Big Data
 

Plus de Barbara Starr

Semantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencingSemantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencing
Barbara Starr
 
Global accessibility day untapped minority
Global accessibility day  untapped minorityGlobal accessibility day  untapped minority
Global accessibility day untapped minority
Barbara Starr
 

Plus de Barbara Starr (13)

Kdd14 t2-bordes-gabrilovich (3)
Kdd14 t2-bordes-gabrilovich (3)Kdd14 t2-bordes-gabrilovich (3)
Kdd14 t2-bordes-gabrilovich (3)
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Smx west Barbara Starr Mac Version - Schema 201 for Real world Succes
Smx west Barbara Starr Mac Version - Schema 201 for Real world SuccesSmx west Barbara Starr Mac Version - Schema 201 for Real world Succes
Smx west Barbara Starr Mac Version - Schema 201 for Real world Succes
 
Smxeastbarbarastarr2012
Smxeastbarbarastarr2012Smxeastbarbarastarr2012
Smxeastbarbarastarr2012
 
Event templates for Question answering
Event templates for Question answeringEvent templates for Question answering
Event templates for Question answering
 
Knowledge intensive query Processing
Knowledge intensive query ProcessingKnowledge intensive query Processing
Knowledge intensive query Processing
 
Semantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencingSemantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencing
 
Saic aqua summary
Saic aqua summarySaic aqua summary
Saic aqua summary
 
Saic aqua
Saic aquaSaic aqua
Saic aqua
 
Hike (hpkb integrated knowledge environment)
Hike (hpkb integrated knowledge environment)Hike (hpkb integrated knowledge environment)
Hike (hpkb integrated knowledge environment)
 
Rdfa semtech2011
Rdfa semtech2011Rdfa semtech2011
Rdfa semtech2011
 
Global accessibility day untapped minority
Global accessibility day  untapped minorityGlobal accessibility day  untapped minority
Global accessibility day untapped minority
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Dernier (20)

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Aquaint kickoff-overview-prange

  • 1. AQUAINT R&D Program Advanced QUestion Answering for INTelligence Dr. John D. Prange AQUAINT Program Director JPrange@nsa.gov 301-688-7092 http://www.ic-arda.org 3 December 2001
  • 2. Outline •  Information Exploitation Thrust •  AQUAINT Program –  The Vision –  The Challenges –  The Plan of Attack –  The AQUAINT Team •  Intelligence Community Perspective on Information Exploitation and AQUAINT •  Some Final Thoughts . . . AQUAINT Kickoff – 3 December 2001
  • 3. Information Exploitation (Info-X) What Functions Does It Include? Information Analytic Retrieval Knowledge Content Data Assessment Presentation and and Mark-up Interpretation Visualization Data Filtering Reporting and & Selection Dissemination Content Data Synthesis Transformation and Fusion Information Information Discovery Understanding Info-X is Focused on Content & Its Meaning! AQUAINT Kickoff – 3 December 2001
  • 4. We Need To Dramatically Improve Our Ability to Find & Understand Information Report With Each Passing Day . . . Report …………………….. •  More “Hay” “Barriers” to Deep Understanding of Content …………………….. …………………….. …………………….. …………………….. •  Lower No. Of “Needles per Volume of Hay” Analysis: Turning Raw Data …………………….. …………………….. …………………….. …………………….. •  Fewer Analysts intoofReportable Intelligence Lack Variable Limited …………………….. AND Multiple Knowledge Control on Topics Reasoning •  Less Time! Sources Representation Creation Information Analytic Domains & Intelligence Capabilities Retrieval Knowledge Community Multiple & Many Foreign Goal / Multi- Data Content Data Integrity/ Languages/ Assessment and Objective of Products Markup Use of Deception and Presentation Interpretation Media VisualizationCharacter Scripts Originator Data Filtering Reporting and Natural & Selection Missing, Types, Sources, Degree of Dissemination Image/Video (vs. Artificial) Conflicting, Quantities Interpretation Understanding Language Ambiguous Data of Errors & Judgement Content Data Synthesis Importance Transformation Depth of Cross Importance and Fusion of Time Understanding Document of Raw Data Dimension Required Analysis Information Context Information “Finding the Discovery Understanding Needles in Role of Formal vs. Automated Lack of the Haystack” Informal Information Automated Knowledge Clearly . . .an Analyst Intensive Activity It Remains Conversation Extraction Learning We MUST Reduce these “Barriers” & Create “Cracks in this Wall”! AQUAINT Kickoffand December 2001 So much hay – 3 so little time! But How . . . 4
  • 5. We Need To Dramatically Improve Our Ability to Find & Understand Information Report Report …………………….. …………………….. …………………….. …………………….. …………………….. Analysis: Turning Raw Data …………………….. …………………….. …………………….. …………………….. into Reportable Intelligence …………………….. Information Analytic Intelligence Retrieval Knowledge Community Content Data Assessment and Products Markup Presentation and Interpretation Visualization Data Filtering Reporting and & Selection Dissemination Content Data Synthesis Transformation and Fusion Raw Data Information Information “Finding the Discovery Understanding Needles in the Haystack” It Remains an Analyst Intensive Activity AQUAINT Kickoff – 3 December 2001 5
  • 6. We Need To Dramatically Improve Our Ability to Find & Understand Information Report With Each Passing Day . . . Report …………………….. •  More “Hay” “Barriers” to Deep Understanding of Content …………………….. …………………….. …………………….. …………………….. •  Lower No. Of “Needles per Volume of Hay” Analysis: Turning Raw Data …………………….. …………………….. …………………….. …………………….. •  Fewer Analysts intoofReportable Intelligence Lack Variable Limited …………………….. AND Multiple Knowledge Control on Topics Reasoning Sources Representation •  Less Time! Creation Information Analytic Domains & Capabilities Retrieval Knowledge Multiple & Many Foreign Goal / Assessment Data Integrity/ Multi- Data Content Languages/ and Objective of Markup Use of Deception and Presentation Interpretation Media VisualizationCharacter Scripts Originator Data Filtering Reporting and Natural & Selection Missing, Types, Sources, Degree of Dissemination Image/Video (vs. Artificial) Conflicting, Quantities Interpretation Understanding Language Ambiguous Data of Errors & Judgement Content Data Synthesis Importance Transformation Depth of Cross Importance and Fusion of Time Understanding Document of Raw Data Dimension Required Analysis Information Context Information “Finding the Discovery Understanding Needles in Role of Formal vs. Automated Lack of the Haystack” Informal Information Automated Knowledge It Remains an Analyst Intensive Activity Conversation Extraction Learning AQUAINT Kickoff – 3 December 2001 6
  • 7. We Need To Dramatically Improve Our Ability to Find & Understand Information Report With Each Passing Day . . . Report …………………….. •  More “Hay” “Barriers” to Deep Understanding of Content …………………….. …………………….. …………………….. …………………….. •  Lower No. Of “Needles per Volume of Hay” Analysis: Turning Raw Data …………………….. …………………….. …………………….. …………………….. •  Fewer Analysts intoofReportable Intelligence Lack Variable Limited …………………….. AND Multiple Knowledge Control on Topics Reasoning Sources Representation •  Less Time! Creation Information Analytic Domains & Capabilities Retrieval Knowledge Multiple & Many Foreign Goal / Assessment Data Integrity/ Multi- Data Content Languages/ and Objective of Markup Use of Deception and Presentation Interpretation Media VisualizationCharacter Scripts Originator Data Filtering Reporting and Natural & Selection Missing, Types, Sources, Degree of Dissemination Image/Video (vs. Artificial) Conflicting, Quantities Interpretation Understanding Language Ambiguous Data of Errors & Judgement Content Data Synthesis Importance Transformation Depth of Cross Importance and Fusion of Time Understanding Document of Raw Data Dimension Required Analysis Information Context Information “Finding the Discovery Understanding Needles in Role of Formal vs. Automated Lack of the Haystack” Informal Information Automated Knowledge Clearly . . .an Analyst Intensive Activity It Remains Conversation Extraction Learning We MUST Reduce these “Barriers” & Create “Cracks in this Wall”! AQUAINT Kickoffand December 2001 So much hay – 3 so little time! But How . . . 7
  • 8. Info-X R&D Programs: The Ideal Build Process ARDA Thrust: Information Exploitation End-to-end Customer Operational Problems Operational Needs Tests Operational Capabilities Customer’s Data Technical Needs Research R&D Response Component Research Projects Level Testing AQUAINT Kickoff – 3 December 2001
  • 9. Current Info-X R&D Programs •  AQUAINT Advanced QUestion & Answering for INTelligence Full R&D •  VACE Programs Video Analysis and Content Extraction consisting of Three 2-Year Phases •  GI2Vis Geospatial Intelligence Information Visualization •  LEMUR Exploratory Statistical Language Modeling for Information Retrieval R&D Programs consisting of Programs •  NDHB 1-Year + Option Year Non-Linear Dynamics from Human Behavior AQUAINT Kickoff – 3 December 2001
  • 10. Outline •  Information Exploitation Thrust •  AQUAINT Program –  The Vision –  The Challenges –  The Plan of Attack –  The AQUAINT Team •  Intelligence Community Perspective on Information Exploitation and AQUAINT •  Some Final Thoughts . . . AQUAINT Kickoff – 3 December 2001
  • 11. “Some look at things that are and ask why. I dream of things that might be and ask why not.” Robert Kennedy 1925-1968 AQUAINT Kickoff – 3 December 2001
  • 12. Traditional Information Retrieval (IR) Approach Question ? System Specific Query e.g. Boolean Key Word Equation Data Traditional Ranked List of Source Hopefully “Relevant” e.g Large Information .......... . .. .. .. .. .. .. .. .. .. . Documents Text Retrieval . .. ... ........................ .. . . .. ... ... .. .. .. .. .. .. .. .. . . .. ... ... ... .... ..... ..... .. .. .. . . .. ... ... ..... .... .... ............. .. . Archive . .. ... .. . .... .... .. .. ... .. .. .. .. .. .. . . . .. ... ... ... ............................................. .. . . .. ... ... ..... .... . ... .. .. .. .. .. ...... .. . . .. ... ... ......... ............................... .. . . .. ... ... ... ...................................... .. . . .. ... ... ...... .... ..... .. .. .. .. .. ...... .. . . .. ... ... .............. .. .. .. .. .. .. .. .. . . .. .. .. .. .. .. .. ... ... ... ... ... ... .. . . . . .. ... ... ...... .. .. .. .. .. .. .. . . .. ... ............................ .. . . .. ............................ .. . . .. ........................ .. . . .. .. .. .. .. .. .. .. .. . .......... AQUAINT Kickoff – 3 December 2001
  • 13. Next Generation Approaches: Question Answering (QA) Systems Single, Factoid Move Closer Question ? to the Question e.g. Question Classification System Specific Query; often Tailored to Question Type Ranked List of Single Traditional . .. .. .. .. .. .. .. .. .. . Hopefully “Relevant” Data Information QA ............ . . .. .. .. .. .. .. .. .. .. . . . .. .. ... ... .................... ... ... .. .. . . . .. .. ... .................. .. .. . . . Documents Source Retrieval . . .. ... ... ... ...................... .. .. .. . . Shallow . .. .. ... ... ............................... ... ... .. .. . . . . .. .. .. .................................. ... ... .. .. . . .. .. ... ... ... ........................... ... ... .. .. . . Analysis . .. ... ... .... ......................................... .... .... ... .. .. . . . .. .. ... .... .... ........................ ... ... .. . . . .. .. ... ... ........................... ... ... ... .. .. . . . .. .. ... .... ..... .............................. ..... .... ... .. .. . .. ... .... ..... ..... ..... ..... ..... ..... .... ... .. . . . .. ... ... ... ... ... ... ... ... .. . . .......... Move Closer to the Answer e.g. Passage Retrieval “Answer” AQUAINT Kickoff – 3 December 2001
  • 14. TREC QA Track Approach •  ARDA & DARPA co-sponsoring the Question Answering Track in the NIST’s organized Text Retrieval Conference (TREC) Program. (Starting with TREC-8 in Nov 1999) •  TREC-10 Results (Nov 2001): –  500- factual questions; About 50 questions had no answer in the Top System: 70% of the TREC-10 Data sources; Used “Answers” found in their “Real” Questions top 5 50-byte Passages –  Data source: approx. 3 GByte database of ~980K news stories –  36 US & international organizations participated; 92 separate runs evaluated –  System output: top 5 regions (50 bytes) in a single story believed to contain Answer to the given question AQUAINT Kickoff – 3 December 2001
  • 15. Pilot Evaluations TREC 10 QA Track •  The “List Task” –  Sample Questions: •  “Name 4 US cities that have a “Shubert” Theater” •  “Name 30 individuals who served as a cabinet officer under Ronald Reagan” –  Evaluation Metric: (Number of distinct instances divided by the target number of instances averaged over 25 questions) •  Top System among 18 runs: Achieved 76% Accuracy •  The “Context Task” –  Sample Series of Questions: •  “How many species of spiders are there?” •  “How many are poisonous to humans?” •  “What percentage of spider bites in the US are fatal?” –  Evaluation Metric: Same as Main Task; 10 Series of Questions; 42 total Questions) •  Top System: Found answer for 34 of the 42 total questions (81%) AQUAINT Kickoff – 3 December 2001
  • 16. AQUAINT Advanced QUestion & Answering for INTelligence In a foreign news broadcast a team of analysts observe a previously unknown individual conferring with the Foreign Minister. They suspect that he/she is really a new senior advisor. What influence Does this signal What are does he/she that other his/her have on FM? policy changes views? are coming? What do we know about him/her? Who is this And still more advisor? questions ??? Overarching Context / Operational Requirement AQUAINT Kickoff – 3 December 2001
  • 17. AQUAINT Advanced QUestion & Answering for INTelligence Judgement Predictive Interpretive Questions? Questions? Overarching Context / Interpreting Questions? Why Operational Requirement Complex Questions QA Scenario ? Other within a Factoid Questions? Larger Context Questions ? Deeper Ranked Extend Extract & Traditional Automated Analyze .......... Lists of . .. .. .. ... ... ... ... ... ... ... .. .. .. . . Information Understanding Results . . . . .. ... ... ... ... ... .. ... ... ... .. . . . . . .. .. .... .... .... .... .... .... .... .... .... .... .... ... .. . . “Relevant” Retrieval . .. .. ... ... ........................................... ... ... .. . . . . .. .. .... ................................... ... ... ... .. . . Data Objects Multiple . . .. ... .... .... ......................................... ... .. .. .. . . .. .. ... ................................................................... ... .. .. . .. Heterogeneous . . .. .. .... ....................................................... .. .. .. .. . . . .. .. ... ... ... ..................................................... ... . . . . Advanced . . .. .. .................................................................................. ... .. .. . . . .. ... .... ... ................................................ .. .... .. .. . . . .. .. ... ... .................................................... ... .. . . . ..... . .. .. ... .... ................................................................ ... ... .. . . .. .. .. .. .. .. .. . . . Data Sources . .. ... .... ..... ..... ..... ........... ..... ..... ..... ... ... ... .. . . Provide Answers QA .. . . . .. ... ... ... ... ... ... .... ... ... ... .. .. .. . . . . . . . . . .. . . . . in a Form Interpret Results Analysts Want & Formulate the Answers Answers AQUAINT Kickoff – 3 December 2001
  • 18. AQUAINT Is Skipping Ahead Two Generations Multiple Key Barriers to Content Understanding Will Be Aggressively Attacked Commercial World & Current R&D Efforts Are Addressing the Next Generation But Only Selected Content Understanding Barriers Are Being Aggressively Attacked
  • 19. Outline •  Information Exploitation Thrust •  AQUAINT Program –  The Vision –  The Challenges –  The Plan of Attack –  The AQUAINT Team •  Intelligence Community Perspective on Information Exploitation and AQUAINT •  Some Final Thoughts . . . AQUAINT Kickoff – 3 December 2001
  • 20. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst 2) Pursue QA Scenarios and not just isolated, factually based QA 3) Support a collaborative, multiple analyst environment 4) Some times SMALL things really matter and other times BIG things don’t 5) Advanced QA must attack the “Data Chasm” 6) Time is of the Essence AQUAINT Kickoff – 3 December 2001
  • 21. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers 8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach 9) Expanding requirements for more advanced learning and reasoning methods/approaches 10) Discovering the correct answer will be hard enough; but crafting an appropriate, articulate, succinct, explainable response will be even harder AQUAINT Kickoff – 3 December 2001
  • 22. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst AQUAINT Kickoff – 3 December 2001
  • 23. Professional Information Analysts: Target Audience for AQUAINT -- Who are They? •  For ARDA and AQUAINT they are: –  Intelligence Community and Military Analysts •  But there are other Potential Target Audiences of “Professional Information Analysts”: –  Investigative / “CNN-type” Reporters –  Financial Industry Analysts / Investors –  Historians / Biographers –  Lawyers / Law Clerks –  Law Enforcement Detectives –  And Others AQUAINT Kickoff – 3 December 2001
  • 24. Intelligence Community Analysts – Who are they? What Do We See When We Focus Directly In On Our Intelligence Analysts? AQUAINT Kickoff – 3 December 2001
  • 25. Some Observations about Intelligence Analysts (IA’s) MAJOR DIFFERENCES DO EXIST AMONG IA’s •  First: There are different levels of intelligence within the IC -- Strategic, Operational, Tactical -- –  ARDA is focusing on Strategic Level IA’s •  Second: There is no stereotypical analyst even within our Strategic Level Intelligence Agencies. –  Clear, significant differences exist across the national IC agencies as well as across the different “INT’s” –  Additional, significant differences are accentuated by total breadth and variety of all IC reporting requirements. –  There are even significant differences between IA’s within the same IC agency •  Third: There are significant skill level differences among IA’s –  Yes, the most seniors IA’s are exceptional –  But the junior IA’s aren’t bad either AQUAINT Kickoff – 3 December 2001
  • 26. Some Observations about Intelligence Analysts (IA’s) BUT UNIVERSAL SIMILARITIES CAN BE IDENTIFIED ACROSS OUR IA’s •  We believe that these similarities are significant and strong enough that: –  Taken collectively they highlight key differences between Intelligence Analysts and the Emerging Casual Information Consumer that is being fueled by the Information Revolution and targeted by the commercial world –  A common set of critically important Info-X problems for the IC can be identified and articulated –  Multi-agency R&D programs against these common Info-X problems can be developed to the benefit of all IC Agencies AQUAINT Kickoff – 3 December 2001
  • 27. Universal Similarities Across IA’s 1. IA’s are information professionals 2. IA’s are almost always subject matter experts within their assigned task areas 3. IA’s track and follow a given event, scenario, problem, situation for an extended period of time 4. Increasingly IA’s are performing all source analysis and production 5. IA’s typically work with overwhelming volumes of data and information, but that’s the good news 6. Increasingly IA’s must collaborate with other IA’s 7. IA’s are focused on their Mission and will do whatever it takes to accomplish it 8. The Intelligence that IA’s produce is judged against the highest standards (called the “Tenets of Intelligence”) - Timeliness - Accuracy - Usability - Completeness - Relevance AQUAINT Kickoff – 3 December 2001
  • 28. Universal Similarities Across IA’s 1. IA’s are information professionals -- That is, IA’s are not casual developers and consumers of information 2. IA’s are almost always subject matter experts within their assigned task areas -- That is, IA’s have broad and deep knowledge of their subject area and possess profound skills developed over 10’s of years of experience 3. IA’s track and follow a given event, scenario, problem, situation for an extended period of time -- That is, IA’s frequently have developed extensive working files related to their investigation; IA’s information needs and queries carry within them an extensive, non-expressed context and background AQUAINT Kickoff – 3 December 2001
  • 29. Universal Similarities Across IA’s 4. Increasingly IA’s are performing all source analysis and production -- For example, the language analyst must use intercept from multiple media, multiple languages and the imagery analyst must know how to combine information from multiple INT’s. 5. IA’s typically work with overwhelming volumes of data and information, but that’s the good news -- Raw data on which the IA developed information is based is often “dirty”, “errorful”, “contradictory or conflicting”, “of questionable or unknown validity”, “incomplete or missing”, “time sensitive”, “highly fragmented”, etc. 6. Increasingly IA’s must collaborate with other IA’s -- These IA’s may be working in different organizations, different agencies and they might not even know that each other would benefit from collaboration. AQUAINT Kickoff – 3 December 2001
  • 30. Universal Similarities Across IA’s 7. IA’s are focused on their Mission and will do whatever it takes to accomplish it -- That is, IA’s are highly adaptable and resourceful. They will develop workable strategies and attacks regardless of the roadblocks that our collection and processing “stovepipes” create and of the limitations that our “brain dead” analytic tools offer. 8. The Intelligence that IA’s produce is judged against the highest standards (called the “Tenets of Intelligence”) -- –  Timeliness –  Accuracy –  Usability –  Completeness –  Relevance AQUAINT Kickoff – 3 December 2001
  • 31. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst 2) Pursue QA Scenarios and not just isolated, factually based QA AQUAINT Kickoff – 3 December 2001
  • 32. Implications of QA Scenarios •  Requires handling a Full Range of Complexity & Continuity of Questions •  Need to understand & track the analysts’ line of reasoning and flow of argument •  QA System requires significantly greater insight into knowledge, desires, past experiences, likes and dislikes of “Questioner” Judgement Predictive Questions •  Place much higher value on Interpretive Questions? ? Questions ? recognizing and capturing Why Questions “background” information ? Other Questions? Factoid •  Questioner/System dialogue Question? is now more than just a Overarching Context / means for clarification Operational Requirement AQUAINT Kickoff – 3 December 2001
  • 33. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst 2) Pursue QA Scenarios and not just isolated, factually based QA 3) Support a collaborative, multiple analyst environment AQUAINT Kickoff – 3 December 2001
  • 34. Collaboration within QA •  Standard Collaboration •  Non-Standard Discovery (From an Analyst Perspective) (From a System Perspective) –  Who else is working all or a –  Identify previous QA portion of my task? Scenarios that have “similarity” to current QA –  What do they know that I Scenario. Compare & don’t and vice versa? Contrast –  Can we share/work together? –  Use / Build-on / Update previous results Knowledge Other Analysts Bases;Technical –  Uncover new data sources Question & Requirement Databases QUESTION Context; Analyst Background –  Borrow a successful “line Knowledge ???? Query of reasoning” or Assessment, Natural Statement of Question; Advisor, “argument flow” Use of Collaboration Focus Multimedia Examples –  Alerts analyst to different Question Clarification Understanding interpretations or to and Interpretation overlooked / undervalued AQUAINT Kickoff – 3 December 2001 data
  • 35. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst 2) Pursue QA Scenarios and not just isolated, factually based QA 3) Support a collaborative, multiple analyst environment 4) Some times SMALL things really matter and other times BIG things don’t AQUAINT Kickoff – 3 December 2001
  • 36. “Small & Big” - Can we tell the difference? •  Some times SMALL differences can produce significantly different results/interpretations: –  Stop Words •  “Books {by; for; about} kids” –  Attachments •  “The man saw the woman in the park with the telescope.” –  Co-reference •  “John {persuaded; promised} Bill to go. He just left.” •  “Mary took the pill from the bottle. She swallowed it.” •  Other times BIG differences can produce the same/ similar results: –  “Name the films in which Richard Harris starred.” –  “Richard Harris played a leading role in which movies?” –  “In what Hollywood productions did Richard Harris receive top billing?” AQUAINT Kickoff – 3 December 2001
  • 37. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst 2) Pursue QA Scenarios and not just isolated, factually based QA 3) Support a collaborative, multiple analyst environment 4) Some times SMALL things really matter and other times BIG things don’t 5) Advanced QA must attack the “Data Chasm” AQUAINT Kickoff – 3 December 2001
  • 38. Attacking the Data Chasm Today Level I Level II Future Level III Mulit-Valued Questions Factual Questions Single Cross Media Full Factual Cross Document Context-Based Isolated Simple Judgement Question Questions Scenario Data Chasm Increasing MANY Heterogeneous Missing Reliability Contradictory Synthesis Across Volumes Data Sources; Data of Data Data “Documents”/Media (Petabyte & up) All Types, Sizes, Locations Answers Variable Narrative Fully Intersected; Automatically Summary; 50/250 Byte Generated; Fixed Templates Multi-Media Passage from Variable or Presentations; Single Text Structure/ Tabular Lists Simple Interpreted Document Format; Results Full Context AQUAINT Kickoff – 3 December 2001 Responses
  • 39. AQUAINT: Data Types Structured / Semi-Structured Unstructured Technical / “Tagged Data” Abstract Visual KB’s DB’s (e.g. Web Data) Data Sensor Geospatial Video Still Images Human Economic Other Language Media Language Genre Newswire / Text English News Broadcast Documents Foreign Language 1 Technical Speech Foreign Formal / Informal Language 2 Communication Multi-Media Foreign Language N Other AQUAINT Kickoff – 3 December 2001
  • 40. AQUAINT: Data Types Structured / Semi-Structured Unstructured Technical / “Tagged Data” Abstract Visual KB’s DB’s (e.g. Web Data) Data Sensor Geospatial Video Still Images Human Economic Other Language DATA FOCUS OF Media Language Genre RELATED QA PROGRAMS / ACTIVITIES Newswire / Text English Commercial News Broadcast “Ask Jeeves” Documents Foreign DARPA’s DAML Language 1 Technical DARPA’s RKF Speech Foreign Formal / Informal DARPA’s TIDES & TDT Language 2 Communication TREC QA Track Multi-Media Foreign Other ARDA’s VACE Language N ARDA’s GI2Vis AQUAINT Kickoff – 3 December 2001
  • 41. AQUAINT: Phase I Data Dimensions Data Dimension Requirement Example 1. Focused Single media, Single language, and English newspaper/ single genre in an unstructured data newswire articles (text) Source 2. Multiple Media Two or more of the following: text (clean, Question where the degraded, and speech recognition answer is summarization produced), raw speech, still imagery, of information found in video data, abstract data (technical, video clips & may contain geospacial), and related media a table of technical data extracted from various sources (geospacial, text, etc.) 3. Cross Lingual English questions with foreign language English question with references and passages. Foreign answer derived from languages could be expressed using any single media (newswire) number of foreign character scripts and material in Chinese or encoding schemes. Arabic and other language. AQUAINT Kickoff – 3 December 2001
  • 42. AQUAINT: Phase I Data Dimensions Data Dimension Requirement Example 4. Multiple Genre Formal and informal correspondence Question with answer (various media), formal dialog, informal derived from formal conversations or discussions, technical/ correspondence and journal articles, newswire/broadcast news; journal articles advertisements; product and technical descriptions, government reports; public databases 5. Structured & Tables, charts and maps, diagrams, linked Question with answer Unstructured data or directed graph data, structured derived from knowledge databases, structured transactions; large base and substantiated knowledge bases; linked web/pages; and with information from html/xml documents PLUS unstructured technical journal. data from one of the media, lingual or genre dimensions. AQUAINT Kickoff – 3 December 2001
  • 43. Top 10 Challenges 1) Satisfy QA requirements of the “Professional” Information Analyst 2) Pursue QA Scenarios and not just isolated, factually based QA 3) Support a collaborative, multiple analyst environment 4) Some times SMALL things really matter and other times BIG things don’t 5) Advanced QA must attack the “Data Chasm” 6) Time is of the Essence AQUAINT Kickoff – 3 December 2001
  • 44. Time: Our Achilles Heel? •  Real Difficulties Exist in: –  Extracting, correctly interpreting time references & then creating manageable timelines –  Estimating & updating changing reliability of information over time –  Processing information in time sequence e.g. Tracking the details of an evolving event over time -- A whole different set of problems •  And of course: –  We can’t forget all of the issues related to the timeliness of the system’s response to our question(s) -- we’ll need at least “near real time responses” March April May June July August AQUAINT Kickoff – 3 December 2001
  • 45. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers AQUAINT Kickoff – 3 December 2001
  • 46. QA Scenarios: A Different Paradigm? •  Current Analytic Paradigm: •  A Different Paradigm may be –  Sequentially “Filter Down” to the useful when handling QA final result Scenarios: Data –  Cast a “wider net” while searching for “golden nuggets” (Answers) How Wide to What Info to Retain? Cast the “Net”? In what form? For how long? Background Processing & Analysis Answers Discarded Space of Data Objects and Sources Results –  Automatically Extract, Represent, and Preserve “closely related” –  Works when QA’s are background information within independent, isolated activities context of the QA Scenario AQUAINT Kickoff – 3 December 2001
  • 47. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers 8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach AQUAINT Kickoff – 3 December 2001
  • 48. Complex QA: The Need for Ever Increasing Knowledge -- Of All Types DIMENSIONS OF THE QUESTION DIMENSIONS OF THE ANSWER PART OF THE QA PROBLEM PART OF THE QA PROBLEM Scope Multiple Sources Advanced Simple Advanced Simple QA Answer, QA Factual The image cannot be displayed. The image cannot be displayed. Your computer may not have R&D enough memory to open the image, Single Your computer may not have R&D enough memory to open the image, Question or the image may have been or the image may have been Program corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to Source Program corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it delete the image and then insert it again. again. Judgement Interpretation Increasing Increasing Knowledge Knowledge Context Requirements ** Fusion Requirements ** ** Knowledge Requirement would be better represented with a whole “quiver of arrows” of different sizes, lengths and types AQUAINT Kickoff – 3 December 2001
  • 49. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers 8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach 9) Expanding requirements for more advanced learning and reasoning methods/approaches AQUAINT Kickoff – 3 December 2001
  • 50. Improved Reasoning & Learning In a foreign news broadcast a team of analysts observe a previously unknown individual conferring with the Foreign Minister. They suspect that he/she is really a new senior advisor. FOCUS What influence Does this signal What are does he/she that other his/her have on FM? policy changes views? are coming? What do we know about him/her? Who is this And still more advisor? questions ??? Overarching Context / Operational Requirement AQUAINT Kickoff – 3 December 2001
  • 51. Improved Reasoning & Learning Advanced Reasoning: Follow-up Follow-up •  Use Multi-level Plans Leads Leads •  Create and evaluate chains of reasoning •  Reason across hetero- Education geneous data sources TV & Radio Broadcasts, Past •  Infer answers from Collected Positions Raw “Bio” Newspapers Information data extracted from Views & Other Family multiple sources when Archives New Senior the answer is not Advisor Travels explicitly stated Cross Fertilization Other Activities •  Utilize Link Analysis & Summarized Evidence Discovery Results Summarized “Views: •  Plus other strategies Past & “Bio” Results ………..…. Present” .….… ……..……. Advanced Learning: ………..…. ……..……. ….….. ………..…. •  Automatically .……. .……. ….….. ….….. ……..……. learn new or modify .……. ….….. …………... .……. ….….. existing reasoning strategies AQUAINT Kickoff – 3 December 2001
  • 52. Top 10 Challenges 7) Must extract, represent and preserve information uncovered when searching for answers 8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach 9) Expanding requirements for more advanced learning and reasoning methods/approaches 10) Discovering the correct answer will be hard enough; but crafting an appropriate, articulate, succinct, explainable response will be even harder AQUAINT Kickoff – 3 December 2001
  • 53. Difficulties in Generating Answers •  Natural Language Generation continues to be a difficult, open research area. –  Adding the requirement to generate multimedia answers makes this problem even harder. •  Providing the ability to explain and/or justify answers also continues to be a difficult, open research area. –  The more complex the line or chain of reasoning, the more complex the explanation and/or justification •  QA Scenarios and differences across analysts add additional levels of complexity. The Same Question asked within different scenarios by different analysts could easily produce substantially: –  Different Answer content –  Different Answer format, structure, depth and/or breadth of coverage –  Or both AQUAINT Kickoff – 3 December 2001
  • 54. Outline •  Information Exploitation Thrust •  AQUAINT Program –  The Vision –  The Challenges –  The Plan of Attack –  The AQUAINT Team •  Intelligence Community Perspective on Information Exploitation and AQUAINT •  Some Final Thoughts . . . AQUAINT Kickoff – 3 December 2001
  • 55. AQUAINT: ARDA’s Plan of Attack •  ARDA’s newest major Info-X R&D Program –  Envisioned as a high risk, long term R&D Program: •  Phase I Fall 2001 - Fall 2003 •  Phase II Fall 2003 - Fall 2005 •  Phase III Fall/Winter 2005 - Fall/Winter 2007 •  Focus on Final Objective from start –  Incrementally add media, data sources, & complexity of questions & answers during each phase •  Each of AQUAINT’s 3 Phases: –  Use Zero-Based, Open BAA-styled Solicitations –  Focus on Key Research Objectives –  Be Closely Linked to Parallel System Integration/Testbed Efforts & Data Collection/Preparation and Evaluation Efforts AQUAINT Kickoff – 3 December 2001
  • 56. AQUAINT: R&D Focused on Three Functional Components Other Analysts Knowledge Bases; Partially Technical Annotated & Question & Requirement Databases Supplemental Structured Data Use Context; Analyst Background Automatic QUESTION Knowledge KB Metadata Queries Creation ???? Query Multiple Assessment, Translate Queries Source Natural Statement of into Source Specific Advisor, Specific Question; Retrieval Languages Use of Collaboration Queries Queries Answer Context Multimedia Examples Question Single, Merged Question & Ranked List of Clarification Under- Answer Relevant “Documents” Multiple Ranked standing and Context Relevant Lists Supple- mental Relevant Use FINAL Interpretation “Knowledge” “Documents” Analyst ANSWER • Relevant information Proposed Query Refinement extracted and combined Feed- Answer based on Analyst where possible; Multiple back Feedback • Accumulation of Knowledge Sources; across “Documents” Multiple Media; Multi-Lingual; • Cross “Document” Multiple Agencies •  Formulate Answer for Results of Analysis Summaries created; Analyst in form they want • Language/Media •  Multimedia Navigation Iterative Refinement Independent Concept Determine Tools for Analyst Review Representation of Results based the on Analyst Feedback • Inconsistencies noted; Answer • Proposed Conclusions Answer Formulation and Inferences Generated AQUAINT Kickoff – 3 December 2001
  • 57. AQUAINT: Cross Cutting/Enabling Technologies R&D Areas Specifically Solicited Research Areas include: 1) Advanced Reasoning for Question Answering 2) Sharable Knowledge Sources 3) Content Representation 4) Interactive Question Answering Sessions 5) Role of Context 6) Role of Knowledge 7) Deep, Human Language Processing and Understanding AQUAINT Kickoff – 3 December 2001
  • 58. AQUAINT: Intermediate Goals Increasing Complexity Levels of Questions & Answers Level 1 Level 2 Level 3 Level 4 ”Simple "Template & “Cross Media & ”Context-Based Factual QA’s" Multi-valued QA’s” Cross Document QA’s" QA Scenarios” Current Near Term Mid Term Long Term AQUAINT Kickoff – 3 December 2001
  • 59. AQUAINT: Separate, Coordinated Activities Annotated and ‘Ground Truthed’ Data Component Level / End-to-End Testing & Evaluation QUESTION Separate ???? Question Information Coordinated Under- Retrieval standing Process Activities and Inter- pretation FINAL ANSWER AQUAINT Analysis & Phase I Synthesis Answer Process Solicitation Formulation Determine the Answer Cross Cutting/Enabling Technologies Research Issues Component Integration and System Architecture Issues AQUAINT Kickoff – 3 December 2001
  • 60. AQUAINT: User Testbed / System Integration •  Pull together best available system components emerging from AQUAINT Program research efforts –  Couple AQUAINT components with existing GOTS and COTS software •  Develop end-to-end AQUAINT prototype(s) aimed at specific Operational QA environments •  Government-led effort: –  Directly Linked into Sponsoring Agency’s Technology Insertion Organizations –  Close, working relationship with working Analysts –  Provide external system development support –  Mitre/Bedford will lead External System Integration / Testbed efforts –  Plan to also utilize additional external researchers as Consultants / Advisors AQUAINT Kickoff – 3 December 2001
  • 61. AQUAINT: Data & Evaluation Issues •  Data –  Start by Using Existing Data Collections •  NIST’s TREC Text Corpora •  Linguistic Data Consortium (LDC) Human Language Corpora (e.g. TDT, Switchboard, Call Home, Call Friend Corpora) •  Existing Knowledge Bases and Other Structured Databases –  Future Data Collection & Annotation and Question/Answer Key Development will be a major effort –  Will likely use combined efforts of NIST and LDC •  Evaluation –  Build upon highly successful TREC Q&A Track Evaluations -- NIST has lead and is currently developing a Phased Evaluation Plan tied to AQUAINT Program Plans –  Cooperate to maximum extent possible with DARPA’s RKF (Rapid Knowledge Formation) Program Evaluation Efforts AQUAINT Kickoff – 3 December 2001
  • 62. AQUAINT R&D Program Workshops •  When: Mon-Wed 3-5 December 2001 •  Where: Xerox Training & Conference Facility, Leesburg, VA •  Mid-Year Workshops: Progress Reviews; Primarily for Program Participants •  Annual Workshops: Major Workshop; Wider Audience; Evaluation & Testbed Results •  Future Phase I Workshops May/June 2002 West Coast Site Dec 2002 Washington DC Area May/June 2003 West Coast Site Dec 2003 Washington DC Area AQUAINT Kickoff – 3 December 2001
  • 63. Reaching out to scientists across the country… Northeast Regional Research Center Hosted by MITRE Corporation Bedford, MA Western Regional Information Science Center Hosted by Pacific Northwest National Laboratory Richland, WA …bringing their solutions home AQUAINT Kickoff – 3 December 2001
  • 64. Regional Research Centers •  Draw talent from national labs, academia, and industry located in the region (Western or Northeastern) •  Principle of organization is to attract highly knowledgeable talent for short periods (weeks, months) to focus on well-defined research problems •  Provide both real and virtual regional centers for technical collaboration in solving Information Technology problems of interest to the Intelligence Community Help from outside the fence AQUAINT Kickoff – 3 December 2001
  • 65. Northeast Regional Research Center Hosted By MITRE, Bedford, MA Administered by CIA •  Conduct a 6-8 week workshop on an AQUAINT-related challenge in Summer 2002 •  4-7 Sep 2001: Planning Workshop held at MITRE. –  Attended by Government Technical Leaders, MITRE, and invited set of industrial, FFRDC and Academic researchers in the field –  Four Potential Challenge Problems identified; Formal Proposals being developed for each Challenge Problem •  16 Nov 2001: Best and final proposal submitted •  5 Dec 2001: Final Selection made AQUAINT Kickoff – 3 December 2001
  • 66. Proposed NRRC Wkshp Challenge Problems 1.  Temporal Issues –  Generate Sequence of events and activities along evolving timeline, resolving multiple levels of time references across series of documents/sources. –  Proposer: James Pustejovsky, Brandeis University 2.  Re-Use of Accumulated Knowledge –  Investigate strategies for structuring and maintaining previously generated knowledge for possible future use. E.g. previous knowledge might include questions and answers (original and amplified) as well as relevant and background information retrieved and processed. –  Proposer: Marc Light, MITRE and Abraham Ittycheriah, IBM AQUAINT Kickoff – 3 December 2001
  • 67. Proposed NRRC Wkshp Challenge Problems 3.  Multiple Perspectives –  Develop approaches for handling situations where relevant information is obtained from multiple sources on the same topic but generated from different perspectives (e.g. cultural or political differences). –  Proposer: Jan Wiebe, University of Pittsburgh 4.  Habitability –  How can a Question Answering system efficiently and effectively inform a user what it can do and fail gracefully when the question is beyond the reasonable capabilities of the system. –  Proposers: Joe Marks, Mitsubishi Electric Research Lab and Christy Doran , MITRE AQUAINT Kickoff – 3 December 2001
  • 68. Outline •  Information Exploitation Thrust •  AQUAINT Program –  The Vision –  The Challenges –  The Plan of Attack –  The AQUAINT Team •  Intelligence Community Perspective on Information Exploitation and AQUAINT •  Some Final Thoughts . . . AQUAINT Kickoff – 3 December 2001
  • 69. ARDA’s AQUAINT Partners Program Committee Active External Active Stakeholders External Stakeholders AQUAINT Kickoff – 3 December 2001
  • 70. Supporting Roles Evaluation User Testbed Data / Operational Scenarios TBD ?? Other Support AQUAINT Kickoff – 3 December 2001
  • 71. AQUAINT Phase I Projects (Fall 01 - Fall 03) Total End-to-End Systems (6) AQUAINT Kickoff – 3 December 2001
  • 72. Answering Questions through Understanding and Analysis (AQUA) BBN Technologies Objectives •  Develop Comprehensive system •  Use statistical language models, knowledge sources, and formal reasoning •  Develop proposition recognition algorithm •  Interpretation by Entity relationship model PLAN •  Apply Cross Document Entity Detection and Tracking (CEDT) algorithm to QA •  Questions will be interpreted in context. •  Related QA sessions of others in workgroup will be brought to user’s attention •  Answers will be drawn from across documents and sources Principal Investigators: Ralph Weischedel / Scott Miller Topic Area: Total System ARDA Contracting Agent: NSA Data Dimension: Focused (Text) AQUAINT Kickoff – 3 December 2001
  • 73. JAVELIN: Justification-based Answer Valuation through Language Interpretation Carnegie Mellon Univ. (Language Technologies Institute) OBJECTIVES •  QA as planning by developing a glass box planning infrastructure •  Universal auditability by developing a detailed set of labeled dependencies that form a traceable network of reasoning steps •  Utility-based information fusion PLAN Address the full Q/A task: •  Question analysis - question typing, interpretation, refinement, clarification •  Information seeking - document retrieval, entity and relation extraction •  Multi-source information fusion - multi-faceted answers, redundancy and contradiction detection Principal Investigator: Eric Nyberg Topic Area: Total System Co-PIs: Jamie Callan, Jaime Carbonell Data Dimension: Multi-Lingual (Text) AQUAINT Kickoff – 3 December 2001 DIA ARDA Contracting Agent: (English, Chinese, Japanese)
  • 74. Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering Columbia Univ. / Univ. of Colorado-Boulder OBJECTIVES •  Use statistical semantic parser to produce a shallow, domain independent semantic representation •  Develop a dialogue interface for carrying on focused dialogue with users •  Adapt algorithms/components to handle spoken questions •  Recognize “atomic” events and then tracking related information •  Integrate summarization and language generation to produce brief, coherent, fluent answers. PLAN Build an integrated system to: •  Answer difficult questions that require interacting with the user to refine context •  Locate conflicting or time-varying answers in heterogeneous text databases •  Present answers that require combining/summarizing information from multiple sources Principal Investigator: Vasileios Hatzivassiloglou, Kathleen Topic Area: Total System McKeown / Daniel Jurafsky, Wayne Ward, Jim Martin Data Dimension: Multi-Media ARDA Contracting Agent: DIA (Text/Voice)