The proliferation of plans can result in debilitating information overload in public health and medical emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and
its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they are in alignment with one another. Human analysis of plans is time-consuming and difficult, so text analysis
software tools are needed that can help humans (a) compare plans to find gaps or discrepancies and (b) locate relevant sections of plans and display links to them. This research-in-progress describes two text
analysis tools being developed at the Los Alamos National Laboratory as part of E-SOS (Emergency Situation Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both tools were used to analyze pan flu plans from the White House, the US Department of Health and Human
Services, the Centers for Disease Control, and the 50 states.
Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning
1. LA-UR-09-02971
Approved for public release;
distribution is unlimited.
Title: Using Text Analysis to Reduce Information Overload in
Pandemic Influenza Planning
Author(s): Linn Marks Collins, Jorge H. Roman, James E. Powell, Mark
L. B. Martinez, Ketan K. Mane, Xiang Yao, A. Shelly
Spearing, Miriam E. Blake
Los Alamos National Laboratory
Geoffrey Hoare, Rhonda White
Florida Department of Health
Intended for: 6th International Conference on Information Systems for
Crisis Response and Management
Special Session on Solutions for Information Overload
May 10-13, 2009
Göteborg, Sweden
Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos National Security, LLC
for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. By acceptance
of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the
published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests
that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National
Laboratory strongly supports academic freedom and a researcher’s right to publish; as an institution, however, the Laboratory does not
endorse the viewpoint of a publication or guarantee its technical correctness.
Form 836 (7/06)
2. Using Text Analysis to Reduce Information Overload
in Pandemic Influenza Planning
James E. Powell, Presenting Author
Los Alamos National Laboratory
Linn Marks Collins, Jorge H. Roman, Mark L. B. Martinez, Ketan K. Mane,
Xiang Yao, A. Shelly Spearing, Miriam E. Blake
Los Alamos National Laboratory
Geoffrey Hoare, Rhonda White
Florida Department of Health
The proliferation of plans can result in debilitating information overload in public health and medical
emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and
its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and
each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they
are in alignment with one another. Human analysis of plans is time‐consuming and difficult, so text analysis
software tools are needed that can help humans (a) compare plans to find gaps or discrepancies and (b)
locate relevant sections of plans and display links to them. This research‐in‐progress describes two text
analysis tools being developed at the Los Alamos National Laboratory as part of E‐SOS (Emergency Situation
Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both
tools were used to analyze pan flu plans from the White House, the US Department of Health and Human
Services, the Centers for Disease Control, and the 50 states.
1
Proceedings of the 6th International ISCRAM Conference – Gothenburg, Sweden, May 2009 J. Landgren and S. Jul, eds.
3. 2
Problem – Information Overload
The proliferation of plans can result in debilitating information
overload in public health and medical emergencies
Pandemic influenza (pan flu) example
The US Department of Health and Human Services (HHS) and its Centers for
•
Disease Control and Prevention (CDC) have pan flu plans intended to
coordinate the 50 states’ responses
Each of the 50 states has its own pan flu plan
•
These plans and related implementing materials are often hundreds of pages
•
long and get updated frequently
Plans need to be analyzed, compared, and revised so that they are in
•
alignment with one another
Human analysis of plans is time-consuming and difficult
•
4. 3
Solution – Text Analysis and Document Linking
Text analysis software tools are needed that can help humans
Compare plans to find gaps or discrepancies by:
1.
Extracting key concepts or themes from each plan
•
Identifying unique and common themes
•
Displaying the themes in a table and in a network visualization to facilitate
•
comparison
Locate relevant sections of plans by:
2.
Conducting a federated search of indexed plans
•
Displaying links to relevant plans
•
Executing both tasks while users are writing sections of a plan, based on what
•
they are writing
5. 4
Research in Progress at the
Los Alamos National Laboratory, Los Alamos, NM, US
Two text analysis tools are being developed as part of E-SOS:
Emergency Situation Overview and Synthesis
Project began in 2007
•
Builds on several years of prior work by team members
•
E-SOS employs a number of tools
Collaborative workspaces
•
Semantic web technologies
•
Digital library technologies
•
Awareness tools
•
Two of the awareness tools are being used to analyze pan flu plans
THEMAT (Theme Awareness Tool)
•
CAT (Content Awareness Tool)
•
6. 5
E-SOS – System Goals
Collaborative
workspaces where
users can report and
discuss information
“Awareness tools”
which display
information that’s
relevant to what users
are currently reporting
and discussing
Technologies that
synthesize information
from heterogeneous
sources
7. 6
Texts – Pan Flu Plans
The White House Implementation Plan for the National Strategy for
Pandemic Influenza
US Health and Human Services Federal Guidance to Assist States in
Improving State-Level Pandemic Influenza Operating Plans
The Centers for Disease Control and Prevention (CDC) Influenza
Pandemic Operation Plan (OPLAN)
Pan flu plans for the states
In the initial study (December 2008 – February 2009) the Florida Department of
•
Health team provided the LANL team with the above pan flu plans as well as
pan flu plans for six states in the US
In the subsequent study (March 2009 - present) the LANL team expanded the
•
project to include pan flu plans for all 50 states in the US, in keeping with its
mission as a national laboratory
8. 7
THEMAT - Methodology
Extract themes, called knowledge signatures (kSigs), from each
document
Create sets of kSigs (taxonomies) for each document or set of
documents
Compare taxonomies to identify unique and common themes
Compute the network analytic relationships among themes
Generate theme network (tNet) visualizations for each document or
set of documents
9. 8
THEMAT – Results
A set of knowledge signatures (kSigs) was generated for each
document
The common themes in pan flu plans were identified
The unique themes in pan flu plans were identified
The common and unique themes were displayed side-by-side in
columns to facilitate comparison
The kSigs were highlighted in all documents
Theme networks (tNets) were created for each plan to display key
concepts and their relationships
10. Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human 9
Services, Centers for Disease Control, and six states showing similarities and differences: for
example, “authorities” and “countries” are important in the federal plans but not the state plans
11. 10
Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human
Services, and Centers for Disease Control showing similarities and differences: for example,
“global influenza preparedness” is important in the WH and HHS plans but not in the CDC plan
12. 11
Interactive interface for finding the specific files where a knowledge signature (kSig) is located:
in this case, “global influenza preparedness” can be found in Appendix C of the Health and
Human Services pan flu plan and in Chapter 5 of the US White House pan flu plan
13. 12
Interactive interface for locating highlighted knowledge signatures (kSigs) within a document: in
this case, “global influenza preparedness” in the US White House pan flu plan
Navigation
column
with links
to all of the
pages and
kSigs in a
document
14. 13
Interactive interface visualizing the theme network (tNet) for the US Centers for Disease
Control pan flu plan, showing which themes are related to each other and the importance of
each theme (most important to least important = red, blue, green, yellow)
15. 14
Visualization of the relative number of themes in the plans and the degree of similarity among
and between them: in this case, the CDC plan and the California plan are least similar to
each other and appear on the X-axis; the Arizona plan is a few degrees closer to the California
plan; the White House Plan is a few degrees closer to the CDC plan; and the White House plan
contains more of the core themes than the other plans do
16. 15
THEMAT Analysis of Pan Flu Plans –
Preliminary Conclusions
Displays of common and unique themes reveal similarities and
differences in the plans
Some of the differences reflect differences in the authorities and responsibilities of
•
the government agencies that created the plans
Some of the differences reflect inconsistencies in terminology, which is a potential
•
human factors problem
Displays of knowledge signatures (kSigs) highlighted in the text reflect
the original document structure
A common structure for all of the documents and plans would make it easier for
•
users to read through the highlighted kSigs and compare plans
Theme networks (tNets) provide a more compact view of themes than
lists
Visualizations are not as familiar as lists, so some users may find them difficult to
•
understand
17. 16
CAT – Methodology
CAT Components
Step 1: Create a repository
of content at LANL
Step 2: Configure the CAT to
include this repository as one
of the targets of the federated
search
Step 3: See what links to
relevant content the CAT
retrieves as users compose
text about pan flu plans
Step 4: Revise the targets as
needed
18. 17
CAT – Results
A repository of content was created at LANL
The CAT was configured to include this repository as one of the
targets of the federated search
The number of links returned depends on the number of targets
One link per target is returned for each segment of parsed text (aka REST
•
query)
In order to increase the number of links, the number of targets has to be
•
increased
The repository of pan flu documents was subsequently divided into a
number of smaller targets in order to increase the number of links
This allows for finer-grained access
19. 18
CAT – Screen Shot
Results of a federated search
of (1) one local repository of
pandemic flu plans and (2) four
websites with pandemic flu content
20. 19
CAT Linking of Pan Flu Plans – Preliminary Conclusions
The resulting output, showing links to relevant texts, reflects the target
structure
Creating a collection of smaller, finer-grained targets optimizes the federated
•
search capabilities of the CAT
Under these conditions, the CAT facilitates:
Locating content that is related to the topic the user is writing about
•
Following links to that content
•
Creating a list of references for the topic
•
21. 20
Future Work
Use another E-SOS tool, the Web Awareness Tool (WEBAT), to
collect more pan flu content from the web
Focus on adding pan flu plans from other countries
•
Use the THEMAT to analyze the additional content
Use the CAT to make the additional content a target of a federated
search
Bridge the two tools in order to better support planning
Share the resulting text analysis with the appropriate federal
agencies in the US
Share relevant portions of the text analysis with the appropriate state
agencies in the US and request that the person(s) responsible for
writing pan flu plans complete a questionnaire providing feedback