SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
LA-UR-09-02971
Approved for public release;
distribution is unlimited.




                                           Title:     Using Text Analysis to Reduce Information Overload in
                                                      Pandemic Influenza Planning




                                     Author(s):       Linn Marks Collins, Jorge H. Roman, James E. Powell, Mark
                                                      L. B. Martinez, Ketan K. Mane, Xiang Yao, A. Shelly
                                                      Spearing, Miriam E. Blake
                                                      Los Alamos National Laboratory

                                                      Geoffrey Hoare, Rhonda White
                                                      Florida Department of Health


                                 Intended for:        6th International Conference on Information Systems for
                                                      Crisis Response and Management
                                                      Special Session on Solutions for Information Overload
                                                      May 10-13, 2009
                                                      Göteborg, Sweden




Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos National Security, LLC
for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. By acceptance
of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the
published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests
that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National
Laboratory strongly supports academic freedom and a researcher’s right to publish; as an institution, however, the Laboratory does not
endorse the viewpoint of a publication or guarantee its technical correctness.


Form 836 (7/06)
Using Text Analysis to Reduce Information Overload
                                in Pandemic Influenza Planning
                                     James E. Powell, Presenting Author
                                       Los Alamos National Laboratory
            Linn Marks Collins, Jorge H. Roman, Mark L. B. Martinez, Ketan K. Mane,
                        Xiang Yao, A. Shelly Spearing, Miriam E. Blake
                                 Los Alamos National Laboratory
                                 Geoffrey Hoare, Rhonda White
                                  Florida Department of Health
      The proliferation of plans can result in debilitating information overload in public health and medical 
      emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and 
      its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and 
      each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they 
      are in alignment with one another. Human analysis of plans is time‐consuming and difficult, so text analysis 
      software tools are needed that can help humans (a) compare plans to find gaps or discrepancies  and (b) 
      locate relevant sections of plans and display links to them. This research‐in‐progress describes two text 
      analysis tools being developed at the Los Alamos National Laboratory as part of E‐SOS (Emergency Situation 
      Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both 
      tools were used to analyze pan flu plans from the White House, the US Department of Health and Human 
      Services, the Centers for Disease Control, and the 50 states.




                                                                                                                   1
Proceedings of the 6th International ISCRAM Conference – Gothenburg, Sweden, May 2009 J. Landgren and S. Jul, eds. 
2


Problem – Information Overload
   The proliferation of plans can result in debilitating information
   overload in public health and medical emergencies
   Pandemic influenza (pan flu) example
       The US Department of Health and Human Services (HHS) and its Centers for
   •
       Disease Control and Prevention (CDC) have pan flu plans intended to
       coordinate the 50 states’ responses
       Each of the 50 states has its own pan flu plan
   •
       These plans and related implementing materials are often hundreds of pages
   •
       long and get updated frequently
       Plans need to be analyzed, compared, and revised so that they are in
   •
       alignment with one another
       Human analysis of plans is time-consuming and difficult
   •
3


Solution – Text Analysis and Document Linking
      Text analysis software tools are needed that can help humans
      Compare plans to find gaps or discrepancies by:
 1.
          Extracting key concepts or themes from each plan
      •
          Identifying unique and common themes
      •
          Displaying the themes in a table and in a network visualization to facilitate
      •
          comparison
      Locate relevant sections of plans by:
 2.
          Conducting a federated search of indexed plans
      •
          Displaying links to relevant plans
      •
          Executing both tasks while users are writing sections of a plan, based on what
      •
          they are writing
4
Research in Progress at the
Los Alamos National Laboratory, Los Alamos, NM, US

   Two text analysis tools are being developed as part of E-SOS:
   Emergency Situation Overview and Synthesis
       Project began in 2007
   •
       Builds on several years of prior work by team members
   •
   E-SOS employs a number of tools
       Collaborative workspaces
   •
       Semantic web technologies
   •
       Digital library technologies
   •
       Awareness tools
   •
   Two of the awareness tools are being used to analyze pan flu plans
       THEMAT (Theme Awareness Tool)
   •
       CAT (Content Awareness Tool)
   •
5


E-SOS – System Goals

 Collaborative
 workspaces where
 users can report and
 discuss information
 “Awareness tools”
 which display
 information that’s
 relevant to what users
 are currently reporting
 and discussing
 Technologies that
 synthesize information
 from heterogeneous
 sources
6


Texts – Pan Flu Plans
   The White House Implementation Plan for the National Strategy for
   Pandemic Influenza
   US Health and Human Services Federal Guidance to Assist States in
   Improving State-Level Pandemic Influenza Operating Plans
   The Centers for Disease Control and Prevention (CDC) Influenza
   Pandemic Operation Plan (OPLAN)
   Pan flu plans for the states
       In the initial study (December 2008 – February 2009) the Florida Department of
   •
       Health team provided the LANL team with the above pan flu plans as well as
       pan flu plans for six states in the US
       In the subsequent study (March 2009 - present) the LANL team expanded the
   •
       project to include pan flu plans for all 50 states in the US, in keeping with its
       mission as a national laboratory
7


THEMAT - Methodology
   Extract themes, called knowledge signatures (kSigs), from each
   document
   Create sets of kSigs (taxonomies) for each document or set of
   documents
   Compare taxonomies to identify unique and common themes
   Compute the network analytic relationships among themes
   Generate theme network (tNet) visualizations for each document or
   set of documents
8


THEMAT – Results
   A set of knowledge signatures (kSigs) was generated for each
   document
   The common themes in pan flu plans were identified
   The unique themes in pan flu plans were identified
   The common and unique themes were displayed side-by-side in
   columns to facilitate comparison
   The kSigs were highlighted in all documents
   Theme networks (tNets) were created for each plan to display key
   concepts and their relationships
Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human 9
Services, Centers for Disease Control, and six states showing similarities and differences: for
example, “authorities” and “countries” are important in the federal plans but not the state plans
10
Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human
Services, and Centers for Disease Control showing similarities and differences: for example,
“global influenza preparedness” is important in the WH and HHS plans but not in the CDC plan
11
Interactive interface for finding the specific files where a knowledge signature (kSig) is located:
in this case, “global influenza preparedness” can be found in Appendix C of the Health and
Human Services pan flu plan and in Chapter 5 of the US White House pan flu plan
12
Interactive interface for locating highlighted knowledge signatures (kSigs) within a document: in
this case, “global influenza preparedness” in the US White House pan flu plan




Navigation
column
with links
to all of the
pages and
kSigs in a
document
13
Interactive interface visualizing the theme network (tNet) for the US Centers for Disease
Control pan flu plan, showing which themes are related to each other and the importance of
each theme (most important to least important = red, blue, green, yellow)
14
Visualization of the relative number of themes in the plans and the degree of similarity among
and between them: in this case, the CDC plan and the California plan are least similar to
each other and appear on the X-axis; the Arizona plan is a few degrees closer to the California
plan; the White House Plan is a few degrees closer to the CDC plan; and the White House plan
contains more of the core themes than the other plans do
15
THEMAT Analysis of Pan Flu Plans –
Preliminary Conclusions
   Displays of common and unique themes reveal similarities and
   differences in the plans
       Some of the differences reflect differences in the authorities and responsibilities of
   •
       the government agencies that created the plans
       Some of the differences reflect inconsistencies in terminology, which is a potential
   •
       human factors problem
   Displays of knowledge signatures (kSigs) highlighted in the text reflect
   the original document structure
       A common structure for all of the documents and plans would make it easier for
   •
       users to read through the highlighted kSigs and compare plans
   Theme networks (tNets) provide a more compact view of themes than
   lists
       Visualizations are not as familiar as lists, so some users may find them difficult to
   •
       understand
16


CAT – Methodology
                                   CAT Components
 Step 1: Create a repository
 of content at LANL
 Step 2: Configure the CAT to
 include this repository as one
 of the targets of the federated
 search
 Step 3: See what links to
 relevant content the CAT
 retrieves as users compose
 text about pan flu plans
 Step 4: Revise the targets as
 needed
17


CAT – Results
   A repository of content was created at LANL
   The CAT was configured to include this repository as one of the
   targets of the federated search
   The number of links returned depends on the number of targets
       One link per target is returned for each segment of parsed text (aka REST
   •
       query)
       In order to increase the number of links, the number of targets has to be
   •
       increased
   The repository of pan flu documents was subsequently divided into a
   number of smaller targets in order to increase the number of links
   This allows for finer-grained access
18


CAT – Screen Shot




                    Results of a federated search
                    of (1) one local repository of
                    pandemic flu plans and (2) four
                    websites with pandemic flu content
19


CAT Linking of Pan Flu Plans – Preliminary Conclusions
   The resulting output, showing links to relevant texts, reflects the target
   structure
       Creating a collection of smaller, finer-grained targets optimizes the federated
   •
       search capabilities of the CAT
   Under these conditions, the CAT facilitates:
       Locating content that is related to the topic the user is writing about
   •
       Following links to that content
   •
       Creating a list of references for the topic
   •
20


Future Work
   Use another E-SOS tool, the Web Awareness Tool (WEBAT), to
   collect more pan flu content from the web
       Focus on adding pan flu plans from other countries
   •
   Use the THEMAT to analyze the additional content
   Use the CAT to make the additional content a target of a federated
   search
   Bridge the two tools in order to better support planning
   Share the resulting text analysis with the appropriate federal
   agencies in the US
   Share relevant portions of the text analysis with the appropriate state
   agencies in the US and request that the person(s) responsible for
   writing pan flu plans complete a questionnaire providing feedback
21


Questions?
22


CAT – Drupal – Screen Shot

Contenu connexe

Similaire à Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
poster_Baseline_20160518
poster_Baseline_20160518poster_Baseline_20160518
poster_Baseline_20160518
Dahbia Agher
 
Cancer and Work Design v1.01
Cancer and Work Design v1.01Cancer and Work Design v1.01
Cancer and Work Design v1.01
James Repenning
 
Dimensional approaches to psychiatric classification -refining the research a...
Dimensional approaches to psychiatric classification -refining the research a...Dimensional approaches to psychiatric classification -refining the research a...
Dimensional approaches to psychiatric classification -refining the research a...
Chloe Taracatac
 
Principles organization and_operation_of_a_dna_bank
Principles organization and_operation_of_a_dna_bankPrinciples organization and_operation_of_a_dna_bank
Principles organization and_operation_of_a_dna_bank
Espirituanna
 
Salzburg workshop 2014 introduction by Scira Menoni
Salzburg workshop 2014 introduction by Scira MenoniSalzburg workshop 2014 introduction by Scira Menoni
Salzburg workshop 2014 introduction by Scira Menoni
know4drr
 

Similaire à Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning (20)

Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...
Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...
Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...
 
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
 
Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence
 
The Alzheimer's Disease Research Network and the Uniform Data Set
The Alzheimer's Disease Research Network and the Uniform Data SetThe Alzheimer's Disease Research Network and the Uniform Data Set
The Alzheimer's Disease Research Network and the Uniform Data Set
 
Genetics in Healthcare
Genetics in HealthcareGenetics in Healthcare
Genetics in Healthcare
 
poster_Baseline_20160518
poster_Baseline_20160518poster_Baseline_20160518
poster_Baseline_20160518
 
Cancer and Work Design v1.01
Cancer and Work Design v1.01Cancer and Work Design v1.01
Cancer and Work Design v1.01
 
3 Round Stones at the New England Health Datapalooza Oct 3, 2012
3 Round Stones at the New England Health Datapalooza Oct 3, 20123 Round Stones at the New England Health Datapalooza Oct 3, 2012
3 Round Stones at the New England Health Datapalooza Oct 3, 2012
 
Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research  Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research
 
Ethical, legal, social, and policy issues in the use of genomic technology by...
Ethical, legal, social, and policy issues in the use of genomic technology by...Ethical, legal, social, and policy issues in the use of genomic technology by...
Ethical, legal, social, and policy issues in the use of genomic technology by...
 
Voss uds-and-alzheimers-disease-research-network
Voss uds-and-alzheimers-disease-research-networkVoss uds-and-alzheimers-disease-research-network
Voss uds-and-alzheimers-disease-research-network
 
Dimensional approaches to psychiatric classification -refining the research a...
Dimensional approaches to psychiatric classification -refining the research a...Dimensional approaches to psychiatric classification -refining the research a...
Dimensional approaches to psychiatric classification -refining the research a...
 
Quantum Criticism: an Analysis of Political News Reporting
Quantum Criticism: an Analysis of Political News ReportingQuantum Criticism: an Analysis of Political News Reporting
Quantum Criticism: an Analysis of Political News Reporting
 
Principles organization and_operation_of_a_dna_bank
Principles organization and_operation_of_a_dna_bankPrinciples organization and_operation_of_a_dna_bank
Principles organization and_operation_of_a_dna_bank
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
 
Web-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationWeb-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government Information
 
Publishing in Public Health
Publishing in Public HealthPublishing in Public Health
Publishing in Public Health
 
OpenAIREplus Data Survey WriteUp2
OpenAIREplus Data Survey WriteUp2OpenAIREplus Data Survey WriteUp2
OpenAIREplus Data Survey WriteUp2
 
Salzburg workshop 2014 introduction by Scira Menoni
Salzburg workshop 2014 introduction by Scira MenoniSalzburg workshop 2014 introduction by Scira Menoni
Salzburg workshop 2014 introduction by Scira Menoni
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

  • 1. LA-UR-09-02971 Approved for public release; distribution is unlimited. Title: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning Author(s): Linn Marks Collins, Jorge H. Roman, James E. Powell, Mark L. B. Martinez, Ketan K. Mane, Xiang Yao, A. Shelly Spearing, Miriam E. Blake Los Alamos National Laboratory Geoffrey Hoare, Rhonda White Florida Department of Health Intended for: 6th International Conference on Information Systems for Crisis Response and Management Special Session on Solutions for Information Overload May 10-13, 2009 Göteborg, Sweden Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos National Security, LLC for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher’s right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Form 836 (7/06)
  • 2. Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning James E. Powell, Presenting Author Los Alamos National Laboratory Linn Marks Collins, Jorge H. Roman, Mark L. B. Martinez, Ketan K. Mane, Xiang Yao, A. Shelly Spearing, Miriam E. Blake Los Alamos National Laboratory Geoffrey Hoare, Rhonda White Florida Department of Health The proliferation of plans can result in debilitating information overload in public health and medical  emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and  its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and  each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they  are in alignment with one another. Human analysis of plans is time‐consuming and difficult, so text analysis  software tools are needed that can help humans (a) compare plans to find gaps or discrepancies  and (b)  locate relevant sections of plans and display links to them. This research‐in‐progress describes two text  analysis tools being developed at the Los Alamos National Laboratory as part of E‐SOS (Emergency Situation  Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both  tools were used to analyze pan flu plans from the White House, the US Department of Health and Human  Services, the Centers for Disease Control, and the 50 states. 1 Proceedings of the 6th International ISCRAM Conference – Gothenburg, Sweden, May 2009 J. Landgren and S. Jul, eds. 
  • 3. 2 Problem – Information Overload The proliferation of plans can result in debilitating information overload in public health and medical emergencies Pandemic influenza (pan flu) example The US Department of Health and Human Services (HHS) and its Centers for • Disease Control and Prevention (CDC) have pan flu plans intended to coordinate the 50 states’ responses Each of the 50 states has its own pan flu plan • These plans and related implementing materials are often hundreds of pages • long and get updated frequently Plans need to be analyzed, compared, and revised so that they are in • alignment with one another Human analysis of plans is time-consuming and difficult •
  • 4. 3 Solution – Text Analysis and Document Linking Text analysis software tools are needed that can help humans Compare plans to find gaps or discrepancies by: 1. Extracting key concepts or themes from each plan • Identifying unique and common themes • Displaying the themes in a table and in a network visualization to facilitate • comparison Locate relevant sections of plans by: 2. Conducting a federated search of indexed plans • Displaying links to relevant plans • Executing both tasks while users are writing sections of a plan, based on what • they are writing
  • 5. 4 Research in Progress at the Los Alamos National Laboratory, Los Alamos, NM, US Two text analysis tools are being developed as part of E-SOS: Emergency Situation Overview and Synthesis Project began in 2007 • Builds on several years of prior work by team members • E-SOS employs a number of tools Collaborative workspaces • Semantic web technologies • Digital library technologies • Awareness tools • Two of the awareness tools are being used to analyze pan flu plans THEMAT (Theme Awareness Tool) • CAT (Content Awareness Tool) •
  • 6. 5 E-SOS – System Goals Collaborative workspaces where users can report and discuss information “Awareness tools” which display information that’s relevant to what users are currently reporting and discussing Technologies that synthesize information from heterogeneous sources
  • 7. 6 Texts – Pan Flu Plans The White House Implementation Plan for the National Strategy for Pandemic Influenza US Health and Human Services Federal Guidance to Assist States in Improving State-Level Pandemic Influenza Operating Plans The Centers for Disease Control and Prevention (CDC) Influenza Pandemic Operation Plan (OPLAN) Pan flu plans for the states In the initial study (December 2008 – February 2009) the Florida Department of • Health team provided the LANL team with the above pan flu plans as well as pan flu plans for six states in the US In the subsequent study (March 2009 - present) the LANL team expanded the • project to include pan flu plans for all 50 states in the US, in keeping with its mission as a national laboratory
  • 8. 7 THEMAT - Methodology Extract themes, called knowledge signatures (kSigs), from each document Create sets of kSigs (taxonomies) for each document or set of documents Compare taxonomies to identify unique and common themes Compute the network analytic relationships among themes Generate theme network (tNet) visualizations for each document or set of documents
  • 9. 8 THEMAT – Results A set of knowledge signatures (kSigs) was generated for each document The common themes in pan flu plans were identified The unique themes in pan flu plans were identified The common and unique themes were displayed side-by-side in columns to facilitate comparison The kSigs were highlighted in all documents Theme networks (tNets) were created for each plan to display key concepts and their relationships
  • 10. Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human 9 Services, Centers for Disease Control, and six states showing similarities and differences: for example, “authorities” and “countries” are important in the federal plans but not the state plans
  • 11. 10 Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human Services, and Centers for Disease Control showing similarities and differences: for example, “global influenza preparedness” is important in the WH and HHS plans but not in the CDC plan
  • 12. 11 Interactive interface for finding the specific files where a knowledge signature (kSig) is located: in this case, “global influenza preparedness” can be found in Appendix C of the Health and Human Services pan flu plan and in Chapter 5 of the US White House pan flu plan
  • 13. 12 Interactive interface for locating highlighted knowledge signatures (kSigs) within a document: in this case, “global influenza preparedness” in the US White House pan flu plan Navigation column with links to all of the pages and kSigs in a document
  • 14. 13 Interactive interface visualizing the theme network (tNet) for the US Centers for Disease Control pan flu plan, showing which themes are related to each other and the importance of each theme (most important to least important = red, blue, green, yellow)
  • 15. 14 Visualization of the relative number of themes in the plans and the degree of similarity among and between them: in this case, the CDC plan and the California plan are least similar to each other and appear on the X-axis; the Arizona plan is a few degrees closer to the California plan; the White House Plan is a few degrees closer to the CDC plan; and the White House plan contains more of the core themes than the other plans do
  • 16. 15 THEMAT Analysis of Pan Flu Plans – Preliminary Conclusions Displays of common and unique themes reveal similarities and differences in the plans Some of the differences reflect differences in the authorities and responsibilities of • the government agencies that created the plans Some of the differences reflect inconsistencies in terminology, which is a potential • human factors problem Displays of knowledge signatures (kSigs) highlighted in the text reflect the original document structure A common structure for all of the documents and plans would make it easier for • users to read through the highlighted kSigs and compare plans Theme networks (tNets) provide a more compact view of themes than lists Visualizations are not as familiar as lists, so some users may find them difficult to • understand
  • 17. 16 CAT – Methodology CAT Components Step 1: Create a repository of content at LANL Step 2: Configure the CAT to include this repository as one of the targets of the federated search Step 3: See what links to relevant content the CAT retrieves as users compose text about pan flu plans Step 4: Revise the targets as needed
  • 18. 17 CAT – Results A repository of content was created at LANL The CAT was configured to include this repository as one of the targets of the federated search The number of links returned depends on the number of targets One link per target is returned for each segment of parsed text (aka REST • query) In order to increase the number of links, the number of targets has to be • increased The repository of pan flu documents was subsequently divided into a number of smaller targets in order to increase the number of links This allows for finer-grained access
  • 19. 18 CAT – Screen Shot Results of a federated search of (1) one local repository of pandemic flu plans and (2) four websites with pandemic flu content
  • 20. 19 CAT Linking of Pan Flu Plans – Preliminary Conclusions The resulting output, showing links to relevant texts, reflects the target structure Creating a collection of smaller, finer-grained targets optimizes the federated • search capabilities of the CAT Under these conditions, the CAT facilitates: Locating content that is related to the topic the user is writing about • Following links to that content • Creating a list of references for the topic •
  • 21. 20 Future Work Use another E-SOS tool, the Web Awareness Tool (WEBAT), to collect more pan flu content from the web Focus on adding pan flu plans from other countries • Use the THEMAT to analyze the additional content Use the CAT to make the additional content a target of a federated search Bridge the two tools in order to better support planning Share the resulting text analysis with the appropriate federal agencies in the US Share relevant portions of the text analysis with the appropriate state agencies in the US and request that the person(s) responsible for writing pan flu plans complete a questionnaire providing feedback
  • 23. 22 CAT – Drupal – Screen Shot