2. ✤ How do we improve disease surveillance?
✤ Can social media (e.g. twitter) be effectively
used to monitor disease outbreaks?
3. Tweets: disease reports
✤
Omg.. The never-ending flu+sore throat.. ☹ bleh.. ☹
✤ Stomach flu. Urgh.
✤ i love puking... f@#k you flu
✤ Having a sore throat,sucks.Having flu,sucks even
MORE.DAMMIT!
✤ Feeling dizzy/ feverish ever since that class at the gym!
overexertion or the flu??
4. Tweets: non disease reports
✤ Study finds H1N1 flu in pregnancy is critical
risk - Reuters - http://bit.ly/bLiLnz
✤ This March Madness turns out to be the flu!
✤ Smiling is infectious, You can catch it like the
flu. Someone smiled at me today, And I
started smiling too.
5. We need Natural Language
Processing (NLP)
✤ We need a NLP engine in order to process
tweets:
✤ Tweet → NLP Engine → It's the flu!
6. Maybe we need NLP + Ontologies
✤ Do we just search for simple keywords?
✤ An ontology can provide us with organized
concepts relevant to a domain (i.e. health,
biomedicine)
✤ How about processing natural language to match
concepts organized in an ontology?
7. Ontologies help answer these
questions
✤ How do we know if a user is referring to a
symptom or a disease?
✤ We seem to need a set of keywords. Where do get
this set of symptoms and disease names?
✤ How do we link references to one or more
symptom to a specific disease?
8. The UMLS Ontology
✤ A comprehensive thesaurus and ontology of
biomedical concepts
✤ Facilitates development of computer systems that
behave as if they "understand" the meaning of the
language of biomedicine and health.
✤ Integrates 2+ million names for ~900k concepts
from 60+ families of biomedical vocabularies, and
12 million relations among these concepts.
9. UMLS & MetaMap
✤ MetaMap is a tool that given an arbitrary
piece of text, finds and returns the relevant
concepts available in the UMLS Ontology
✤ MetaMap is a software interface to query
the “MetaThesaurus” and the “Semantic
Network”, both a component of UMLS
10. Concept mapping with MetaMap
✤ Using MetaMap to query the
MetaThesaurus, we can map the following
text strings to the concept "Atrial
Fibrillation"
✤ Atrial fibrillation!
✤ AF!
✤ AFib!
✤ Atrial fibrillation (disorder)
11. ✤ But who actually tweets “atrial
fibrillation” ??
12. “Having a sore throat, sucks.
Having flu, sucks even MORE”
✤ Matches:
✤ SORETHROAT (Sore Throat) [Sign or
Symptom]
✤ Flu (Influenza) [Disease or Syndrome]
✤ Sucking [Physiologic Function]
13. “i love puking... damn you flu”
✤ Matches:
✤ I (Iodides) [Inorganic Chemical]
✤ Love [Mental Process]
✤ Flu (Influenza) [Disease or Syndrome]
14. “Feeling dizzy/ feverish ever since that class at
the gym! overexertion or the flu??”
✤ Matches:
✤ Feeling dizzy [Sign or Symptom]
✤ Feverish (Fever) [Finding]
✤ Overexertion (Exhaustion due to excessive
exertion) [Injury or Poisoning]
✤ Flu (Influenza) [Disease or Syndrome]
15. “Smiling is infectious, u can catch it like the
flu; someone smiled at me today, and I started
smiling too”
✤ Matches:
✤ Smiling [Social Behavior]
✤ Infection [Disease or Syndrome]
✤ Catch (Catch - Finding of sensory dimension of pain)
[Sign or Symptom]
✤ Flu (Influenza) [Disease or Syndrome]
✤ Today [Temporal Concept]
17. Using MetaMap
✤ Free of Charge!
✤ MetaMap Transfer (MMTx) is a java-based distributable
version of the MetaMap program
✤ Requires 7GB disk space (uncompressed) and at least 1GB
of RAM (2GB recommended)
✤ “MetaMap is not an end user product. Users will need a
moderate amount of programming knowledge to use
MMTx effectively.” - from UMLS website
18. We identified tweets that mention
a concept...SO WHAT?
✤ We can't assume its a case report!
✤ How the we go around this?
✤ Are we done here?
19. Supervised learning to improve
the results?
✤ What if we use machine learning?
✤ Supervised learning is a machine learning
technique for deducing a function from
training data
20. Is it feasible?
✤ Weka is a collection of machine learning algorithms for data
mining tasks.
✤ Algorithms can be applied directly to a dataset or called from
your own Java code.
✤ Input: dataset of concept matches; Output: Classifier Java
Class
✤ This automatically generated java class can be easily be used
to answer if a tweet matching X and Y medical concepts is or is
not a disease report
21. Processing a tweet overview
✤ Get Tweet
✤ Process tweet using MetaMap
✤ Get matching concepts from MetaMap
✤ Feed the matches to the Classifier Java Class
✤ Get a True or False answer indicator “it's a disease
report”