SlideShare une entreprise Scribd logo
1  sur  19
Text mining for health
knowledgediscovery
Suzan Verberne | Emerging technologies in medicine (ETIM) 2022 | June 2022
Credits to Anne Dirkson
Collaboration with Hans Gelderblom, Gerard van Oortmerssen and Wessel Kraaij
Patientshave
experiential
knowledge
Suzan Verberne 2022
Patient discussion
forumsareagold
mine
 Emotional support
 Patient journey
 Medical information, treatment
 Therapy adherence
 Discover side effects (adverse drug events)
 Discover coping strategies for side effects
 Quality of life, psycho-social problems
Suzan Verberne 2022
Real-world evidencefrom patient
forums cancomplementmedical
perspectives
Our case: Gastrointestinal Stromal Tumor (GIST)
Suzan Verberne 2022
GISTforumdata
 GIST Support International on Facebook
 The forum data was donated to us by the organization
 Data from 24 Oct 2009 until 1 Nov 2020
 124,103 posts in 14,631 threads
 Median post length: 20 words
Suzan Verberne 2022
Post-market
drugmonitoring
Monitoring of side effects
suffers from under-reporting
Real world evidence from
patients is necessary
Can be found on patient
forum discussions
Suzan Verberne 2022
Medicalentity
extraction
 Entity extraction is a machine learning
task based on sequence labelling
 Word order matters
 One entity can span multiple words
 There are multiple ways to refer to the
same concept
 The extracted entities need to be linked
to a standard form (in an ontology)
I have a
throbbing
pain in my
head
“my head is
bursting”
headache
Arjun Magge, Elena Tutubalina, Zulfat Miftahutdinov, Ilseyar Alimova, Anne Dirkson, Suzan Verberne, Davy Weissenbacher, Graciela
Gonzalez-Hernandez, DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event
mentions on Twitter, Journal of the American Medical Informatics Association (JAMIA), Volume 28, Issue 10, October 2021, pp 2184–2192
Entityextraction
 Label spans (word sequences) in the text that refer to
entities
 Store the data with a label for each word
 Label ‘B’: this word is the beginning of an entity
 Label ‘I’: this word is inside an entity
 Label ‘O’: this word is not in an entity
Pickle juice reduces
my muscle cramps
Suzan Verberne 2022
Entityextraction
 Train a classifier on predicting the labels while taking
the context into account (‘sequence labelling’)
 Hidden Markov Models, Conditional Random Fields
 Recurrent Neural Networks, Bi-LSTMs
 Bidirectional Encoder Representations from
Transformers (BERT)
Suzan Verberne 2022
Since I started on Gleevec,
I can’t fall asleep at all.
Imatinib
Insomnia
BIOSYN
BERT
models
trained
125.161 messages
Anne Dirkson, Suzan Verberne, Wessel Kraaij, Gerard van Oortmerssen, Hans Gelderblom (2022). Automated gathering of real-world
evidence from online patient forums can complement pharmacovigilance for rare cancers. To appear in Nature Scientific Reports.
4,195 messages (527 discussions)
mannually annotated
SNOMED
Evaluation
Recall 74%
Precision 70%
F1 0.72
Human pairwise F1 0.80
 Entity extraction
Accuracy@1 65%
Accuracy@5 79%
 Entity-ontology linking (SNOMED)
Suzan Verberne 2022
Model applied to
thewholeGIST
forum
Treatment type Drug # of ADE found
First-line Imatinib 13,376
Second-line Sunitinib 2,335
Third-line Regorafenib 319
Fourth-line Ripretinib 319
PDGFRA exon 18 mutations Avapritinib 297
Off-label Nilotinib 59
Off-label Pazopanib 51
Off-label Sorafenib 47
Off-label Ponatinib 17
Unknown 2,948
Total 21,051
Suzan Verberne 2022
Comparison with
surveydata for
imatinib
Rank in survey Rank in forum
1. Fatigue 1. Fatigue
2. Muscle aches, pains or cramps 2. Nausea
3. Swelling of face or around the eyes 3. Cramp
4. Problems remembering things 4. Disorder of skin
Aches or pains in joints 5. Oedema
6. Skin problems 6. Pain
Diarrhoea 7. Alopecia
8. Feeling weak 8. Altered bowel function
9. Indigestion or heart burn 9. Pain in limb
10. Swelling in any part of body (Oedema) 10. Facial swelling
Dide den Hollander, Anne R. Dirkson, Suzan Verberne, Wessel Kraaij,…, Olga Husson (2022). Symptoms reported by gastrointestinal
stromal tumour (GIST) patients on imatinib treatment: combining questionnaire and forum data. In Supportive Care in Cancer.
Copingwith
sideeffects
No systematic collection
Patients share exeriences
with each other
Doctors can figure out why
it works & if it is safe
Suzan Verberne 2022
Extracting
coping
strategies
New task. Challenges:
1. No data
2. No ontology
3. Complex and varied entities
4. Cross-document relations to ADE
Anne Dirkson, Suzan Verberne, Gerard van Oortmerssen, Hans Gelderblom,Wessel Kraaij. How do others cope? Extracting coping
strategies for adverse drug events from social media. In revision for Journal of Biomedical Informatics (JBI).
Ontologylinking
 Build a basic ontology
 Manual annotation (link text spans to ontology)
 Update ontology based on gaps found
 Discuss discrepancies and come to agreement
Suzan Verberne 2022
Hierarchical
evaluation
Hierarchy level F1 Recall Precision
Baseline 0.220 31% 17%
One level higher 0.498 35% 86%
Top category 0.556 39% 95%
Suzan Verberne 2022
Results:coping
withnausea
Suzan Verberne 2022
Conclusions
 Patient forums are valuable resources for experiential
knowledge
 Extraction of side effects from patient forums leads to
new information compared to clinical trials and survey
data
 Extraction of coping strategies is challenging but can
also lead to new insights and medical follow-up
research
https://dashboard-gist-adr.herokuapp.com
https://github.com/AnneDirkson
https://twitter.com/suzan

Contenu connexe

Similaire à Text mining for health knowledge discovery

E-book Dissertatie Coen Itz
E-book Dissertatie Coen ItzE-book Dissertatie Coen Itz
E-book Dissertatie Coen Itz
Coen Itz
 
Genetic counselling
Genetic counsellingGenetic counselling
Genetic counselling
sindhujojo
 
Pub. annotated bibliography 1
Pub. annotated bibliography 1Pub. annotated bibliography 1
Pub. annotated bibliography 1
maldjuan
 
D2.1 relevant diseases and conditions
D2.1 relevant diseases and conditionsD2.1 relevant diseases and conditions
D2.1 relevant diseases and conditions
PREVE group
 
SPK - Genetic OWE Graded
SPK - Genetic OWE GradedSPK - Genetic OWE Graded
SPK - Genetic OWE Graded
May Clyne
 

Similaire à Text mining for health knowledge discovery (14)

Pattern diagnostics 2015
Pattern diagnostics 2015Pattern diagnostics 2015
Pattern diagnostics 2015
 
Pattern diagnostics 2015
Pattern diagnostics 2015Pattern diagnostics 2015
Pattern diagnostics 2015
 
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
 
Evidence Based Medicine
Evidence Based MedicineEvidence Based Medicine
Evidence Based Medicine
 
E-book Dissertatie Coen Itz
E-book Dissertatie Coen ItzE-book Dissertatie Coen Itz
E-book Dissertatie Coen Itz
 
Genetic counselling
Genetic counsellingGenetic counselling
Genetic counselling
 
A Case for Using Sensor Technology to Monitor Disruptive Behavior of Persons ...
A Case for Using Sensor Technology to Monitor Disruptive Behavior of Persons ...A Case for Using Sensor Technology to Monitor Disruptive Behavior of Persons ...
A Case for Using Sensor Technology to Monitor Disruptive Behavior of Persons ...
 
Gpt buchman
Gpt buchmanGpt buchman
Gpt buchman
 
Genomics in Public Health
Genomics in Public HealthGenomics in Public Health
Genomics in Public Health
 
Pub. annotated bibliography 1
Pub. annotated bibliography 1Pub. annotated bibliography 1
Pub. annotated bibliography 1
 
D2.1 relevant diseases and conditions
D2.1 relevant diseases and conditionsD2.1 relevant diseases and conditions
D2.1 relevant diseases and conditions
 
Gpt buchman
Gpt buchmanGpt buchman
Gpt buchman
 
A. Delgado - Successes and pitfalls in applying model results to policy
A. Delgado - Successes and pitfalls in applying model results to policyA. Delgado - Successes and pitfalls in applying model results to policy
A. Delgado - Successes and pitfalls in applying model results to policy
 
SPK - Genetic OWE Graded
SPK - Genetic OWE GradedSPK - Genetic OWE Graded
SPK - Genetic OWE Graded
 

Plus de Leiden University

Plus de Leiden University (14)

‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals
 
kanker.nl & Data Science
kanker.nl & Data Sciencekanker.nl & Data Science
kanker.nl & Data Science
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Computationeel denken
Computationeel denkenComputationeel denken
Computationeel denken
 
Summarizing discussion threads
Summarizing discussion threadsSummarizing discussion threads
Summarizing discussion threads
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
 
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
 
RemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt ResearchRemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt Research
 
Collecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in contextCollecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in context
 
Search engines for the humanities that go beyond Google
Search engines for the humanities that go beyond GoogleSearch engines for the humanities that go beyond Google
Search engines for the humanities that go beyond Google
 
Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?
 

Dernier

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 

Dernier (20)

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 

Text mining for health knowledge discovery

  • 1. Text mining for health knowledgediscovery Suzan Verberne | Emerging technologies in medicine (ETIM) 2022 | June 2022 Credits to Anne Dirkson Collaboration with Hans Gelderblom, Gerard van Oortmerssen and Wessel Kraaij
  • 3. Patient discussion forumsareagold mine  Emotional support  Patient journey  Medical information, treatment  Therapy adherence  Discover side effects (adverse drug events)  Discover coping strategies for side effects  Quality of life, psycho-social problems Suzan Verberne 2022
  • 4. Real-world evidencefrom patient forums cancomplementmedical perspectives Our case: Gastrointestinal Stromal Tumor (GIST) Suzan Verberne 2022
  • 5. GISTforumdata  GIST Support International on Facebook  The forum data was donated to us by the organization  Data from 24 Oct 2009 until 1 Nov 2020  124,103 posts in 14,631 threads  Median post length: 20 words Suzan Verberne 2022
  • 6. Post-market drugmonitoring Monitoring of side effects suffers from under-reporting Real world evidence from patients is necessary Can be found on patient forum discussions Suzan Verberne 2022
  • 7. Medicalentity extraction  Entity extraction is a machine learning task based on sequence labelling  Word order matters  One entity can span multiple words  There are multiple ways to refer to the same concept  The extracted entities need to be linked to a standard form (in an ontology) I have a throbbing pain in my head “my head is bursting” headache Arjun Magge, Elena Tutubalina, Zulfat Miftahutdinov, Ilseyar Alimova, Anne Dirkson, Suzan Verberne, Davy Weissenbacher, Graciela Gonzalez-Hernandez, DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter, Journal of the American Medical Informatics Association (JAMIA), Volume 28, Issue 10, October 2021, pp 2184–2192
  • 8. Entityextraction  Label spans (word sequences) in the text that refer to entities  Store the data with a label for each word  Label ‘B’: this word is the beginning of an entity  Label ‘I’: this word is inside an entity  Label ‘O’: this word is not in an entity Pickle juice reduces my muscle cramps Suzan Verberne 2022
  • 9. Entityextraction  Train a classifier on predicting the labels while taking the context into account (‘sequence labelling’)  Hidden Markov Models, Conditional Random Fields  Recurrent Neural Networks, Bi-LSTMs  Bidirectional Encoder Representations from Transformers (BERT) Suzan Verberne 2022
  • 10. Since I started on Gleevec, I can’t fall asleep at all. Imatinib Insomnia BIOSYN BERT models trained 125.161 messages Anne Dirkson, Suzan Verberne, Wessel Kraaij, Gerard van Oortmerssen, Hans Gelderblom (2022). Automated gathering of real-world evidence from online patient forums can complement pharmacovigilance for rare cancers. To appear in Nature Scientific Reports. 4,195 messages (527 discussions) mannually annotated SNOMED
  • 11. Evaluation Recall 74% Precision 70% F1 0.72 Human pairwise F1 0.80  Entity extraction Accuracy@1 65% Accuracy@5 79%  Entity-ontology linking (SNOMED) Suzan Verberne 2022
  • 12. Model applied to thewholeGIST forum Treatment type Drug # of ADE found First-line Imatinib 13,376 Second-line Sunitinib 2,335 Third-line Regorafenib 319 Fourth-line Ripretinib 319 PDGFRA exon 18 mutations Avapritinib 297 Off-label Nilotinib 59 Off-label Pazopanib 51 Off-label Sorafenib 47 Off-label Ponatinib 17 Unknown 2,948 Total 21,051 Suzan Verberne 2022
  • 13. Comparison with surveydata for imatinib Rank in survey Rank in forum 1. Fatigue 1. Fatigue 2. Muscle aches, pains or cramps 2. Nausea 3. Swelling of face or around the eyes 3. Cramp 4. Problems remembering things 4. Disorder of skin Aches or pains in joints 5. Oedema 6. Skin problems 6. Pain Diarrhoea 7. Alopecia 8. Feeling weak 8. Altered bowel function 9. Indigestion or heart burn 9. Pain in limb 10. Swelling in any part of body (Oedema) 10. Facial swelling Dide den Hollander, Anne R. Dirkson, Suzan Verberne, Wessel Kraaij,…, Olga Husson (2022). Symptoms reported by gastrointestinal stromal tumour (GIST) patients on imatinib treatment: combining questionnaire and forum data. In Supportive Care in Cancer.
  • 14. Copingwith sideeffects No systematic collection Patients share exeriences with each other Doctors can figure out why it works & if it is safe Suzan Verberne 2022
  • 15. Extracting coping strategies New task. Challenges: 1. No data 2. No ontology 3. Complex and varied entities 4. Cross-document relations to ADE Anne Dirkson, Suzan Verberne, Gerard van Oortmerssen, Hans Gelderblom,Wessel Kraaij. How do others cope? Extracting coping strategies for adverse drug events from social media. In revision for Journal of Biomedical Informatics (JBI).
  • 16. Ontologylinking  Build a basic ontology  Manual annotation (link text spans to ontology)  Update ontology based on gaps found  Discuss discrepancies and come to agreement Suzan Verberne 2022
  • 17. Hierarchical evaluation Hierarchy level F1 Recall Precision Baseline 0.220 31% 17% One level higher 0.498 35% 86% Top category 0.556 39% 95% Suzan Verberne 2022
  • 19. Conclusions  Patient forums are valuable resources for experiential knowledge  Extraction of side effects from patient forums leads to new information compared to clinical trials and survey data  Extraction of coping strategies is challenging but can also lead to new insights and medical follow-up research https://dashboard-gist-adr.herokuapp.com https://github.com/AnneDirkson https://twitter.com/suzan

Notes de l'éditeur

  1. Motivational example: pure chocolate
  2. We use heuristic rules to determine which drug is linked to each ADE. Whenever possible, we select the drug mentioned prior to the ADE in the message. If there is none, we select the drug mentioned after the ADE in the message. If there are no drugs mentioned in the message, we select the first drug mentioned in the conversational thread prior to the message. These rules were determined based on further manual annotation of our annotated subset by the first author. In some cases, it was not clear which drug the patient believed was causing the ADE and these cases were excluded. In this data set, our rules were 93% accurate if we restricted the possible drugs for linking to a predetermined list BIOSYN: Our data does not contain annotations for normalization (i.e., the relevant concept IDs for each ADE mention). We rely on three publicly available data sets for training our normalization model: CADEC [Karimi et al., 2015], PsyTAR [Zolnoori et al., 2019] and the Clinical Findings subset of the COMETA corpus [Basaldella et al., 2020]. (acc@1: 0.645; acc@5@ 0.791)
  3. 65%: accurate labelling of ADEs when tested on a different data set than those on which the model was trained 79% (14% in the top-5). Manual inspection: for 53 of 100 randomly selected cases, we would consider the first label to be correct or even better than the gold label. E.g. mention “severe abdominal pain” has the gold label “painful” (i.e., the label given by humans, which was in the top-5). Top-1 label: “abdominal pain” 100 random cases in which the correct label was not included in the top 5: 36 of those we would consider the top-1 predicted label as correct. Thus, the performance in the table is an underestimation.
  4. Our model can find novel entities: 40.2% of the true positives were not present in the training data.
  5. Forum data contain new side effects Can keep questionnaires up to date
  6. Not only side effects can be gathered from fora. We want to go one step further and also look at the advice that is given for dealing with these side effects. This is unique to forums, and not routinely gathered.. This would be a new task.
  7. We chose SNOMED-CT, NCIT and RxNORM as our source ontologies in line with the OHDSI project The inter-annotator agreement was substantial (mean Kappa= 0.706) on the token level
  8. Strawberry – raspberry – milk example
  9. Manual examination of underlying messages reveals that eating and drinking different forms of ginger is recommended, as well as drinking herbal tea (both ginger and peppermint). Patients also recommend taking anti-nausea medication ondansetron and splitting the dosage (`split dosage'). The other categories which relate to how you consume medication (e.g., `half to one hour before food') do relate to this broader topic, but the specific labels are incorrect. Amongst others, patient recommend to avoid taking medication on an empty stomach and to take it after dinner or just before bed.