SlideShare une entreprise Scribd logo
1  sur  95
Data4Impact Workshop
on Application of Big Data
in Scientometrics
Rome|ISSI 2019 Conference
Introduction to Data4Impact
Data4Impact: the basics
• Call: CO-CREATION-08-2016-2017: Better integration of evidence on the impact of research
and innovation in policy making
• Expected impacts:
 Improved monitoring of R&I activities: new indicators for assessing research and innovation
performance, including the impact of research and innovation policies
 Prove value to the society: determining the societal impact of research and innovation funding in order
better to justify research and innovation spending
Data4Impact addresses key challenges and expected impacts of CO-CREATION-08-2016-2017
through a data driven approach
What is big data?
Definition of Big Data:
"Big Data is high-volume, high-velocity and/or high-variety information assets that demand
cost-effective, innovative forms of information processing that enable enhanced insight, decision
making, and process automation."
Key properties of Big Data:
 Volume, i.e. no sampling is generally applied
 Variety, i.e. structured and unstructured data from various sources, in different formats
 Velocity, i.e. real-time/rapid data
 Veracity, i.e. variations in data quality, cleaning, processing, etc.
Non-intrusiveness -> Big Data is a byproduct of digital interaction and communication;
Key objective: make Big Data small!
Where? Start with an individual
Individual level
Who participated in the programme?
Who were members of the extended team?
Organisation/team level
Research teams in universities & research centres;
Small companies and large enterprises
Project/programme level
Data aggregated at project or programme level
Analytical dimensions
Within researchers themselves; between researchers;
between researchers and organisations; between
organisations; between projects; between programmes
Key questions:
- Whom exactly did the programme attract?
- What happened during and after the projects?
- What was the impact?
How? Build a Knowledge Graph, Integrate Data
Why/what? Answer questions that matter to funders without
ever asking a beneficiary
1 2 3
Outputs,
products and
interventions
- Outputs, products and
interventions
- Collaborations
- Scientific publications
- Intellectual Property Rights
- Scientific prizes
Outcome-level
indicators
- Innovations
- Dissemination activities
- Further funding/
investment
- Next destinations
- Effects on the company/
private sector
- New companies/
organizations created
Impact level
indicators
- Impact on health and
welfare/ Health and
environmental impacts
- Impacts on creativity,
culture & society/ Social,
economic, capability and
cultural impact
- Influence on policy
making/ political impact
Ask less, ask anything
Evaluating Planning Storytelling
Tracking individual researchers
Organisation news/public relations
Tracking organisations
Tracking organisations
Tracking projects
Key objectives and results
Data4Impact
Data4Impact: objectives
Objective 1: define, develop, analyse new indicators for assessing the
performance of EU and national R&I systems.
Data4Impact: objectives
Objectives 2+3: gather data at input, throughput, output and impact levels,
derive facts and understand impact on health-related challenges
Objectives 4+5: perform community-driven validation and develop user-
centered tools
Key facts about Data4Impact
Project dimension Coverage
Levels of data collection Organisation
Project (for EU FP programmes only)
Programme
Programmes covered Over 40 health funders in the Europe + EU FPs
Data collection Yes (strong effort)
Data integration Yes (moderate effort)
Machine learning, NLP, entity
recognition
Yes (strong effort)
Topic modelling Yes (strong effort)
Project duration & budget 2 years, EUR 1.5 million
Key facts about Data4Impact
Input data EC monitoring data (Health & SC1 projects, health related),
PubMed data
Data sources: output level indicators EC monitoring data (Cordis)
OpenAIRE
Europe PMC (incl. full text data)
PATSTAT (incl. abstracts & full texts)
Lens.org data
Data sources: result level indicators Company websites
Social media (Twitter)
Clinical guidelines repositories
Data sources: impact level indicators EC monitoring data
EMA data on human medicinal products & orphan medicines
DrugBank data
Company websites
Social media (Twitter)
News/media sites
Data4Impact framework
Tracking Research Activities
Input Throughput Output Impact
Tracking Research Activities
Methodology attributes:
- automated
- granular
- scalable
- applicable to other domains
FP7/H2020 Projects – DATA
CORDIS
● Call document
● Project description
● Final or periodic project reports (project summary)
● Scholarly publications deriving from the project
● Patents
● Results in Brief – Expected Impact
automatic
extraction of
pertinent info from
associated
documents (NLP),
and metadata
Topics in the Health Sector
• International statistical Classification of Diseases and
related health problems
• international standard for reporting diseases and health
conditions
• diagnostic standard for all clinical and research purposes
ICD Chapters
bottom up estimation of associated ICD classes for each project
Input Throughput Output Impact
Tracking Research Activities
Methodology attributes:
- automated
- granular
- scalable
- applicable to other domains
Funding
Input (Funding) & Topics
H2020-CoreFP7-Core
Input Throughput Output Impact
Tracking Research Activities
Methodology attributes:
- automated
- granular
- scalable
- applicable to other domains
Pubs & Patents Other Innovations
FP7/H2020 Projects – DATA
CORDIS
● Call document
● Project description
● Final or periodic project reports (project summary)
● Scholarly publications deriving from the project
● Patents
● Results in Brief – Expected Impact
automatic
extraction of
pertinent info from
associated
documents (NLP),
and metadata
Throughput & Output
Innovation “Insights” from Project Portfolios
• Diagnostic Tools
• Treatment
• Drug
• Protocol
• Biomarker
• Biorepository
• Gene
• Metabolite
• Clinical Trial
• Method
• Patent
• Device
• Material
• Infrastructure
• Software
• System
• Prototype
• Study
• Publication
• Company
• Education
• Employment
• Dissemination
• *Impact
• *Outcome
Treatment
Standard
Publication
Prototype
Protocol
Protein
Metabolite
Material
Gene
Employment
Education
Drug
Dissemination
Diagnostic Tool
Clinical Trial
Biorepository
Biomarker
0 5000 10000 15000 20000 25000 30000 35000 40000
Device
Infrastructure
Method
Software
System
Study
Output – Innovations - FP7- Extended
Output – Creation of New Companies
● 430 newly created companies in FP7
● 51 of which in FP7-Core
● Sample of FP7-Core projects with 2 or more new companies formed
Project Number Project Acronym # Spin-offs
201924 EDICT 3
223744 DOPAMINET 2
201418 READNA 2
278832 hiPAD 2
279039 ComplexINC 2
Collaboration Networks
ICD Ch9 Diseases of the Circulatory System
Technological Diffusion - Organization Networks
(public vs private, geographic location, etc):
size, density, key bridge organizations,
across fields, fine detail within a subfield
Project Facets
Track I, T, O, extract named entities, links across:
sector (private, public)
geographic location
country
programme, call, etc.
Organizations
Funder
*Research Areas*
Time
estimated
provided
Input Throughput Output Impact
Tracking Research Activities
Methodology attributes:
- automated
- granular
- scalable
- applicable to other domains
Academic
Publications
• > 5 million
• H2020, FP7
• 20% of sample from 40+
funders of D4I
Project Reports
Deep Learning
NLP
Expert
469 Topics
10 major categories
Topic Modelling
Academic Impact
Citations
Clinicopathologic and 11C-Pittsburgh compound B implications
of Thal amyloid phase across the Alzheimer’s disease spectrum
An autoradiographic evaluation of AV-1451 Tau PET in dementia
Deciphering Interactions of Acquired Risk Factors and ApoE-
mediated Pathways in AlzheimerΒ΄s Disease
What is normal in normal aging? Effects of aging, amyloid and
Alzheimer's disease on the cerebral cortex and the hippocampus
Soluble apoE complex: mechanism and therapeutic target for
APOE4-induced AD risk
Role of genes linked to sporadic Alzheimer's disease risk in the
production of Β -amyloid peptides
Proteolytic Cleavage of Apolipoprotein E4 as the Keystone for
the Heightened Risk Associated with Alzheimer’s Disease
MeSH
alzheimer disease
amyloid beta peptides
amyloid
neurodegenerative diseases
Brain
apolipoprotein e4
amyloidosis
Text
Amyloid
Alzheimer
Apoe
Neurodegeneration
Neurodegenerative
Abeta
Brain
Dementia
Aggregation
Fibrils
Tau
Cognitive
Pathology
Plaques
Deposition
impairment
aging
Phrases
alzheimer disease
neurodegenerative diseases
amyloid fibrils
amyloid deposition
Keywords
alzheimer disease
neurodegeneration
amyloid
dementia
geriatrics
Wikipedia terms
Alzheimer's_disease
Neurodegeneration
Apolipoprotein_E
Amyloid
Neuropathology
What is this Topic about??
Alzheimer’s disease
Topic Modelling: Identifying Topics
• completely bottom up approach
• very little domain knowledge needed (sources for documents &
annotations)
• granularity
• each document associated with a list of topics (and a weight for each) 
fully flexible indicators
• keywords
• each topic associated with keywords  topic similarity
• removes programmatic structure
Topic Modelling
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Academic Impact: Trends per Year
hydrogen
bonds and
cyclohexane
conformation
Academic Impact: Safe Bets
Safe bet: a topic
with a strong
presence every
year
(weight more than a
st.dev. above the
average)
Topic
Antibiotic resistant infections
Cardian (ventricular) remodelling
Community-based health promotion strategies
Health literacy in primary health care
Malaria and leishmaniasis
Organic chemistry synthesis
Academic Impact: Emerging Topics
Emerging: a topic
with low presence
before 2015 that is
now growing “way”
faster than the
average.
Topic
Antibiotic resistant bacterias
Chagas disease
Chemometric analysis of volatile compounds
Complementary and alternative medicine
Fluorescein isothiocyanate (FITC)
Hormonal disorders
Immortalised cell lines
Pulmonary hypertension
Sleep apnea
T-cell mediated inflammatory skin diseases
Teratology
Academic Impact: Hibernating Giants
Hibernating Giant:
a topic with that
used to be strong
up to [2011-2013]
and is now
consistently at low
levels
Topic
Enhancer-Binding Protein Complexes
Hydrogen bonds and coordination geometry
Hydrogen bonds and cyclohexane conformation
Minority health and health care disparities
Molecular dynamics and protein function
Regulation of protein function
Regulatory T cell function and immune system
Use of Arabidopsis thaliana as a plant model
Academic Impact & Project Facets
Evaluate academic impact across:
sector (private, public)
geographic location
country
programme, call, etc.
Organizations
Funder
*Research Areas*
Time
estimated
provided
Academic Impact: Timeliness of Investment
Field Projects
Topic View: Cardiovascular Diseases
Funder Rank
National Institutes of Health (US) 1
Medical Research Council (UK)* 2
European Commission 3
Wellcome Trust (UK) 4
British Heart Foundation (UK) 5
National Health and Medical
Research Council (Australia) 6
Research Councils UK* 7
Swedish Research Council (Sweden) 8
Chief Scientist Office (UK) 9
Cancer Research UK 10
Topic Size: large
- x2 of average topic in PubMed
Topic Trend: growing
- 1.25 times larger in 2012-18, than
2005-11
Topic Exclusivity:
- low (many funders investing on
topic)
Cardiovascular Risk Factors
Share-of-Voice for each factor, Dates: 13 January – 15 April 2019
Most Discussed Topics
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
Indicator: Topic Buzz, rank topics by the number of mentions. Dates: 13 January – 31 March 2019
Economic impact
Tracking of data from company
websites
Why?
Current methodologies affected by low and dropping response rates, relatively
high running costs and substantial data lags
Big data offers data scalability, completeness and speed
Growing interest in the big data, e.g. future editions of the European Innovation
Scoreboard to contain data derived from big data approaches
Process (how?)
Input data from
Cordis + Orbis
Scraping/Crawling
Language
recognition/
Translation
Database with text
data from company
websites
Randomly
selecting and
labelling a sub set
of data
Model
development
List of innovation
mentions by stage
and type
Aggregating to a
number of unique
innovations
Database with
company
innovation counts
Aggregation
Visualisation
Classification of innovations (what?)
Innovations
Innovation type
Input data Company URL link
Innovation output
Product
innovation
Service, process,
other
innovation
Innovation
activity
Licensing
activities
Private/public
funding
attracted
Certification &
standardisation
M&A
+
Extraction of entities (product names, trademarks, copyright) associated
with innovation outputs and activities
Examples of innovations identified
2019-09-16
52
Key results: FP7-Core set
Key results: 2097 FP7 & H2020 companies analysed in total, over 1.5 million URL links
harvested, over 15,000 innovation texts identified
Key results: FP7-Core Set
Indicator Indicator value (FP7-Core projects)
Number of companies analysed in the FP7-Core set 1395
Estimated share of enterprises with evidence of innovation activities 46.0%
Average number of innovation outputs and activities identified per
company
16.1
Estimated share of highly innovative enterprises 7.4%
Estimated share of enterprises with evidence of licensing activities
(incl. patent/trademark license agreements)
9.3%
Estimated share of enterprises involved in activities related to
acquisitions
20.0%
Estimated share of enterprises with evidence of private
investment/capital attracted
8.0%
Treatment
Standard
Publication
Prototype
Protocol
Protein
Metabolite
Material
Gene
Employment
Education
Drug
Dissemination
Diagnostic Tool
Clinical Trial
Biorepository
Biomarker
0 5000 10000 15000 20000 25000 30000 35000 40000
Device
Infrastructure
Method
Software
System
Study
Output – Innovations - FP7- Extended
Uptake of R&I by companies
Estimated uptake of innovation outputs and activities in FP7-Core projects, by ICD class
Summary
Useful for:
- Monitoring and ex-post evaluation: first use cases for the EIS built; possible to
link company innovations to previous research activities
- Storytelling: rich source of data for innovation success stories and case studies
- Proposal evaluation: innovation track record, previous commercialisation
activities, investment attracted, etc.
Caveats, weaknesses and areas for further work:
- Process and service innovations captured to a lesser degree
- Eudamed (EU database for CE marked medical devices and technologies,
opening in 2020) offers a rich source of data for further work
Societal & health impact
Data4Impact framework
Linking medicines to R&I
Why?
No data currently tracked in a systematic way on the contributions of R&I to
new products on the market
Large investments made in translational medicine and close-to-market research,
but little known about the uptake
New products on the market is a proxy for economic impact, but also
health/societal impact, e.g. orphan medicines, new non-generic medicines,
medicines treating highly resistant pathogens
Project
Medicine
Medicine and/or
active substance
Clinical trial
Publication
mentions
Sponsor
Linked to
Clinical
trials
New
medicine/
product
Linked to
develops
Process (how?)
Key results: human medicinal
products authorized by the EMA
Selected results: top-5 medicines with
the strongest links to FP7
Medicine name Active substance Marketing authorisation
holder
Total number of mentions
of medicine name & active
substance
Orfadin Nitisinone Swedish Orphan Biovitrum
International AB
4290
Alkindi Hydrocortisone Diurnal Europe B.V. 3144
Ferriprox Deferiprone Apotex Europe BV 2789
Herceptin Trastuzumab Roche Registration GmbH 1210
Aplidin Plitidepsin Pharma Mar, S.A. 650
Example: Alkindi
Example: Alkindi
Example: Olaparib
Example: Olaparib (breast, ovarian cancer)
Example: Olaparib (breast, ovarian cancer)
Key results: human medicinal
products authorized by the EMA
Summary
TRR and Data4Impact systematically link all key stages of the R&I lifecycle
& follow the logic of impact pathways
Eventually, big data will cover in the health domain:
• Basic research & research in general: traditional indicators +
throughput/output data + new measures of academic impact based on topic
modelling
• Translational research: clinical trials
• Innovation & market uptake: innovation data from company websites, EMA
data on medicines, Eudamed data on medical devices & technologies
• Impact: clinical guidelines, HTAs, Cochrane reviews
Basic research
Translational
research
Innovation/ market
uptake
Societal/ health
impact
Clinical guidelines
• Clinical guidelines, systematic reviews and treatment
recommendation documents provide traces of clinical and
professional practice
• Proprietary data from Minso Solutions AB. Maintains a
database, Clinical Impact, (CI:TM) (Except WHO, Cochrane,
NICE, also available in PubMed)
• The coverage is nearly complete at the government level
for Sweden, Denmark, Norway, Germany (at the S3 level),
and the UK (NICE and SIGN guidelines), as well as good
coverage of WHO guideline documents and Cochrane
Systematic Reviews.
• In total 855 clinical guidelines had a total of 3684 (2,073
fractional) references that were matched to 1781
publications found in the D4I database.
Funder (EC breakdown)
Funder_type
Number
(full)
Number
(fract.)
EC_funder (FP7/H2020) 115 78.2
European nat’l funders 1,859 1,317.9
Internationa funders 1,710 676.9
Total sum 3,684 2,073.0
Funder
Number
(full)
Number
(fract)
EC_FP7-CORE 74 49.9
EC_FP7-EXTENDED 28 18.2
EC_H2020-EXTENDED 1 0.1
EC_other 12 10.0
Total sum 115 78.2
MESH terms for funded research
HIV Infections 13 1.97%
Antitubercular Agents 8 1.21%
Mycobacterium
tuberculosis 8 1.21%
Stroke 6 0.91%
Antibodies, Monoclonal 5 0.76%
Colorectal Neoplasms 5 0.76%
ErbB Receptors 5 0.76%
Microbial Sensitivity Tests 5 0.76%
ras Proteins 4 0.61%
HIV-1 4 0.61%
Tuberculosis 4 0.61%
Diabetes Mellitus, Type 1 4 0.61%
Europe 4 0.61%
EC
HIV Infections 62 2.45%
Stroke 27 1.07%
Anti-HIV Agents 26 1.03%
United Kingdom 26 1.03%
England 18 0.71%
Diabetes Mellitus, Type 2 17 0.67%
Brain 15 0.59%
Primary Health Care 15 0.59%
Smoking 15 0.59%
Smoking Cessation 14 0.55%
Cardiovascular Diseases 13 0.51%
Obesity 13 0.51%
Breast Neoplasms 12 0.47%
Bipolar Disorder 12 0.47%
HIV Seropositivity 12 0.47%
Depression 12 0.47%
Medical research council
HIV Infections 104 4.01%
Antimalarials 60 2.32%
Malaria, Falciparum 48 1.85%
Artemisinins 44 1.70%
Tuberculosis 28 1.08%
Anti-HIV Agents 24 0.93%
Malaria, Vivax 23 0.89%
Malaria 22 0.85%
Plasmodium falciparum 22 0.85%
South Africa 21 0.81%
Pregnancy
Complications, Parasitic 19 0.73%
Primaquine 17 0.66%
Quinolines 17 0.66%
Wellcome Trust
MESH terms in referred works
fastText algorithm
Topical analysis of reference contexts
congue risus feugiat ref264 tincidunt lorem nullam
In the generated topic model, each word is associated
with a probability distribution of topics
For each reference, a symmetric context window of
size k is used as a pseudo-document, and the most
probable topic is calculated for that context window
congue risus feugiat ref264 tincidunt lorem nullam
Asthma, a chronic respiratory condition
affecting 300 million people globally (
aref15080825 ), causes inflammation of the lungs
as well as structural and functional remodelling
of the airways. It is characterised by recurrent
attacks of breathlessness and wheezing with
varying degrees of frequency and severity, which
is caused by swelling of the bronchial tubes
resulting in airflow limitation (WHO 2011).
Although the causes of asthma are not completely
understood, risk factors are known to include
inhaling asthma triggers such as allergens,
tobacco smoke and chemical irritants. Asthma is
incurable and the prevalence is increasing,
particularly in children and young adults (
aref22157151 ), however appropriate management
can control the disorder and enable people to
enjoy a high quality of life (WHO 2011).
https://doi.org/10.1002/14651858.CD001116.pub4
asthma a chronic respiratory condition affecting million people globally aref causes inflammation of
the lungs as well as structural and functional remodelling of the airways
Topic 346 (0.8149): asthma, copd, allergic, airway, disease, fev, ige, respiratory, lung, symptoms
Topic 78 (0.0689): pressure, lung, pulmonary, respiratory, gas, lungs, ventilation, volume, breathing,
alveolar
Topical coherence
Using distance measures
defined on spaces of
probability distribution, such
as the Bhattacharyya
distance and the Hellinger
distance, we measure the
divergence between the topics
assigned to the same
reference in different contexts
as well as the topics assigned
to context windows of
different size for a specific in-
text citation.
Clinical guideline impact
• Professional impact – One step closer to the implementation of research
within the clinic
• Case: References in context:
 Generic method for academic citations
In Data for impact :
1. Subject classification of citing document based on cited documents’ MESH
terms
2. Distinguishing between reference kinds in guideline documents
3. Establishing the ”topicality” of each reference based on a trained model of
EuroPMC article.
Architecture
WP4
500 topic
models
WP5.4
138 topic
searches
H2020/FP7
project topics
human expert
web lists of
diseases
manual
selection
News
Blogs
Fora
Twitter
Mentions Indicators
• Monthly releases
• ~1,5M documents per release:
news, blogs, fora. Expected
total size ~5M documents
• ~10M tweets per release
total size ~30M tweets
• 138 topics searched -> 1
dataset per topic
Top-20 Twitter topics (n:~31M tweets)
0 500,000 1,000,000 1,500,000 2,000,000
climate change
vaccination
measles and newborn screening
stress disorders
diabetes mellitus
attention deficit disorder with…
depression
transplantation
weight loss and obesity
cardiovascular risk factors
alzheimer disease
cancer therapy
eating disorders
hypertension and blood pressure
myocardium and heart failure
breast cancer
schizophrenia and bipolar disorder
dendritic cells and immunity
asthma
environmental exposure and air…
Topic Topic name Num tweets
433 climate change 9,949,906
272 vaccination 1,760,780
175 measles and newborn screening 1,457,110
245 stress disorders 898,758
209 diabetes mellitus 858,118
294 adhd 706,055
315 depression 703,844
348 transplantation 699,582
121 weight loss and obesity 696,612
319 cardiovascular risk factors 647,843
254 alzheimer disease 637,668
362 cancer therapy 570,636
123 eating disorders 513,989
240 hypertension and blood pressure 452,499
302 myocardium and heart failure 445,434
284 breast cancer 415,986
366 schizophrenia and bipolar disorder 407,553
344 dendritic cells and immunity 397,980
169 asthma 383,321
373 env. exposure and air pollution 381,212
Topic fluctuation Jan-Feb
3
123
175
254
272
362
0
10000
20000
30000
40000
50000
60000
70000
Topics: 3: anorexia, 123: bulimia, 175:
measles, 254: Alzheimer, 272: vaccination,
362: cancer
Virality
Five prominent topics according to virality,
the most retweeted tweet together with its url.
ID Topic Retweets URL
47lung cancer 145,421https://t.co/nAtqnmKCqW
491acute lymphoblastic leukemia 11,338https://t.co/zc4qFt6fy5
433climate change 47,547https://t.co/zxzAlorA3O
272vaccination 11,923https://t.co/d6l8vfmBVW
348transplantation 60,692https://t.co/FSmETQpSkm
47 lung cancer 491 leukemia 433 climate change 272 vaccination 348 transplantation
Task 5.4.3 Twitter conversation analysis
• Builds on other WP5.4 activities, but takes a somewhat different approach
to collecting data.
 Focuses on relationships between social media posts (retweets, @tweets, #tweets)
 Possible to construct meaningful tests as ”scripted dialogs”
 Helps weed out spam
 Amenable to content based text analysis at the conversation level (e.g. Sentiment
analys, topic modelling)
Referring to research in thread
First collected tweet in thread:
-[tweet id='13441' replyto='14018'] Independent research has shown that individuals who were
vaccinated for the flu had 5.5 times more respiratory illness than those who were not
vaccinated. [/tweet]
- (A number of replies omitted; thread length: 313)
- [tweet id='216387' replyto='216418'] In the light of new info, why not? It happens all the
time.[/tweet]
- (Replies omitted, showing those with reference)
- [tweet id='216302' replyto='216387'] which is???DOI:10.1371/journal.pntd.0005179
[/tweet]
- [tweet id='216261' replyto='216387'] 'Analysis of year 3 results of phase III trials
of Dengvaxia suggest high rates of protection of vaccinated partial dengue immunes
but high rates of hospitalizations during breakthrough dengue infections of persons
who were vaccinated when seronegative...'DOI:10.1371/journal.pntd.0005179
[/tweet]
-- [tweet id='216241' replyto='216387'] Phase III Trials, among our 9-year olds!
FACT. DOI:10.1371/journal.pntd.0005179 [/tweet]
--- [tweet id='215757' replyto='216241'] Phase 2 was all that is required for release
Phase 3 was 'extra' 'Extra' studies are always done throughout the commercial
lifetimes of drugs & vaccines Consequences of phase 3 results are nowhere near
what group wud have us believe DOI:10.1371/journal.pntd.0005179 [/tweet]
Vaccination on
Twitter
Topic bursts, user behaviour and referring to research in discussions
Topic burst
• Identify a day when activity is more than 50% above the daily average
• The burst extends up to the next day with activity below the average
• This period is compared to previous and following periods of equal length
• This example: 4 day long burst in topic 272 (vaccination)
3
123
175
254
272
362
0
10000
20000
30000
40000
50000
60000
70000
14-Jan
15-Jan
16-Jan
17-Jan
18-Jan
19-Jan
20-Jan
21-Jan
22-Jan
23-Jan
24-Jan
25-Jan
26-Jan
27-Jan
28-Jan
29-Jan
30-Jan
31-Jan
1-Feb
2-Feb
3-Feb
4-Feb
5-Feb
6-Feb
7-Feb
8-Feb
9-Feb
10-Feb
RT networks
(similar
structures,
amount of RTs
increases when
activity is high)
Word clouds
based on
hashtags
(seemingly a
topical shift
during burst)
48% rts 55% rts 42.5% rts
User groups and their relative activity Previous (144869
tweets)
Burst
(194712)
Next
(115557)
Top 1% most active share (overall: 16%) 12 12 19
Next 9% share (overall: 17%) 20 18 18
90% least active share (overall: 67%) 68 70 63
The least active user group
is more prominent when
general activity is high
while the most active user
group is more prominent
when activity is low.
”Deniers”
(measles, vaxxed, mmr,
autism, study, flu, hpv,
informedconsent,
vaxwoke, cdc,
vaccineinjury,
learntherisk, maga,
gardasil, vaccineskill) ”Non-deniers 2”
(measles, vaccineswork,
publichealth, science,
humanitariancrisis,
scientificreport, antivax,
vaccinessavelives, venezuela,
crisis, humanitarianaid, help,
antivaxxers, vaccinesaresafe,
misinformation, scicomm,
itrustvaccines, mmr,
factsmatter)
”Non-deniers 1”
(measles, vaccineswork, flu, hpv,
antivax, vaxfactsfebruary,
vaccinessavelives, immunization,
antivaxxers, mumps, rotavirus,
ethiopia, law, ebola)
RT and coupled hashtag
networks from burst period.
Academic
27%
Academically
trained
11%
Other
Professional
23%
Media
38%
Policy/decision
maker
1%
9,647 plain text biographies from Twitter profiles
classified using a rule-based method: 30 % matched as professionals:
Class Keyword example
Science student student, studying,
Graduated MS, MA, graduate
University faculty lectur, prof., professor
Other scientist
technician, lab
manager, -ologist
Education and
outreach
curator, teacher,
librarian
Applied science
organization
nonprofit, philantropy
Other professional
recruiter, entrepreneur,
manager
Media professional journalis, publisher
Policy/decision
maker
congressman, senator,
parliament
Ekström, B. (2019): Developing a rule-based method for identifying researchers on Twitter: The case of vaccine discussions
Poster accepted to ISSI, 17th International Society of Scientometrics and Informetrics Conference, Rome, 2-5 September.
How can we use Twitter-bio personas?
- Retweet data
How can we use Twitter-bio personas?
Conversation data
?
Presentation of the platform
Questions & Answers
Data4Impact has received funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 770531.
Thank you for your attention!
If you would like to be notified when the online monitoring
platform is launched, email us at:
sonata@ppmi.lt
Visit out website:
www.data4impact.eu
Follow us on Twitter and SlideShare:
@Data4Impact

Contenu connexe

Similaire à Workshop Presentation on Application of Big Data in Scientometrics

Meeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthMeeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthPhilip Bourne
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact
 
PADDI - A business intelligence and data quality platform for Piedmont health
PADDI - A business intelligence and data quality platform for Piedmont healthPADDI - A business intelligence and data quality platform for Piedmont health
PADDI - A business intelligence and data quality platform for Piedmont healthGiuliana Bonello
 
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018Pistoia Alliance
 
Tackling societal challenges through digital transformation
Tackling societal challenges through digital transformationTackling societal challenges through digital transformation
Tackling societal challenges through digital transformationGames for Health Europe
 
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE
 
Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...e-ROSA
 
Jean claude burgelman implications of open data
Jean claude burgelman implications of open dataJean claude burgelman implications of open data
Jean claude burgelman implications of open dataPlatforma Otwartej Nauki
 
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Platforma Otwartej Nauki
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeEdward Curry
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataSemantic Web Company
 
Drive communication plan and debate - Sharon McHale & Riia Järvenpää
Drive communication plan and debate - Sharon McHale & Riia JärvenpääDrive communication plan and debate - Sharon McHale & Riia Järvenpää
Drive communication plan and debate - Sharon McHale & Riia JärvenpääDRIVE research
 
Big data approaches to estimate the impact of EU funding
Big data approaches to estimate the impact of EU fundingBig data approaches to estimate the impact of EU funding
Big data approaches to estimate the impact of EU fundingData4Impact
 
Pistoia Alliance European Conference 2015 - Ann Martin / IMI
Pistoia Alliance European Conference 2015 - Ann Martin / IMIPistoia Alliance European Conference 2015 - Ann Martin / IMI
Pistoia Alliance European Conference 2015 - Ann Martin / IMIPistoia Alliance
 
Health Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big DataHealth Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big DataPhilip Bourne
 
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...Martin Pan
 
Natural Language Processing (NLP) in Healthcare and Life Sciences Market Comp...
Natural Language Processing (NLP) in Healthcare and Life Sciences Market Comp...Natural Language Processing (NLP) in Healthcare and Life Sciences Market Comp...
Natural Language Processing (NLP) in Healthcare and Life Sciences Market Comp...subishsam
 

Similaire à Workshop Presentation on Application of Big Data in Scientometrics (20)

Meeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthMeeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human Health
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of results
 
PADDI - A business intelligence and data quality platform for Piedmont health
PADDI - A business intelligence and data quality platform for Piedmont healthPADDI - A business intelligence and data quality platform for Piedmont health
PADDI - A business intelligence and data quality platform for Piedmont health
 
Company presentation 2013
Company presentation 2013Company presentation 2013
Company presentation 2013
 
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
 
Tackling societal challenges through digital transformation
Tackling societal challenges through digital transformationTackling societal challenges through digital transformation
Tackling societal challenges through digital transformation
 
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
 
Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...
 
Jean claude burgelman implications of open data
Jean claude burgelman implications of open dataJean claude burgelman implications of open data
Jean claude burgelman implications of open data
 
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in Europe
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked Data
 
Drive communication plan and debate - Sharon McHale & Riia Järvenpää
Drive communication plan and debate - Sharon McHale & Riia JärvenpääDrive communication plan and debate - Sharon McHale & Riia Järvenpää
Drive communication plan and debate - Sharon McHale & Riia Järvenpää
 
Big data approaches to estimate the impact of EU funding
Big data approaches to estimate the impact of EU fundingBig data approaches to estimate the impact of EU funding
Big data approaches to estimate the impact of EU funding
 
Pistoia Alliance European Conference 2015 - Ann Martin / IMI
Pistoia Alliance European Conference 2015 - Ann Martin / IMIPistoia Alliance European Conference 2015 - Ann Martin / IMI
Pistoia Alliance European Conference 2015 - Ann Martin / IMI
 
Health Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big DataHealth Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big Data
 
2.tic sante atelierh2020- pcn santé-10sept15
2.tic sante atelierh2020- pcn santé-10sept152.tic sante atelierh2020- pcn santé-10sept15
2.tic sante atelierh2020- pcn santé-10sept15
 
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
 
Natural Language Processing (NLP) in Healthcare and Life Sciences Market Comp...
Natural Language Processing (NLP) in Healthcare and Life Sciences Market Comp...Natural Language Processing (NLP) in Healthcare and Life Sciences Market Comp...
Natural Language Processing (NLP) in Healthcare and Life Sciences Market Comp...
 
Workshop report
Workshop reportWorkshop report
Workshop report
 

Dernier

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Dernier (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 

Workshop Presentation on Application of Big Data in Scientometrics

  • 1. Data4Impact Workshop on Application of Big Data in Scientometrics Rome|ISSI 2019 Conference
  • 3. Data4Impact: the basics • Call: CO-CREATION-08-2016-2017: Better integration of evidence on the impact of research and innovation in policy making • Expected impacts:  Improved monitoring of R&I activities: new indicators for assessing research and innovation performance, including the impact of research and innovation policies  Prove value to the society: determining the societal impact of research and innovation funding in order better to justify research and innovation spending Data4Impact addresses key challenges and expected impacts of CO-CREATION-08-2016-2017 through a data driven approach
  • 4.
  • 5. What is big data? Definition of Big Data: "Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation." Key properties of Big Data:  Volume, i.e. no sampling is generally applied  Variety, i.e. structured and unstructured data from various sources, in different formats  Velocity, i.e. real-time/rapid data  Veracity, i.e. variations in data quality, cleaning, processing, etc. Non-intrusiveness -> Big Data is a byproduct of digital interaction and communication; Key objective: make Big Data small!
  • 6. Where? Start with an individual Individual level Who participated in the programme? Who were members of the extended team? Organisation/team level Research teams in universities & research centres; Small companies and large enterprises Project/programme level Data aggregated at project or programme level Analytical dimensions Within researchers themselves; between researchers; between researchers and organisations; between organisations; between projects; between programmes Key questions: - Whom exactly did the programme attract? - What happened during and after the projects? - What was the impact?
  • 7. How? Build a Knowledge Graph, Integrate Data
  • 8. Why/what? Answer questions that matter to funders without ever asking a beneficiary 1 2 3 Outputs, products and interventions - Outputs, products and interventions - Collaborations - Scientific publications - Intellectual Property Rights - Scientific prizes Outcome-level indicators - Innovations - Dissemination activities - Further funding/ investment - Next destinations - Effects on the company/ private sector - New companies/ organizations created Impact level indicators - Impact on health and welfare/ Health and environmental impacts - Impacts on creativity, culture & society/ Social, economic, capability and cultural impact - Influence on policy making/ political impact
  • 9. Ask less, ask anything Evaluating Planning Storytelling
  • 11.
  • 12.
  • 16. Key objectives and results Data4Impact
  • 17. Data4Impact: objectives Objective 1: define, develop, analyse new indicators for assessing the performance of EU and national R&I systems.
  • 18. Data4Impact: objectives Objectives 2+3: gather data at input, throughput, output and impact levels, derive facts and understand impact on health-related challenges Objectives 4+5: perform community-driven validation and develop user- centered tools
  • 19. Key facts about Data4Impact Project dimension Coverage Levels of data collection Organisation Project (for EU FP programmes only) Programme Programmes covered Over 40 health funders in the Europe + EU FPs Data collection Yes (strong effort) Data integration Yes (moderate effort) Machine learning, NLP, entity recognition Yes (strong effort) Topic modelling Yes (strong effort) Project duration & budget 2 years, EUR 1.5 million
  • 20. Key facts about Data4Impact Input data EC monitoring data (Health & SC1 projects, health related), PubMed data Data sources: output level indicators EC monitoring data (Cordis) OpenAIRE Europe PMC (incl. full text data) PATSTAT (incl. abstracts & full texts) Lens.org data Data sources: result level indicators Company websites Social media (Twitter) Clinical guidelines repositories Data sources: impact level indicators EC monitoring data EMA data on human medicinal products & orphan medicines DrugBank data Company websites Social media (Twitter) News/media sites
  • 23. Input Throughput Output Impact Tracking Research Activities Methodology attributes: - automated - granular - scalable - applicable to other domains
  • 24. FP7/H2020 Projects – DATA CORDIS ● Call document ● Project description ● Final or periodic project reports (project summary) ● Scholarly publications deriving from the project ● Patents ● Results in Brief – Expected Impact automatic extraction of pertinent info from associated documents (NLP), and metadata
  • 25. Topics in the Health Sector • International statistical Classification of Diseases and related health problems • international standard for reporting diseases and health conditions • diagnostic standard for all clinical and research purposes ICD Chapters bottom up estimation of associated ICD classes for each project
  • 26. Input Throughput Output Impact Tracking Research Activities Methodology attributes: - automated - granular - scalable - applicable to other domains Funding
  • 27. Input (Funding) & Topics H2020-CoreFP7-Core
  • 28. Input Throughput Output Impact Tracking Research Activities Methodology attributes: - automated - granular - scalable - applicable to other domains Pubs & Patents Other Innovations
  • 29. FP7/H2020 Projects – DATA CORDIS ● Call document ● Project description ● Final or periodic project reports (project summary) ● Scholarly publications deriving from the project ● Patents ● Results in Brief – Expected Impact automatic extraction of pertinent info from associated documents (NLP), and metadata
  • 30. Throughput & Output Innovation “Insights” from Project Portfolios • Diagnostic Tools • Treatment • Drug • Protocol • Biomarker • Biorepository • Gene • Metabolite • Clinical Trial • Method • Patent • Device • Material • Infrastructure • Software • System • Prototype • Study • Publication • Company • Education • Employment • Dissemination • *Impact • *Outcome
  • 31. Treatment Standard Publication Prototype Protocol Protein Metabolite Material Gene Employment Education Drug Dissemination Diagnostic Tool Clinical Trial Biorepository Biomarker 0 5000 10000 15000 20000 25000 30000 35000 40000 Device Infrastructure Method Software System Study Output – Innovations - FP7- Extended
  • 32. Output – Creation of New Companies ● 430 newly created companies in FP7 ● 51 of which in FP7-Core ● Sample of FP7-Core projects with 2 or more new companies formed Project Number Project Acronym # Spin-offs 201924 EDICT 3 223744 DOPAMINET 2 201418 READNA 2 278832 hiPAD 2 279039 ComplexINC 2
  • 33. Collaboration Networks ICD Ch9 Diseases of the Circulatory System Technological Diffusion - Organization Networks (public vs private, geographic location, etc): size, density, key bridge organizations, across fields, fine detail within a subfield
  • 34. Project Facets Track I, T, O, extract named entities, links across: sector (private, public) geographic location country programme, call, etc. Organizations Funder *Research Areas* Time estimated provided
  • 35. Input Throughput Output Impact Tracking Research Activities Methodology attributes: - automated - granular - scalable - applicable to other domains Academic
  • 36. Publications • > 5 million • H2020, FP7 • 20% of sample from 40+ funders of D4I Project Reports Deep Learning NLP Expert 469 Topics 10 major categories Topic Modelling Academic Impact
  • 37. Citations Clinicopathologic and 11C-Pittsburgh compound B implications of Thal amyloid phase across the Alzheimer’s disease spectrum An autoradiographic evaluation of AV-1451 Tau PET in dementia Deciphering Interactions of Acquired Risk Factors and ApoE- mediated Pathways in AlzheimerΒ΄s Disease What is normal in normal aging? Effects of aging, amyloid and Alzheimer's disease on the cerebral cortex and the hippocampus Soluble apoE complex: mechanism and therapeutic target for APOE4-induced AD risk Role of genes linked to sporadic Alzheimer's disease risk in the production of Β -amyloid peptides Proteolytic Cleavage of Apolipoprotein E4 as the Keystone for the Heightened Risk Associated with Alzheimer’s Disease MeSH alzheimer disease amyloid beta peptides amyloid neurodegenerative diseases Brain apolipoprotein e4 amyloidosis Text Amyloid Alzheimer Apoe Neurodegeneration Neurodegenerative Abeta Brain Dementia Aggregation Fibrils Tau Cognitive Pathology Plaques Deposition impairment aging Phrases alzheimer disease neurodegenerative diseases amyloid fibrils amyloid deposition Keywords alzheimer disease neurodegeneration amyloid dementia geriatrics Wikipedia terms Alzheimer's_disease Neurodegeneration Apolipoprotein_E Amyloid Neuropathology What is this Topic about?? Alzheimer’s disease Topic Modelling: Identifying Topics
  • 38. • completely bottom up approach • very little domain knowledge needed (sources for documents & annotations) • granularity • each document associated with a list of topics (and a weight for each)  fully flexible indicators • keywords • each topic associated with keywords  topic similarity • removes programmatic structure Topic Modelling
  • 39. 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Academic Impact: Trends per Year hydrogen bonds and cyclohexane conformation
  • 40. Academic Impact: Safe Bets Safe bet: a topic with a strong presence every year (weight more than a st.dev. above the average) Topic Antibiotic resistant infections Cardian (ventricular) remodelling Community-based health promotion strategies Health literacy in primary health care Malaria and leishmaniasis Organic chemistry synthesis
  • 41. Academic Impact: Emerging Topics Emerging: a topic with low presence before 2015 that is now growing “way” faster than the average. Topic Antibiotic resistant bacterias Chagas disease Chemometric analysis of volatile compounds Complementary and alternative medicine Fluorescein isothiocyanate (FITC) Hormonal disorders Immortalised cell lines Pulmonary hypertension Sleep apnea T-cell mediated inflammatory skin diseases Teratology
  • 42. Academic Impact: Hibernating Giants Hibernating Giant: a topic with that used to be strong up to [2011-2013] and is now consistently at low levels Topic Enhancer-Binding Protein Complexes Hydrogen bonds and coordination geometry Hydrogen bonds and cyclohexane conformation Minority health and health care disparities Molecular dynamics and protein function Regulation of protein function Regulatory T cell function and immune system Use of Arabidopsis thaliana as a plant model
  • 43. Academic Impact & Project Facets Evaluate academic impact across: sector (private, public) geographic location country programme, call, etc. Organizations Funder *Research Areas* Time estimated provided
  • 44. Academic Impact: Timeliness of Investment Field Projects
  • 45. Topic View: Cardiovascular Diseases Funder Rank National Institutes of Health (US) 1 Medical Research Council (UK)* 2 European Commission 3 Wellcome Trust (UK) 4 British Heart Foundation (UK) 5 National Health and Medical Research Council (Australia) 6 Research Councils UK* 7 Swedish Research Council (Sweden) 8 Chief Scientist Office (UK) 9 Cancer Research UK 10 Topic Size: large - x2 of average topic in PubMed Topic Trend: growing - 1.25 times larger in 2012-18, than 2005-11 Topic Exclusivity: - low (many funders investing on topic)
  • 46. Cardiovascular Risk Factors Share-of-Voice for each factor, Dates: 13 January – 15 April 2019
  • 47. Most Discussed Topics 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 Indicator: Topic Buzz, rank topics by the number of mentions. Dates: 13 January – 31 March 2019
  • 49. Tracking of data from company websites Why? Current methodologies affected by low and dropping response rates, relatively high running costs and substantial data lags Big data offers data scalability, completeness and speed Growing interest in the big data, e.g. future editions of the European Innovation Scoreboard to contain data derived from big data approaches
  • 50. Process (how?) Input data from Cordis + Orbis Scraping/Crawling Language recognition/ Translation Database with text data from company websites Randomly selecting and labelling a sub set of data Model development List of innovation mentions by stage and type Aggregating to a number of unique innovations Database with company innovation counts Aggregation Visualisation
  • 51. Classification of innovations (what?) Innovations Innovation type Input data Company URL link Innovation output Product innovation Service, process, other innovation Innovation activity Licensing activities Private/public funding attracted Certification & standardisation M&A + Extraction of entities (product names, trademarks, copyright) associated with innovation outputs and activities
  • 52. Examples of innovations identified 2019-09-16 52
  • 53. Key results: FP7-Core set Key results: 2097 FP7 & H2020 companies analysed in total, over 1.5 million URL links harvested, over 15,000 innovation texts identified
  • 54. Key results: FP7-Core Set Indicator Indicator value (FP7-Core projects) Number of companies analysed in the FP7-Core set 1395 Estimated share of enterprises with evidence of innovation activities 46.0% Average number of innovation outputs and activities identified per company 16.1 Estimated share of highly innovative enterprises 7.4% Estimated share of enterprises with evidence of licensing activities (incl. patent/trademark license agreements) 9.3% Estimated share of enterprises involved in activities related to acquisitions 20.0% Estimated share of enterprises with evidence of private investment/capital attracted 8.0%
  • 55. Treatment Standard Publication Prototype Protocol Protein Metabolite Material Gene Employment Education Drug Dissemination Diagnostic Tool Clinical Trial Biorepository Biomarker 0 5000 10000 15000 20000 25000 30000 35000 40000 Device Infrastructure Method Software System Study Output – Innovations - FP7- Extended
  • 56. Uptake of R&I by companies Estimated uptake of innovation outputs and activities in FP7-Core projects, by ICD class
  • 57. Summary Useful for: - Monitoring and ex-post evaluation: first use cases for the EIS built; possible to link company innovations to previous research activities - Storytelling: rich source of data for innovation success stories and case studies - Proposal evaluation: innovation track record, previous commercialisation activities, investment attracted, etc. Caveats, weaknesses and areas for further work: - Process and service innovations captured to a lesser degree - Eudamed (EU database for CE marked medical devices and technologies, opening in 2020) offers a rich source of data for further work
  • 60. Linking medicines to R&I Why? No data currently tracked in a systematic way on the contributions of R&I to new products on the market Large investments made in translational medicine and close-to-market research, but little known about the uptake New products on the market is a proxy for economic impact, but also health/societal impact, e.g. orphan medicines, new non-generic medicines, medicines treating highly resistant pathogens
  • 61. Project Medicine Medicine and/or active substance Clinical trial Publication mentions Sponsor Linked to Clinical trials New medicine/ product Linked to develops
  • 63. Key results: human medicinal products authorized by the EMA
  • 64. Selected results: top-5 medicines with the strongest links to FP7 Medicine name Active substance Marketing authorisation holder Total number of mentions of medicine name & active substance Orfadin Nitisinone Swedish Orphan Biovitrum International AB 4290 Alkindi Hydrocortisone Diurnal Europe B.V. 3144 Ferriprox Deferiprone Apotex Europe BV 2789 Herceptin Trastuzumab Roche Registration GmbH 1210 Aplidin Plitidepsin Pharma Mar, S.A. 650
  • 68. Example: Olaparib (breast, ovarian cancer)
  • 69. Example: Olaparib (breast, ovarian cancer)
  • 70. Key results: human medicinal products authorized by the EMA
  • 71. Summary TRR and Data4Impact systematically link all key stages of the R&I lifecycle & follow the logic of impact pathways Eventually, big data will cover in the health domain: • Basic research & research in general: traditional indicators + throughput/output data + new measures of academic impact based on topic modelling • Translational research: clinical trials • Innovation & market uptake: innovation data from company websites, EMA data on medicines, Eudamed data on medical devices & technologies • Impact: clinical guidelines, HTAs, Cochrane reviews Basic research Translational research Innovation/ market uptake Societal/ health impact
  • 72. Clinical guidelines • Clinical guidelines, systematic reviews and treatment recommendation documents provide traces of clinical and professional practice • Proprietary data from Minso Solutions AB. Maintains a database, Clinical Impact, (CI:TM) (Except WHO, Cochrane, NICE, also available in PubMed) • The coverage is nearly complete at the government level for Sweden, Denmark, Norway, Germany (at the S3 level), and the UK (NICE and SIGN guidelines), as well as good coverage of WHO guideline documents and Cochrane Systematic Reviews. • In total 855 clinical guidelines had a total of 3684 (2,073 fractional) references that were matched to 1781 publications found in the D4I database.
  • 73. Funder (EC breakdown) Funder_type Number (full) Number (fract.) EC_funder (FP7/H2020) 115 78.2 European nat’l funders 1,859 1,317.9 Internationa funders 1,710 676.9 Total sum 3,684 2,073.0 Funder Number (full) Number (fract) EC_FP7-CORE 74 49.9 EC_FP7-EXTENDED 28 18.2 EC_H2020-EXTENDED 1 0.1 EC_other 12 10.0 Total sum 115 78.2
  • 74. MESH terms for funded research HIV Infections 13 1.97% Antitubercular Agents 8 1.21% Mycobacterium tuberculosis 8 1.21% Stroke 6 0.91% Antibodies, Monoclonal 5 0.76% Colorectal Neoplasms 5 0.76% ErbB Receptors 5 0.76% Microbial Sensitivity Tests 5 0.76% ras Proteins 4 0.61% HIV-1 4 0.61% Tuberculosis 4 0.61% Diabetes Mellitus, Type 1 4 0.61% Europe 4 0.61% EC HIV Infections 62 2.45% Stroke 27 1.07% Anti-HIV Agents 26 1.03% United Kingdom 26 1.03% England 18 0.71% Diabetes Mellitus, Type 2 17 0.67% Brain 15 0.59% Primary Health Care 15 0.59% Smoking 15 0.59% Smoking Cessation 14 0.55% Cardiovascular Diseases 13 0.51% Obesity 13 0.51% Breast Neoplasms 12 0.47% Bipolar Disorder 12 0.47% HIV Seropositivity 12 0.47% Depression 12 0.47% Medical research council HIV Infections 104 4.01% Antimalarials 60 2.32% Malaria, Falciparum 48 1.85% Artemisinins 44 1.70% Tuberculosis 28 1.08% Anti-HIV Agents 24 0.93% Malaria, Vivax 23 0.89% Malaria 22 0.85% Plasmodium falciparum 22 0.85% South Africa 21 0.81% Pregnancy Complications, Parasitic 19 0.73% Primaquine 17 0.66% Quinolines 17 0.66% Wellcome Trust
  • 75. MESH terms in referred works fastText algorithm
  • 76. Topical analysis of reference contexts congue risus feugiat ref264 tincidunt lorem nullam In the generated topic model, each word is associated with a probability distribution of topics For each reference, a symmetric context window of size k is used as a pseudo-document, and the most probable topic is calculated for that context window congue risus feugiat ref264 tincidunt lorem nullam
  • 77. Asthma, a chronic respiratory condition affecting 300 million people globally ( aref15080825 ), causes inflammation of the lungs as well as structural and functional remodelling of the airways. It is characterised by recurrent attacks of breathlessness and wheezing with varying degrees of frequency and severity, which is caused by swelling of the bronchial tubes resulting in airflow limitation (WHO 2011). Although the causes of asthma are not completely understood, risk factors are known to include inhaling asthma triggers such as allergens, tobacco smoke and chemical irritants. Asthma is incurable and the prevalence is increasing, particularly in children and young adults ( aref22157151 ), however appropriate management can control the disorder and enable people to enjoy a high quality of life (WHO 2011). https://doi.org/10.1002/14651858.CD001116.pub4 asthma a chronic respiratory condition affecting million people globally aref causes inflammation of the lungs as well as structural and functional remodelling of the airways Topic 346 (0.8149): asthma, copd, allergic, airway, disease, fev, ige, respiratory, lung, symptoms Topic 78 (0.0689): pressure, lung, pulmonary, respiratory, gas, lungs, ventilation, volume, breathing, alveolar
  • 78. Topical coherence Using distance measures defined on spaces of probability distribution, such as the Bhattacharyya distance and the Hellinger distance, we measure the divergence between the topics assigned to the same reference in different contexts as well as the topics assigned to context windows of different size for a specific in- text citation.
  • 79. Clinical guideline impact • Professional impact – One step closer to the implementation of research within the clinic • Case: References in context:  Generic method for academic citations In Data for impact : 1. Subject classification of citing document based on cited documents’ MESH terms 2. Distinguishing between reference kinds in guideline documents 3. Establishing the ”topicality” of each reference based on a trained model of EuroPMC article.
  • 80. Architecture WP4 500 topic models WP5.4 138 topic searches H2020/FP7 project topics human expert web lists of diseases manual selection News Blogs Fora Twitter Mentions Indicators • Monthly releases • ~1,5M documents per release: news, blogs, fora. Expected total size ~5M documents • ~10M tweets per release total size ~30M tweets • 138 topics searched -> 1 dataset per topic
  • 81. Top-20 Twitter topics (n:~31M tweets) 0 500,000 1,000,000 1,500,000 2,000,000 climate change vaccination measles and newborn screening stress disorders diabetes mellitus attention deficit disorder with… depression transplantation weight loss and obesity cardiovascular risk factors alzheimer disease cancer therapy eating disorders hypertension and blood pressure myocardium and heart failure breast cancer schizophrenia and bipolar disorder dendritic cells and immunity asthma environmental exposure and air… Topic Topic name Num tweets 433 climate change 9,949,906 272 vaccination 1,760,780 175 measles and newborn screening 1,457,110 245 stress disorders 898,758 209 diabetes mellitus 858,118 294 adhd 706,055 315 depression 703,844 348 transplantation 699,582 121 weight loss and obesity 696,612 319 cardiovascular risk factors 647,843 254 alzheimer disease 637,668 362 cancer therapy 570,636 123 eating disorders 513,989 240 hypertension and blood pressure 452,499 302 myocardium and heart failure 445,434 284 breast cancer 415,986 366 schizophrenia and bipolar disorder 407,553 344 dendritic cells and immunity 397,980 169 asthma 383,321 373 env. exposure and air pollution 381,212
  • 82. Topic fluctuation Jan-Feb 3 123 175 254 272 362 0 10000 20000 30000 40000 50000 60000 70000 Topics: 3: anorexia, 123: bulimia, 175: measles, 254: Alzheimer, 272: vaccination, 362: cancer
  • 83. Virality Five prominent topics according to virality, the most retweeted tweet together with its url. ID Topic Retweets URL 47lung cancer 145,421https://t.co/nAtqnmKCqW 491acute lymphoblastic leukemia 11,338https://t.co/zc4qFt6fy5 433climate change 47,547https://t.co/zxzAlorA3O 272vaccination 11,923https://t.co/d6l8vfmBVW 348transplantation 60,692https://t.co/FSmETQpSkm 47 lung cancer 491 leukemia 433 climate change 272 vaccination 348 transplantation
  • 84. Task 5.4.3 Twitter conversation analysis • Builds on other WP5.4 activities, but takes a somewhat different approach to collecting data.  Focuses on relationships between social media posts (retweets, @tweets, #tweets)  Possible to construct meaningful tests as ”scripted dialogs”  Helps weed out spam  Amenable to content based text analysis at the conversation level (e.g. Sentiment analys, topic modelling)
  • 85. Referring to research in thread First collected tweet in thread: -[tweet id='13441' replyto='14018'] Independent research has shown that individuals who were vaccinated for the flu had 5.5 times more respiratory illness than those who were not vaccinated. [/tweet] - (A number of replies omitted; thread length: 313) - [tweet id='216387' replyto='216418'] In the light of new info, why not? It happens all the time.[/tweet] - (Replies omitted, showing those with reference) - [tweet id='216302' replyto='216387'] which is???DOI:10.1371/journal.pntd.0005179 [/tweet] - [tweet id='216261' replyto='216387'] 'Analysis of year 3 results of phase III trials of Dengvaxia suggest high rates of protection of vaccinated partial dengue immunes but high rates of hospitalizations during breakthrough dengue infections of persons who were vaccinated when seronegative...'DOI:10.1371/journal.pntd.0005179 [/tweet] -- [tweet id='216241' replyto='216387'] Phase III Trials, among our 9-year olds! FACT. DOI:10.1371/journal.pntd.0005179 [/tweet] --- [tweet id='215757' replyto='216241'] Phase 2 was all that is required for release Phase 3 was 'extra' 'Extra' studies are always done throughout the commercial lifetimes of drugs & vaccines Consequences of phase 3 results are nowhere near what group wud have us believe DOI:10.1371/journal.pntd.0005179 [/tweet]
  • 86. Vaccination on Twitter Topic bursts, user behaviour and referring to research in discussions
  • 87. Topic burst • Identify a day when activity is more than 50% above the daily average • The burst extends up to the next day with activity below the average • This period is compared to previous and following periods of equal length • This example: 4 day long burst in topic 272 (vaccination) 3 123 175 254 272 362 0 10000 20000 30000 40000 50000 60000 70000 14-Jan 15-Jan 16-Jan 17-Jan 18-Jan 19-Jan 20-Jan 21-Jan 22-Jan 23-Jan 24-Jan 25-Jan 26-Jan 27-Jan 28-Jan 29-Jan 30-Jan 31-Jan 1-Feb 2-Feb 3-Feb 4-Feb 5-Feb 6-Feb 7-Feb 8-Feb 9-Feb 10-Feb
  • 88. RT networks (similar structures, amount of RTs increases when activity is high) Word clouds based on hashtags (seemingly a topical shift during burst) 48% rts 55% rts 42.5% rts User groups and their relative activity Previous (144869 tweets) Burst (194712) Next (115557) Top 1% most active share (overall: 16%) 12 12 19 Next 9% share (overall: 17%) 20 18 18 90% least active share (overall: 67%) 68 70 63 The least active user group is more prominent when general activity is high while the most active user group is more prominent when activity is low.
  • 89. ”Deniers” (measles, vaxxed, mmr, autism, study, flu, hpv, informedconsent, vaxwoke, cdc, vaccineinjury, learntherisk, maga, gardasil, vaccineskill) ”Non-deniers 2” (measles, vaccineswork, publichealth, science, humanitariancrisis, scientificreport, antivax, vaccinessavelives, venezuela, crisis, humanitarianaid, help, antivaxxers, vaccinesaresafe, misinformation, scicomm, itrustvaccines, mmr, factsmatter) ”Non-deniers 1” (measles, vaccineswork, flu, hpv, antivax, vaxfactsfebruary, vaccinessavelives, immunization, antivaxxers, mumps, rotavirus, ethiopia, law, ebola) RT and coupled hashtag networks from burst period.
  • 90. Academic 27% Academically trained 11% Other Professional 23% Media 38% Policy/decision maker 1% 9,647 plain text biographies from Twitter profiles classified using a rule-based method: 30 % matched as professionals: Class Keyword example Science student student, studying, Graduated MS, MA, graduate University faculty lectur, prof., professor Other scientist technician, lab manager, -ologist Education and outreach curator, teacher, librarian Applied science organization nonprofit, philantropy Other professional recruiter, entrepreneur, manager Media professional journalis, publisher Policy/decision maker congressman, senator, parliament Ekström, B. (2019): Developing a rule-based method for identifying researchers on Twitter: The case of vaccine discussions Poster accepted to ISSI, 17th International Society of Scientometrics and Informetrics Conference, Rome, 2-5 September.
  • 91. How can we use Twitter-bio personas? - Retweet data
  • 92. How can we use Twitter-bio personas? Conversation data ?
  • 95. Data4Impact has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 770531. Thank you for your attention! If you would like to be notified when the online monitoring platform is launched, email us at: sonata@ppmi.lt Visit out website: www.data4impact.eu Follow us on Twitter and SlideShare: @Data4Impact