1. |
From Maslow’s Hierarchy
to Knowledgegraphs:
Experiments in Big and
Small Data at Elsevier
Anita de Waard, a.dewaard@elsevier.com
VP Research Data Management, Elsevier
Charleston Conference, November 4, 2016
2. | 2
Big Data vs. Small Data: What Will I Be Talking About?
Data Type Small Big
User UX User analytics
Performance Pure Scival
Research Research Data
Management (RDM)
HPC systems
(HEP, astronomy, etc)
Text Text mining KnowledgeGraphs
Health Medical systems Precision Medicine
Elsevier does I will talk about
3. |
Bauer, B. (Bruno) et al,(2015) ‘Forschende und ihre Daten. Ergebnisse einer österreichweiten Befragung (eBook)‘ (in German)
E-infrastructures Austria, https://phaidra.univie.ac.at/detail_object/o:407736
Stays at institution
Take it with me
Don’t know
Data is lost
Other
When You Leave Your Institution, What Happens To Your Data?
4. |
When we talk about data, we really talk about the following:
Machine & environment settings
Raw data Processed data
Scripts & analyses
Protocols, methods, algorithms
Accessibility
Reproducibility
Reusability
Discoverability
Note: images for illustrative purpose only
4
9. |
Access: Linking papers to data: www.Scholix.org
• ICSU/WDS/RDA Publishing Data
Service Working group
• Creating linked-data model for
exposing DOI to DOI links outside
publisher’s firewall
• Merged with National Data Service
pilot with the same goal
• Collaboration between CrossRef,
DataCite, Europe PubMed Central,
ANDS, Thompson Reuters,
Elsevier, OpenAire
Objective: move from
a plethora of (mostly)
bilateral
arrangements
between the different
players…
.. a one-for-all
cross-referencing
service for articles
and data
.. to ..
10. |
Discover: Data Search (http://datasearch.elsevier.com)
DataSearch.Elsevier.com
1. Across repositories
2. (Deep) indexing of data, so not just metadata
3. Data preview
1
3
2
13. |
SAMPLE OUTPUT:
glaucoma developed many years after chronic inflammation of uveal tract
glaucoma develop following chronic inflammation of uveal tract
glaucoma can appear soon in family history of glaucoma
glaucoma can appear soon in age over 40
glaucoma the risk of functional visual field loss
glaucoma contributing causes of functional visual field loss
glaucoma contributed to functional visual field loss
glaucoma is considered the second leading cause of functional visual field loss
glaucoma remains the second leading cause of functional visual field loss
Deduplication/normalization: downsampled from 49M entity-resolved triples:
14. |
Knowledge Graphs for the Life Sciences:
Bradley Allen, DC Conference, Oct 2016,
http://www.slideshare.net/bpa777/dc2016-keynote-20161013-67164305/15
15. | 15
Trends driving Digital Health & Precision Medicine:
need for health data with consent
4500 tests for gene
disorders available
(2013: 3200 +20% CAGR)
$1245
cost to sequence
full genome
(10/2014: $5730)
$199
cost of 23andME
test
25 million
biomed articles
referenced on PubMed
30 days → 1
hour
manual to machine
learning
time needed to develop
one prediction model at
Elsevier
1.2 million
new biomed articles p.a.
76%
of US hospitals use
at least a basic EMR
130 million patient
data sets at large insurer
21 m complete for last 2 years
7 m with clinical and lab data
NB: 6 m (no clin, lab) in Germany
6.5 million in Catalonia
105 mm ECG
high ecg quality, heart rate, respiratory,
body temp, activity, body position,
water tight, induction charged, bluetooth,
continuous data feed
patientslikeme has
400,000+ members
31 million data points covering
2,500+ conditions, donating data
1. genetic testing
2. information explosion
3. patient data
4. biosensors - IoT in health
5. machine learning
6. patient empowerment
16. | 16
The Elsevier Medical Graph is a deep predictive model
that relates attributes of over 2000 medical conditions
to phenotypes of patients at potential risk of re-admission.
Probability of occurrance within next five years. 2,083 ICD10 conditions.
Based on 6 year longitudinal history of 6 million German patients.
17. | 17
Big Data vs. Small Data: What Did I Talk About?
Data Type Small Big
User UX User analytics
Performance Pure Scival
Research Research Data
Management (RDM)
HPC systems
(HEP, astronomy, etc)
Text Text mining KnowledgeGraphs
Health Medical systems Precision Medicine
Elsevier does I discussed!
18. |
Thank you!
18
Anita de Waard, VP Research Data Collaborations,
Elsevier RDM Services
Jericho, VT 05465
a.dewaard@elsevier.com