SlideShare a Scribd company logo
1 of 65
Download to read offline
Data management
what's the use for Research?
dirk.roorda@dans.knaw.nl
2016-04-21 Den Haag
a case study based on the Hebrew Bible
Scholarly letters 16th century
download as pdf
Welcome to
SHEBANQ
Wido van Peursen, leader of
ETCBC. Initiator and strategic
leader.
Oliver Glanz, Andrews
University. ETCBC data expert,
contributing numerous queries
for teaching.
Dirk Roorda, DANS. Author of
most of the code.
Eep Talstra, founder of
ETCBC. Still computing
(Pascal): participant
data in the making.
Constantijn Sikkel,
data designer for
ETCBC. Inventor of
efficient data creation
work flows.
Janet Dyk, linguist at
ETCBC. Long-time data
contributor, specialized
in verbal valence and
language variation.
Reinoud Oosting, data
designer for Leiden
University. Contributed
ETCBC data, now key
user.
Ulrik Sandborg-
Petersen, creator of
Emdros. Without it,
SHEBANQ would not
exist!
Henk van den Berg,
DANS. Programmed the
first versions.
Heleen van de Schraaf,
then DANS.
Programmed the first
user interface.
SHEBANQ relies on data
and tools created by
contributors in the past
User Guide
System for HEBrew Text: 

ANnotations for Queries and Markup 

funded by

CLARIN-NL, The Language Archive
Text. What is it?
bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ .
Genesis 1:1
In the beginning God created the heavens and the earth.
A string of words ...
bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ .
... separated by spaces?
bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ .
bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ .
A string of letters ...
... in which alefbet?
bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm
ʔˌēṯ haššāmˈayim wᵊʔ
ˌēṯ hāʔˈāreṣ .
phonetic
hebrew with vowels and accents
‫֣א‬ ָ‫ר‬ ָ‫בּ‬ ‫ית‬ ֖ ִ‫אשׁ‬ ֵ‫ר‬ ְ‫בּ‬
‫ם‬ִ‫֖י‬ ַ‫מ‬ָּ‫שׁ‬ ַ‫ה‬ ‫֥ת‬ ֵ‫א‬ ‫֑ים‬ ִ‫ֱֹלה‬‫א‬
‫ץ׃‬ ֶ‫ר‬ ָֽ‫א‬ ָ‫ה‬ ‫֥ת‬ ֵ‫א‬ ְ‫ו‬
hebrew consonantal
‫ברא‬ ‫בראשית‬
‫השמים‬ ‫את‬ ‫אלהים‬
‫הארץ׃‬ ‫ואת‬
etcbc transcription (full)
B.:- R;>CI73JT
B.@R@74>
>:ELOHI92JM >;71T
HA- C.@MA73JIM W:-
>;71T H@- >@75REY00
etcbc transcription (consonantal)
B R>CJT BR> >LHJM
>T H CMJM W >T H
>RY
1. The Text itself (representations)
2. Linguistics (feature structures)
3. "Manual" (really manual or software-generated)
4. Queries (exegetical search)
layers of annotation
words with
highlighted
occurrences
queries with
highlighted hits
click name to
toggle preview
of query
click author to
goto query and
all hits
click entry to
goto word and
all occurrences
click gloss to
toggle preview
of word
click any word to
toggle its highlight
Context items for this
chapter
enlarge preview
of query in a
pop-up
Data and tradition
text+
linguistics=>
data
+
research
=>
Wido van Peursen
What do we find?
wivu wivu hebrew
What do we find?
What do we find?
Observations?
The first hits are from archives, infrastructures
Researchers and their institutes follow later
The hits are mainly books, i.e. publications
What's missing?
metadata: descriptions, manuals, code books
analyses: what use have other researchers
made of this data?
instruments: tools to handle this kind of data
the very data!
Explanations?
These researchers started before the internet
they have developed a sphisticated data
workflow in their institute
the ETCBC has grown a thick cell membrane
research data cycle ?
research data cycle ?
religious
communities
theol.
scholars
theol.
scholars
enlightened lay
people
research data cycle !
religious
communities
theol.
scholars
theol.
scholars
enlightened lay
people
linguists
comp. hum
Research Data
Archiving
DANS
CLARIN
SHEBANQ
LAF-Fabric
More visibility
step 1: website (2008)
wivu.dans.knaw.nl
step 2: demo (2012)
step 3: deposit
(2012)
what has been deposited?
step 4: project (2013)
SHEBANQ
System for Hebrew Text: ANnotations for
Queries
CLARIN-NL project
	 data curation: LAF
	 demonstrator: query saver
	
#!/etc bc
LAF? Yes, ISO
Linguistic Annotation Framework
ISO 24612:2012
Nancy Ide, Laurent Romary
This is LAF
<node xml:id="n_88917">
<link targets="r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11"/>
</node>
<edge xml:id="e1" from="n88917" to="n84383"/>
<a xml:id="ae1" label="parents" ref="e1" as="link"/>
<region xml:id="r_2" anchors="6 23"/>
<node xml:id="n_3"><link targets="r_2"/></node>
<a xml:id="a_3" label="word" ref="n_3" as="monads"/>labeled
edges
nodes
annotations
(features)
annotations
(empty)
primary data
regions
lexeme_utf8= ‫ר‬‫א‬‫שׁ‬‫י‬‫ת‬
surface_consonants_utf8= ‫ר‬‫א‬‫שׁ‬‫י‬‫ת‬
‫בּ‬ְ‫ר‬ֵ‫א‬‫שׁ‬ִ֖‫י‬‫ת‬‫בּ‬ָ‫ר‬ָ֣‫א‬‫א‬ֱ.‫ה‬ִ֑‫י‬‫ם‬‫א‬ֵ֥‫ת‬‫ה‬ַ‫שּׁ‬ָ‫מ‬ַ֖‫י‬ִ‫ם‬‫ְו‬‫א‬ֵ֥‫ת‬‫ה‬ָ‫א‬ָֽ‫ר‬ֶ‫ץ‬‫׃‬
0-56-2392 72-91r9r10r11
n2n3
word
sentence
phrase
determination=determined
phrase_function=Objc
phrase_type=PP
parents
mother
subphrase
clause
r11 r10 r9
clause_atom_number=1
clause_atom_relation=0
clause_atom_type=xQtl
indentation=0
<a xml:id="af22" label="ft" ref="n3" as="utf8"><fs>
<f name="lexeme_utf8" value=" ‫ר‬‫א‬‫ׁש‬‫י‬‫ת‬ "/>
<f name="surface_consonants_utf8" value=" ‫ר‬‫א‬‫ׁש‬‫י‬‫ת‬ "/>
</fs></a>
link to
regions
Linguistic Annotation Framework
IPython notebook
excursion: data and tools
data is not available separately
there is always the need for a tool: software
inspect
transport
transform
data science at the command line
http://datascienceatthecommandline.com
http://datasciencetoolbox.org
The Data Science Toolbox is a virtual
environment based on Ubuntu Linux that
is specifically suited for doing data
science. Its purpose is to get you started
in a matter of minutes. You can run the
Data Science Toolbox either locally
(using VirtualBox and Vagrant) or in the
cloud (using Amazon Web Services).
inspect
dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > ls
etcbc4.hdr etcbc4_lingo.c.xml etcbc4_lingo.sp.xml etcbc4_regions.xml
etcbc4.lst etcbc4_lingo.p.xml etcbc4_lingo.xml etcbc4_sections.xml
etcbc4.txt etcbc4_lingo.pa.xml etcbc4_monads.lex.xml
etcbc4.txt.hdr etcbc4_lingo.s.xml etcbc4_monads.xml
dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > du -h .
1.5G .
dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > fgrep -l 'BR&gt;' *.xml
etcbc4_monads.lex.xml
BR> = ‫ברא‬ = maken
dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > fgrep -c 'BR&gt;' etcbc4_monads.lex.xml
113
inspect
dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > ls -lh *.txt
-rw-r--r-- 1 dirk staff 5.1M Jul 23 10:58 etcbc4.txt
inspect
inspect (xml)
dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > ls -lh *.xml
-rw-r--r-- 1 dirk staff 104M Jul 23 11:00 etcbc4_lingo.c.xml
-rw-r--r-- 1 dirk staff 107M Jul 23 11:00 etcbc4_lingo.p.xml
-rw-r--r-- 1 dirk staff 148M Jul 23 11:00 etcbc4_lingo.pa.xml
-rw-r--r-- 1 dirk staff 22M Jul 23 11:00 etcbc4_lingo.s.xml
-rw-r--r-- 1 dirk staff 23M Jul 23 11:00 etcbc4_lingo.sp.xml
-rw-r--r-- 1 dirk staff 299M Jul 23 11:00 etcbc4_lingo.xml
-rw-r--r-- 1 dirk staff 642M Jul 23 10:58 etcbc4_monads.lex.xml
-rw-r--r-- 1 dirk staff 125M Jul 23 10:58 etcbc4_monads.xml
-rw-r--r-- 1 dirk staff 37M Jul 23 10:58 etcbc4_regions.xml
-rw-r--r-- 1 dirk staff 36M Jul 23 10:58 etcbc4_sections.xml
dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > time xmllint --nonet --noout
--stream --schema /Users/dirk/Dropbox/laf-fabric-data/etcbc4/decl/graf-
standoff.xsd etcbc4_monads.lex.xml
etcbc4_monads.lex.xml validates
real 2m26.029s
user 2m20.308s
sys 0m2.376s
inspect (xml)
.hdr => .xml
transport
transform
The shortest path to having the computer work for me
scripting
shell, python
scientific programming
software as instrument
hourly cycle
by and for researchers
programming
C, C++, Java
software engineering
applications as product
weekly cycle
by ICT dev for
researcher
what do scholars want
they are not software developers
but they do program
they explore data, knead, massage
their products are not software
but analyses, visualizations, publications
scientific computing
more than (i)Python
more than an interface
more than an ecosystem
a culture:
culture
fragments from a
video of Fernando Perez
4:19 researchers and computing - 7:37
17:00 tools and the data life cycle - 20:26
42:09 data and publishing - 44:20 / 49:22
trees for Data Oriented Parsing
step 6: harvest (2014-2015)
Rens Bod:
ling/dighum
Data Oriented Parsing
Bible Online Learner
Nicolai Winther-Nielsen
EuroPlot, University of Aalborg
Martijn Naaijer
Linguistic Variation:
statistics with R
step 7: better versions (2013-2015)
step 7: better versions (2013-2015)
step 7: better versions (2013-2015)
step 8: website SHEBANQ (2013)
hey, Martijn is after something!
inform your followers with 1 click
just browsing Genesis 4
step 9: mature SHEBANQ (2015-2016)
Queries
visuals
step 10: more (2016-2020)
more projects (digging into data?)
more disciplines (linguistics, data science, archaeology)
more data sources (syriac, qumran)
more users
> 250 people
systems (Bible Online Learner, Tiberias)
institutes (VU University, Andrews University, Aalborg University)
more output (articles, derived data)
more training (workshops, master students, Ph.D students)
better position in the competition
turn-turn-turn
religious
communities
theol.
scholars
theol.
scholars
enlightened lay
people
linguists
comp. hum
Research Data
Archiving
DANS
research environment
function medium infra
data LAF in dataset DANS EASY
web site web2py
DANS=>KNAW,
Leaseweb, Cloud
tools
LAF-Fabric,
Shebanq, Emdros
Github,
Sourceforge
publishing
IPython notebooks,
Restructured Text
Github,
Readthedocs
products
apps, notebooks,
articles
Github, Science
Clouds, Journals
is this a success story?
there is certainly a degree of success ...
it took 6 years to get a feeling of acceleration
grab opportunities eagerly
persuade liberally
embrace technology
and combine it with affinity with sources and scholarship
make up-front investments (time, relationships)
why is it not going faster?
the team is efficiently organised already
new ways of work have not proved themselves
yet
technical support is a rare and expensive
commodity for small teams in the humanities
contributing factors
personnel mutations
new projects
new requirements from funders (open access)
competition and collaboration across disciplines
the digital world is increasingly penetrating
people's lives
yes, if they realize the importance of re-use
yes, if they find the path to archiving
yes, if archives go out of their way to be
relevant for researchers
yes, if archives use ICT proactively
dirk.roorda@dans.knaw.nl
Data management
useful for researchers? ...
yes, if they realize the importance of re-use
yes, if they find the path to archiving
yes, if archives go out of their way to be
relevant for researchers
yes, if archives use ICT proactively
none of these are
straightforward
dirk.roorda@dans.knaw.nl
Data management
useful for researchers? ...

More Related Content

What's hot

DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
TPDL2013 tutorial linked data for digital libraries 2013-10-22
TPDL2013 tutorial linked data for digital libraries 2013-10-22TPDL2013 tutorial linked data for digital libraries 2013-10-22
TPDL2013 tutorial linked data for digital libraries 2013-10-22jodischneider
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Richard Urban
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
Open hpi semweb-06-part5
Open hpi semweb-06-part5Open hpi semweb-06-part5
Open hpi semweb-06-part5Nadine Ludwig
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Linking American Art to the Cloud
Linking American Art to the CloudLinking American Art to the Cloud
Linking American Art to the CloudGeorgina Goodlander
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)Oscar Corcho
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationHerbert Van de Sompel
 
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware ac.uk
 
Transcript - Provenance and Social Science data
Transcript  - Provenance and Social Science dataTranscript  - Provenance and Social Science data
Transcript - Provenance and Social Science dataARDC
 

What's hot (20)

DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
TPDL2013 tutorial linked data for digital libraries 2013-10-22
TPDL2013 tutorial linked data for digital libraries 2013-10-22TPDL2013 tutorial linked data for digital libraries 2013-10-22
TPDL2013 tutorial linked data for digital libraries 2013-10-22
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Open hpi semweb-06-part5
Open hpi semweb-06-part5Open hpi semweb-06-part5
Open hpi semweb-06-part5
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Linking American Art to the Cloud
Linking American Art to the CloudLinking American Art to the Cloud
Linking American Art to the Cloud
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustration
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
 
Memento 101
Memento 101Memento 101
Memento 101
 
Transcript - Provenance and Social Science data
Transcript  - Provenance and Social Science dataTranscript  - Provenance and Social Science data
Transcript - Provenance and Social Science data
 

Viewers also liked

Research data management
Research data managementResearch data management
Research data managementHugo Besemer
 
Data management in Stata
Data management in StataData management in Stata
Data management in Stataizahn
 
Resumen tema 3 gestión
Resumen tema 3 gestiónResumen tema 3 gestión
Resumen tema 3 gestiónnoeliags16
 
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...Rhapsody Technologies, Inc.
 
Selecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachSelecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachChristopher Bradley
 
Data Management Strategies
Data Management StrategiesData Management Strategies
Data Management StrategiesMicheal Axelsen
 
An Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and GraphAn Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and GraphMarko Rodriguez
 
Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Jisc RDM
 
3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final PresentationJames Chi
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...Christopher Bradley
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity ModelsAlan McSweeney
 
Compliance: Data Management Plans and Public Access to Data
Compliance: Data Management Plans and Public Access to DataCompliance: Data Management Plans and Public Access to Data
Compliance: Data Management Plans and Public Access to DataMargaret Henderson
 
10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Managementibi
 
Open Science: Research Data Management
Open Science: Research Data ManagementOpen Science: Research Data Management
Open Science: Research Data ManagementLibrary_Connect
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementSung Kuan
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016 Rebecca Raworth, MLIS
 
Top 10 clinical data manager interview questions and answers
Top 10 clinical data manager interview questions and answersTop 10 clinical data manager interview questions and answers
Top 10 clinical data manager interview questions and answerscadderlux
 
Emerging Trends in Clinical Data Management
Emerging Trends in Clinical Data ManagementEmerging Trends in Clinical Data Management
Emerging Trends in Clinical Data ManagementArshad Mohammed
 

Viewers also liked (20)

Research data management
Research data managementResearch data management
Research data management
 
Data management in Stata
Data management in StataData management in Stata
Data management in Stata
 
Resumen tema 3 gestión
Resumen tema 3 gestiónResumen tema 3 gestión
Resumen tema 3 gestión
 
5 Steps To Master Data Management
5 Steps To Master Data Management5 Steps To Master Data Management
5 Steps To Master Data Management
 
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
 
Selecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachSelecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approach
 
Data Management Strategies
Data Management StrategiesData Management Strategies
Data Management Strategies
 
An Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and GraphAn Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and Graph
 
Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...
 
3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity Models
 
Compliance: Data Management Plans and Public Access to Data
Compliance: Data Management Plans and Public Access to DataCompliance: Data Management Plans and Public Access to Data
Compliance: Data Management Plans and Public Access to Data
 
10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management
 
Open Science: Research Data Management
Open Science: Research Data ManagementOpen Science: Research Data Management
Open Science: Research Data Management
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
Top 10 clinical data manager interview questions and answers
Top 10 clinical data manager interview questions and answersTop 10 clinical data manager interview questions and answers
Top 10 clinical data manager interview questions and answers
 
Emerging Trends in Clinical Data Management
Emerging Trends in Clinical Data ManagementEmerging Trends in Clinical Data Management
Emerging Trends in Clinical Data Management
 

Similar to Data management for researchers

CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
 
Digital library software
Digital library softwareDigital library software
Digital library softwareavid
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009Kevin Ashley
 
Dspace Installation Requirement
Dspace Installation RequirementDspace Installation Requirement
Dspace Installation RequirementAnil Mishra
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
 
Academic English With The Electronic Theses Online Service (EThOS) At The Bri...
Academic English With The Electronic Theses Online Service (EThOS) At The Bri...Academic English With The Electronic Theses Online Service (EThOS) At The Bri...
Academic English With The Electronic Theses Online Service (EThOS) At The Bri...Martha Brown
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
 
Towards a digital library for York
Towards a digital library for YorkTowards a digital library for York
Towards a digital library for YorkJulie Allinson
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 
Inforum 2007 Into The User environment
Inforum 2007 Into The User environmentInforum 2007 Into The User environment
Inforum 2007 Into The User environmentGuus van den Brekel
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research CentreMichael Hausenblas
 

Similar to Data management for researchers (20)

CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
WALS and eLanguage (Leipzig)
WALS and eLanguage (Leipzig)WALS and eLanguage (Leipzig)
WALS and eLanguage (Leipzig)
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Digital library software
Digital library softwareDigital library software
Digital library software
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Irish Digital Libraries Summit
Irish Digital Libraries SummitIrish Digital Libraries Summit
Irish Digital Libraries Summit
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009
 
Dspace Installation Requirement
Dspace Installation RequirementDspace Installation Requirement
Dspace Installation Requirement
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
Academic English With The Electronic Theses Online Service (EThOS) At The Bri...
Academic English With The Electronic Theses Online Service (EThOS) At The Bri...Academic English With The Electronic Theses Online Service (EThOS) At The Bri...
Academic English With The Electronic Theses Online Service (EThOS) At The Bri...
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the Web
 
Towards a digital library for York
Towards a digital library for YorkTowards a digital library for York
Towards a digital library for York
 
Niatalk24jan10
Niatalk24jan10Niatalk24jan10
Niatalk24jan10
 
Semantic Web in Action
Semantic Web in ActionSemantic Web in Action
Semantic Web in Action
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 
Inforum 2007 Into The User environment
Inforum 2007 Into The User environmentInforum 2007 Into The User environment
Inforum 2007 Into The User environment
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research Centre
 

More from Dirk Roorda

General Missives
General MissivesGeneral Missives
General MissivesDirk Roorda
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)Dirk Roorda
 
Quran and Text-Fabric
Quran and Text-FabricQuran and Text-Fabric
Quran and Text-FabricDirk Roorda
 
Ancient corpora analysis
Ancient corpora analysisAncient corpora analysis
Ancient corpora analysisDirk Roorda
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsDirk Roorda
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew BibleDirk Roorda
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissenDirk Roorda
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDirk Roorda
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDirk Roorda
 
Hebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsHebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsDirk Roorda
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Dirk Roorda
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleDirk Roorda
 

More from Dirk Roorda (20)

TF-FAIR.pdf
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdf
 
Textpy
TextpyTextpy
Textpy
 
General Missives
General MissivesGeneral Missives
General Missives
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)
 
Tf in-context
Tf in-contextTf in-context
Tf in-context
 
Quran and Text-Fabric
Quran and Text-FabricQuran and Text-Fabric
Quran and Text-Fabric
 
Ancient corpora analysis
Ancient corpora analysisAncient corpora analysis
Ancient corpora analysis
 
Qdf2tf
Qdf2tfQdf2tf
Qdf2tf
 
Text fabric
Text fabricText fabric
Text fabric
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew Verbs
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew Bible
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Award
AwardAward
Award
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Hebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsHebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, Lessons
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew Bible
 
LAF Fabric
LAF FabricLAF Fabric
LAF Fabric
 

Recently uploaded

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Data management for researchers

  • 1. Data management what's the use for Research? dirk.roorda@dans.knaw.nl 2016-04-21 Den Haag a case study based on the Hebrew Bible
  • 3.
  • 4. download as pdf Welcome to SHEBANQ Wido van Peursen, leader of ETCBC. Initiator and strategic leader. Oliver Glanz, Andrews University. ETCBC data expert, contributing numerous queries for teaching. Dirk Roorda, DANS. Author of most of the code. Eep Talstra, founder of ETCBC. Still computing (Pascal): participant data in the making. Constantijn Sikkel, data designer for ETCBC. Inventor of efficient data creation work flows. Janet Dyk, linguist at ETCBC. Long-time data contributor, specialized in verbal valence and language variation. Reinoud Oosting, data designer for Leiden University. Contributed ETCBC data, now key user. Ulrik Sandborg- Petersen, creator of Emdros. Without it, SHEBANQ would not exist! Henk van den Berg, DANS. Programmed the first versions. Heleen van de Schraaf, then DANS. Programmed the first user interface. SHEBANQ relies on data and tools created by contributors in the past User Guide System for HEBrew Text: ANnotations for Queries and Markup  funded by CLARIN-NL, The Language Archive
  • 5. Text. What is it? bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ . Genesis 1:1 In the beginning God created the heavens and the earth.
  • 6. A string of words ... bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ .
  • 7. ... separated by spaces? bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ . bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ .
  • 8. A string of letters ... ... in which alefbet? bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔ ˌēṯ hāʔˈāreṣ . phonetic
  • 9. hebrew with vowels and accents ‫֣א‬ ָ‫ר‬ ָ‫בּ‬ ‫ית‬ ֖ ִ‫אשׁ‬ ֵ‫ר‬ ְ‫בּ‬ ‫ם‬ִ‫֖י‬ ַ‫מ‬ָּ‫שׁ‬ ַ‫ה‬ ‫֥ת‬ ֵ‫א‬ ‫֑ים‬ ִ‫ֱֹלה‬‫א‬ ‫ץ׃‬ ֶ‫ר‬ ָֽ‫א‬ ָ‫ה‬ ‫֥ת‬ ֵ‫א‬ ְ‫ו‬
  • 10. hebrew consonantal ‫ברא‬ ‫בראשית‬ ‫השמים‬ ‫את‬ ‫אלהים‬ ‫הארץ׃‬ ‫ואת‬
  • 11. etcbc transcription (full) B.:- R;>CI73JT B.@R@74> >:ELOHI92JM >;71T HA- C.@MA73JIM W:- >;71T H@- >@75REY00
  • 12. etcbc transcription (consonantal) B R>CJT BR> >LHJM >T H CMJM W >T H >RY
  • 13. 1. The Text itself (representations) 2. Linguistics (feature structures) 3. "Manual" (really manual or software-generated) 4. Queries (exegetical search) layers of annotation
  • 14. words with highlighted occurrences queries with highlighted hits click name to toggle preview of query click author to goto query and all hits click entry to goto word and all occurrences click gloss to toggle preview of word click any word to toggle its highlight Context items for this chapter enlarge preview of query in a pop-up
  • 15.
  • 18. What do we find? wivu wivu hebrew
  • 19. What do we find?
  • 20. What do we find?
  • 21. Observations? The first hits are from archives, infrastructures Researchers and their institutes follow later The hits are mainly books, i.e. publications
  • 22. What's missing? metadata: descriptions, manuals, code books analyses: what use have other researchers made of this data? instruments: tools to handle this kind of data the very data!
  • 23. Explanations? These researchers started before the internet they have developed a sphisticated data workflow in their institute the ETCBC has grown a thick cell membrane
  • 25. research data cycle ? religious communities theol. scholars theol. scholars enlightened lay people
  • 26. research data cycle ! religious communities theol. scholars theol. scholars enlightened lay people linguists comp. hum Research Data Archiving DANS CLARIN SHEBANQ LAF-Fabric
  • 28. step 1: website (2008) wivu.dans.knaw.nl
  • 29. step 2: demo (2012)
  • 31. what has been deposited?
  • 32. step 4: project (2013) SHEBANQ System for Hebrew Text: ANnotations for Queries CLARIN-NL project data curation: LAF demonstrator: query saver #!/etc bc
  • 33. LAF? Yes, ISO Linguistic Annotation Framework ISO 24612:2012 Nancy Ide, Laurent Romary
  • 34. This is LAF <node xml:id="n_88917"> <link targets="r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11"/> </node> <edge xml:id="e1" from="n88917" to="n84383"/> <a xml:id="ae1" label="parents" ref="e1" as="link"/> <region xml:id="r_2" anchors="6 23"/> <node xml:id="n_3"><link targets="r_2"/></node> <a xml:id="a_3" label="word" ref="n_3" as="monads"/>labeled edges nodes annotations (features) annotations (empty) primary data regions lexeme_utf8= ‫ר‬‫א‬‫שׁ‬‫י‬‫ת‬ surface_consonants_utf8= ‫ר‬‫א‬‫שׁ‬‫י‬‫ת‬ ‫בּ‬ְ‫ר‬ֵ‫א‬‫שׁ‬ִ֖‫י‬‫ת‬‫בּ‬ָ‫ר‬ָ֣‫א‬‫א‬ֱ.‫ה‬ִ֑‫י‬‫ם‬‫א‬ֵ֥‫ת‬‫ה‬ַ‫שּׁ‬ָ‫מ‬ַ֖‫י‬ִ‫ם‬‫ְו‬‫א‬ֵ֥‫ת‬‫ה‬ָ‫א‬ָֽ‫ר‬ֶ‫ץ‬‫׃‬ 0-56-2392 72-91r9r10r11 n2n3 word sentence phrase determination=determined phrase_function=Objc phrase_type=PP parents mother subphrase clause r11 r10 r9 clause_atom_number=1 clause_atom_relation=0 clause_atom_type=xQtl indentation=0 <a xml:id="af22" label="ft" ref="n3" as="utf8"><fs> <f name="lexeme_utf8" value=" ‫ר‬‫א‬‫ׁש‬‫י‬‫ת‬ "/> <f name="surface_consonants_utf8" value=" ‫ר‬‫א‬‫ׁש‬‫י‬‫ת‬ "/> </fs></a> link to regions Linguistic Annotation Framework
  • 35.
  • 37. excursion: data and tools data is not available separately there is always the need for a tool: software inspect transport transform
  • 38. data science at the command line http://datascienceatthecommandline.com http://datasciencetoolbox.org The Data Science Toolbox is a virtual environment based on Ubuntu Linux that is specifically suited for doing data science. Its purpose is to get you started in a matter of minutes. You can run the Data Science Toolbox either locally (using VirtualBox and Vagrant) or in the cloud (using Amazon Web Services).
  • 39. inspect dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > ls etcbc4.hdr etcbc4_lingo.c.xml etcbc4_lingo.sp.xml etcbc4_regions.xml etcbc4.lst etcbc4_lingo.p.xml etcbc4_lingo.xml etcbc4_sections.xml etcbc4.txt etcbc4_lingo.pa.xml etcbc4_monads.lex.xml etcbc4.txt.hdr etcbc4_lingo.s.xml etcbc4_monads.xml dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > du -h . 1.5G . dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > fgrep -l 'BR&gt;' *.xml etcbc4_monads.lex.xml BR> = ‫ברא‬ = maken dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > fgrep -c 'BR&gt;' etcbc4_monads.lex.xml 113
  • 40. inspect dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > ls -lh *.txt -rw-r--r-- 1 dirk staff 5.1M Jul 23 10:58 etcbc4.txt
  • 42. inspect (xml) dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > ls -lh *.xml -rw-r--r-- 1 dirk staff 104M Jul 23 11:00 etcbc4_lingo.c.xml -rw-r--r-- 1 dirk staff 107M Jul 23 11:00 etcbc4_lingo.p.xml -rw-r--r-- 1 dirk staff 148M Jul 23 11:00 etcbc4_lingo.pa.xml -rw-r--r-- 1 dirk staff 22M Jul 23 11:00 etcbc4_lingo.s.xml -rw-r--r-- 1 dirk staff 23M Jul 23 11:00 etcbc4_lingo.sp.xml -rw-r--r-- 1 dirk staff 299M Jul 23 11:00 etcbc4_lingo.xml -rw-r--r-- 1 dirk staff 642M Jul 23 10:58 etcbc4_monads.lex.xml -rw-r--r-- 1 dirk staff 125M Jul 23 10:58 etcbc4_monads.xml -rw-r--r-- 1 dirk staff 37M Jul 23 10:58 etcbc4_regions.xml -rw-r--r-- 1 dirk staff 36M Jul 23 10:58 etcbc4_sections.xml dirk:~/Dropbox/laf-fabric-data/etcbc4/laf > time xmllint --nonet --noout --stream --schema /Users/dirk/Dropbox/laf-fabric-data/etcbc4/decl/graf- standoff.xsd etcbc4_monads.lex.xml etcbc4_monads.lex.xml validates real 2m26.029s user 2m20.308s sys 0m2.376s
  • 45. transform The shortest path to having the computer work for me scripting shell, python scientific programming software as instrument hourly cycle by and for researchers programming C, C++, Java software engineering applications as product weekly cycle by ICT dev for researcher
  • 46. what do scholars want they are not software developers but they do program they explore data, knead, massage their products are not software but analyses, visualizations, publications
  • 47. scientific computing more than (i)Python more than an interface more than an ecosystem a culture:
  • 48. culture fragments from a video of Fernando Perez 4:19 researchers and computing - 7:37 17:00 tools and the data life cycle - 20:26 42:09 data and publishing - 44:20 / 49:22
  • 49. trees for Data Oriented Parsing
  • 50. step 6: harvest (2014-2015) Rens Bod: ling/dighum Data Oriented Parsing Bible Online Learner Nicolai Winther-Nielsen EuroPlot, University of Aalborg Martijn Naaijer Linguistic Variation: statistics with R
  • 51. step 7: better versions (2013-2015)
  • 52. step 7: better versions (2013-2015)
  • 53. step 7: better versions (2013-2015)
  • 54. step 8: website SHEBANQ (2013) hey, Martijn is after something! inform your followers with 1 click just browsing Genesis 4
  • 55. step 9: mature SHEBANQ (2015-2016)
  • 58. step 10: more (2016-2020) more projects (digging into data?) more disciplines (linguistics, data science, archaeology) more data sources (syriac, qumran) more users > 250 people systems (Bible Online Learner, Tiberias) institutes (VU University, Andrews University, Aalborg University) more output (articles, derived data) more training (workshops, master students, Ph.D students) better position in the competition
  • 60. research environment function medium infra data LAF in dataset DANS EASY web site web2py DANS=>KNAW, Leaseweb, Cloud tools LAF-Fabric, Shebanq, Emdros Github, Sourceforge publishing IPython notebooks, Restructured Text Github, Readthedocs products apps, notebooks, articles Github, Science Clouds, Journals
  • 61. is this a success story? there is certainly a degree of success ... it took 6 years to get a feeling of acceleration grab opportunities eagerly persuade liberally embrace technology and combine it with affinity with sources and scholarship make up-front investments (time, relationships)
  • 62. why is it not going faster? the team is efficiently organised already new ways of work have not proved themselves yet technical support is a rare and expensive commodity for small teams in the humanities
  • 63. contributing factors personnel mutations new projects new requirements from funders (open access) competition and collaboration across disciplines the digital world is increasingly penetrating people's lives
  • 64. yes, if they realize the importance of re-use yes, if they find the path to archiving yes, if archives go out of their way to be relevant for researchers yes, if archives use ICT proactively dirk.roorda@dans.knaw.nl Data management useful for researchers? ...
  • 65. yes, if they realize the importance of re-use yes, if they find the path to archiving yes, if archives go out of their way to be relevant for researchers yes, if archives use ICT proactively none of these are straightforward dirk.roorda@dans.knaw.nl Data management useful for researchers? ...