This document provides an overview of the SHEBANQ project, which provides tools for querying annotated Hebrew text data. It describes the data sources and contributors that have built up the underlying text corpus over many years. It also outlines the steps taken to make this data and related tools more accessible, including developing a website, depositing data in archives, running demonstration projects, and integrating the data and tools into broader research environments through additional projects and publications. The goal has been to facilitate wider use of this linguistic resource and foster more digital humanities and data science work based on its contents.
4. download as pdf
Welcome to
SHEBANQ
Wido van Peursen, leader of
ETCBC. Initiator and strategic
leader.
Oliver Glanz, Andrews
University. ETCBC data expert,
contributing numerous queries
for teaching.
Dirk Roorda, DANS. Author of
most of the code.
Eep Talstra, founder of
ETCBC. Still computing
(Pascal): participant
data in the making.
Constantijn Sikkel,
data designer for
ETCBC. Inventor of
efficient data creation
work flows.
Janet Dyk, linguist at
ETCBC. Long-time data
contributor, specialized
in verbal valence and
language variation.
Reinoud Oosting, data
designer for Leiden
University. Contributed
ETCBC data, now key
user.
Ulrik Sandborg-
Petersen, creator of
Emdros. Without it,
SHEBANQ would not
exist!
Henk van den Berg,
DANS. Programmed the
first versions.
Heleen van de Schraaf,
then DANS.
Programmed the first
user interface.
SHEBANQ relies on data
and tools created by
contributors in the past
User Guide
System for HEBrew Text:
ANnotations for Queries and Markup
funded by
CLARIN-NL, The Language Archive
5. Text. What is it?
bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ .
Genesis 1:1
In the beginning God created the heavens and the earth.
6. A string of words ...
bᵊrēšˈîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˈayim wᵊʔˌēṯ hāʔˈāreṣ .
13. 1. The Text itself (representations)
2. Linguistics (feature structures)
3. "Manual" (really manual or software-generated)
4. Queries (exegetical search)
layers of annotation
14. words with
highlighted
occurrences
queries with
highlighted hits
click name to
toggle preview
of query
click author to
goto query and
all hits
click entry to
goto word and
all occurrences
click gloss to
toggle preview
of word
click any word to
toggle its highlight
Context items for this
chapter
enlarge preview
of query in a
pop-up
21. Observations?
The first hits are from archives, infrastructures
Researchers and their institutes follow later
The hits are mainly books, i.e. publications
22. What's missing?
metadata: descriptions, manuals, code books
analyses: what use have other researchers
made of this data?
instruments: tools to handle this kind of data
the very data!
23. Explanations?
These researchers started before the internet
they have developed a sphisticated data
workflow in their institute
the ETCBC has grown a thick cell membrane
25. research data cycle ?
religious
communities
theol.
scholars
theol.
scholars
enlightened lay
people
26. research data cycle !
religious
communities
theol.
scholars
theol.
scholars
enlightened lay
people
linguists
comp. hum
Research Data
Archiving
DANS
CLARIN
SHEBANQ
LAF-Fabric
32. step 4: project (2013)
SHEBANQ
System for Hebrew Text: ANnotations for
Queries
CLARIN-NL project
data curation: LAF
demonstrator: query saver
#!/etc bc
37. excursion: data and tools
data is not available separately
there is always the need for a tool: software
inspect
transport
transform
38. data science at the command line
http://datascienceatthecommandline.com
http://datasciencetoolbox.org
The Data Science Toolbox is a virtual
environment based on Ubuntu Linux that
is specifically suited for doing data
science. Its purpose is to get you started
in a matter of minutes. You can run the
Data Science Toolbox either locally
(using VirtualBox and Vagrant) or in the
cloud (using Amazon Web Services).
45. transform
The shortest path to having the computer work for me
scripting
shell, python
scientific programming
software as instrument
hourly cycle
by and for researchers
programming
C, C++, Java
software engineering
applications as product
weekly cycle
by ICT dev for
researcher
46. what do scholars want
they are not software developers
but they do program
they explore data, knead, massage
their products are not software
but analyses, visualizations, publications
48. culture
fragments from a
video of Fernando Perez
4:19 researchers and computing - 7:37
17:00 tools and the data life cycle - 20:26
42:09 data and publishing - 44:20 / 49:22
50. step 6: harvest (2014-2015)
Rens Bod:
ling/dighum
Data Oriented Parsing
Bible Online Learner
Nicolai Winther-Nielsen
EuroPlot, University of Aalborg
Martijn Naaijer
Linguistic Variation:
statistics with R
58. step 10: more (2016-2020)
more projects (digging into data?)
more disciplines (linguistics, data science, archaeology)
more data sources (syriac, qumran)
more users
> 250 people
systems (Bible Online Learner, Tiberias)
institutes (VU University, Andrews University, Aalborg University)
more output (articles, derived data)
more training (workshops, master students, Ph.D students)
better position in the competition
60. research environment
function medium infra
data LAF in dataset DANS EASY
web site web2py
DANS=>KNAW,
Leaseweb, Cloud
tools
LAF-Fabric,
Shebanq, Emdros
Github,
Sourceforge
publishing
IPython notebooks,
Restructured Text
Github,
Readthedocs
products
apps, notebooks,
articles
Github, Science
Clouds, Journals
61. is this a success story?
there is certainly a degree of success ...
it took 6 years to get a feeling of acceleration
grab opportunities eagerly
persuade liberally
embrace technology
and combine it with affinity with sources and scholarship
make up-front investments (time, relationships)
62. why is it not going faster?
the team is efficiently organised already
new ways of work have not proved themselves
yet
technical support is a rare and expensive
commodity for small teams in the humanities
63. contributing factors
personnel mutations
new projects
new requirements from funders (open access)
competition and collaboration across disciplines
the digital world is increasingly penetrating
people's lives
64. yes, if they realize the importance of re-use
yes, if they find the path to archiving
yes, if archives go out of their way to be
relevant for researchers
yes, if archives use ICT proactively
dirk.roorda@dans.knaw.nl
Data management
useful for researchers? ...
65. yes, if they realize the importance of re-use
yes, if they find the path to archiving
yes, if archives go out of their way to be
relevant for researchers
yes, if archives use ICT proactively
none of these are
straightforward
dirk.roorda@dans.knaw.nl
Data management
useful for researchers? ...