SlideShare a Scribd company logo
1 of 34
Download to read offline
The Hebrew Bible as Data 
Laboratory - Sharing - Lessons 
dirk.roorda@dans.knaw.nl 
2014-10-02 
TUSTEP meeting 
Amsterdam 
Query the Hebrew Bible through the 
ETCBC database 
and SHEBANQ
overview 
in the beginning: origin story: ETCBC 
six days of working: laboratory: LAF-Fabric 
the sabbath: dissemination: SHEBANQ 
the tree of knowledge of good and evil: lessons
I 
in the beginning: origin story: ETCBC 
six days of working: laboratory: LAF-Fabric 
the sabbath: dissemination: SHEBANQ 
the tree of knowledge of good and evil: lessons
text + linguistics => 
data + research =>
Data creation 
versus: archiving - sharing - dissemination
research data cycle ?
research data cycle ?religious 
communities 
theol. 
scholars 
theol. 
scholars 
enlightened lay 
people
research data cycle ?religious 
communities 
theol. 
scholars 
theol. 
scholars 
Research Data 
Archiving 
DANS 
CLARIN 
SHEBANQ 
LAF-Fabric 
comp. hum 
linguists 
enlightened lay 
people
2012 deposit ETCBC3 
2014 deposit ETCBC4
II 
in the beginning: origin story: ETCBC 
six days of working: laboratory: LAF-Fabric 
the sabbath: dissemination: SHEBANQ 
the tree of knowledge of good and evil: lessons
scientific computing 
fragment from a video of Fernando Perez 
4:19 researchers and computing - 9:55 
17:00 tools and the data life cycle - 20:26 
42:09 data and publishing - 44:20 / 49:22
Linguistic Annotation Framework 
ISO 24612:2012 
Nancy Ide, Laurent Romary
Linguistic Annotation Framework 
<node xml:id="n_88917"> 
sentence 
<link targets="r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11"/> 
</node> 
<edge xml:id="e1" from="n88917" to="n84383"/> 
<a xml:id="ae1" label="parents" ref="e1" as="link"/> 
<a xml:id="af22" label="ft" ref="n3" as="utf8"><fs> 
<f name="lexeme_utf8" value=" </" רשׁא ית 
<f name="surface_consonants_utf8" value=" </" רשׁא ית 
</fs></a> 
<region xml:id="r_2" anchors="6 23"/> 
<node xml:id="n_3"><link targets="r_2"/></node> 
clause 
labeled <a xml:id="a_3" label="word" ref="n_3" as="monads"/> 
edges 
nodes 
clause_atom_number=1 
clause_atom_relation=0 
clause_atom_type=xQtl 
indentation=0 
annotations 
(features) 
determination=determined 
phrase_function=Objc 
phrase_type=PP 
subphrase 
link to 
regions 
annotations 
(empty) 
regions 
primary data 
lexeme_utf8= רשׁא ית 
surface_consonants_utf8= רשׁא ית 
n3 n2 
phrase 
parents 
mother 
r11 r10 r9 
r11 r10 r9 92 72-91 6-23 0-5 
word 
בְּראֵשׁיִ֖ת בָּראָ֣ אֱ.ה יִ֑ם א ת֥ הַשּׁמָיַ֖םִ וְ אֵת֥ הָארָֽץֶ׃
too big to parse all the time 
compile it
kindergarten: counting 
1m 39s Counting nodes! 
1m 40s There are 1441144 nodes. 
7m 56s Counting nodes! 
7m 59s Nodes counted:! 
! book : 39x! 
! chapter : 929x! 
! clause : 87978x! 
! clause_atom : 90144x! 
! half_verse : 44682x! 
! phrase : 254664x! 
! phrase_atom : 267965x! 
! sentence : 66045x! 
! sentence_atom : 66701x! 
! subphrase : 112229x! 
! verse : 23213x! 
! word : 426555x! 
for n in NN():! 
nodes += 1 
nodes = collections.Counter()! 
for n in NN():! 
nodes[F.otype.v(n)] += 1 
http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/Counting.ipynb
primary school: r/w 
בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ 
ם וְר֣וּחַ אֱלֹהִ֔ים מְרַחֶ֖פֶת עַל־פְּנֵ֥י הַמָּֽיִם׃ E וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָבֹ֔הוּ וְחֹ֖שֶׁךְ עַל־פְּנֵי֣ תְה֑ 
ר׃ E ר וַֽ יְהִי־אֽ E וַיּ֥אֹמֶר אֱלֹהִ֖ים יְהִ֣י א֑ 
ר וּבֵ֥ין הַחֹֽשֶׁךְ׃ E ב וַיַּבְדֵּ֣ל אֱלֹהִ֔ים בֵּ֥ין הָא֖ E ר כִּי־ט֑ E וַיַּ רְא אֱלֹהִ֛ים אֶת־הָא֖ 
ם אֶחָֽד׃ פ E ם וְלַחֹ֖שֶׁךְ קָ֣רָא לָ֑יְלָה וַֽ יְהִי־עֶ֥רֶב וַֽ יְהִי־בֹ֖קֶר י֥ E ר֙ י֔ E וַיִּקְרָ֨א אֱלֹהִ֤ים ׀ לָא 
ךְ הַמָּ֑יִם וִיהִ֣י מַבְדִּ֔יל בֵּ֥ין מַ֖יִם לָמָֽיִם׃ E וַיּ֣אֹמֶר אֱלֹהִ֔ים יְהִ֥י רָקִ֖יעַ בְּת֣ 
וַיַּעַ֣שׂ אֱלֹהִים אֶת־הָרָקִיעַ֒ וַיַּבְדֵּ֗ל בֵּ֤ין הַמַּ֨יִם֙ אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ וּבֵ֣ין הַמַּ֔יִם אֲשֶׁ֖ר מֵעַל֣ לָרָקִ֑יעַ וַֽ יְהִי־כֵֽן׃ 
ם שֵׁנִֽי׃ פ E וַיִּקְרָ֧א אֱלֹהִ֛ים לָֽרָקִ֖יעַ שָׁמָ֑יִם וַֽ יְהִי־עֶ֥רֶב וַֽ יְהִי־בֹ֖קֶר י֥ 
ם אֶחָ֔ד וְתֵרָאֶ֖ה הַיַּבָּשָׁ֑ה וַֽ יְהִי־כֵֽן׃ E וַיּ֣אֹמֶר אֱלֹהִ֗ים יִקָּו֨וּ הַמַּ֜יִם מִתַּ֤חַת הַשָּׁמַ֨יִם֙ אֶל־מָק֣ 
ב׃ E וַיִּקְרָ֨א אֱלֹהִ֤ים ׀ לַיַּבָּשָׁה֙ אֶ֔רֶץ וּלְמִקְוֵ֥ה הַמַּ֖יִם קָרָ֣א יַמִּ֑ים וַיַּ רְ֥א אֱלֹהִ֖ים כִּי־טֽ 
plain_file = outfile("etcbc4_plain.txt")! 
! 
for i in F.otype.s('word'):! 
the_text = F.g_word_utf8.v(i)! 
the_trailer = F.trailer_utf8.v(i)! 
plain_file.write(the_text + the_trailer)! 
! 
plain_file.close()! 
http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/text/plain.ipynb
EXO 06,08 ├─┼♠┼─┼───┤├─┼♠┼──┤├─♠┼─┼─♂─♂──♂┤ 
├─┼♠┼─┼─┼─┤ 
├─┼♂┤ 
EXO 06,09 ├─┼♠┼♂┼─┼──⊙┤ 
├─┼─┼♠┼─♂┼───────┤ 
EXO 06,10 ├─┼♠┼♂┼─♂┤├─♠┤ 
EXO 06,11 ├♠┤ 
├♠┼───⊙┤ 
├─┼♠┼──⊙┼──┤ 
EXO 06,12 ├─┼♠┼♂┼──♂┤├─♠┤ 
├─┤ 
├─⊙┼─┼♠┼─┤ 
├─┼─┼♠┼─┤ 
├─┼─┼──┤ 
EXO 06,13 ├─┼♠┼♂┼─♂──♂┤ 
├─┼♠┼──⊙────⊙┤├─♠┼──⊙┼──⊙┤ 
EXO 06,14 ├─┼───┤ 
├─⊙─⊙┼♂─♂♂─♂┤ 
├─┼─⊙┤ 
EXO 06,15 ├─┼─⊙┼♂─♂─♂─♂─♂─♂───┤ 
├─┼─⊙┤ 
EXO 06,16 ├─┼─┼──⊙┼──┤ 
├♂─♂─♂┤ 
├─┼──⊙┼──────┤ 
EXO 06,17 ├─♂┼♂─♂┼──┤ 
EXO 06,18 ├─┼─♂┼♂─♂─♂─♂┤ 
├─┼──♂┼──────┤ 
EXO 06,19 ├─┼─♂┼♂─♂┤ 
├─┼───┼──┤ 
EXO 06,20 ├─┼♠┼♂┼─♀─┼─┼──┤ 
├─┼♠┼─┼─♂──♂┤ 
├─┼──♂┼──────┤ 
secondary school: 
o!ut = outfile("properviz.txt")! type_map = collections.defaultdict(lambda: None, [! 
("chapter", 'Ch'),! 
("verse", 'V'),! 
("sentence", 'S'),! 
("clause", 'C'),! 
("phrase", 'P'),! 
("word", 'w'),! 
graphic 
])! 
otypes = ['Ch', 'V', 'S', 'C', 'P', 'w']! 
watch = collections.defaultdict(lambda: {})! 
start = {}! 
c!ur_verse_label = ['','']! def print_node(ob, obdata):! 
(node, minm, maxm, monads) = obdata! 
if ob == "w":! 
if not watch:! 
out.write("◘".format(monads))! 
else:! 
outchar = "!"! 
p_o_s = F.sp.v(node)! 
if p_o_s == "nmpr":! 
if F.gn.v(node) == "m": outchar = "♂"! 
elif F.gn.v(node) == "f": outchar = "♀"! 
elif F.gn.v(node) == "unknown": outchar = "⊙"! 
elif p_o_s == "verb":! 
outchar = "♠"! 
out.write(outchar)! 
if monads in watch:! 
tofinish = watch[monads]! 
for o in reversed(otypes):! 
if o in tofinish:! 
if o == 'C':! 
out.write(""")! 
elif o == 'P':! 
if 'C' not in tofinish:! 
out.write("#")! 
elif o != 'S':! 
out.write("{}»".format(o))! 
del watch[monads]! 
elif ob == "Ch":! 
this_chapter_label = "{} {}".format(F.book.v(node), F.chapter.v(node))! 
elif ob == "V":! 
this_verse_label = F.label.v(node).strip(" ")! 
cur_verse_label[0] = this_verse_label! 
cur_verse_label[1] = this_verse_label! 
elif ob == "S":! 
out.write("n{:<11} ".format(cur_verse_label[1]))! 
cur_verse_label[1] = ''! 
watch[maxm][ob] = None! 
elif ob == "C":! 
out.write("$")! 
watch[maxm][ob] = None! 
elif ob == "P":! 
watch[maxm][ob] = None! 
else:! 
out.write("«{}".format(ob))! 
! watch[maxm][ob] = None! lastmin = None! 
l!astmax = None! for i in NN():! 
otype = F.otype.v(i)! 
if otype == 'book':! 
sys.stderr.write("{:<11}".format(F.book.v(i)))! 
! 
ob = type_map[otype]! 
if ob == None:! 
continue! 
monads = F.monads.v(i)! 
minm = F.minmonad.v(i)! 
maxm = F.maxmonad.v(i)! 
if lastmin == minm and lastmax == maxm:! 
start[ob] = (i, minm, maxm, monads)! 
else:! 
for o in otypes:! 
if o in start:! 
print_node(o, start[o])! 
start = {ob: (i, minm, maxm, monads)}! 
lastmin = minm! 
lastmax = maxm! 
for ob in otypes:! 
if ob in start:! 
! print_node(ob, start[ob])! close() 
http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/text/proper.ipynb
adolescence: gender 
for node in NN():! 
otype = F.otype.v(node)! 
if otype == "word":! 
stats[0] += 1! 
if F.gn.v(node) == "m":! 
stats[1] += 1! 
elif F.gn.v(node) == "f":! 
stats[2] += 1! 
elif otype == "chapter":! 
if cur_chapter != None:! 
masc = 0 if not stats[0] else 100 * float(stats[1]) / stats[0]! 
fem = 0 if not stats[0] else 100 * float(stats[2]) / stats[0]! 
ch.append(cur_chapter)! 
m.append(masc)! 
f.append(fem)! 
table.write("{},{},{}n".format(cur_chapter, masc, fem))! 
else:! 
table.write("{},{},{}n".format('book chapter', 'masculine', 'feminine'))! 
this_book = F.book.v(node)! 
this_chapnum = F.chapter.v(node)! 
this_chapter = "{} {}".format(this_book, this_chapnum)! 
if this_book != cur_book:! 
sys.stderr.write("n{}".format(this_book))! 
cur_book = this_book! 
sys.stderr.write(" {}".format(this_chapnum))! 
stats = [0, 0, 0]! 
cur_chapter = this_chapter 
http://nbviewer.ipython.org/github/ETCBC/laf-fabric/blob/master/examples/gender.ipynb
university: mining 
<?xml version="1.0" encoding="UTF-8"?>! 
<gexf xmlns:viz="http:///www.gexf.net/1.2draft/viz" xmlns="http://www.gexf.net/1.1draft" version="1.2">! 
<meta>! 
<creator>LAF-Fabric</creator>! 
</meta>! 
<graph defaultedgetype="undirected" idtype="string" type="static">! 
<nodes count="39"> 
http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/lingvar/ 
cooccurrences.ipynb 
for node 
this_type 
if 
lexeme 
! 
lexemes[ 
lexeme_support_book[ 
! 
p_o_s 
lexemes[ 
lexeme_support_book[ 
lexemes[ 
lexeme_support_book[ 
lexemes[ 
lexeme_support_book[ 
lexemes[ 
lexeme_support_book[ 
lexemes[ 
lexeme_support_book[ 
! 
elif 
book_name 
books 
msg( 
msg("Done" 
<node id="17" label="Amos"/>! 
<node id="18" label="Obadia"/>! 
<node id="19" label="Jona"/> 
<edge id="17" source="1" target="18" weight="2.32"/>! 
<edge id="18" source="1" target="19" weight="5.68"/>! 
<edge id="19" source="1" target="20" weight="9.54"/>
professional: contributing data 
AMOS 01,01 DBR/ 0 2 -1 -1 -1 5 0 -1 -1 3 2 1 2 0 -1 2 -1 -1 -1 -1 -1 
AMOS 01,01 <MWS/ 0 3 -1 -1 -1 1 -1 -1 -1 1 2 2 3 2 2 -10002 -1 -1 0 521 0 
* 0 1 12 2 12 3 470 0 0 .N 0 LineNr 1 ClauseNr 1: 1: 1: 200: 0 0 SentenceNr 1 TxtType: ? Pargr: 1 ClType:NmCl 
AMOS 01,01 >CR 0 6 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 0 519 0 
AMOS 01,01 HJH[ -2 1 0 0 1 0 0 2 3 1 2 -1 1 1 -1 -1 -1 -1 0 501 0 
AMOS 01,01 B 0 5 -1 -1 -1 -1 0 -1 -1 -1 -1 -1 5 0 -1 -1 -1 -1 -1 -1 -1 
AMOS 01,01 H 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 -1 
AMOS 01,01 NQD/ 0 2 -1 -1 -1 4 0 -1 -1 3 2 2 2 5 2 -1 -1 -1 0 504 0 
AMOS 01,01 MN 0 5 -1 -1 -1 -1 0 -1 -1 -1 -1 -1 5 0 -1 -1 -1 -1 -1 -1 -1 
AMOS 01,01 TQW<=/ 0 3 -1 -1 -1 1 -1 -1 -1 1 0 2 3 5 2 -1 -1 -1 -11 582 0 
* 0 -1 12 0 0 .. 3 LineNr 2 ClauseNr 2: 1: 3: 132: -13 -1007 SentenceNr 1 TxtType: ? Pargr: 1 ClType:xQt0 
px = PX(API)! 
px.deliver_annots('px/px_data', 'px', 'para', (! 
('etcbc4', 'px', 'instruction'),! 
('etcbc4', 'px', 'number_in_ch'),! 
('etcbc4', 'px', 'pargr'),! 
)) 
<?xml version="1.0" encoding="UTF-8"?> 
<graph xmlns="http://www.xces.org/ns/GrAF/1.0/" 
xmlns:graf="http://www.xces.org/ns/GrAF/1.0/"> 
<graphHeader> 
<labelsDecl/> 
<dependencies/> 
<annotationSpaces/> 
</graphHeader> 
<a xml:id="a1" as="etcbc4" label="px" ref="n1298850"><fs> 
<f name="instruction" value=".#"/> 
<f name="number_in_ch" value="32"/> 
<f name="pargr" value="32"/> 
</fs></a> 
<a xml:id="a2" as="etcbc4" label="px" ref="n50738"><fs> 
<f name="instruction" value=".."/> 
<f name="number_in_ch" value="30"/> 
<f name="pargr" value="2.7"/> 
</fs></a> 
ETCBC LAF 
extra/ 
correct-ion 
LAF-Fabric 
results 
http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/extradata/para%20from%20px.ipynb
old age: trees 
tree = Tree(API, otypes=tree_types, ! 
clause_type=clause_type,! 
ccr_feature='rela',! 
pt_feature='typ',! 
pos_feature='sp',! 
mother_feature = 'mother',! 
)! 
tree.restructure_clauses(ccr_class)! 
results = tree.relations()! 
parent = results['rparent']! 
sisters = results['sisters']! 
children = results['rchildren']! 
elder_sister = results['elder_sister']! 
msg("Ready for processing") 
0.00s LOADING API with EXTRAs: please wait ... ! 
0.00s INFO: USING DATA COMPILED AT: 2014-07-23T09-31-37! 
1.45s INFO: DATA LOADED FROM SOURCE etcbc4 AND ANNOX -- ...! 
0.00s Start computing parent and children relations for ...! 
1.36s 100000 nodes! 
2.74s 200000 nodes! 
4.08s 300000 nodes! 
5.48s 400000 nodes! 
6.79s 500000 nodes! 
8.20s 600000 nodes! 
9.63s 700000 nodes! 
11s 800000 nodes! 
12s 900000 nodes! 
13s 947471 nodes: 881423 have parents and 520916 have children! 
13s Restructuring clauses: deep copying tree relations! 
19s Pass 0: Storing mother relationship! 
21s 18580 clauses have a mother! 
21s All clauses have mothers of types in! 
{'sentence', 'word', 'phrase', 'subphrase', 'clause'}! 
21s Pass 1: all clauses except those of type Coor! 
22s Pass 2: clauses of type Coor only! 
23s Mothers applied. Found 0 motherless clauses.! 
23s 2497 nodes have 1 sisters! 
23s 167 nodes have 2 sisters! 
23s 9 nodes have 3 sisters! 
23s There are 2858 sisters, 2673 nodes have sisters.! 
23s Ready for processing 
# GEN 01,01! node=1127306!oid=11! bmonad=1!0 1 2 3 4 5 6 7 8 9 10! 
(S(C(PP(pp " ב")(n " ראשׁית "))(VP(vb " ברא "))(NP(n " אלהים "))(PP(U(pp " את ")(dt " ה")(n " שׁמים "))(cj " ו")(U(pp " את ")(dt " ה")(n 
((((("ארץ" ! 
! 
# GEN 01,02! node=1127307!oid=39! bmonad=12! 0 1 2 3 4 5 6! 
(S(C(CP(cj " ו"))(NP(dt " ה")(n " ארץ "))(VP(vb " היתה "))(NP(U(n " תהו "))(cj " ו")(U(n " ((((("בהו ! 
http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/trees/trees_etcbc4.ipynb
III 
in the beginning: origin story: ETCBC 
six days of working: laboratory: LAF-Fabric 
the sabbath: dissemination: SHEBANQ 
the tree of knowledge of good and evil: lessons
back to EMDROS 
select all objects 
in {1-40} 
where 
[phrase 
[word] 
[word] 
]! 
.. 
[phrase 
[word g_cons = 'H'] 
[word focus] 
] 
optionally restrict 
results to words 1-40 
gap 
the first word has value H 
for feature g_cons 
deliver just the 
second word of the second 
phrase as result
SHEBANQ 
System for HEBrew text: ANnotations for 
Queries and markup 
http://shebanq.ancient-data.org 
שִׁבֹּ֜לֶת 
סִבֹּ֗לֶת 
s(h)ibboleth
http://shebanq.ancient-data.org/mql/display_query?id=18
proliferation of queries 
78 queries, in varying degrees of maturity 
who is afraid of lists?
serendipity 
hey, Martijn is after something! 
inform your followers with 1 click 
just browsing Genesis 4
feature doc 
http://shebanq-doc.readthedocs.org/en/latest/features/comments/0_overview.html
IV 
in the beginning: origin story: ETCBC 
six days of working: laboratory: LAF-Fabric 
the sabbath: dissemination: SHEBANQ 
the tree of knowledge of good and evil: lessons
nota bene: formats 
LAF = stand-off markup TEI = inline markup 
XML only for import/export XML tech all over the place 
Queries: textual (MQL) and by 
walking (Graph) XQUERY, XSLT, SQL
nota bene: tech 
current, mainstream tech: e.g. 
(I)Python plus packages 
cling to what once worked 
avoid reinventing the wheel 
support researchers in coding 
maximize return on investment 
shield researchers from 
coding 
abstraction level: scripts 
data in data structures 
sys programming: C++, Java, 
data in formalisms: XML, RDF 
facilitate 
import/export/sharing 
invest in monoliths and GUIs 
(over-facilitating)
nota bene: property 
share widely: 
live in a silo 
your data, your results 
with other fields as well 
become idiosyncratic 
avoid stimuli from elsewhere 
share openly: 
data into an archive 
tools on github 
exert copyrights on data 
protect your software 
you cannot *own* ideas 
they grow by being handed over 
our ideas are like a bag of 
potatoes: we have worked for 
it and you have to pay for it
Query the Hebrew Bible through the 
dirk.roorda@dans.knaw.nl 
ETCBC database 
SHEBANQ 
ר׃ E וַֽ יְהִי־אֽ 
רE יְהִ֣י א֑ 
thank you

More Related Content

What's hot

Database adapter
Database adapterDatabase adapter
Database adapterxavier john
 
Missing kids on you
Missing kids on youMissing kids on you
Missing kids on youguest3fa681
 
Office doc (10)
Office doc (10)Office doc (10)
Office doc (10)ly2wf
 
Concept History
Concept HistoryConcept History
Concept Historyjonphipps
 
Proactive Web Performance Optimization.(Marcel Duran)
Proactive Web Performance Optimization.(Marcel Duran)Proactive Web Performance Optimization.(Marcel Duran)
Proactive Web Performance Optimization.(Marcel Duran)Ontico
 
Twas the night before Malware...
Twas the night before Malware...Twas the night before Malware...
Twas the night before Malware...DoktorMandrake
 
Apache Camel: rotas para as suas mensagens
Apache Camel: rotas para as suas mensagensApache Camel: rotas para as suas mensagens
Apache Camel: rotas para as suas mensagensBruno Borges
 
The Web, one huge database ...
The Web, one huge database ...The Web, one huge database ...
The Web, one huge database ...Michael Hausenblas
 
Download information
Download informationDownload information
Download informationOgendra
 
Office doc (2)
Office doc (2)Office doc (2)
Office doc (2)ly2wf
 

What's hot (13)

Database adapter
Database adapterDatabase adapter
Database adapter
 
Missing kids on you
Missing kids on youMissing kids on you
Missing kids on you
 
Office doc (10)
Office doc (10)Office doc (10)
Office doc (10)
 
Api pain points
Api pain pointsApi pain points
Api pain points
 
Concept History
Concept HistoryConcept History
Concept History
 
Proactive Web Performance Optimization.(Marcel Duran)
Proactive Web Performance Optimization.(Marcel Duran)Proactive Web Performance Optimization.(Marcel Duran)
Proactive Web Performance Optimization.(Marcel Duran)
 
Inc
IncInc
Inc
 
Netcfg 52244
Netcfg 52244Netcfg 52244
Netcfg 52244
 
Twas the night before Malware...
Twas the night before Malware...Twas the night before Malware...
Twas the night before Malware...
 
Apache Camel: rotas para as suas mensagens
Apache Camel: rotas para as suas mensagensApache Camel: rotas para as suas mensagens
Apache Camel: rotas para as suas mensagens
 
The Web, one huge database ...
The Web, one huge database ...The Web, one huge database ...
The Web, one huge database ...
 
Download information
Download informationDownload information
Download information
 
Office doc (2)
Office doc (2)Office doc (2)
Office doc (2)
 

Similar to Hebrew Bible as Data: Laboratory, Sharing, Lessons

Elasticsearch at EyeEm
Elasticsearch at EyeEmElasticsearch at EyeEm
Elasticsearch at EyeEmLars Fronius
 
Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Sawood Alam
 
Maze solving app listing
Maze solving app listingMaze solving app listing
Maze solving app listingChris Worledge
 
Using Phing for Fun and Profit
Using Phing for Fun and ProfitUsing Phing for Fun and Profit
Using Phing for Fun and ProfitNicholas Jansma
 
Representing Material Culture Online: Historic Clothing in Omeka
Representing Material Culture Online: Historic Clothing in OmekaRepresenting Material Culture Online: Historic Clothing in Omeka
Representing Material Culture Online: Historic Clothing in OmekaArden Kirkland
 
Spring scala - Sneaking Scala into your corporation
Spring scala  - Sneaking Scala into your corporationSpring scala  - Sneaking Scala into your corporation
Spring scala - Sneaking Scala into your corporationHenryk Konsek
 
Continuous delivery with Gradle
Continuous delivery with GradleContinuous delivery with Gradle
Continuous delivery with GradleBob Paulin
 
Functional Pe(a)rls - the Purely Functional Datastructures edition
Functional Pe(a)rls - the Purely Functional Datastructures editionFunctional Pe(a)rls - the Purely Functional Datastructures edition
Functional Pe(a)rls - the Purely Functional Datastructures editionosfameron
 
Making Mongo realtime - oplog tailing in Meteor
Making Mongo realtime - oplog tailing in MeteorMaking Mongo realtime - oplog tailing in Meteor
Making Mongo realtime - oplog tailing in Meteoryaliceme
 
Piotr Szotkowski about "Bits of ruby"
Piotr Szotkowski about "Bits of ruby"Piotr Szotkowski about "Bits of ruby"
Piotr Szotkowski about "Bits of ruby"Pivorak MeetUp
 
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar PradhanAwesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar PradhanAjeet Singh Raina
 
Perl 6 in Context
Perl 6 in ContextPerl 6 in Context
Perl 6 in Contextlichtkind
 
Software Dendrology by Brandon Bloom
Software Dendrology by Brandon BloomSoftware Dendrology by Brandon Bloom
Software Dendrology by Brandon BloomHakka Labs
 
Theme Development and Customization
Theme Development and CustomizationTheme Development and Customization
Theme Development and CustomizationAniket Pant
 
Rebooting TEI Pointers
Rebooting TEI PointersRebooting TEI Pointers
Rebooting TEI PointersHugh Cayless
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic searchmarkstory
 
PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!Blanca Mancilla
 
Learn Frontend Testing
Learn Frontend TestingLearn Frontend Testing
Learn Frontend TestingRyan Roemer
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet Pôle Systematic Paris-Region
 

Similar to Hebrew Bible as Data: Laboratory, Sharing, Lessons (20)

Elasticsearch at EyeEm
Elasticsearch at EyeEmElasticsearch at EyeEm
Elasticsearch at EyeEm
 
Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015
 
Maze solving app listing
Maze solving app listingMaze solving app listing
Maze solving app listing
 
Using Phing for Fun and Profit
Using Phing for Fun and ProfitUsing Phing for Fun and Profit
Using Phing for Fun and Profit
 
Representing Material Culture Online: Historic Clothing in Omeka
Representing Material Culture Online: Historic Clothing in OmekaRepresenting Material Culture Online: Historic Clothing in Omeka
Representing Material Culture Online: Historic Clothing in Omeka
 
Spring scala - Sneaking Scala into your corporation
Spring scala  - Sneaking Scala into your corporationSpring scala  - Sneaking Scala into your corporation
Spring scala - Sneaking Scala into your corporation
 
JavaFX, because you're worth it
JavaFX, because you're worth itJavaFX, because you're worth it
JavaFX, because you're worth it
 
Continuous delivery with Gradle
Continuous delivery with GradleContinuous delivery with Gradle
Continuous delivery with Gradle
 
Functional Pe(a)rls - the Purely Functional Datastructures edition
Functional Pe(a)rls - the Purely Functional Datastructures editionFunctional Pe(a)rls - the Purely Functional Datastructures edition
Functional Pe(a)rls - the Purely Functional Datastructures edition
 
Making Mongo realtime - oplog tailing in Meteor
Making Mongo realtime - oplog tailing in MeteorMaking Mongo realtime - oplog tailing in Meteor
Making Mongo realtime - oplog tailing in Meteor
 
Piotr Szotkowski about "Bits of ruby"
Piotr Szotkowski about "Bits of ruby"Piotr Szotkowski about "Bits of ruby"
Piotr Szotkowski about "Bits of ruby"
 
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar PradhanAwesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
 
Perl 6 in Context
Perl 6 in ContextPerl 6 in Context
Perl 6 in Context
 
Software Dendrology by Brandon Bloom
Software Dendrology by Brandon BloomSoftware Dendrology by Brandon Bloom
Software Dendrology by Brandon Bloom
 
Theme Development and Customization
Theme Development and CustomizationTheme Development and Customization
Theme Development and Customization
 
Rebooting TEI Pointers
Rebooting TEI PointersRebooting TEI Pointers
Rebooting TEI Pointers
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!
 
Learn Frontend Testing
Learn Frontend TestingLearn Frontend Testing
Learn Frontend Testing
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
 

More from Dirk Roorda

General Missives
General MissivesGeneral Missives
General MissivesDirk Roorda
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)Dirk Roorda
 
Quran and Text-Fabric
Quran and Text-FabricQuran and Text-Fabric
Quran and Text-FabricDirk Roorda
 
Ancient corpora analysis
Ancient corpora analysisAncient corpora analysis
Ancient corpora analysisDirk Roorda
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsDirk Roorda
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchersDirk Roorda
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew BibleDirk Roorda
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissenDirk Roorda
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDirk Roorda
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDirk Roorda
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Dirk Roorda
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleDirk Roorda
 

More from Dirk Roorda (20)

TF-FAIR.pdf
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdf
 
Textpy
TextpyTextpy
Textpy
 
General Missives
General MissivesGeneral Missives
General Missives
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)
 
Tf in-context
Tf in-contextTf in-context
Tf in-context
 
Quran and Text-Fabric
Quran and Text-FabricQuran and Text-Fabric
Quran and Text-Fabric
 
Ancient corpora analysis
Ancient corpora analysisAncient corpora analysis
Ancient corpora analysis
 
Qdf2tf
Qdf2tfQdf2tf
Qdf2tf
 
Text fabric
Text fabricText fabric
Text fabric
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew Verbs
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew Bible
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Award
AwardAward
Award
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew Bible
 
LAF Fabric
LAF FabricLAF Fabric
LAF Fabric
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Hebrew Bible as Data: Laboratory, Sharing, Lessons

  • 1. The Hebrew Bible as Data Laboratory - Sharing - Lessons dirk.roorda@dans.knaw.nl 2014-10-02 TUSTEP meeting Amsterdam Query the Hebrew Bible through the ETCBC database and SHEBANQ
  • 2. overview in the beginning: origin story: ETCBC six days of working: laboratory: LAF-Fabric the sabbath: dissemination: SHEBANQ the tree of knowledge of good and evil: lessons
  • 3. I in the beginning: origin story: ETCBC six days of working: laboratory: LAF-Fabric the sabbath: dissemination: SHEBANQ the tree of knowledge of good and evil: lessons
  • 4. text + linguistics => data + research =>
  • 5. Data creation versus: archiving - sharing - dissemination
  • 7. research data cycle ?religious communities theol. scholars theol. scholars enlightened lay people
  • 8. research data cycle ?religious communities theol. scholars theol. scholars Research Data Archiving DANS CLARIN SHEBANQ LAF-Fabric comp. hum linguists enlightened lay people
  • 9. 2012 deposit ETCBC3 2014 deposit ETCBC4
  • 10.
  • 11. II in the beginning: origin story: ETCBC six days of working: laboratory: LAF-Fabric the sabbath: dissemination: SHEBANQ the tree of knowledge of good and evil: lessons
  • 12. scientific computing fragment from a video of Fernando Perez 4:19 researchers and computing - 9:55 17:00 tools and the data life cycle - 20:26 42:09 data and publishing - 44:20 / 49:22
  • 13. Linguistic Annotation Framework ISO 24612:2012 Nancy Ide, Laurent Romary
  • 14. Linguistic Annotation Framework <node xml:id="n_88917"> sentence <link targets="r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11"/> </node> <edge xml:id="e1" from="n88917" to="n84383"/> <a xml:id="ae1" label="parents" ref="e1" as="link"/> <a xml:id="af22" label="ft" ref="n3" as="utf8"><fs> <f name="lexeme_utf8" value=" </" רשׁא ית <f name="surface_consonants_utf8" value=" </" רשׁא ית </fs></a> <region xml:id="r_2" anchors="6 23"/> <node xml:id="n_3"><link targets="r_2"/></node> clause labeled <a xml:id="a_3" label="word" ref="n_3" as="monads"/> edges nodes clause_atom_number=1 clause_atom_relation=0 clause_atom_type=xQtl indentation=0 annotations (features) determination=determined phrase_function=Objc phrase_type=PP subphrase link to regions annotations (empty) regions primary data lexeme_utf8= רשׁא ית surface_consonants_utf8= רשׁא ית n3 n2 phrase parents mother r11 r10 r9 r11 r10 r9 92 72-91 6-23 0-5 word בְּראֵשׁיִ֖ת בָּראָ֣ אֱ.ה יִ֑ם א ת֥ הַשּׁמָיַ֖םִ וְ אֵת֥ הָארָֽץֶ׃
  • 15. too big to parse all the time compile it
  • 16. kindergarten: counting 1m 39s Counting nodes! 1m 40s There are 1441144 nodes. 7m 56s Counting nodes! 7m 59s Nodes counted:! ! book : 39x! ! chapter : 929x! ! clause : 87978x! ! clause_atom : 90144x! ! half_verse : 44682x! ! phrase : 254664x! ! phrase_atom : 267965x! ! sentence : 66045x! ! sentence_atom : 66701x! ! subphrase : 112229x! ! verse : 23213x! ! word : 426555x! for n in NN():! nodes += 1 nodes = collections.Counter()! for n in NN():! nodes[F.otype.v(n)] += 1 http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/Counting.ipynb
  • 17. primary school: r/w בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ ם וְר֣וּחַ אֱלֹהִ֔ים מְרַחֶ֖פֶת עַל־פְּנֵ֥י הַמָּֽיִם׃ E וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָבֹ֔הוּ וְחֹ֖שֶׁךְ עַל־פְּנֵי֣ תְה֑ ר׃ E ר וַֽ יְהִי־אֽ E וַיּ֥אֹמֶר אֱלֹהִ֖ים יְהִ֣י א֑ ר וּבֵ֥ין הַחֹֽשֶׁךְ׃ E ב וַיַּבְדֵּ֣ל אֱלֹהִ֔ים בֵּ֥ין הָא֖ E ר כִּי־ט֑ E וַיַּ רְא אֱלֹהִ֛ים אֶת־הָא֖ ם אֶחָֽד׃ פ E ם וְלַחֹ֖שֶׁךְ קָ֣רָא לָ֑יְלָה וַֽ יְהִי־עֶ֥רֶב וַֽ יְהִי־בֹ֖קֶר י֥ E ר֙ י֔ E וַיִּקְרָ֨א אֱלֹהִ֤ים ׀ לָא ךְ הַמָּ֑יִם וִיהִ֣י מַבְדִּ֔יל בֵּ֥ין מַ֖יִם לָמָֽיִם׃ E וַיּ֣אֹמֶר אֱלֹהִ֔ים יְהִ֥י רָקִ֖יעַ בְּת֣ וַיַּעַ֣שׂ אֱלֹהִים אֶת־הָרָקִיעַ֒ וַיַּבְדֵּ֗ל בֵּ֤ין הַמַּ֨יִם֙ אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ וּבֵ֣ין הַמַּ֔יִם אֲשֶׁ֖ר מֵעַל֣ לָרָקִ֑יעַ וַֽ יְהִי־כֵֽן׃ ם שֵׁנִֽי׃ פ E וַיִּקְרָ֧א אֱלֹהִ֛ים לָֽרָקִ֖יעַ שָׁמָ֑יִם וַֽ יְהִי־עֶ֥רֶב וַֽ יְהִי־בֹ֖קֶר י֥ ם אֶחָ֔ד וְתֵרָאֶ֖ה הַיַּבָּשָׁ֑ה וַֽ יְהִי־כֵֽן׃ E וַיּ֣אֹמֶר אֱלֹהִ֗ים יִקָּו֨וּ הַמַּ֜יִם מִתַּ֤חַת הַשָּׁמַ֨יִם֙ אֶל־מָק֣ ב׃ E וַיִּקְרָ֨א אֱלֹהִ֤ים ׀ לַיַּבָּשָׁה֙ אֶ֔רֶץ וּלְמִקְוֵ֥ה הַמַּ֖יִם קָרָ֣א יַמִּ֑ים וַיַּ רְ֥א אֱלֹהִ֖ים כִּי־טֽ plain_file = outfile("etcbc4_plain.txt")! ! for i in F.otype.s('word'):! the_text = F.g_word_utf8.v(i)! the_trailer = F.trailer_utf8.v(i)! plain_file.write(the_text + the_trailer)! ! plain_file.close()! http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/text/plain.ipynb
  • 18. EXO 06,08 ├─┼♠┼─┼───┤├─┼♠┼──┤├─♠┼─┼─♂─♂──♂┤ ├─┼♠┼─┼─┼─┤ ├─┼♂┤ EXO 06,09 ├─┼♠┼♂┼─┼──⊙┤ ├─┼─┼♠┼─♂┼───────┤ EXO 06,10 ├─┼♠┼♂┼─♂┤├─♠┤ EXO 06,11 ├♠┤ ├♠┼───⊙┤ ├─┼♠┼──⊙┼──┤ EXO 06,12 ├─┼♠┼♂┼──♂┤├─♠┤ ├─┤ ├─⊙┼─┼♠┼─┤ ├─┼─┼♠┼─┤ ├─┼─┼──┤ EXO 06,13 ├─┼♠┼♂┼─♂──♂┤ ├─┼♠┼──⊙────⊙┤├─♠┼──⊙┼──⊙┤ EXO 06,14 ├─┼───┤ ├─⊙─⊙┼♂─♂♂─♂┤ ├─┼─⊙┤ EXO 06,15 ├─┼─⊙┼♂─♂─♂─♂─♂─♂───┤ ├─┼─⊙┤ EXO 06,16 ├─┼─┼──⊙┼──┤ ├♂─♂─♂┤ ├─┼──⊙┼──────┤ EXO 06,17 ├─♂┼♂─♂┼──┤ EXO 06,18 ├─┼─♂┼♂─♂─♂─♂┤ ├─┼──♂┼──────┤ EXO 06,19 ├─┼─♂┼♂─♂┤ ├─┼───┼──┤ EXO 06,20 ├─┼♠┼♂┼─♀─┼─┼──┤ ├─┼♠┼─┼─♂──♂┤ ├─┼──♂┼──────┤ secondary school: o!ut = outfile("properviz.txt")! type_map = collections.defaultdict(lambda: None, [! ("chapter", 'Ch'),! ("verse", 'V'),! ("sentence", 'S'),! ("clause", 'C'),! ("phrase", 'P'),! ("word", 'w'),! graphic ])! otypes = ['Ch', 'V', 'S', 'C', 'P', 'w']! watch = collections.defaultdict(lambda: {})! start = {}! c!ur_verse_label = ['','']! def print_node(ob, obdata):! (node, minm, maxm, monads) = obdata! if ob == "w":! if not watch:! out.write("◘".format(monads))! else:! outchar = "!"! p_o_s = F.sp.v(node)! if p_o_s == "nmpr":! if F.gn.v(node) == "m": outchar = "♂"! elif F.gn.v(node) == "f": outchar = "♀"! elif F.gn.v(node) == "unknown": outchar = "⊙"! elif p_o_s == "verb":! outchar = "♠"! out.write(outchar)! if monads in watch:! tofinish = watch[monads]! for o in reversed(otypes):! if o in tofinish:! if o == 'C':! out.write(""")! elif o == 'P':! if 'C' not in tofinish:! out.write("#")! elif o != 'S':! out.write("{}»".format(o))! del watch[monads]! elif ob == "Ch":! this_chapter_label = "{} {}".format(F.book.v(node), F.chapter.v(node))! elif ob == "V":! this_verse_label = F.label.v(node).strip(" ")! cur_verse_label[0] = this_verse_label! cur_verse_label[1] = this_verse_label! elif ob == "S":! out.write("n{:<11} ".format(cur_verse_label[1]))! cur_verse_label[1] = ''! watch[maxm][ob] = None! elif ob == "C":! out.write("$")! watch[maxm][ob] = None! elif ob == "P":! watch[maxm][ob] = None! else:! out.write("«{}".format(ob))! ! watch[maxm][ob] = None! lastmin = None! l!astmax = None! for i in NN():! otype = F.otype.v(i)! if otype == 'book':! sys.stderr.write("{:<11}".format(F.book.v(i)))! ! ob = type_map[otype]! if ob == None:! continue! monads = F.monads.v(i)! minm = F.minmonad.v(i)! maxm = F.maxmonad.v(i)! if lastmin == minm and lastmax == maxm:! start[ob] = (i, minm, maxm, monads)! else:! for o in otypes:! if o in start:! print_node(o, start[o])! start = {ob: (i, minm, maxm, monads)}! lastmin = minm! lastmax = maxm! for ob in otypes:! if ob in start:! ! print_node(ob, start[ob])! close() http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/text/proper.ipynb
  • 19. adolescence: gender for node in NN():! otype = F.otype.v(node)! if otype == "word":! stats[0] += 1! if F.gn.v(node) == "m":! stats[1] += 1! elif F.gn.v(node) == "f":! stats[2] += 1! elif otype == "chapter":! if cur_chapter != None:! masc = 0 if not stats[0] else 100 * float(stats[1]) / stats[0]! fem = 0 if not stats[0] else 100 * float(stats[2]) / stats[0]! ch.append(cur_chapter)! m.append(masc)! f.append(fem)! table.write("{},{},{}n".format(cur_chapter, masc, fem))! else:! table.write("{},{},{}n".format('book chapter', 'masculine', 'feminine'))! this_book = F.book.v(node)! this_chapnum = F.chapter.v(node)! this_chapter = "{} {}".format(this_book, this_chapnum)! if this_book != cur_book:! sys.stderr.write("n{}".format(this_book))! cur_book = this_book! sys.stderr.write(" {}".format(this_chapnum))! stats = [0, 0, 0]! cur_chapter = this_chapter http://nbviewer.ipython.org/github/ETCBC/laf-fabric/blob/master/examples/gender.ipynb
  • 20. university: mining <?xml version="1.0" encoding="UTF-8"?>! <gexf xmlns:viz="http:///www.gexf.net/1.2draft/viz" xmlns="http://www.gexf.net/1.1draft" version="1.2">! <meta>! <creator>LAF-Fabric</creator>! </meta>! <graph defaultedgetype="undirected" idtype="string" type="static">! <nodes count="39"> http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/lingvar/ cooccurrences.ipynb for node this_type if lexeme ! lexemes[ lexeme_support_book[ ! p_o_s lexemes[ lexeme_support_book[ lexemes[ lexeme_support_book[ lexemes[ lexeme_support_book[ lexemes[ lexeme_support_book[ lexemes[ lexeme_support_book[ ! elif book_name books msg( msg("Done" <node id="17" label="Amos"/>! <node id="18" label="Obadia"/>! <node id="19" label="Jona"/> <edge id="17" source="1" target="18" weight="2.32"/>! <edge id="18" source="1" target="19" weight="5.68"/>! <edge id="19" source="1" target="20" weight="9.54"/>
  • 21. professional: contributing data AMOS 01,01 DBR/ 0 2 -1 -1 -1 5 0 -1 -1 3 2 1 2 0 -1 2 -1 -1 -1 -1 -1 AMOS 01,01 <MWS/ 0 3 -1 -1 -1 1 -1 -1 -1 1 2 2 3 2 2 -10002 -1 -1 0 521 0 * 0 1 12 2 12 3 470 0 0 .N 0 LineNr 1 ClauseNr 1: 1: 1: 200: 0 0 SentenceNr 1 TxtType: ? Pargr: 1 ClType:NmCl AMOS 01,01 >CR 0 6 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 0 519 0 AMOS 01,01 HJH[ -2 1 0 0 1 0 0 2 3 1 2 -1 1 1 -1 -1 -1 -1 0 501 0 AMOS 01,01 B 0 5 -1 -1 -1 -1 0 -1 -1 -1 -1 -1 5 0 -1 -1 -1 -1 -1 -1 -1 AMOS 01,01 H 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 -1 AMOS 01,01 NQD/ 0 2 -1 -1 -1 4 0 -1 -1 3 2 2 2 5 2 -1 -1 -1 0 504 0 AMOS 01,01 MN 0 5 -1 -1 -1 -1 0 -1 -1 -1 -1 -1 5 0 -1 -1 -1 -1 -1 -1 -1 AMOS 01,01 TQW<=/ 0 3 -1 -1 -1 1 -1 -1 -1 1 0 2 3 5 2 -1 -1 -1 -11 582 0 * 0 -1 12 0 0 .. 3 LineNr 2 ClauseNr 2: 1: 3: 132: -13 -1007 SentenceNr 1 TxtType: ? Pargr: 1 ClType:xQt0 px = PX(API)! px.deliver_annots('px/px_data', 'px', 'para', (! ('etcbc4', 'px', 'instruction'),! ('etcbc4', 'px', 'number_in_ch'),! ('etcbc4', 'px', 'pargr'),! )) <?xml version="1.0" encoding="UTF-8"?> <graph xmlns="http://www.xces.org/ns/GrAF/1.0/" xmlns:graf="http://www.xces.org/ns/GrAF/1.0/"> <graphHeader> <labelsDecl/> <dependencies/> <annotationSpaces/> </graphHeader> <a xml:id="a1" as="etcbc4" label="px" ref="n1298850"><fs> <f name="instruction" value=".#"/> <f name="number_in_ch" value="32"/> <f name="pargr" value="32"/> </fs></a> <a xml:id="a2" as="etcbc4" label="px" ref="n50738"><fs> <f name="instruction" value=".."/> <f name="number_in_ch" value="30"/> <f name="pargr" value="2.7"/> </fs></a> ETCBC LAF extra/ correct-ion LAF-Fabric results http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/extradata/para%20from%20px.ipynb
  • 22. old age: trees tree = Tree(API, otypes=tree_types, ! clause_type=clause_type,! ccr_feature='rela',! pt_feature='typ',! pos_feature='sp',! mother_feature = 'mother',! )! tree.restructure_clauses(ccr_class)! results = tree.relations()! parent = results['rparent']! sisters = results['sisters']! children = results['rchildren']! elder_sister = results['elder_sister']! msg("Ready for processing") 0.00s LOADING API with EXTRAs: please wait ... ! 0.00s INFO: USING DATA COMPILED AT: 2014-07-23T09-31-37! 1.45s INFO: DATA LOADED FROM SOURCE etcbc4 AND ANNOX -- ...! 0.00s Start computing parent and children relations for ...! 1.36s 100000 nodes! 2.74s 200000 nodes! 4.08s 300000 nodes! 5.48s 400000 nodes! 6.79s 500000 nodes! 8.20s 600000 nodes! 9.63s 700000 nodes! 11s 800000 nodes! 12s 900000 nodes! 13s 947471 nodes: 881423 have parents and 520916 have children! 13s Restructuring clauses: deep copying tree relations! 19s Pass 0: Storing mother relationship! 21s 18580 clauses have a mother! 21s All clauses have mothers of types in! {'sentence', 'word', 'phrase', 'subphrase', 'clause'}! 21s Pass 1: all clauses except those of type Coor! 22s Pass 2: clauses of type Coor only! 23s Mothers applied. Found 0 motherless clauses.! 23s 2497 nodes have 1 sisters! 23s 167 nodes have 2 sisters! 23s 9 nodes have 3 sisters! 23s There are 2858 sisters, 2673 nodes have sisters.! 23s Ready for processing # GEN 01,01! node=1127306!oid=11! bmonad=1!0 1 2 3 4 5 6 7 8 9 10! (S(C(PP(pp " ב")(n " ראשׁית "))(VP(vb " ברא "))(NP(n " אלהים "))(PP(U(pp " את ")(dt " ה")(n " שׁמים "))(cj " ו")(U(pp " את ")(dt " ה")(n ((((("ארץ" ! ! # GEN 01,02! node=1127307!oid=39! bmonad=12! 0 1 2 3 4 5 6! (S(C(CP(cj " ו"))(NP(dt " ה")(n " ארץ "))(VP(vb " היתה "))(NP(U(n " תהו "))(cj " ו")(U(n " ((((("בהו ! http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/trees/trees_etcbc4.ipynb
  • 23. III in the beginning: origin story: ETCBC six days of working: laboratory: LAF-Fabric the sabbath: dissemination: SHEBANQ the tree of knowledge of good and evil: lessons
  • 24. back to EMDROS select all objects in {1-40} where [phrase [word] [word] ]! .. [phrase [word g_cons = 'H'] [word focus] ] optionally restrict results to words 1-40 gap the first word has value H for feature g_cons deliver just the second word of the second phrase as result
  • 25. SHEBANQ System for HEBrew text: ANnotations for Queries and markup http://shebanq.ancient-data.org שִׁבֹּ֜לֶת סִבֹּ֗לֶת s(h)ibboleth
  • 27. proliferation of queries 78 queries, in varying degrees of maturity who is afraid of lists?
  • 28. serendipity hey, Martijn is after something! inform your followers with 1 click just browsing Genesis 4
  • 30. IV in the beginning: origin story: ETCBC six days of working: laboratory: LAF-Fabric the sabbath: dissemination: SHEBANQ the tree of knowledge of good and evil: lessons
  • 31. nota bene: formats LAF = stand-off markup TEI = inline markup XML only for import/export XML tech all over the place Queries: textual (MQL) and by walking (Graph) XQUERY, XSLT, SQL
  • 32. nota bene: tech current, mainstream tech: e.g. (I)Python plus packages cling to what once worked avoid reinventing the wheel support researchers in coding maximize return on investment shield researchers from coding abstraction level: scripts data in data structures sys programming: C++, Java, data in formalisms: XML, RDF facilitate import/export/sharing invest in monoliths and GUIs (over-facilitating)
  • 33. nota bene: property share widely: live in a silo your data, your results with other fields as well become idiosyncratic avoid stimuli from elsewhere share openly: data into an archive tools on github exert copyrights on data protect your software you cannot *own* ideas they grow by being handed over our ideas are like a bag of potatoes: we have worked for it and you have to pay for it
  • 34. Query the Hebrew Bible through the dirk.roorda@dans.knaw.nl ETCBC database SHEBANQ ר׃ E וַֽ יְהִי־אֽ רE יְהִ֣י א֑ thank you